Event cameras offer many advantages over traditional frame-based cameras, such as high dynamic range and low latency. Therefore, event cameras are widely applied in diverse computer vision applications, where event-based keypoint detection is a fundamental task. However, achieving robust event-based keypoint detection remains challenging because the ground truth of event keypoints is difficult to obtain, descriptors extracted by CNN usually lack discriminative ability in the presence of intense noise, and fixed keypoint detectors are limited in detecting varied keypoint patterns. To address these challenges, a novel event-based keypoint detection method is proposed by learning dynamic detectors and contextual descriptors in a self-supervised manner (SD2Event), including a contextual feature descriptor learning (CFDL) module and a dynamic keypoint detector learning (DKDL) module. The proposed SD2Event enjoys several merits. First, the proposed CFDL module can model long-range contexts efficiently and effectively. Second, the DKDL module generates dynamic keypoint detectors, which can detect keypoints with diverse patterns across various event streams. Third, the proposed self-supervised signals can guide the model's adaptation to event data. Extensive experimental results on three challenging benchmarks show that our proposed method significantly outperforms state-of-the-art event-based keypoint detection methods.