Visible and Infrared image Fusion (VIF) offers a comprehensive scene description by combining thermal infrared images with the rich textures from visible cameras. However, conventional VIF systems may capture over/under exposure or blurry images in extreme lighting and high dynamic motion scenarios, leading to degraded fusion results. To address these problems, we propose a novel Event-based Visible and Infrared Fusion (EVIF) system that employs a visible event camera as an alternative to traditional frame-based cameras for the VIF task. With extremely low latency and high dynamic range, event cameras can effectively address blurriness and are robust against diverse luminous ranges. To produce high-quality fused images, we develop a multi-task collaborative framework that simultaneously performs event-based visible texture reconstruction, event-guided infrared image deblurring, and visible-infrared fusion. Rather than independently learning these tasks, our framework capitalizes on their synergy, leveraging cross-task event enhancement for efficient deblurring and bi-level min-max mutual information optimization to achieve higher fusion quality. Experiments on both synthetic and real data show that EVIF achieves remarkable performance in dealing with extreme lighting conditions and high-dynamic scenes, ensuring high-quality fused images across a broad range of practical scenarios.