Poster
Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications
Yuwen Xiong · Zhiqi Li · Yuntao Chen · Feng Wang · Xizhou Zhu · Jiapeng Luo · Wenhai Wang · Tong Lu · Hongsheng Li · Yu Qiao · Lewei Lu · Jie Zhou · Jifeng Dai
Arch 4A-E Poster #78
Highlight |
We introduce Deformable ConvNets v4 (DCNv4), a highly efficient and effective operator for a broad spectrum of vision applications featuring an advanced sparse attention mechanism. DCNv4 addresses the limitations of its predecessor, DCNv3, with two key enhancements: 1. removing softmax normalization in spatial aggregation to enhance its dynamic property and expressive power and 2. optimizing memory access to minimize redundant operations for speedup. These improvements result in a significantly faster convergence compared to DCNv3 and a substantial increase in processing speed, with DCNv4 achieving more than three times the forward speed.Our evaluation demonstrates DCNv4's superior performance in various tasks, including image classification, instance and semantic segmentation, and notably in image generation. When integrated into generative models like U-Net in the latent diffusion model, DCNv4 outperforms baselines, underscoring its potential to enhance generative models. In practical applications, replacing DCNv3 with DCNv4 in the InternImage model to create FlashInternImage results in up to an 80\% speed increase without necessitating further modifications.DCNv4's advancements in speed and efficiency, combined with its robust performance across diverse vision tasks, position it as a foundational building block for future efficient and effective vision models.