Skip to yearly menu bar Skip to main content


Poster

SportsSloMo: A New Benchmark and Baselines for Human-centric Video Frame Interpolation

Jiaben Chen · Huaizu Jiang

Arch 4A-E Poster #163
[ ] [ Paper PDF ]
[ Poster
Wed 19 Jun 5 p.m. PDT — 6:30 p.m. PDT

Abstract: Human-centric video frame interpolation has great potential for enhancing entertainment experiences and finding commercial applications in the sports analysis industry, e.g., synthesizing slow-motion videos. Although there are multiple benchmark datasets available for video frame interpolation in the community, none of them is dedicated to human-centric scenarios. To bridge this gap, we introduce SportsSloMo, a benchmark featuring over 130K high-resolution ($\geq$720p) slow-motion sports video clips, totaling over 1M video frames, sourced from YouTube. We re-train several state-of-the-art methods on our benchmark, and we observed a noticeable decrease in their accuracy compared to other datasets. This highlights the difficulty of our benchmark and suggests that it poses significant challenges even for the best-performing methods, as human bodies are highly deformable and occlusions are frequent in sports videos. To tackle these challenges, we propose human-aware loss terms, where we add auxiliary supervision for human segmentation in panoptic settings and keypoints detection. These loss terms are model-agnostic and can be easily plugged into any video frame interpolation approach. Experimental results validate the effectiveness of our proposed human-aware loss terms, leading to consistent performance improvement over existing models. The dataset and code will be publicly released to foster future research.

Chat is not available.