KPM-Bench: A Kinematic Parsing Motion Benchmark for Fine-grained Motion-centric Video Understanding
arXiv:2602.17768v1 Announce Type: new Abstract: Despite recent advancements, video captioning models still face significant limitations in accurately describing fine-grained motion details and suffer from severe hallucination issues. These challenges become particularly prominent when generating captions for motion-centric videos, where precise depiction of intricate movements and limb dynamics is crucial yet often neglected. To alleviate this gap, we introduce an automated annotation pipeline that integrates kinematic-based motion computation with linguistic parsing, enabling detailed decomposition and description of complex human […]