KinMo: Kinematic-aware Human Motion Understanding and Generation
KinMo: Kinematic-aware Human Motion Understanding and Generation
Current human motion synthesis frameworks rely on global action descriptions, creating a modality gap that limits both motion understanding and generation capabilities. A single coarse description, such as ``run", fails to capture details like variations in speed, limb positioning, and kinematic dynamics, leading to ambiguities between text and motion modalities. To address this challenge, we introduce \textbf{KinMo}, a unified framework built on a hierarchical describable motion representation that extends beyond global action by incorporating kinematic group movements and their interactions. We design an automated annotation pipeline to generate high-quality, fine-grained descriptions for this decomposition, resulting in the KinMo dataset. To leverage these structured descriptions, we propose Hierarchical Text-Motion Alignment, improving spatial understanding by integrating additional motion details. Furthermore, we introduce a coarse-to-fine generation procedure to leverage enhanced spatial understanding to improve motion synthesis. Experimental results show that KinMo significantly improves motion understanding, demonstrated by enhanced text-motion retrieval performance and enabling more fine-grained motion generation and editing capabilities. Project Page: https://andypinxinliu.github.io/KinMo
Pablo Garrido、Pinxin Liu、Hyeongwoo Kim、Bindita Chaudhuri、Pengfei Zhang
计算技术、计算机技术
Pablo Garrido,Pinxin Liu,Hyeongwoo Kim,Bindita Chaudhuri,Pengfei Zhang.KinMo: Kinematic-aware Human Motion Understanding and Generation[EB/OL].(2024-11-23)[2025-04-30].https://arxiv.org/abs/2411.15472.点此复制
评论