A Unified Transformer-Based Framework with Pretraining For Whole Body Grasping Motion Generation
A Unified Transformer-Based Framework with Pretraining For Whole Body Grasping Motion Generation
Accepted in the ICIP 2025 We present a novel transformer-based framework for whole-body grasping that addresses both pose generation and motion infilling, enabling realistic and stable object interactions. Our pipeline comprises three stages: Grasp Pose Generation for full-body grasp generation, Temporal Infilling for smooth motion continuity, and a LiftUp Transformer that refines downsampled joints back to high-resolution markers. To overcome the scarcity of hand-object interaction data, we introduce a data-efficient Generalized Pretraining stage on large, diverse motion datasets, yielding robust spatio-temporal representations transferable to grasping tasks. Experiments on the GRAB dataset show that our method outperforms state-of-the-art baselines in terms of coherence, stability, and visual realism. The modular design also supports easy adaptation to other human-motion applications.
Edward Effendy、Kuan-Wei Tseng、Rei Kawakami
计算技术、计算机技术自动化技术、自动化技术设备
Edward Effendy,Kuan-Wei Tseng,Rei Kawakami.A Unified Transformer-Based Framework with Pretraining For Whole Body Grasping Motion Generation[EB/OL].(2025-07-01)[2025-07-22].https://arxiv.org/abs/2507.00676.点此复制
评论