ForceVLA: Enhancing VLA Models with a Force-aware MoE for Contact-rich Manipulation
ForceVLA: Enhancing VLA Models with a Force-aware MoE for Contact-rich Manipulation
Vision-Language-Action (VLA) models have advanced general-purpose robotic manipulation by leveraging pretrained visual and linguistic representations. However, they struggle with contact-rich tasks that require fine-grained control involving force, especially under visual occlusion or dynamic uncertainty. To address these limitations, we propose \textbf{ForceVLA}, a novel end-to-end manipulation framework that treats external force sensing as a first-class modality within VLA systems. ForceVLA introduces \textbf{FVLMoE}, a force-aware Mixture-of-Experts fusion module that dynamically integrates pretrained visual-language embeddings with real-time 6-axis force feedback during action decoding. This enables context-aware routing across modality-specific experts, enhancing the robot's ability to adapt to subtle contact dynamics. We also introduce \textbf{ForceVLA-Data}, a new dataset comprising synchronized vision, proprioception, and force-torque signals across five contact-rich manipulation tasks. ForceVLA improves average task success by 23.2\% over strong $\pi_0$-based baselines, achieving up to 80\% success in tasks such as plug insertion. Our approach highlights the importance of multimodal integration for dexterous manipulation and sets a new benchmark for physically intelligent robotic control. Code and data will be released at https://sites.google.com/view/forcevla2025.
Jiawen Yu、Hairuo Liu、Qiaojun Yu、Jieji Ren、Ce Hao、Haitong Ding、Guangyu Huang、Guofan Huang、Yan Song、Panpan Cai、Cewu Lu、Wenqiang Zhang
计算技术、计算机技术
Jiawen Yu,Hairuo Liu,Qiaojun Yu,Jieji Ren,Ce Hao,Haitong Ding,Guangyu Huang,Guofan Huang,Yan Song,Panpan Cai,Cewu Lu,Wenqiang Zhang.ForceVLA: Enhancing VLA Models with a Force-aware MoE for Contact-rich Manipulation[EB/OL].(2025-05-28)[2025-07-25].https://arxiv.org/abs/2505.22159.点此复制
评论