BFA: Best-Feature-Aware Fusion for Multi-View Fine-grained Manipulation
BFA: Best-Feature-Aware Fusion for Multi-View Fine-grained Manipulation
In real-world scenarios, multi-view cameras are typically employed for fine-grained manipulation tasks. Existing approaches (e.g., ACT) tend to treat multi-view features equally and directly concatenate them for policy learning. However, it will introduce redundant visual information and bring higher computational costs, leading to ineffective manipulation. For a fine-grained manipulation task, it tends to involve multiple stages while the most contributed view for different stages is varied over time. In this paper, we propose a plug-and-play best-feature-aware (BFA) fusion strategy for multi-view manipulation tasks, which is adaptable to various policies. Built upon the visual backbone of the policy network, we design a lightweight network to predict the importance score of each view. Based on the predicted importance scores, the reweighted multi-view features are subsequently fused and input into the end-to-end policy network, enabling seamless integration. Notably, our method demonstrates outstanding performance in fine-grained manipulations. Experimental results show that our approach outperforms multiple baselines by 22-46% success rate on different tasks. Our work provides new insights and inspiration for tackling key challenges in fine-grained manipulations.
Weixin Mao、Tiancai Wang、Le Wang、Zihan Lan、Haosheng Li、Haoqiang Fan、Osamu Yoshie
计算技术、计算机技术
Weixin Mao,Tiancai Wang,Le Wang,Zihan Lan,Haosheng Li,Haoqiang Fan,Osamu Yoshie.BFA: Best-Feature-Aware Fusion for Multi-View Fine-grained Manipulation[EB/OL].(2025-06-28)[2025-07-17].https://arxiv.org/abs/2502.11161.点此复制
评论