MK-Pose: Category-Level Object Pose Estimation via Multimodal-Based Keypoint Learning
MK-Pose: Category-Level Object Pose Estimation via Multimodal-Based Keypoint Learning
Category-level object pose estimation, which predicts the pose of objects within a known category without prior knowledge of individual instances, is essential in applications like warehouse automation and manufacturing. Existing methods relying on RGB images or point cloud data often struggle with object occlusion and generalization across different instances and categories. This paper proposes a multimodal-based keypoint learning framework (MK-Pose) that integrates RGB images, point clouds, and category-level textual descriptions. The model uses a self-supervised keypoint detection module enhanced with attention-based query generation, soft heatmap matching and graph-based relational modeling. Additionally, a graph-enhanced feature fusion module is designed to integrate local geometric information and global context. MK-Pose is evaluated on CAMERA25 and REAL275 dataset, and is further tested for cross-dataset capability on HouseCat6D dataset. The results demonstrate that MK-Pose outperforms existing state-of-the-art methods in both IoU and average precision without shape priors. Codes will be released at \href{https://github.com/yangyifanYYF/MK-Pose}{https://github.com/yangyifanYYF/MK-Pose}.
Yifan Yang、Peili Song、Enfan Lan、Dong Liu、Jingtai Liu
自动化技术、自动化技术设备计算技术、计算机技术
Yifan Yang,Peili Song,Enfan Lan,Dong Liu,Jingtai Liu.MK-Pose: Category-Level Object Pose Estimation via Multimodal-Based Keypoint Learning[EB/OL].(2025-07-09)[2025-07-22].https://arxiv.org/abs/2507.06662.点此复制
评论