基于注意力与递归网格定位的多视角3D人体姿态估计
Attention-Based Recursive Mesh Location for Multi-View 3D Human Pose Estimation
徐桓 1夏海轮 1梁泽宇1
作者信息
- 1. 北京邮电大学信息与通信工程学院,北京 100876
- 折叠
摘要
三维人体姿态估计是计算机视觉领域的基础性研究方向,其应用范围广泛,涵盖运动分析至虚拟现实等多个领域。尽管当前多视角方法已取得显著进展,但仍面临诸多挑战,包括体素化方法带来的高计算开销、对二维检测误差的敏感性,以及跨视角特征聚合效率低下等问题。为克服上述局限性,本文提出一种新颖的递归模型架构,并提出三项核心创新:(1)一种高效的体素特征采样模块,将二维特征投影至三维空间,避免了基于二维特征融合方法的显示极线计算;(2)一种三维相对位置编码方案,用于捕捉体素间的空间关系;(3)一种特征融合模块,用于处理网格特征。在Human3.6M和CMUPanoptic数据集上的大量实验评估表明,本文方法性能与当前最先进方法相当,在Human3.6M数据集上无需MPII预训练即可实现绝对位置18.0毫米、相对于骨盆20.0毫米的平均关节位置误差。消融实验验证了本文递归架构与对称骨长约束的有效性。本文工作为多视角三维姿态估计提供了一种鲁棒且高效的解决方案,支持端到端可微学习,并能有效捕捉跨视角空间关联。
Abstract
The field of three-dimensional (3D) human pose estimation is fundamental to computer vision, with broad applications ranging from motion analysis to virtual reality. Despite significant progress in current multi-view methods, challenges remain including high computational overhead from volumetric approaches, sensitivity to 2D detection errors, and inefficient feature aggregation across views. To overcome these limitations, we proposes a novel recursive model architecture with three core innovations:(1) An efficient voxel feature sampling module that projects 2D features into 3D space, avoiding explicit epipolar computation required by 2D feature fusion-based methods; (2) A 3D relative position encoding scheme for capturing spatial relationships between voxels; (3) A feature fusion module for processing grid features. Extensive evaluations on Human3.6M and CMU Panoptic datasets demonstrate that our approach achieves performance comparable to state-of-the-art methods, achieving 18.0 mm MPJPE for absolute positions and 20.0 mm relative to pelvis on Human3.6M without MPII pre-training. Ablation studies confirm the contributions of our recursive architecture and symmetric bone length constraints. This work provides a robust and efficient solution for multi-view 3D pose estimation, enabling end-to-end differentiable learning and effective cross-view spatial correlation capture.关键词
3D姿态估计/递归模型架构/体积特征采样Key words
3D pose estimation/recursive model architecture/volumetric feature sampling引用本文复制引用
徐桓,夏海轮,梁泽宇.基于注意力与递归网格定位的多视角3D人体姿态估计[EB/OL].(2026-03-24)[2026-03-27].http://www.paper.edu.cn/releasepaper/content/202603-238.学科分类
计算技术、计算机技术
评论