首页|基于注意力与递归网格定位的多视角3D人体姿态估计

基于注意力与递归网格定位的多视角3D人体姿态估计

徐桓夏海轮梁泽宇

来源：

中国科技论文在线

基于注意力与递归网格定位的多视角3D人体姿态估计

Attention-Based Recursive Mesh Location for Multi-View 3D Human Pose Estimation

徐桓 ¹夏海轮 ¹梁泽宇¹

作者信息

1. 北京邮电大学信息与通信工程学院，北京 100876
折叠

摘要

三维人体姿态估计是计算机视觉领域的基础性研究方向，其应用范围广泛，涵盖运动分析至虚拟现实等多个领域。尽管当前多视角方法已取得显著进展，但仍面临诸多挑战，包括体素化方法带来的高计算开销、对二维检测误差的敏感性，以及跨视角特征聚合效率低下等问题。为克服上述局限性，本文提出一种新颖的递归模型架构，并提出三项核心创新：(1)一种高效的体素特征采样模块，将二维特征投影至三维空间，避免了基于二维特征融合方法的显示极线计算；(2)一种三维相对位置编码方案，用于捕捉体素间的空间关系；(3)一种特征融合模块，用于处理网格特征。在Human3.6M和CMUPanoptic数据集上的大量实验评估表明，本文方法性能与当前最先进方法相当，在Human3.6M数据集上无需MPII预训练即可实现绝对位置18.0毫米、相对于骨盆20.0毫米的平均关节位置误差。消融实验验证了本文递归架构与对称骨长约束的有效性。本文工作为多视角三维姿态估计提供了一种鲁棒且高效的解决方案，支持端到端可微学习，并能有效捕捉跨视角空间关联。

Abstract

The field of three-dimensional (3D) human pose estimation is fundamental to computer vision, with broad applications ranging from motion analysis to virtual reality. Despite significant progress in current multi-view methods, challenges remain including high computational overhead from volumetric approaches, sensitivity to 2D detection errors, and inefficient feature aggregation across views. To overcome these limitations, we proposes a novel recursive model architecture with three core innovations:(1) An efficient voxel feature sampling module that projects 2D features into 3D space, avoiding explicit epipolar computation required by 2D feature fusion-based methods; (2) A 3D relative position encoding scheme for capturing spatial relationships between voxels; (3) A feature fusion module for processing grid features. Extensive evaluations on Human3.6M and CMU Panoptic datasets demonstrate that our approach achieves performance comparable to state-of-the-art methods, achieving 18.0 mm MPJPE for absolute positions and 20.0 mm relative to pelvis on Human3.6M without MPII pre-training. Ablation studies confirm the contributions of our recursive architecture and symmetric bone length constraints. This work provides a robust and efficient solution for multi-view 3D pose estimation, enabling end-to-end differentiable learning and effective cross-view spatial correlation capture.

关键词

3D姿态估计/递归模型架构/体积特征采样

Key words

3D pose estimation/recursive model architecture/volumetric feature sampling

引用本文复制引用

徐桓,夏海轮,梁泽宇.基于注意力与递归网格定位的多视角3D人体姿态估计[EB/OL].(2026-03-24)[2026-03-27].http://www.paper.edu.cn/releasepaper/content/202603-238.

学科分类

计算技术、计算机技术

首发时间： 2026-03-24

下载量：0

点击量：13

段落导航

基于注意力与递归网格定位的多视角3D人体姿态估计

基于注意力与递归网格定位的多视角3D人体姿态估计

Attention-Based Recursive Mesh Location for Multi-View 3D Human Pose Estimation

摘要

Abstract

关键词

Key words

引用本文复制引用

学科分类

评论