基于双解码器的端到端3D人体姿态估计
End-to-end 3D Human Pose Estimation using Dual Decoders
现有的三维人体姿态估计方法主要将任务分为两个阶段。第一阶段识别输入图像中人体关节的2D坐标,即2D人体关节坐标。第二阶段使用来自第一阶段的结果作为输入,从2D人体关节坐标恢复人体关节的深度信息,以实现3D人体姿态估计。然而,两阶段方法的识别精度严重依赖于第一阶段的结果,并且包括过多的冗余处理步骤,这降低了网络的推理效率。为了解决这些问题,我们提出了EDD,这是一种基于双解码器转换器架构的完全端到端的3D人体姿态估计方法。通过学习多个人体姿势,该模型可以使用姿势解码器直接推断图像中的所有3D人体姿势,然后基于关节之间的运动学关系,使用联合解码器进一步优化识别结果。通过注意机制,该方法可以自适应地关注与目标关节最相关的特征,有效地克服了人体姿态估计任务中的特征错位问题,大大提高了模型性能。还消除了复杂的后处理步骤,如非最大值抑制等,进一步提高了模型的效率。结果表明,该方法在MuPoTS-3D数据集上实现了87.4%的准确率,显著提高了基于混合训练的端到端3D人体姿态估计方法的准确率。?
Existing methods for 3D human pose estimation mainly divide the task into two stages. The first stage identifies the 2D coordinates of the human joints in the input image, namely the 2D human joint coordinates. The second stage uses the results from the first stage as input to recover the depth information of human joints from the 2D human joint coordinates to achieve 3D human pose estimation. However, the recognition accuracy of the two-stage method relies heavily on the results of the first stage and includes too many redundant processing steps, which reduces the inference efficiency of the network. To address these issues, we propose the EDD, a fully End-to-end 3D human pose estimation method based on transformer architecture with Dual Decoders. By learning multiple human poses, the model can directly infer all 3D human poses in the image using a pose decoder, and then further optimize the recognition result using a joint decoder based on the kinematic relations between joints. With the attention mechanism, this method can adaptively focus on the most relevant features to the target joint, effectively overcoming the feature misalignment problem in the human pose estimation task and greatly improving the model performance. Any complex post-processing step, such as non-maximum suppression, is eliminated, further improving the efficiency of the model. The results show that this method achieves an accuracy of 87.4\% on the MuPoTS-3D dataset, significantly improving the accuracy of the end-to-end 3D human pose estimation method based on mixed training.
宋梅、王璋、金磊
计算技术、计算机技术
人工智能计算机视觉3D人体姿态估计变压器
rtificial intelligenceComputer vision3D human pose estimationTransformer
宋梅,王璋,金磊.基于双解码器的端到端3D人体姿态估计[EB/OL].(2024-02-05)[2025-08-02].http://www.paper.edu.cn/releasepaper/content/202402-45.点此复制
评论