|国家预印本平台
首页|基于多尺度注意力机制的高分辨率网络人体姿态估计

基于多尺度注意力机制的高分辨率网络人体姿态估计

中文摘要英文摘要

针对人体姿态估计中面对特征图尺度变化的挑战时,难以预测人体的正确姿势,提出了一种基于多尺度注意力机制的高分辨率网络MSANet(multiscale-attention net)以提高人体姿态估计的检测精度。引入轻量级的金字塔卷积和注意力特征融合达到更高效的完成多尺度信息的提取;在并行子网的融合中引用自转换器模块进行特征增强,获取全局特征;在输出阶段中将各层的特征使用自适应空间特征融合策略进行融合后作为最后的输出,更充分的获取高层特征的语义信息和底层特征的细粒度特征,以推断不可见点和被遮挡的关键点。在公开数据集 COCO2017上进行测试,实验结果表明,该方法比基础网络HRNet的估计精度提升了4.2%。

It is difficult to predict the correct human poses when facing the challenge of the scale change of the feature map in the human pose estimation. To solve this problem, proposing a high-resolution network MSANet (Multiscale-Attention Net) based on multi-scale attention mechanism to improve the detection accuracy of human pose estimation. Introduce lightweight pyramid convolution and attention feature fusion to achieve more efficient extraction of multi-scale information; citing the self-transformer module in the fusion of parallel subnets for feature enhancement to obtain global features; in the output stage, The features of each layer are fused using an adaptive spatial feature fusion strategy as the final output, which more fully obtains the semantic information of high-level features and the fine-grained features of low-level features to infer invisible points and occluded key points. Tested on the public dataset COCO2017, the experimental results show that this method improves the estimation accuracy by 4.2% compared with the basic network HRNet.

刘宇红、张荣芬、张雯雯、李丽、陈娜

10.12074/202205.00122V1

计算技术、计算机技术

人体姿态估计高分辨率网络多尺度注意力特征融合自适应空间特征融合

刘宇红,张荣芬,张雯雯,李丽,陈娜.基于多尺度注意力机制的高分辨率网络人体姿态估计[EB/OL].(2022-05-18)[2025-08-02].https://chinaxiv.org/abs/202205.00122.点此复制

评论