|国家预印本平台
首页|基于相对位置编码和注意力机制的多视图三维重建方法

基于相对位置编码和注意力机制的多视图三维重建方法

Multi-view 3D reconstruction method based on relative position coding and attention mechanism

中文摘要英文摘要

本文提出了一种基于Transfomer的多视图三维重建模型,该模型在基于学习的多视图三维重建模型当中达到优秀的效果,在之前表现不佳的三维重建场景的完整度方面,本模型有较好的提升。利用intra-attention 和 inter-attention的对于图像内特征和图像间特征的捕获能力,增强了网络对于各个深度信息的捕获能力,提升了精度。该方法首先利用特征金字塔从各个尺度提取特征,这一步使得网络能有效感知细节信息。此外,本文参考了最新swin transformer对于特征的相对位置编码,构造了适用于多视图重建的注意力函数,并在之后实验证明该方法的可行性。最后,还构建了基于ResNet的深度调整残差网络,这一步使得因为训练导致边界信息丢失不全的问题得到一定程度的缓解,提升了整体重建的质量。

his paper proposes a multi-view 3D reconstruction model based on Transfomer. This model achieves excellent results in the multi-view 3D reconstruction model based on deep learning. Utilizing intra-attention and inter-attention\'s ability to capture intra-image features and inter-image features enhances the network\'s ability to capture various depth information and improves accuracy. This method first uses the feature pyramid network to extract features from various scales, which enables the network to effectively perceive detailed information. In addition, this paper refers to the relative position encoding of features by the latest swin transformer, constructs an attention function suitable for multi-view reconstruction, and then proves the feasibility of this method through experiments. Finally, a depth adjustment residual network based on ResNet was also constructed. This step alleviated the problem of incomplete loss of boundary information due to training to a certain extent, and improved the quality of the overall reconstruction.

张鹏鲲、杨旭东

计算技术、计算机技术遥感技术

三维重建深度学习注意力机制多视图几何

3D reconstructiondeep learningattention mechanismmulti-view geometry

张鹏鲲,杨旭东.基于相对位置编码和注意力机制的多视图三维重建方法[EB/OL].(2023-05-18)[2025-07-09].http://www.paper.edu.cn/releasepaper/content/202305-117.点此复制

评论