Prompt Guidance and Human Proximal Perception for HOT Prediction with Regional Joint Loss
Prompt Guidance and Human Proximal Perception for HOT Prediction with Regional Joint Loss
The task of Human-Object conTact (HOT) detection involves identifying the specific areas of the human body that are touching objects. Nevertheless, current models are restricted to just one type of image, often leading to too much segmentation in areas with little interaction, and struggling to maintain category consistency within specific regions. To tackle this issue, a HOT framework, termed \textbf{P3HOT}, is proposed, which blends \textbf{P}rompt guidance and human \textbf{P}roximal \textbf{P}erception. To begin with, we utilize a semantic-driven prompt mechanism to direct the network's attention towards the relevant regions based on the correlation between image and text. Then a human proximal perception mechanism is employed to dynamically perceive key depth range around the human, using learnable parameters to effectively eliminate regions where interactions are not expected. Calculating depth resolves the uncertainty of the overlap between humans and objects in a 2D perspective, providing a quasi-3D viewpoint. Moreover, a Regional Joint Loss (RJLoss) has been created as a new loss to inhibit abnormal categories in the same area. A new evaluation metric called ``AD-Acc.'' is introduced to address the shortcomings of existing methods in addressing negative samples. Comprehensive experimental results demonstrate that our approach achieves state-of-the-art performance in four metrics across two benchmark datasets. Specifically, our model achieves an improvement of \textbf{0.7}$\uparrow$, \textbf{2.0}$\uparrow$, \textbf{1.6}$\uparrow$, and \textbf{11.0}$\uparrow$ in SC-Acc., mIoU, wIoU, and AD-Acc. metrics, respectively, on the HOT-Annotated dataset. Code is available at https://github.com/YuxiaoWang-AI/P3HOT.
Yuxiao Wang、Yu Lei、Zhenao Wei、Weiying Xue、Xinyu Jiang、Nan Zhuang、Qi Liu
计算技术、计算机技术
Yuxiao Wang,Yu Lei,Zhenao Wei,Weiying Xue,Xinyu Jiang,Nan Zhuang,Qi Liu.Prompt Guidance and Human Proximal Perception for HOT Prediction with Regional Joint Loss[EB/OL].(2025-07-02)[2025-07-16].https://arxiv.org/abs/2507.01630.点此复制
评论