SAMITE: Position Prompted SAM2 with Calibrated Memory for Visual Object Tracking
SAMITE: Position Prompted SAM2 with Calibrated Memory for Visual Object Tracking
Visual Object Tracking (VOT) is widely used in applications like autonomous driving to continuously track targets in videos. Existing methods can be roughly categorized into template matching and autoregressive methods, where the former usually neglects the temporal dependencies across frames and the latter tends to get biased towards the object categories during training, showing weak generalizability to unseen classes. To address these issues, some methods propose to adapt the video foundation model SAM2 for VOT, where the tracking results of each frame would be encoded as memory for conditioning the rest of frames in an autoregressive manner. Nevertheless, existing methods fail to overcome the challenges of object occlusions and distractions, and do not have any measures to intercept the propagation of tracking errors. To tackle them, we present a SAMITE model, built upon SAM2 with additional modules, including: (1) Prototypical Memory Bank: We propose to quantify the feature-wise and position-wise correctness of each frame's tracking results, and select the best frames to condition subsequent frames. As the features of occluded and distracting objects are feature-wise and position-wise inaccurate, their scores would naturally be lower and thus can be filtered to intercept error propagation; (2) Positional Prompt Generator: To further reduce the impacts of distractors, we propose to generate positional mask prompts to provide explicit positional clues for the target, leading to more accurate tracking. Extensive experiments have been conducted on six benchmarks, showing the superiority of SAMITE. The code is available at https://github.com/Sam1224/SAMITE.
Qianxiong Xu、Lanyun Zhu、Chenxi Liu、Guosheng Lin、Cheng Long、Ziyue Li、Rui Zhao
计算技术、计算机技术
Qianxiong Xu,Lanyun Zhu,Chenxi Liu,Guosheng Lin,Cheng Long,Ziyue Li,Rui Zhao.SAMITE: Position Prompted SAM2 with Calibrated Memory for Visual Object Tracking[EB/OL].(2025-07-29)[2025-08-11].https://arxiv.org/abs/2507.21732.点此复制
评论