首页|Few-Shot Referring Video Single- and Multi-Object Segmentation via Cross-Modal Affinity with Instance Sequence Matching

Few-Shot Referring Video Single- and Multi-Object Segmentation via Cross-Modal Affinity with Instance Sequence Matching

来源：

Arxiv

英文摘要

Referring video object segmentation (RVOS) aims to segment objects in videos guided by natural language descriptions. We propose FS-RVOS, a Transformer-based model with two key components: a cross-modal affinity module and an instance sequence matching strategy, which extends FS-RVOS to multi-object segmentation (FS-RVMOS). Experiments show FS-RVOS and FS-RVMOS outperform state-of-the-art methods across diverse benchmarks, demonstrating superior robustness and accuracy.

作者：Heng Liu、Guanghui Li、Mingqi Gao、Xiantong Zhen、Feng Zheng、Yang Wang

作者单位：

学科分类：计算技术、计算机技术

推荐引用：Heng Liu,Guanghui Li,Mingqi Gao,Xiantong Zhen,Feng Zheng,Yang Wang.Few-Shot Referring Video Single- and Multi-Object Segmentation via Cross-Modal Affinity with Instance Sequence Matching[EB/OL].(2025-04-18)[2025-06-04].https://arxiv.org/abs/2504.13710.点此复制

Few-Shot Referring Video Single- and Multi-Object Segmentation via Cross-Modal Affinity with Instance Sequence Matching

Few-Shot Referring Video Single- and Multi-Object Segmentation via Cross-Modal Affinity with Instance Sequence Matching

评论