首页|OpenWorldSAM: Extending SAM2 for Universal Image Segmentation with Language Prompts

OpenWorldSAM: Extending SAM2 for Universal Image Segmentation with Language Prompts

来源：

英文摘要

The ability to segment objects based on open-ended language prompts remains a critical challenge, requiring models to ground textual semantics into precise spatial masks while handling diverse and unseen categories. We present OpenWorldSAM, a framework that extends the prompt-driven Segment Anything Model v2 (SAM2) to open-vocabulary scenarios by integrating multi-modal embeddings extracted from a lightweight vision-language model (VLM). Our approach is guided by four key principles: i) Unified prompting: OpenWorldSAM supports a diverse range of prompts, including category-level and sentence-level language descriptions, providing a flexible interface for various segmentation tasks. ii) Efficiency: By freezing the pre-trained components of SAM2 and the VLM, we train only 4.5 million parameters on the COCO-stuff dataset, achieving remarkable resource efficiency. iii) Instance Awareness: We enhance the model's spatial understanding through novel positional tie-breaker embeddings and cross-attention layers, enabling effective segmentation of multiple instances. iv) Generalization: OpenWorldSAM exhibits strong zero-shot capabilities, generalizing well on unseen categories and an open vocabulary of concepts without additional training. Extensive experiments demonstrate that OpenWorldSAM achieves state-of-the-art performance in open-vocabulary semantic, instance, and panoptic segmentation across multiple benchmarks, including ADE20k, PASCAL, ScanNet, and SUN-RGBD.

作者：Shiting Xiao、Rishabh Kabra、Yuhang Li、Donghyun Lee、Joao Carreira、Priyadarshini Panda

作者单位：

学科分类：计算技术、计算机技术

推荐引用：Shiting Xiao,Rishabh Kabra,Yuhang Li,Donghyun Lee,Joao Carreira,Priyadarshini Panda.OpenWorldSAM: Extending SAM2 for Universal Image Segmentation with Language Prompts[EB/OL].(2025-07-07)[2025-08-02].https://arxiv.org/abs/2507.05427.点此复制

OpenWorldSAM: Extending SAM2 for Universal Image Segmentation with Language Prompts

OpenWorldSAM: Extending SAM2 for Universal Image Segmentation with Language Prompts

评论