|国家预印本平台
首页|MARS: a Multimodal Alignment and Ranking System for Few-Shot Segmentation

MARS: a Multimodal Alignment and Ranking System for Few-Shot Segmentation

MARS: a Multimodal Alignment and Ranking System for Few-Shot Segmentation

来源:Arxiv_logoArxiv
英文摘要

Current Few Shot Segmentation literature lacks a mask selection method that goes beyond visual similarity between the query and example images, leading to suboptimal predictions. We present MARS, a plug-and-play ranking system that leverages multimodal cues to filter and merge mask proposals robustly. Starting from a set of mask predictions for a single query image, we score, filter, and merge them to improve results. Proposals are evaluated using multimodal scores computed at local and global levels. Extensive experiments on COCO-20i, Pascal-5i, LVIS-92i, and FSS-1000 demonstrate that integrating all four scoring components is crucial for robust ranking, validating our contribution. As MARS can be effortlessly integrated with various mask proposal systems, we deploy it across a wide range of top-performer methods and achieve new state-of-the-art results on multiple existing benchmarks. Code will be available upon acceptance.

Matteo Matteucci、Nico Catalano、Stefano Samele、Paolo Pertino

计算技术、计算机技术

Matteo Matteucci,Nico Catalano,Stefano Samele,Paolo Pertino.MARS: a Multimodal Alignment and Ranking System for Few-Shot Segmentation[EB/OL].(2025-04-10)[2025-04-24].https://arxiv.org/abs/2504.07942.点此复制

评论