首页|Benchmarking Feature Upsampling Methods for Vision Foundation Models using Interactive Segmentation

Benchmarking Feature Upsampling Methods for Vision Foundation Models using Interactive Segmentation

来源：

英文摘要

Vision Foundation Models (VFMs) are large-scale, pre-trained models that serve as general-purpose backbones for various computer vision tasks. As VFMs' popularity grows, there is an increasing interest in understanding their effectiveness for dense prediction tasks. However, VFMs typically produce low-resolution features, limiting their direct applicability in this context. One way to tackle this limitation is by employing a task-agnostic feature upsampling module that refines VFM features resolution. To assess the effectiveness of this approach, we investigate Interactive Segmentation (IS) as a novel benchmark for evaluating feature upsampling methods on VFMs. Due to its inherent multimodal input, consisting of an image and a set of user-defined clicks, as well as its dense mask output, IS creates a challenging environment that demands comprehensive visual scene understanding. Our benchmarking experiments show that selecting appropriate upsampling strategies significantly improves VFM features quality. The code is released at https://github.com/havrylovv/iSegProbe

作者：Volodymyr Havrylov、Haiwen Huang、Dan Zhang、Andreas Geiger

作者单位：

学科分类：计算技术、计算机技术

推荐引用：Volodymyr Havrylov,Haiwen Huang,Dan Zhang,Andreas Geiger.Benchmarking Feature Upsampling Methods for Vision Foundation Models using Interactive Segmentation[EB/OL].(2025-05-04)[2025-06-17].https://arxiv.org/abs/2505.02075.点此复制

Benchmarking Feature Upsampling Methods for Vision Foundation Models using Interactive Segmentation

Benchmarking Feature Upsampling Methods for Vision Foundation Models using Interactive Segmentation

评论