Benchmarking Feature Upsampling Methods for Vision Foundation Models using Interactive Segmentation
Benchmarking Feature Upsampling Methods for Vision Foundation Models using Interactive Segmentation
Vision Foundation Models (VFMs) are large-scale, pre-trained models that serve as general-purpose backbones for various computer vision tasks. As VFMs' popularity grows, there is an increasing interest in understanding their effectiveness for dense prediction tasks. However, VFMs typically produce low-resolution features, limiting their direct applicability in this context. One way to tackle this limitation is by employing a task-agnostic feature upsampling module that refines VFM features resolution. To assess the effectiveness of this approach, we investigate Interactive Segmentation (IS) as a novel benchmark for evaluating feature upsampling methods on VFMs. Due to its inherent multimodal input, consisting of an image and a set of user-defined clicks, as well as its dense mask output, IS creates a challenging environment that demands comprehensive visual scene understanding. Our benchmarking experiments show that selecting appropriate upsampling strategies significantly improves VFM features quality. The code is released at https://github.com/havrylovv/iSegProbe
Volodymyr Havrylov、Haiwen Huang、Dan Zhang、Andreas Geiger
计算技术、计算机技术
Volodymyr Havrylov,Haiwen Huang,Dan Zhang,Andreas Geiger.Benchmarking Feature Upsampling Methods for Vision Foundation Models using Interactive Segmentation[EB/OL].(2025-05-04)[2025-06-17].https://arxiv.org/abs/2505.02075.点此复制
评论