|国家预印本平台
首页|Visual Instance-aware Prompt Tuning

Visual Instance-aware Prompt Tuning

Visual Instance-aware Prompt Tuning

来源:Arxiv_logoArxiv
英文摘要

Visual Prompt Tuning (VPT) has emerged as a parameter-efficient fine-tuning paradigm for vision transformers, with conventional approaches utilizing dataset-level prompts that remain the same across all input instances. We observe that this strategy results in sub-optimal performance due to high variance in downstream datasets. To address this challenge, we propose Visual Instance-aware Prompt Tuning (ViaPT), which generates instance-aware prompts based on each individual input and fuses them with dataset-level prompts, leveraging Principal Component Analysis (PCA) to retain important prompting information. Moreover, we reveal that VPT-Deep and VPT-Shallow represent two corner cases based on a conceptual understanding, in which they fail to effectively capture instance-specific information, while random dimension reduction on prompts only yields performance between the two extremes. Instead, ViaPT overcomes these limitations by balancing dataset-level and instance-level knowledge, while reducing the amount of learnable parameters compared to VPT-Deep. Extensive experiments across 34 diverse datasets demonstrate that our method consistently outperforms state-of-the-art baselines, establishing a new paradigm for analyzing and optimizing visual prompts for vision transformers.

Xi Xiao、Yunbei Zhang、Xingjian Li、Tianyang Wang、Xiao Wang、Yuxiang Wei、Jihun Hamm、Min Xu

计算技术、计算机技术

Xi Xiao,Yunbei Zhang,Xingjian Li,Tianyang Wang,Xiao Wang,Yuxiang Wei,Jihun Hamm,Min Xu.Visual Instance-aware Prompt Tuning[EB/OL].(2025-07-10)[2025-07-25].https://arxiv.org/abs/2507.07796.点此复制

评论