|国家预印本平台
首页|CLIP-SLA: Parameter-Efficient CLIP Adaptation for Continuous Sign Language Recognition

CLIP-SLA: Parameter-Efficient CLIP Adaptation for Continuous Sign Language Recognition

CLIP-SLA: Parameter-Efficient CLIP Adaptation for Continuous Sign Language Recognition

来源:Arxiv_logoArxiv
英文摘要

Continuous sign language recognition (CSLR) focuses on interpreting and transcribing sequences of sign language gestures in videos. In this work, we propose CLIP sign language adaptation (CLIP-SLA), a novel CSLR framework that leverages the powerful pre-trained visual encoder from the CLIP model to sign language tasks through parameter-efficient fine-tuning (PEFT). We introduce two variants, SLA-Adapter and SLA-LoRA, which integrate PEFT modules into the CLIP visual encoder, enabling fine-tuning with minimal trainable parameters. The effectiveness of the proposed frameworks is validated on four datasets: Phoenix2014, Phoenix2014-T, CSL-Daily, and Isharah-500, where both CLIP-SLA variants outperformed several SOTA models with fewer trainable parameters. Extensive ablation studies emphasize the effectiveness and flexibility of the proposed methods with different vision-language models for CSLR. These findings showcase the potential of adapting large-scale pre-trained models for scalable and efficient CSLR, which pave the way for future advancements in sign language understanding.

Sarah Alyami、Hamzah Luqman

计算技术、计算机技术

Sarah Alyami,Hamzah Luqman.CLIP-SLA: Parameter-Efficient CLIP Adaptation for Continuous Sign Language Recognition[EB/OL].(2025-04-02)[2025-05-09].https://arxiv.org/abs/2504.01666.点此复制

评论