|国家预印本平台
首页|Adapting Pretrained Language Models for Citation Classification via Self-Supervised Contrastive Learning

Adapting Pretrained Language Models for Citation Classification via Self-Supervised Contrastive Learning

Adapting Pretrained Language Models for Citation Classification via Self-Supervised Contrastive Learning

来源:Arxiv_logoArxiv
英文摘要

Citation classification, which identifies the intention behind academic citations, is pivotal for scholarly analysis. Previous works suggest fine-tuning pretrained language models (PLMs) on citation classification datasets, reaping the reward of the linguistic knowledge they gained during pretraining. However, directly fine-tuning for citation classification is challenging due to labeled data scarcity, contextual noise, and spurious keyphrase correlations. In this paper, we present a novel framework, Citss, that adapts the PLMs to overcome these challenges. Citss introduces self-supervised contrastive learning to alleviate data scarcity, and is equipped with two specialized strategies to obtain the contrastive pairs: sentence-level cropping, which enhances focus on target citations within long contexts, and keyphrase perturbation, which mitigates reliance on specific keyphrases. Compared with previous works that are only designed for encoder-based PLMs, Citss is carefully developed to be compatible with both encoder-based PLMs and decoder-based LLMs, to embrace the benefits of enlarged pretraining. Experiments with three benchmark datasets with both encoder-based PLMs and decoder-based LLMs demonstrate our superiority compared to the previous state of the art. Our code is available at: github.com/LITONG99/Citss

Tong Li、Jiachuan Wang、Yongqi Zhang、Shuangyin Li、Lei Chen

10.1145/3711896.3736829

计算技术、计算机技术

Tong Li,Jiachuan Wang,Yongqi Zhang,Shuangyin Li,Lei Chen.Adapting Pretrained Language Models for Citation Classification via Self-Supervised Contrastive Learning[EB/OL].(2025-05-20)[2025-06-18].https://arxiv.org/abs/2505.14471.点此复制

评论