|国家预印本平台
首页|ForeCite: Adapting Pre-Trained Language Models to Predict Future Citation Rates of Academic Papers

ForeCite: Adapting Pre-Trained Language Models to Predict Future Citation Rates of Academic Papers

ForeCite: Adapting Pre-Trained Language Models to Predict Future Citation Rates of Academic Papers

来源:Arxiv_logoArxiv
英文摘要

Predicting the future citation rates of academic papers is an important step toward the automation of research evaluation and the acceleration of scientific progress. We present $\textbf{ForeCite}$, a simple but powerful framework to append pre-trained causal language models with a linear head for average monthly citation rate prediction. Adapting transformers for regression tasks, ForeCite achieves a test correlation of $\rho = 0.826$ on a curated dataset of 900K+ biomedical papers published between 2000 and 2024, a 27-point improvement over the previous state-of-the-art. Comprehensive scaling-law analysis reveals consistent gains across model sizes and data volumes, while temporal holdout experiments confirm practical robustness. Gradient-based saliency heatmaps suggest a potentially undue reliance on titles and abstract texts. These results establish a new state-of-the-art in forecasting the long-term influence of academic research and lay the groundwork for the automated, high-fidelity evaluation of scientific contributions.

Gavin Hull、Alex Bihlo

医学研究方法计算技术、计算机技术

Gavin Hull,Alex Bihlo.ForeCite: Adapting Pre-Trained Language Models to Predict Future Citation Rates of Academic Papers[EB/OL].(2025-05-13)[2025-06-03].https://arxiv.org/abs/2505.08941.点此复制

评论