首页|ForeCite: Adapting Pre-Trained Language Models to Predict Future Citation Rates of Academic Papers

ForeCite: Adapting Pre-Trained Language Models to Predict Future Citation Rates of Academic Papers

来源：

英文摘要

Predicting the future citation rates of academic papers is an important step toward the automation of research evaluation and the acceleration of scientific progress. We present $\textbf{ForeCite}$, a simple but powerful framework to append pre-trained causal language models with a linear head for average monthly citation rate prediction. Adapting transformers for regression tasks, ForeCite achieves a test correlation of $\rho = 0.826$ on a curated dataset of 900K+ biomedical papers published between 2000 and 2024, a 27-point improvement over the previous state-of-the-art. Comprehensive scaling-law analysis reveals consistent gains across model sizes and data volumes, while temporal holdout experiments confirm practical robustness. Gradient-based saliency heatmaps suggest a potentially undue reliance on titles and abstract texts. These results establish a new state-of-the-art in forecasting the long-term influence of academic research and lay the groundwork for the automated, high-fidelity evaluation of scientific contributions.

作者：Gavin Hull、Alex Bihlo

作者单位：

学科分类：医学研究方法计算技术、计算机技术

推荐引用：Gavin Hull,Alex Bihlo.ForeCite: Adapting Pre-Trained Language Models to Predict Future Citation Rates of Academic Papers[EB/OL].(2025-05-13)[2025-06-03].https://arxiv.org/abs/2505.08941.点此复制

ForeCite: Adapting Pre-Trained Language Models to Predict Future Citation Rates of Academic Papers

ForeCite: Adapting Pre-Trained Language Models to Predict Future Citation Rates of Academic Papers

评论