|国家预印本平台
首页|Delving into LLM-assisted writing in biomedical publications through excess vocabulary

Delving into LLM-assisted writing in biomedical publications through excess vocabulary

Delving into LLM-assisted writing in biomedical publications through excess vocabulary

来源:Arxiv_logoArxiv
英文摘要

Large language models (LLMs) like ChatGPT can generate and revise text with human-level performance. These models come with clear limitations: they can produce inaccurate information, reinforce existing biases, and be easily misused. Yet, many scientists use them for their scholarly writing. But how wide-spread is such LLM usage in the academic literature? To answer this question for the field of biomedical research, we present an unbiased, large-scale approach: we study vocabulary changes in over 15 million biomedical abstracts from 2010--2024 indexed by PubMed, and show how the appearance of LLMs led to an abrupt increase in the frequency of certain style words. This excess word analysis suggests that at least 13.5% of 2024 abstracts were processed with LLMs. This lower bound differed across disciplines, countries, and journals, reaching 40% for some subcorpora. We show that LLMs have had an unprecedented impact on scientific writing in biomedical research, surpassing the effect of major world events such as the Covid pandemic.

Dmitry Kobak、Emőke-Ágnes Horvát、Jan Lause、Rita González-Márquez

10.1126/sciadv.adt3813

医药卫生理论医学研究方法生物科学现状、生物科学发展生物科学研究方法、生物科学研究技术

Dmitry Kobak,Emőke-Ágnes Horvát,Jan Lause,Rita González-Márquez.Delving into LLM-assisted writing in biomedical publications through excess vocabulary[EB/OL].(2025-07-03)[2025-07-21].https://arxiv.org/abs/2406.07016.点此复制

评论