首页|Democratizing Protein Language Models with Parameter-Efficient Fine-Tuning

Democratizing Protein Language Models with Parameter-Efficient Fine-Tuning

来源：

英文摘要

Proteomics has been revolutionized by large pre-trained protein language models, which learn unsupervised representations from large corpora of sequences. The parameters of these models are then fine-tuned in a supervised setting to tailor the model to a specific downstream task. However, as model size increases, the computational and memory footprint of fine-tuning becomes a barrier for many research groups. In the field of natural language processing, which has seen a similar explosion in the size of models, these challenges have been addressed by methods for parameter-efficient fine-tuning (PEFT). In this work, we newly bring parameter-efficient fine-tuning methods to proteomics. Using the parameter-efficient method LoRA, we train new models for two important proteomic tasks: predicting protein-protein interactions (PPI) and predicting the symmetry of homooligomers. We show that for homooligomer symmetry prediction, these approaches achieve performance competitive with traditional fine-tuning while requiring reduced memory and using three orders of magnitude fewer parameters. On the PPI prediction task, we surprisingly find that PEFT models actually outperform traditional fine-tuning while using two orders of magnitude fewer parameters. Here, we go even further to show that freezing the parameters of the language model and training only a classification head also outperforms fine-tuning, using five orders of magnitude fewer parameters, and that both of these models outperform state-of-the-art PPI prediction methods with substantially reduced compute. We also demonstrate that PEFT is robust to variations in training hyper-parameters, and elucidate where best practices for PEFT in proteomics differ from in natural language processing. Thus, we provide a blueprint to democratize the power of protein language model tuning to groups which have limited computational resources.

作者：Baek Minkyung、Kshirsagar Meghana、Dodhia Rahul、Lavista Ferres Juan、Sledzieski Samuel、Berger Bonnie

作者单位：

DOI：10.1101/2023.11.09.566187

学科分类：生物科学研究方法、生物科学研究技术计算技术、计算机技术分子生物学

推荐引用：Baek Minkyung,Kshirsagar Meghana,Dodhia Rahul,Lavista Ferres Juan,Sledzieski Samuel,Berger Bonnie.Democratizing Protein Language Models with Parameter-Efficient Fine-Tuning[EB/OL].(2025-03-28)[2025-08-03].https://www.biorxiv.org/content/10.1101/2023.11.09.566187.点此复制

Democratizing Protein Language Models with Parameter-Efficient Fine-Tuning

Democratizing Protein Language Models with Parameter-Efficient Fine-Tuning

评论