Democratizing Protein Language Models with Parameter-Efficient Fine-Tuning
Democratizing Protein Language Models with Parameter-Efficient Fine-Tuning
Proteomics has been revolutionized by large pre-trained protein language models, which learn unsupervised representations from large corpora of sequences. The parameters of these models are then fine-tuned in a supervised setting to tailor the model to a specific downstream task. However, as model size increases, the computational and memory footprint of fine-tuning becomes a barrier for many research groups. In the field of natural language processing, which has seen a similar explosion in the size of models, these challenges have been addressed by methods for parameter-efficient fine-tuning (PEFT). In this work, we newly bring parameter-efficient fine-tuning methods to proteomics. Using the parameter-efficient method LoRA, we train new models for two important proteomic tasks: predicting protein-protein interactions (PPI) and predicting the symmetry of homooligomers. We show that for homooligomer symmetry prediction, these approaches achieve performance competitive with traditional fine-tuning while requiring reduced memory and using three orders of magnitude fewer parameters. On the PPI prediction task, we surprisingly find that PEFT models actually outperform traditional fine-tuning while using two orders of magnitude fewer parameters. Here, we go even further to show that freezing the parameters of the language model and training only a classification head also outperforms fine-tuning, using five orders of magnitude fewer parameters, and that both of these models outperform state-of-the-art PPI prediction methods with substantially reduced compute. We also demonstrate that PEFT is robust to variations in training hyper-parameters, and elucidate where best practices for PEFT in proteomics differ from in natural language processing. Thus, we provide a blueprint to democratize the power of protein language model tuning to groups which have limited computational resources.
Baek Minkyung、Kshirsagar Meghana、Dodhia Rahul、Lavista Ferres Juan、Sledzieski Samuel、Berger Bonnie
生物科学研究方法、生物科学研究技术计算技术、计算机技术分子生物学
Baek Minkyung,Kshirsagar Meghana,Dodhia Rahul,Lavista Ferres Juan,Sledzieski Samuel,Berger Bonnie.Democratizing Protein Language Models with Parameter-Efficient Fine-Tuning[EB/OL].(2025-03-28)[2025-08-03].https://www.biorxiv.org/content/10.1101/2023.11.09.566187.点此复制
评论