|国家预印本平台
首页|solPredict: Antibody apparent solubility prediction from sequence by transfer learning

solPredict: Antibody apparent solubility prediction from sequence by transfer learning

solPredict: Antibody apparent solubility prediction from sequence by transfer learning

来源:bioRxiv_logobioRxiv
英文摘要

There is growing interest in developing therapeutic mAbs for the route of subcutaneous administration for several reasons, including patient convenience and compliance. This requires identifying mAbs with superior solubility that are amenable for high-concentration formulation development. However, early selection of developable antibodies with optimal high-concentration attributes remains challenging. Since experimental screening is often material and labor intensive, there is significant interest in developing robust in silico tools capable of screening thousands of molecules based on sequence information alone. In this paper, we present a strategy applying protein language modeling, named solPredict, to predict the apparent solubility of mAbs in histidine (pH 6.0) buffer condition. solPredict inputs embeddings extracted from pretrained protein language model from single sequences into a shallow neutral network. A dataset of 220 diverse, in-house mAbs, with extrapolated protein solubility data obtained from PEG-induced precipitation method, were used for model training and hyperparameter tuning through five-fold cross validation. An independent test set of 40 mAbs were used for model evaluation. solPredict achieves high correlation with experimental data (Spearman correlation coefficient = 0.86, Pearson correlation coefficient = 0.84, R2 = 0.69, and RMSE = 4.40). The output from solPredict directly corresponds to experimental solubility measurements (PEG %) and enables quantitative interpretation of results. This approach eliminates the need of 3D structure modeling of mAbs, descriptor computation, and expert-crafted input features. The minimal computational expense of solPredict enables rapid, large-scale, and high-throughput screening of mAbs during early antibody discovery.

Jiang Min、Chai Qing、Shih James、Feng Jiangyan

Advanced Analytics and Data Sciences, Eli Lilly Corporate CenterBioTechnology Discovery Research, Eli Lilly Biotechnology CenterBioTechnology Discovery Research, Eli Lilly Biotechnology CenterBioTechnology Discovery Research, Eli Lilly Biotechnology Center

10.1101/2021.12.07.471655

医药卫生理论医学研究方法药学

antibody solubilitydevelopabilityprotein language modelingtransfer learningmachine learning

Jiang Min,Chai Qing,Shih James,Feng Jiangyan.solPredict: Antibody apparent solubility prediction from sequence by transfer learning[EB/OL].(2025-03-28)[2025-04-28].https://www.biorxiv.org/content/10.1101/2021.12.07.471655.点此复制

评论