|国家预印本平台
首页|Natural Language Guided Ligand-Binding Protein Design

Natural Language Guided Ligand-Binding Protein Design

Natural Language Guided Ligand-Binding Protein Design

来源:Arxiv_logoArxiv
英文摘要

Can AI protein models follow human language instructions and design proteins with desired functions (e.g. binding to a ligand)? Designing proteins that bind to a given ligand is crucial in a wide range of applications in biology and chemistry. Most prior AI models are trained on protein-ligand complex data, which is scarce due to the high cost and time requirements of laboratory experiments. In contrast, there is a substantial body of human-curated text descriptions about protein-ligand interactions and ligand formula. In this paper, we propose InstructPro, a family of protein generative models that follow natural language instructions to design ligand-binding proteins. Given a textual description of the desired function and a ligand formula in SMILES, InstructPro generates protein sequences that are functionally consistent with the specified instructions. We develop the model architecture, training strategy, and a large-scale dataset, InstructProBench, to support both training and evaluation. InstructProBench consists of 9,592,829 triples of (function description, ligand formula, protein sequence). We train two model variants: InstructPro-1B (with 1 billion parameters) and InstructPro-3B~(with 3 billion parameters). Both variants consistently outperform strong baselines, including ProGen2, ESM3, and Pinal. Notably, InstructPro-1B achieves the highest docking success rate (81.52% at moderate confidence) and the lowest average root mean square deviation (RMSD) compared to ground truth structures (4.026{\AA}). InstructPro-3B further descreases the average RMSD to 2.527{\AA}, demonstrating InstructPro's ability to generate ligand-binding proteins that align with the functional specifications.

Zhenqiao Song、Ramith Hettiarachchi、Chuan Li、Jianwen Xie、Lei Li

生物科学研究方法、生物科学研究技术生物化学分子生物学生物工程学

Zhenqiao Song,Ramith Hettiarachchi,Chuan Li,Jianwen Xie,Lei Li.Natural Language Guided Ligand-Binding Protein Design[EB/OL].(2025-06-10)[2025-07-21].https://arxiv.org/abs/2506.09332.点此复制

评论