|国家预印本平台
首页|Semantical and Geometrical Protein Encoding Toward Enhanced Bioactivity and Thermostability

Semantical and Geometrical Protein Encoding Toward Enhanced Bioactivity and Thermostability

Semantical and Geometrical Protein Encoding Toward Enhanced Bioactivity and Thermostability

来源:bioRxiv_logobioRxiv
英文摘要

Protein engineering is a pivotal aspect of synthetic biology, involving the modification of amino acids within existing protein sequences to achieve novel or enhanced functionalities and physical properties. Accurate prediction of protein variant effects requires a thorough understanding of protein sequence, structure, and function. Deep learning methods have demonstrated remarkable performance in guiding protein modification for improved functionality. However, existing approaches predominantly rely on protein sequences, which face challenges in efficiently encoding the geometric aspects of amino acids' local environment and often fall short in capturing crucial details related to protein folding stability, internal molecular interactions, and bio-functions. Furthermore, there lacks a fundamental evaluation for developed methods in predicting protein thermostability, although it is a key physical property that is frequently investigated in practice. To address these challenges, this paper introduces a novel pre-training framework that integrates sequential and geometric encoders for protein primary and tertiary structures. This framework guides mutation directions toward desired traits by simulating natural selection on wild-type proteins and evaluates variant effects based on their fitness to perform specific functions. We assess the proposed approach using three benchmarks comprising over 300 deep mutational scanning assays. The prediction results showcase exceptional performance across extensive experiments when compared to other zero-shot learning methods, all while maintaining a minimal cost in terms of trainable parameters. This study not only proposes an effective framework for more accurate and comprehensive predictions to facilitate efficient protein engineering, but also enhances the in silico assessment system for future deep learning models to better align with empirical requirements. The PyTorch implementation are available at https://github.com/tyang816/ProtSSN.

Fan Guisheng、Hong Liang、Zheng Lirong、Tan Yang、Zhou Bingxin

10.1101/2023.12.01.569522

生物工程学生物科学研究方法、生物科学研究技术生物物理学

Fan Guisheng,Hong Liang,Zheng Lirong,Tan Yang,Zhou Bingxin.Semantical and Geometrical Protein Encoding Toward Enhanced Bioactivity and Thermostability[EB/OL].(2025-03-28)[2025-05-01].https://www.biorxiv.org/content/10.1101/2023.12.01.569522.点此复制

评论