Highly accurate discovery of terpene synthases powered by machine learning reveals functional terpene cyclization in Archaea
Highly accurate discovery of terpene synthases powered by machine learning reveals functional terpene cyclization in Archaea
Abstract Terpene synthases (TPSs) generate the scaffolds of the largest class of natural products, including several first-line medicines. The amount of available protein sequences is increasing exponentially, and accurate computational characterization of their function remains an unsolved challenge. We assembled a curated dataset of one thousand characterized TPS reactions and developed a method to devise highly accurate machine-learning models for functional annotation in a low-data regime. Our models significantly outperform existing methods for TPS detection and substrate prediction. By applying the models to large protein sequence databases, we discovered seven TPS enzymes previously undetected by state-of-the-art protein signatures and experimentally confirmed their activity, including the first reported TPSs in the major domain of life Archaea. Furthermore, we discovered a new TPS structural domain and distinct subtypes of previously known domains. This work demonstrates the potential of machine learning to speed up the discovery and characterization of novel TPSs.
Samusevich Raman、Smr?kov¨¢ Helena、Pluskal Tom¨¢?、Sivic Josef、Tajovsk¨¢ Ad¨|la、Chatpatanasiri Ratthachat、Kulh¨¢nek Jon¨¢?、Engst Martin、Hebra T¨|o、?alounov¨¢ Tereza、Bushuiev Roman、Perkovi? Milana、Bushuiev Anton
Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Prague||Czech Institute of Informatics, Robotics and Cybernetics (CIIRC), Czech Technical University in PragueInstitute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, PragueInstitute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, PragueCzech Institute of Informatics, Robotics and Cybernetics (CIIRC), Czech Technical University in PragueInstitute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, PragueCzech Institute of Informatics, Robotics and Cybernetics (CIIRC), Czech Technical University in PragueCzech Institute of Informatics, Robotics and Cybernetics (CIIRC), Czech Technical University in PragueInstitute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, PragueInstitute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, PragueInstitute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, PragueInstitute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Prague||Czech Institute of Informatics, Robotics and Cybernetics (CIIRC), Czech Technical University in PragueInstitute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, PragueCzech Institute of Informatics, Robotics and Cybernetics (CIIRC), Czech Technical University in Prague
生物科学研究方法、生物科学研究技术生物化学分子生物学
Samusevich Raman,Smr?kov¨¢ Helena,Pluskal Tom¨¢?,Sivic Josef,Tajovsk¨¢ Ad¨|la,Chatpatanasiri Ratthachat,Kulh¨¢nek Jon¨¢?,Engst Martin,Hebra T¨|o,?alounov¨¢ Tereza,Bushuiev Roman,Perkovi? Milana,Bushuiev Anton.Highly accurate discovery of terpene synthases powered by machine learning reveals functional terpene cyclization in Archaea[EB/OL].(2025-03-28)[2025-04-29].https://www.biorxiv.org/content/10.1101/2024.01.29.577750.点此复制
评论