Foldclass and Merizo-search: embedding-based deep learning tools for protein domain segmentation, fold recognition and comparison
Foldclass and Merizo-search: embedding-based deep learning tools for protein domain segmentation, fold recognition and comparison
The availability of very large numbers of protein structures from accurate computational methods poses new challenges in storing, searching and detecting relationships between these structures. In particular, the new-found abundance of multi-domain structures in the AlphaFold structure database introduces challenges for traditional structure comparison methods. We address these challenges using a fast, embedding-based structure comparison method called Foldclass which detects structural similarity between protein domains. We demonstrate the accuracy of Foldclass embeddings for homology detection. In combination with a recently developed deep learning-based automatic domain segmentation tool Merizo, we develop Merizo-search, which first segments multi-domain query structures into domains, and then searches a Foldclass embedding database to determine the top matches for each constituent domain. Combining the ability of Merizo to accurately segment complete chains into domains, and Foldclass to embed and detect similar domains, Merizo-search can be used to detect per-domain similarities for complete chains. We anticipate that these tools will enable a number of analyses using the wealth of predicted structural data now available. Foldclass and Merizo-search are available at https://github.com/psipred/merizo_search.
Kandathil Shaun M、Lau Andy M C、Jones David T
生物科学研究方法、生物科学研究技术计算技术、计算机技术生物工程学
Kandathil Shaun M,Lau Andy M C,Jones David T.Foldclass and Merizo-search: embedding-based deep learning tools for protein domain segmentation, fold recognition and comparison[EB/OL].(2025-03-28)[2025-06-04].https://www.biorxiv.org/content/10.1101/2024.03.25.586696.点此复制
评论