Cross-Modality and Self-Supervised Protein Embedding for Compound–Protein Affinity and Contact Prediction
Cross-Modality and Self-Supervised Protein Embedding for Compound–Protein Affinity and Contact Prediction
Abstract MotivationComputational methods for compound–protein affinity and contact (CPAC) prediction aim at facilitating rational drug discovery by simultaneous prediction of the strength and the pattern of compound–protein interactions. Although the desired outputs are highly structure-dependent, the lack of protein structures often makes structure-free methods rely on protein sequence inputs alone. The scarcity of compound–protein pairs with affinity and contact labels further limits the accuracy and the generalizability of CPAC models. ResultsTo overcome the aforementioned challenges of structure naivety and labelled-data scarcity, we introduce cross-modality and self-supervised learning, respectively, for structure-aware and task-relevant protein embedding. Specifically, protein data are available in both modalities of 1D amino-acid sequences and predicted 2D contact maps, that are separately embedded with recurrent and graph neural networks, respectively, as well as jointly embedded with two cross-modality schemes. Furthermore, both protein modalities are pretrained under various self-supervised learning strategies, by leveraging massive amount of unlabelled protein data. Our results indicate that individual protein modalities differ in their strengths of predicting affinities or contacts. Proper cross-modality protein embedding combined with self-supervised learning improves model generalizability when predicting both affinities and contacts for unseen proteins. AvailabilityData and source codes are available at https://github.com/Shen-Lab/CPAC. Contactyshen@tamu.edu Supplementary informationSupplementary data are included.
Shen Yang、You Yuning
Department of Electrical and Computer Engineering, Texas A&M University||Department of Computer Science and Engineering, Texas A&M UniversityDepartment of Electrical and Computer Engineering, Texas A&M University
生物科学研究方法、生物科学研究技术计算技术、计算机技术生物科学理论、生物科学方法
Shen Yang,You Yuning.Cross-Modality and Self-Supervised Protein Embedding for Compound–Protein Affinity and Contact Prediction[EB/OL].(2025-03-28)[2025-08-02].https://www.biorxiv.org/content/10.1101/2022.07.18.500559.点此复制
评论