CFP-Gen: Combinatorial Functional Protein Generation via Diffusion Language Models
CFP-Gen: Combinatorial Functional Protein Generation via Diffusion Language Models
Existing PLMs generate protein sequences based on a single-condition constraint from a specific modality, struggling to simultaneously satisfy multiple constraints across different modalities. In this work, we introduce CFP-Gen, a novel diffusion language model for Combinatorial Functional Protein GENeration. CFP-Gen facilitates the de novo protein design by integrating multimodal conditions with functional, sequence, and structural constraints. Specifically, an Annotation-Guided Feature Modulation (AGFM) module is introduced to dynamically adjust the protein feature distribution based on composable functional annotations, e.g., GO terms, IPR domains and EC numbers. Meanwhile, the Residue-Controlled Functional Encoding (RCFE) module captures residue-wise interaction to ensure more precise control. Additionally, off-the-shelf 3D structure encoders can be seamlessly integrated to impose geometric constraints. We demonstrate that CFP-Gen enables high-throughput generation of novel proteins with functionality comparable to natural proteins, while achieving a high success rate in designing multifunctional proteins. Code and data available at https://github.com/yinjunbo/cfpgen.
Junbo Yin、Chao Zha、Wenjia He、Chencheng Xu、Xin Gao
生物科学研究方法、生物科学研究技术生物工程学
Junbo Yin,Chao Zha,Wenjia He,Chencheng Xu,Xin Gao.CFP-Gen: Combinatorial Functional Protein Generation via Diffusion Language Models[EB/OL].(2025-05-28)[2025-06-07].https://arxiv.org/abs/2505.22869.点此复制
评论