Ctrl-DNA: Controllable Cell-Type-Specific Regulatory DNA Design via Constrained RL
Ctrl-DNA: Controllable Cell-Type-Specific Regulatory DNA Design via Constrained RL
Designing regulatory DNA sequences that achieve precise cell-type-specific gene expression is crucial for advancements in synthetic biology, gene therapy and precision medicine. Although transformer-based language models (LMs) can effectively capture patterns in regulatory DNA, their generative approaches often struggle to produce novel sequences with reliable cell-specific activity. Here, we introduce Ctrl-DNA, a novel constrained reinforcement learning (RL) framework tailored for designing regulatory DNA sequences with controllable cell-type specificity. By formulating regulatory sequence design as a biologically informed constrained optimization problem, we apply RL to autoregressive genomic LMs, enabling the models to iteratively refine sequences that maximize regulatory activity in targeted cell types while constraining off-target effects. Our evaluation on human promoters and enhancers demonstrates that Ctrl-DNA consistently outperforms existing generative and RL-based approaches, generating high-fitness regulatory sequences and achieving state-of-the-art cell-type specificity. Moreover, Ctrl-DNA-generated sequences capture key cell-type-specific transcription factor binding sites (TFBS), short DNA motifs recognized by regulatory proteins that control gene expression, demonstrating the biological plausibility of the generated sequences.
Xingyu Chen、Shihao Ma、Runsheng Lin、Jiecong Lin、Bo Wang
生物科学研究方法、生物科学研究技术分子生物学生物工程学
Xingyu Chen,Shihao Ma,Runsheng Lin,Jiecong Lin,Bo Wang.Ctrl-DNA: Controllable Cell-Type-Specific Regulatory DNA Design via Constrained RL[EB/OL].(2025-05-26)[2025-06-19].https://arxiv.org/abs/2505.20578.点此复制
评论