Deciphering 3'UTR mediated gene regulation using interpretable deep representation learning
Deciphering 3'UTR mediated gene regulation using interpretable deep representation learning
The 3'prime untranslated regions (3'UTRs) of messenger RNAs contain many important cis-regulatory elements that are under functional and evolutionary constraints. We hypothesize that these constraints are similar to grammars and syntaxes in human languages and can be modeled by advanced natural language models such as Transformers, which has been very effective in modeling protein sequence and structures. Here we describe 3UTRBERT, which implements an attention-based language model, i.e., Bidirectional Encoder Representations from Transformers (BERT). 3UTRBERT was pre-trained on aggregated 3'UTR sequences of human mRNAs in a task-agnostic manner; the pre-trained model was then fine-tuned for specific downstream tasks such as predicting RBP binding sites, m6A RNA modification sites, and predicting RNA sub-cellular localizations. Benchmark results showed that 3UTRBERT generally outperformed other contemporary methods in each of these tasks. We also showed that the self-attention mechanism within 3UTRBERT allows direct visualization of the semantic relationship between sequence elements.
Li Gen、Pang Kuan、Li Xiangtao、Zhang Zhaolei、Yang Yuning、Cao Wuxinhao
分子生物学生物科学研究方法、生物科学研究技术计算技术、计算机技术
Li Gen,Pang Kuan,Li Xiangtao,Zhang Zhaolei,Yang Yuning,Cao Wuxinhao.Deciphering 3'UTR mediated gene regulation using interpretable deep representation learning[EB/OL].(2025-03-28)[2025-05-13].https://www.biorxiv.org/content/10.1101/2023.09.08.556883.点此复制
评论