RNAsamba: coding potential assessment using ORF and whole transcript sequence information
RNAsamba: coding potential assessment using ORF and whole transcript sequence information
Abstract MotivationThe advent of high-throughput sequencing technologies made it possible to obtain large volumes of genetic information, quickly and inexpensively. Thus, many efforts are devoted to unveil the biological roles of genomic elements, being one of the main tasks the identification of protein-coding and long non-coding RNAs. ResultsWe describe RNAsamba, a tool to predict the coding potential of RNA molecules from sequence information using a deep-learning model that processes both the whole sequence and the ORF to look for patterns that distinguish coding and non-coding RNAs. We evaluated the model in the classification of coding and non-coding transcripts of humans and five other model organisms and show that RNAsamba mostly outperforms other state-of-the-art methods. We also show that RNAsamba can identify coding signals in partial-length ORFs and UTR sequences, evidencing that its model is not dependent on the presence of complete coding regions. RNAsamba is a fast and easy tool that can provide valuable contributions to genome annotation pipelines. Availability and implementationThe source code of RNAsamba is freely available at:https://github.com/apcamargo/RNAsamba.
Camargo Antonio P.、Sourkov Vsevolod、Carazzolle Marcelo F.
Department of Genetics, Evolution, Microbiology and Immunology, Institute of Biology, University of Campinas||Graduate Program in Genetics and Molecular Biology, Institute of Biology, University of CampinasDepartment of Computer ScienceDepartment of Genetics, Evolution, Microbiology and Immunology, Institute of Biology, University of Campinas
生物科学研究方法、生物科学研究技术分子生物学遗传学
Camargo Antonio P.,Sourkov Vsevolod,Carazzolle Marcelo F..RNAsamba: coding potential assessment using ORF and whole transcript sequence information[EB/OL].(2025-03-28)[2025-04-26].https://www.biorxiv.org/content/10.1101/620880.点此复制
评论