Tiberius: End-to-End Deep Learning with an HMM for Gene Prediction
Tiberius: End-to-End Deep Learning with an HMM for Gene Prediction
For more than 25 years, learning-based eukaryotic gene predictors were driven by hidden Markov models (HMMs), which were directly inputted a DNA sequence. Recently, Holst et al. demonstrated with their program Helixer that the accuracy of ab initio eukaryotic gene prediction can be improved by combining deep learning layers with a separate HMM postprocessor. We present Tiberius, a novel deep learning-based ab initio gene predictor that end-to-end integrates convolutional and long short-term memory layers with a differentiable HMM layer. Tiberius uses a custom gene prediction loss and was trained for prediction in mammalian genomes and evaluated on human and two other genomes. It significantly outperforms existing ab initio methods, achieving F1-scores of 62% at gene level for the human genome, compared to 21% for the next best ab initio method. In de novo mode, Tiberius predicts the exon-intron structure of two out of three human genes without error. Remarkably, even Tiberius's ab initio accuracy matches that of BRAKER3, which uses RNA-seq data and a protein database. Tiberius's highly parallelized model is the fastest state-of-the-art gene prediction method, processing the human genome in under 2 hours.
Gabriel Lars、Stanke Mario、Becker Felix、Hoff Katharina J
遗传学生物科学研究方法、生物科学研究技术计算技术、计算机技术
Gabriel Lars,Stanke Mario,Becker Felix,Hoff Katharina J.Tiberius: End-to-End Deep Learning with an HMM for Gene Prediction[EB/OL].(2025-03-28)[2025-08-07].https://www.biorxiv.org/content/10.1101/2024.07.21.604459.点此复制
评论