|国家预印本平台
首页|Deep learning models for RNA secondary structure prediction (probably) do not generalise across families

Deep learning models for RNA secondary structure prediction (probably) do not generalise across families

Deep learning models for RNA secondary structure prediction (probably) do not generalise across families

来源:bioRxiv_logobioRxiv
英文摘要

Abstract MotivationThe secondary structure of RNA is of importance to its function. Over the last few years, several papers attempted to use machine learning to improve de novo RNA secondary structure prediction. Many of these papers report impressive results for intra-family predictions, but seldom address the much more difficult (and practical) inter-family problem. ResultsWe demonstrate it is nearly trivial with convolutional neural networks to generate pseudo-free energy changes, modeled after structure mapping data, that improve the accuracy of structure prediction for intra-family cases. We propose a more rigorous method for inter-family cross-validation that can be used to assess the performance of learning-based models. Using this method, we further demonstrate that intra-family performance is insufficient proof of generalisation despite the widespread assumption in the literature, and provide strong evidence that many existing learning-based models have not generalised inter-family. AvailabilitySource code and data is available at https://github.com/marcellszi/dl-rna.

Ward Max、Datta Amitava、Szikszai Marcell、Wise Michael、Mathews David H.

Department of Computer Science & Software Engineering, The University of Western Australia||Department of Molecular and Cellular Biology, Harvard UniversityDepartment of Computer Science & Software Engineering, The University of Western AustraliaDepartment of Computer Science & Software Engineering, The University of Western AustraliaDepartment of Computer Science & Software Engineering, The University of Western Australia||The Marshall Centre for Infectious Diseases Research and Training, The University of Western AustraliaDepartment of Biochemistry & Biophysics, Center for RNA Biology, and Department of Biostatistics & Computational Biology, University of Rochester

10.1101/2022.03.21.485135

分子生物学计算技术、计算机技术生物科学现状、生物科学发展

Ward Max,Datta Amitava,Szikszai Marcell,Wise Michael,Mathews David H..Deep learning models for RNA secondary structure prediction (probably) do not generalise across families[EB/OL].(2025-03-28)[2025-08-14].https://www.biorxiv.org/content/10.1101/2022.03.21.485135.点此复制

评论