|国家预印本平台
首页|Learning to Fold RNAs in Linear Time

Learning to Fold RNAs in Linear Time

Learning to Fold RNAs in Linear Time

来源:bioRxiv_logobioRxiv
英文摘要

Abstract RNA secondary structure is helpful for understanding RNA’s functionality, thus accurate prediction systems are desired. Both thermodynamics-based models and machine learning-based models have been used in different prediction systems to solve this problem. Compared to thermodynamics-based models, machine learning-based models can address the inaccurate measurement of thermodynamic parameters due to experimental limitation. However, the existing methods for training machine learning-based models are still expensive because of their cubic-time inference cost. To overcome this, we present a linear-time machine learning-based folding system, using recently proposed approximate folding tool LinearFold as inference engine, and structured SVM (sSVM) as training algorithm. Furthermore, to remedy non-convergence of naive sSVM with inexact search inference, we introduce a max violation update strategy. The training speed of our system is 41× faster than CONTRAfold on a diverse dataset for one epoch, and 14× faster than MXfold on a dataset with longer sequences. With the learned parameters, our system improves the accuracy of LinearFold, and is also the most accurate system among selected folding tools, including CONTRAfold, Vienna RNAfold and MXfold.

Rezaur Rahman Chowdhury F A、Zhang He、Huang Liang

Baidu Research USA||Washington State UniversityBaidu Research USABaidu Research USA||School of Electrical Engineering & Computer Science, Oregon State University

10.1101/852871

生物科学研究方法、生物科学研究技术计算技术、计算机技术分子生物学

RNA foldingmachine learningstructured SVMmax violationlinear time

Rezaur Rahman Chowdhury F A,Zhang He,Huang Liang.Learning to Fold RNAs in Linear Time[EB/OL].(2025-03-28)[2025-05-17].https://www.biorxiv.org/content/10.1101/852871.点此复制

评论