深度学习在植物基因组学与作物育种中的应用现状与展望
目的/ 意义]随着单细胞测序、高通量技术的突破,植物基因组学也取得了巨大进步,可以低成本获取多维全基因组分子表型的海量数据。深度学习技术可以作为强大的数据挖掘工具对获取的分子表型进行进一步预测和解释。当前研究表明,深度学习在植物基因组学与作物育种研究任务中取得显著效果。但目前尚缺乏对于深度学习在植物基因组学中应用的完整综述。[方法/ 过程]本文首先概述了深度学习方法背景,包括最新的图神经网络;随后着重从基因特性、蛋白质特性方面综述了基因组学和深度学习交叉领域的两个突出问题:1冤如何对从植物基因组DNA 序列到分子表型的信息流进行建模?2冤如何使用深度学习模型识别自然种群中的功能变异?[结果/ 结论]本文总结了当前研究中如何应用传统深度学习算法、图深度学习、生成对抗网络以及可解释性AI 等方法解决上述两个问题。最后分析了深度学习在未来植物基因组学研究和作物遗传改良中的发展前景。
Purpose/Significance] Advances in single-cell sequencing and high-throughput technology have made it possible for plantgenomics to accumulate large quantities of data describing multidimensional genomic-wide molecular phenotypes at low cost. Aspowerful data mining tools, deep learning techniques can be utilized to further predict and interpret the acquired molecular phenotypes.In recent studies, deep learning has been shown to yield significant results in plant genomics and crop breeding research. However, acomplete review of deep learning applications in plant genomics is lacking. [Method/Process] The input to deep learning applied togenomics is usually biological sequences and molecular phenotypes as predictor and target variables, respectively. We introduced theworkflow from four views: input data pre-processing includes retrieval, coding, and splitting; model construction and training includesthe selection of model architecture and hyperparameters; model evaluation and interpretability. Specifically, this paper introduces thebackground of deep learning approaches, including the latest graph neural networks; then it discusses two prominent issues in theintersection of genomics and deep learning with respect to gene characterization and protein characterization: 1) how to model the flowof information from plant genomic DNA sequences to molecular phenotypes; and 2) how deep learning models can be utilized toidentify functional variation in natural populations? Specifically, the paper summarizes the current status of deep learning applications inrelated fields, which include deep learning and DNA and gene characterization research, interpretability of deep learning in genomicsapplications, graph neural networks in genomics, deep learning and genomic variation research, deep learning in protein prediction,ALPHAFOLD in protein prediction, deep learning and crop breeding research, and unsupervised learning in genomics and proteincharacterization. [Results/Conclusions] This article summarizes how traditional deep-learning algorithms, graph deep-learning,generative adversarial networks and interpretable AI are applied in current research in order to address these two problems. Finally, theprospects for deep learning in future plant genomics research and crop improvement are discussed. Overall, deep learning has providedbetter results than conventional methods in many genomics research directions, and the application of deep learning in genomics hasyielded early applications of scientific and economic significance. Deep learning offers two distinct advantages: 1) end-to-end learning,with the ability to integrate multiple pre-processing steps into a single model; and 2) multimodal data processing capabilities that canhandle extremely heterogeneous data in genomics. The advancement of deep learning has the potential to expand new researchperspectives in genomics and crop breeding, and to facilitate larger-scale association studies in both phenotypic and genotypic genomicsas algorithms become more accurate.
侯祥英、崔运鹏、刘娟
农业科学研究生物科学研究方法、生物科学研究技术分子生物学
植物基因组学作物育种深度学习图深度学习综述
侯祥英,崔运鹏,刘娟.深度学习在植物基因组学与作物育种中的应用现状与展望[EB/OL].(2023-03-31)[2025-08-16].https://chinaxiv.org/abs/202303.10405.点此复制
评论