|国家预印本平台
首页|基于Pacbio第三代测序技术的厚朴基因组测序分析

基于Pacbio第三代测序技术的厚朴基因组测序分析

中文摘要英文摘要

厚朴为著名的传统药用植物,归于木兰科、木兰属,于我国广泛种植,其树皮、根皮、枝皮、叶片、花、果实均能入药或食用。为获取厚朴全基因组序列信息,以厚朴叶片DNA为材料,该文采用Pacbio Sequel第三代测序技术构建厚朴全基因组数据库,并利用生物信息学方法对获得的核苷酸序列进行组装、功能注释以及进化分析研究。结果表明:原始测序数据过滤后获得140.91 Gb三代数据,Read N50约为13 784 bp,经过组装得到厚朴基因组大小为1.68 Gb,Contig N50约为222 069 bp,单拷贝基因完整性为78.05 %。组装后的序列通过与NR、KOG、KEGG等功能数据库比对,共有98.40 %的基因得到了功能注释,其中KOG功能注释结果发现厚朴的蛋白功能主要集中在一般功能预测、翻译后修饰、蛋白质转换、伴侣以及信号转导机制;GO功能分类表明厚朴的基因集中在细胞组分及生物学过程;KEGG分析发现厚朴参与代谢通路的基因占主要地位。通过与葡萄、拟南芥、水稻、杨树、银杏、无油樟、茶树及牛樟基因组的比对分析,发现厚朴23 424个基因中有20 801个基因可以分类到12 129个家族,其中有515个基因家族是厚朴所特有的,而厚朴与牛樟(樟科)亲缘关系较近,两者的分化时间约在122.5百万年前(mya)。该研究首次利用第三代测序技术对厚朴全基因组解析,有利于对其进一步进行深入的开发与利用,也为研究其它药用植物全基因组奠定了基础。

Magnolia officinalis is a famous traditional medicinal plant, belonging to the Magnoliaceae family and Magnolia genus and being widely cultivated in China. Its bark, root bark, branch bark, leaves, flowers and fruits could be used as medicine or food. However, the whose genome information is little known for this plant species. In order to obtain the whole genome sequence information of M. officinalis, the leaf DNA was used as the material, and the third-generation sequencing technology of Pacbio Sequel was used to establish its nucleotide sequence database. Then genome assembly, function annotation and evolution analysis were carried out by bioinformatic methods. The experimental results showed that 140.91 Gb third-generation data were obtained after the original sequencing data, with the Read N50 about 13 784 bp. The assembled M. officinalis genome size was 1.68 Gb, Contig N50 being about 222 069 bp, and the integrity of single copy gene being 78.05 %. 98.40% of the genes from the assembled sequence got gene annotation after being compared with functional databases such as NR, KOG and KEGG. The result of KOG gene annotation was that the protein function of M. officinalis concentrated in the general functional prediction only, posttranslational modification, protein turnover, chaperones signal transduction mechanisms. GO functional classification indicated that the genes of M. officinalis concentrated on cell components and biological processes. KEGG analysis found that the M. Officinalis genes mostly involved in metabolic pathways. By comparative genomics analysis, the genomes of Vitis vinifera, Arabidopsis thaliana, Oryza sativa, Poplar trichocarpa, Ginkgo biloba, Amborella trichopoda, Camellia sinensis and Cinnamomum kanehirae were aligned. It was found that 20 801 of 23 424 genes in M. officinalis could be classified into 12 129 families, 515 gene families being unique to M. officinalis. The genetic evolution tree constructed from the genomes of the selected reference species pointed that the M. officinalis (Magnoliaceae) was closely related to Cinnamomum kanehirae (Lauraceae), and the divergence time between the two species was about 122.5 mya. It is the first time to use the third-generation sequencing technology to analyze the whole genome of M. officinalis in the study. The study is conducive to its further development and utilization, and also provides the information for the study of the whole genome of other medicinal plants.

 林新娜、 丁乔娇、 张敏、尹彦棚、 彭成、 高继海、 罗加伟

10.12074/202005.00073V1

生物科学研究方法、生物科学研究技术植物学遗传学

厚朴基因组第三代测序技术基因注释

 林新娜, 丁乔娇, 张敏,尹彦棚, 彭成, 高继海, 罗加伟.基于Pacbio第三代测序技术的厚朴基因组测序分析[EB/OL].(2020-05-28)[2025-07-25].https://chinaxiv.org/abs/202005.00073.点此复制

评论