基于生物信息学的功能蛋白基因序列分类研究
Bioinformatics-based functional classification of protein sequences
生物信息学最首要的任务是从大量的生物信息数据中提取有价值的知识,在生物信息学中用未知功能的序列与已知功能的序列进行比较来预测未知功能序列的功能,是发现知识的一种常用的手段。本文主要是采用BLAST算法对植物病毒编码的运动蛋白基因序列和衣壳蛋白基因序列进行序列比对,在比对的结果距离矩阵上,用主成分分析方法提取5维特征向量,最后用支持向量机建立两类蛋白基因序列的分类模型并验证模型效果,验证样本的正确识别率和正确拒识率都能达到80%或更高。
Bioinformatics sciences most important duty is to withdraw the valuable knowledge from the massive biology information data. Scientists carry on the comparison with the unknown function sequences and the known function sequences to predict the function of sequences of unknown function, is a common means to explore knowledge in the bioinformatics. We use BLAST algorithm to alignment sequences that the gene sequences of plant virus-encoded movement proteins and coat proteins in this paper, then use principal component analysis to extract five feature vectors based the results of the distance matrix which we get after sequences alignment, and finally use Support Vector Machines to build classification model for the two types of proteins gene sequencing categorize and validate the results of classification model, the correct recognition rate and rejection rate of the classification model can reach 80% or higher, this methods are more suitable for the classification of the two tapes proteins.
郭婷婷、安冬、陈婷婷、李林
生物科学研究方法、生物科学研究技术计算技术、计算机技术分子生物学
生物信息学序列比对主成分分析支持向量机
BioinformaticsAlignmentPCASVM
郭婷婷,安冬,陈婷婷,李林.基于生物信息学的功能蛋白基因序列分类研究[EB/OL].(2011-04-27)[2025-08-02].http://www.paper.edu.cn/releasepaper/content/201104-689.点此复制
评论