|国家预印本平台
首页|蛋白质序列中的关联规则发现及其应用

蛋白质序列中的关联规则发现及其应用

iscovery and Applications of the Association Rules in Protein Sequences

中文摘要英文摘要

蛋白质序列分析中使用的机器学习算法越来越复杂,导致其结果的解释和发现过程也随之复杂化。因此,有必要寻找简单且理论上可靠的算法。关联规则发现算法原理简单,理论可靠,发现的规则具有很强的实际意义。利用关联规则发现算法,找到了蛋白质序列中数以万计的的模式。用实际的例子演示了如何将这些模式应用于蛋白质序列的相关分析中,如保守区域发现、二级结构预测等。根据这些结果构建了一个二级结构模式库和一种简单的二级结构预测算法。实验结果表明,约81%的二级结构可以由至少一条关联规则预测得到。

In recent years, many complicated motif-discovering algorithms have been developed to find the motifs in protein sequences, but their results and running processes are hard to be explained. Association rules discovery is a simple method, but it has a theoretical base of probability. Applying the method, thousands of motifs have been found, which can be used in protein sequence analysis such as preserved site discovery and secondary structure prediction. A secondary structure rule library is constructed using the association rules, and a simple secondary structure prediction is built. It shows that nearly 81 percent of secondary structure can be implied by at least one rule.

郑浩然、王煦法、陈双平、刘海燕

生物科学研究方法、生物科学研究技术生物化学分子生物学

蛋白质序列,关联规则,二级结构预测,模式发现。

Protein sequence Association rule Secondary structure prediction Motif discovery.

郑浩然,王煦法,陈双平,刘海燕.蛋白质序列中的关联规则发现及其应用[EB/OL].(2006-01-09)[2025-08-23].http://www.paper.edu.cn/releasepaper/content/200601-101.点此复制

评论