|国家预印本平台
首页|基于模糊蚁群的加权蛋白质复合物识别算法

基于模糊蚁群的加权蛋白质复合物识别算法

中文摘要英文摘要

针对蚁群融合模糊C-means (FCM)聚类算法在蛋白质相互作用网络中进行复合物识别的准确率不高、召回率较低以及时间性能不佳等问题进行了研究,提出一种基于模糊蚁群的加权蛋白质复合物识别算法FAC-PC (algorithm for identifying weighted protein complexes based on fuzzy ant colony clustering)。首先,融合边聚集系数与基因共表达的皮尔逊相关系数构建加权网络;其次提出EPS (essential protein selection)度量公式来选取关键蛋白质,遍历关键蛋白质的邻居节点,设计蛋白质适应度PFC (protein fitness calculation)来获取关键组蛋白质,利用关键组蛋白质替换种子节点进行蚁群聚类,克服蚁群算法中因大量拾起放下和重复合并过滤操作而导致准确率和收敛速度过慢的缺陷;接着设计相似度SI (similarity improvement)度量优化拾起放下概率来对节点进行蚁群聚类进而获得聚类数目;最后将关键蛋白质和通过蚁群聚类得到的聚类数目初始化FCM算法,设计隶属度更新策略来优化隶属度的更新,同时提出兼顾类内距和类间距的FCM迭代目标函数,最终利用改进的FCM完成复合物的识别。将FAC-PC算法应用在DIP数据上进行复合物的识别,实验结果表明FAC-PC算法的准确率和召回率较高,能够较准确地识别蛋白质复合物。

iming at the problem that the accuracy and recall of the protein complexes identification algorithm based on ant colony and fuzzy C-means (FCM) clustering are not high and the running efficiency is low, this paper proposed a novel protein complex recognition algorithm named FAC-PC (algorithm for identifying weighted protein complexes based on fuzzy ant colony clustering) . Firstly, combing with the Pearson correlation coefficient and edge aggregation coefficient, it constructed the weighted protein network. Secondly, in order to overcome the defects of massive merger, filter, repeated pick-up and drop-down operations in ant colony clustering algorithm, it designed the EPS (essential protein selection) metric to select essential protein, and designed the PFC (protein fitness calculation) metric to traverse neighbors of essential proteins to obtain essential group proteins, then the essential group protein replaced the seed node in the process of ant colony clustering, which improved results that the accuracy and time performance. Furthermore, it proposed the SI (similarity improvement) metric to optimize the probability of picking and dropping operations of ant colony to obtain the number of clustering. Finally, according to the improved ant colony algorithm, it obtained the essential protein and the number of clustering to initialize the FCM algorithm, and designed the membership update strategy to optimize the membership update, at the same time, a new FCM objective function which took a balance between intra-clustering and proposed inter-clustering variation, finally identified the protein complex by improved FCM algorithm. It used FAC-PC algorithm to identify protein complexes on DIP data. The experimental results show that FAC-PC algorithm has better performance on accuracy and recall, which is more reasonable to identify protein complexes.

毛伊敏、刘银萍、胡健

10.12074/201904.00051V1

生物科学研究方法、生物科学研究技术分子生物学生物工程学

蛋白质相互作用网络蚁群聚类算法模糊C-means适应度蛋白质复合物

毛伊敏,刘银萍,胡健.基于模糊蚁群的加权蛋白质复合物识别算法[EB/OL].(2019-04-01)[2025-07-19].https://chinaxiv.org/abs/201904.00051.点此复制

评论