多类数据中对照模式的挖掘
Mining Contrast Patterns from Multi-class Data
在多类数据中发现对照模式具有很大的挑战性。以前的研究都是针对两类数据,如在医疗数据中,通过发病人群和正常人群对照得到疾病的发病模式,在多显型疾病(多种疾病或多亚型疾病)中同时发对照模式没有得到进一步研究。而关联规则挖掘的瓶颈在于产生大量的规则,其中有很多是冗余规则,已存在的挖掘非冗余规则算法虽然去掉了冗余规则,还是有很多规则对给定应用领域是不感兴趣的或兴趣度太低。针对这样的问题,本文基于统计方法定义了致病模式(Vital Pattern)和保护模式(Protect Pattern),提出了一个新的算法MVP实现了在多显型疾病中发现致病模式和保护模式,并画出了清晰直观的因果关系图,挖掘出的模式已经被专家认可。这些规则的产生为医疗研究发展提供了精确而又非常有用的信息,被广泛应用在医疗研究中。最后基于这些规则得出分类器,实验结果验证了该算法的高效性和实用性。
Identifying contrast pattern in multiple phenotypes are great challenge. Previous work only focuses on two groups. For example, we can compare the abnormal patients with normal patients to find patterns associated with disease. However, the simultaneous distinguish across multiple disease types has not been well studied yet, which is important for medical researcher. Association rule mining often generate a huge number of rules, but a bottleneck of them either are redundant or no sense in a specific domain. In this paper, we define VP (an acronym for "Vital Pattern") and PP (an acronym for "Protect Pattern") by a statistical metric, and propose a new algorithm called MVP to make use of the property discovery VP and PP from multiple phenotypes medical data, and draw a clearly causal graph. The algorithm can generate some useful rules for medical researchers. The results demonstrate the feasibility of performing the clinically useful classification from patients of multiple pneumonia types.
印莹
医学研究方法基础医学
规则兴趣度致病模式保护模式
rulesinterestingness measurevital patternprotect pattern.
印莹.多类数据中对照模式的挖掘[EB/OL].(2014-11-21)[2025-08-18].http://www.paper.edu.cn/releasepaper/content/201411-388.点此复制
评论