|国家预印本平台
首页|基于关联规则和遗传算法的Web文档分类

基于关联规则和遗传算法的Web文档分类

lassify Web Document by Genetic Algorithm with Association Rules

中文摘要英文摘要

Web文档分类,例如BBS、HTML、e-mail的分类是Web引用中的重要任务。为了解决这一问题,该翁做 了下列工作:(1)提出了一个用于中文文本自动分类的、称为基于关联规则和遗传算法的Web文档分类的 计算方法(Classification by Genetic Algorithm with Association Rules Method 缩写为CGAA)。(2 )不同于前人的路线,适应度函数的在关联规则制导下工作,而关联规则通过此文提出的Apriori_CGAA算 法挖掘;(3)实现了并测试了一系列基础遗传过程,例如CGAA_Roulette_Selection过程,CGAA_Xover过程 和 CGAA_binaryMutatio过程;(4)给出了丰富的实验结果,表明新的CGAA算法性能远优于传统的算法, 其中向量Best-Vector经过50代CGAA算法的进化后。获得了高达3513.6的评分。

lassifying Web Document such as BBS, HTML and e-mail, etc., is an important task for web application. To solve this problem, this paper presents following results: (1) Proposes a new text classification method called Classification by Genetic Algorithm with Association Rules Method (CGAA method). (2) Other than previous work, the fitness function are applied under the guidance of the association rules mined by Apriori_CGAA algorithm. (3) Realizing a family of genetic procedures such as CGAA _Roulette_Selection, CGAA_Xover and CGAA _binaryMutation and giving extensive experiments with real data. (4)The experiment show that the CGAA algorithm is superior to other common methods. A Best-Vector with a score 3513.6 can be achieved after running CGAA algorithm after 50 generations.

唐常杰、胡蓉、张天庆、陈安龙、元昌安

计算技术、计算机技术

中文文档分类,遗传算法,自然语言处理,CGAA方法

hinese document classification Genetic Algorithm Association rules Natural

唐常杰,胡蓉,张天庆,陈安龙,元昌安.基于关联规则和遗传算法的Web文档分类[EB/OL].(2004-03-23)[2025-08-18].http://www.paper.edu.cn/releasepaper/content/200403-162.点此复制

评论