基于聚类的旅游领域实体空间关系抽取研究
ourism Spatial Relation Extraction Based on Clustering
针对旅游领域实体空间关系预先定义困难的问题,本文研究了一种基于聚类的实体空间关系获取方法。首先使用BootStrapping方法从领域文本中迭代获取描述实体间空间关系的特征词集合,借助语义词典《知网》和《同义词词林扩展版》计算特征词间的相似度,利用层次聚类等方法对特征词聚类,所得的每一类即为自动发现的一个实体关系类别,从而实现旅游领域实体关系类型的自动获取。实验结果表明,本方法取得了较好的实验性能,其中基于知网的空间关系获取得到了最好的F值,为0.6911,与人工分类的效果相差较小,验证了方法的有效性和可行性。
his paper studied a relation extraction approach based on clustering to solve the problem of tourism spatial relation predefining. Firstly we extracted the spatial feature words from domain corpus basing on bootstrapping iteration. Secondly we calculated the similarity between words through semantic thesauruses. Finally, the feature words are clustered and every cluster represents a kind of entity relation discovered automatically. The experimental results showed that this approach has a good performance. The relation extraction based on HowNet ac-quires the best F value (0.618), which has almost the same effect as manual work, proving the efficiency and feasibility of the method.
靳知瑶、杜军平
科学、科学研究计算技术、计算机技术
计算机应用关系抽取知网同义词词林聚类
computer applicationsrelation extractionHowNetiLinclustering
靳知瑶,杜军平.基于聚类的旅游领域实体空间关系抽取研究[EB/OL].(2015-10-26)[2025-08-04].http://www.paper.edu.cn/releasepaper/content/201510-213.点此复制
评论