|国家预印本平台
首页|基于维基百科的中文文本层次路径生成研究

基于维基百科的中文文本层次路径生成研究

中文摘要英文摘要

【目的】利用维基百科知识库生成自由文本的层次语义路径。【方法】针对维基百科的中文导出数据, 构建层次结构的树状图; 进而通过显性语义分析将自由文本表示为文章概念向量, 通过文章–类别关联关系将文本映射到树状图中构成种子类别节点, 再通过种子节点开始的信息扩散和自顶向下的路径选择与优化, 生成层次路径。【结果】首条层次路径的平均相关度在测试集上达到54.10%, 前20 条路径整体上按相关度降序排序。【局限】未分析显性概念向量在保留不同概念数量时对生成路径质量的影响。【结论】基于维基百科知识库所生成的层次路径结果能够反映文本的主要语义信息。

Objective] Generate hierarchical semantic paths of texts from Wikipedia. [Methods] We first establish article concept vector of Chinese texts from Wikipedia through explicit semantic analysis. And then, we mapped the vector to the category nodes of hierarchical-tree-like graph. Finally, we generated the hierarchical paths with the help of seed node information diffusion and top-down path selection, as well as optimization technology. [Results] The average relevance degree of the first generated hierarchical path was 54.10% on the test dataset, and the top 20 paths were sorted by relevance in the descending order. [Limitations] We did not analyze the effect of using different numbers of explicit concept vector to the quality of the generated path. [Conclusions] The hierarchical paths generated from Wikipedia can reflect the main semantic meaning of the given texts.

夏天

10.12074/201711.01237V1

计算技术、计算机技术

语义路径显性语义分析层次分类维基百科

夏天.基于维基百科的中文文本层次路径生成研究[EB/OL].(2017-10-11)[2025-08-18].https://chinaxiv.org/abs/201711.01237.点此复制

评论