基于DOM的Web信息提取方法的改进
n Improved Method for Information Extraction Based on DOM
针对基于DOM(Document Object Model, 文档对象模型)模型进行Web信息提取时的标签库过于复杂以及抽取效率低等问题,本文提出一种改进方案,使得基于DOM结构的过滤和基于语义的剪枝更加高效,对主题信息的提取更加准确。最后通过仿真实验,验证了该方法的有效性。
In order to resolve the complexity of tag library and extract efficiency in Web page information extracting based on DOM, an improved method is proposed based the original method. With the advanced method, It is more efficient in filtration based on DOM, more precise in node prune based semantic. At the end, the correctness and the efficiency of the improved method will be validated according to the specific application.
马太保
计算技术、计算机技术
信息提取OM影响度因子OM剪枝
information extractionDOMinfluence degreeDOM prune
马太保.基于DOM的Web信息提取方法的改进[EB/OL].(2010-12-13)[2025-08-02].http://www.paper.edu.cn/releasepaper/content/201012-409.点此复制
评论