|国家预印本平台
首页|基于文本的科技论文图像检索

基于文本的科技论文图像检索

ext Based Image Retrieval on Scientific Documents Database

中文摘要英文摘要

本文分析了建立科技论文图像检索系统的必要性,并对建立该系统需要解决的两个问题进行了研究。一是提出了一种从科技论文中提取图像的算法。该算法首先将文档转换成文档图像,然后使用颜色直方图、一阶颜色矩、二阶颜色矩等图像底层特征去发现科技论文中的内容图像,使用该方法提取图像可以达到94.3%的准确率。二是提出了基于规则的相关文本提取算法。文中使用标题,摘要,关键词和周边文本这四种相关文本的不同组合为图像建立索引。实验表明,使用标题和周边文本为图像建立索引效果最好。

he necessity of building image retrieval system on scientific documents database is analyzed and two related problems are studied . One is that an image extraction method is proposed to extract images from scientific documents . In this method ,the documents are transformed to document images ,and then color histogram ,1-order color moment and 2-order color moment are used to find out content images from document images and a precision value of 94.3% is obtained . The other is that rule-based related text extraction method is proposed . Different combinations of title , abstract , keywords and surround text are used to index the content images .The experiment result shows that using the title and surround text to index images results in best retrieval performance .

马军、王瑜、马德奎

计算技术、计算机技术

科技论文图像检索相关文本图像提取

Scientific documentImage retrievalRelated textImage extraction

马军,王瑜,马德奎.基于文本的科技论文图像检索[EB/OL].(2008-11-03)[2025-08-02].http://www.paper.edu.cn/releasepaper/content/200811-35.点此复制

评论