|国家预印本平台
首页|基于词向量扩展的学术资源语义检索技术

基于词向量扩展的学术资源语义检索技术

Semantic Retrieval Technology of Academic Resources Based on Word Embedding Extension

中文摘要英文摘要

[目的/意义] 尝试以统计的方法为指导思想,探究基于词向量扩展的语义检索技术来提升学术资源的语义检索能力。[方法/过程] 利用自然语言处理、文本挖掘技术,对采集来的学术资源(主要是学术论文)元数据进行预处理,结合word2vec词向量生成工具和elasticsearch全文检索引擎搭建语义检索系统,对学术资源进行语义检索的探索研究。[结果/结论] 本文提出的方法能够有效提升学术信息的检索效果,一定程度上实现学术资源的语义检索,并为后续语义检索的进一步研究提供借鉴。

Purpose/significance] Based on the statistical method, the paper explored the semantic retrieval technology based on word embedding expansion to enhance the semantic retrieval ability of academic resources.[Method/process] Using Natural Language Processing and text mining technology, the paper preprocessed the collected academic resources (mainly academic papers) metadata, combined the Word2vec word embedding generation tool and the elasticsearch full text retrieval engine to build semantic retrieval system, and explored the semantic retrieval of academic resources.[Result/conclusion] The method proposed in this paper can effectively improve the retrieval effect of academic information, and it realizes the semantic retrieval of academic resources to a certain extent, and could provide reference for further research on the follow-up semantic retrieval.

陈川宝、孟现茹、王仁武

信息传播、知识传播科学、科学研究计算技术、计算机技术

word2vecElasticsearch语义检索学术资源

Word2vecelasticsearchsemantic retrievalacademic resources

陈川宝,孟现茹,王仁武.基于词向量扩展的学术资源语义检索技术[EB/OL].(2023-08-27)[2025-08-16].https://chinaxiv.org/abs/202308.00540.点此复制

评论