基于多特征融合的科技文献自动标引方法研究
Automatic indexing of scientific and technological documents based on multi-feature fusion
目的/意义 当前用户迫切需要在极度复杂的信息当中高效获取具有价值的信息,在这种背景下,本文提出一种多特征融合的自动标引方法以提高文本标引的准确性。 方法/过程 首先将文本正文和摘要同时作为标引源,接着分别采用Keybert方法和TF-IDF方法处理摘要和正文,同时结合统计学习法的词频特征和机器学习法的语义特征获取两组文本候选标引词;最后通过语义相似度计算做融合处理结合两种方法的优势以体现对标引结果的准确性和全面性的整体把握。 结果/结论 实验表明,基于多特征融合的文本自动标引是可行的,具有较好的标引结果。
计算技术、计算机技术自动化基础理论自动化技术、自动化技术设备
自动标引 多特征融合 候选词提取
.基于多特征融合的科技文献自动标引方法研究[EB/OL].(2022-09-02)[2025-10-15].https://chinaxiv.org/abs/202209.00010.点此复制
Purpose/Significance With the advent of the Big Data era, users are in urgent need of efficient access to valuable information in the midst of extremely complex information, especially in literature reading, where it is crucial to quickly grasp the core content and topic ideas of the text. Method/Process This study proposes to use both text body and abstract as citation sources, and combine the word frequency features of statistical learning method and semantic features of machine learning method to obtain text candidate citation words, and then combine the advantages of both methods by semantic similarity calculation to reflect the accuracy and comprehensiveness of the citation results as a whole. Result/Conclusion The experiments show that automatic text citation based on multi-feature fusion is feasible and has better citation results.
automatic indexing multi-feature fusion candidate word extraction
展开英文信息
评论