|国家预印本平台
首页|基于大模型知识蒸馏的专利技术功效词自动抽取方法研究:以车联网V2X领域为例

基于大模型知识蒸馏的专利技术功效词自动抽取方法研究:以车联网V2X领域为例

Research on automatic extraction of technical and function words extraction method of patent based on large model knowledge distillation: A case study in the field of Vehicle to Everything V2X

中文摘要英文摘要

目的]本文旨在提高专利技术功效自动化提取的准确度。[方法]使用ChatGPT作为教师模型(Teacher-model),ChatGLM3作为学生模型(Student-model),通过知识蒸馏,将ChatGPT生成的训练数据微调ChatGLM3,得到多个技术词抽取模型和功效词抽取模型。采用多个技术词抽取模型分别从专利的摘要、第一权利要求和技术功效语段中抽取技术词,并采用功效词抽取模型从技术功效语段中抽取功效词。[结果]微调后的多个技术词抽取模型和功效词抽取模型相较于ChatGPT,在抽取技术词和功效词时呈现准确率高、召回率低的特点,第一权利要求的ChatGLM3微调模型的准确率和F1值最高,分别为0.734和0.724。功效词抽取模型抽取的功效词的准确率为0.649,大于商业工具标注功效词的准确率0.53。[局限]本研究的技术领域和专利语言单一,验证数据量偏小,数据清洗规则还有待于继续优化。[结论]本研究方案通过知识蒸馏操作,提升了大语言模型自动化抽取技术功效的准确性。同时,本研究能够支持从专利文本中挖掘前沿创新技术、热点技术,支撑更高质量的智能化专利分析。

Objective] This paper aims to improve the accuracy of automatic extraction of key technical words and corresponding function words from patent.[Methods] ChatGPT was used as the Teacher-model, and ChatGLM3 was used as the Student-model. Through knowledge distillation method, the training data generated by ChatGPT was used to fine-tune ChatGLM3, and multiple technical word extraction models and a function word extraction model were obtained. The technical words are extracted from the abstract, the first claim and the technical function paragraph, respectively, by using multiple technical word extraction models, and the function words are extracted from the technical function paragraph by using the function words extraction model.[Results] Compared with ChatGPT, the fine-tuned multiple technical word extraction models and function word extraction model show higher accuracy and lower recall rate, when extracting technical words and function words. The ChatGLM3 fine-tuning model of the first claim has the highest accuracy and F1 values of 0.734 and 0.724 respectively. Moreover, The accuracy of the function words extracted by the function word extraction model is 0.649, which is higher than the accuracy of the function words labeled by the commercial tool, which is is 0.53.[Limitations] The technical field and patent language of this research are single, the amount of patent verification data is small, and the data cleaning rules expect to be further optimized. [Conclusions] This research scheme improves the efficiency accuracy of automatic extraction of large language model through knowledge distillation operation. At the same time, this study can support the mining of cutting-edge innovative and hot technologies from patent texts, and support higher quality intelligent patent analysis.

10.12074/202402.00235V1

计算技术、计算机技术自动化技术、自动化技术设备

技术功效词抽取知识蒸馏微调大模型语义相似矩阵

echnical function word extractionKnowledge distillationFine-tuning modelSemantic similarity matrix

.基于大模型知识蒸馏的专利技术功效词自动抽取方法研究:以车联网V2X领域为例[EB/OL].(2024-02-26)[2025-08-02].https://chinaxiv.org/abs/202402.00235.点此复制

评论