|国家预印本平台
首页|基于SCDV及各向异性调整BERT的文本语义消歧方法

基于SCDV及各向异性调整BERT的文本语义消歧方法

中文摘要英文摘要

文本表示需要解决文本词语的歧义性问题,并能够准确界定词语在特定上下文语境中的语义特征。针对词语的多义性及语境特征问题,提出了一种文本语义消歧的SCDVAB模型。主要创新点有:基于分区平均技术,将场景语料库转换为文档嵌入,并引入各向异性,改进了软聚类的稀疏复合文档向量(SCDV)算法,以提高BERT的语境化表示能力;将调整各向异性后的BERT词语嵌入,作为静态词语向量的文档嵌入,以提升文本语义消歧的能力。通过大量实验进一步证明,SCDVAB模型的效果明显优于传统的文本消歧算法, SCDVAB模型可有效提高文本语义消歧的综合性能。

Solving the problem of ambiguity of text words is important for text representation, and it can accurately define the semantic characteristics of words in a specific context. Aiming at the polysemy and contextual characteristics of words, this paper proposed a semantic disambiguation model of SCDVAB. The main innovations are: Based on the partition average technology, it can convert scene corpus into document embedding, and introduce anisotropy to improve the sparse composite document vector (SCDV) algorithm of soft clustering to improve the contextual representation ability of BERT; and then it can improve the ability of text semantic disambiguation by embedding the BERT words after adjusting the anisotropy as a static word vector. Through many experiments, SCDVAB model is significantly better than the traditional text disambiguation algorithm. SCDVAB model can effectively improve the comprehensive performance of text semantic disambiguation.

李保珍、顾秀莲

10.12074/202205.00137V1

计算技术、计算机技术

语义消歧各向异性BERT稀疏复合文档向量文本表示

李保珍,顾秀莲.基于SCDV及各向异性调整BERT的文本语义消歧方法[EB/OL].(2022-05-18)[2025-08-23].https://chinaxiv.org/abs/202205.00137.点此复制

评论