首页|融合角色、结构和语义的口语对话预训练语言模型

融合角色、结构和语义的口语对话预训练语言模型

来源：

中文摘要

英文摘要

口语语言理解是任务式对话系统的重要组件，预训练语言模型在口语语言理解中取得了重要突破。然而，目前这些预训练语言模型，大多是基于大规模书面文本语料。考虑到口语与书面语在结构、使用条件和表达方式上的明显差异，构建了大规模、双角色、多轮次、口语对话语料，并提出融合角色、结构和语义的四个自监督预训练任务：全词掩码，角色预测，话语内部反转预测和轮次间互换预测，通过多任务联合训练面向口语的预训练语言模型SPD-BERT：SPoken Dialog-BERT。在金融领域智能客服场景的三个人工标注数据集：意图识别、实体识别和拼音纠错上进行详细的实验测试，实验结果表明该语言模型的有效性。

Spoken language understanding (SLU) is an important component of dialog system. Recently, pre-trained language model has made breakthrough in various tasks of spoken language understanding. However, these language models are trained with large-scale written language, which are quite different from spoken language in structure, condition and expression pattern. This paper construct large-scale multi-turn bi-role spoken dialog corpus. Then four self-supervised pre-trained tasks are proposed: masked language model, role prediction, intra-query reverse prediction and inter-query exchange prediction. A bert-based spoken dialog language model (SPD-BERT) is pre-trained through multi-task learning. Finally, the model is tested with three typical tasks of intelligent customer service in finance domain. The experiment results demonstrates the effectiveness of out model.

作者：李锋、黄健

作者单位：

DOI：10.12074/202204.00048V1

学科分类：语言学

中文关键词：对话系统口语语言理解预训练语言模型意图识别实体识别

推荐引用：李锋,黄健.融合角色、结构和语义的口语对话预训练语言模型[EB/OL].(2022-04-07)[2025-08-06].https://chinaxiv.org/abs/202204.00048.点此复制

融合角色、结构和语义的口语对话预训练语言模型

评论