|国家预印本平台
首页|融合角色、结构和语义的口语对话预训练语言模型

融合角色、结构和语义的口语对话预训练语言模型

中文摘要英文摘要

口语语言理解是任务式对话系统的重要组件,预训练语言模型在口语语言理解中取得了重要突破。然而,目前这些预训练语言模型,大多是基于大规模书面文本语料。考虑到口语与书面语在结构、使用条件和表达方式上的明显差异,构建了大规模、双角色、多轮次、口语对话语料,并提出融合角色、结构和语义的四个自监督预训练任务:全词掩码,角色预测,话语内部反转预测和轮次间互换预测,通过多任务联合训练面向口语的预训练语言模型SPD-BERT:SPoken Dialog-BERT。在金融领域智能客服场景的三个人工标注数据集:意图识别、实体识别和拼音纠错上进行详细的实验测试,实验结果表明该语言模型的有效性。

Spoken language understanding (SLU) is an important component of dialog system. Recently, pre-trained language model has made breakthrough in various tasks of spoken language understanding. However, these language models are trained with large-scale written language, which are quite different from spoken language in structure, condition and expression pattern. This paper construct large-scale multi-turn bi-role spoken dialog corpus. Then four self-supervised pre-trained tasks are proposed: masked language model, role prediction, intra-query reverse prediction and inter-query exchange prediction. A bert-based spoken dialog language model (SPD-BERT) is pre-trained through multi-task learning. Finally, the model is tested with three typical tasks of intelligent customer service in finance domain. The experiment results demonstrates the effectiveness of out model.

李锋、黄健

10.12074/202204.00048V1

语言学

对话系统口语语言理解预训练语言模型意图识别实体识别

李锋,黄健.融合角色、结构和语义的口语对话预训练语言模型[EB/OL].(2022-04-07)[2025-08-06].https://chinaxiv.org/abs/202204.00048.点此复制

评论