一种基于嵌入扰动的大语言模型微调数据隐私保护方法
A Data Privacy Protection Method for LLM Fine-Tuning Based on Embedding Perturbation
刘欣诚 1程祥1
作者信息
- 1. 北京邮电大学计算机学院(国家示范性软件学院),北京,100876
- 折叠
摘要
大语言模型主要采用"预训练+微调"的训练范式,在领域数据上微调可以提升模型在特定任务上的表现。然而,一些企业拥有高质量的领域数据却没有足够的算力微调模型,需要可提供大模型微调服务的企业,并通过对方提供的模型微调和推理服务应用微调后的模型。在此过程中,微调数据中的隐私信息有泄露的风险。现有方法基于差分隐私对词嵌入添加噪声来解决这一问题。然而,微调服务提供方往往不允许数据方获取预训练模型权重,数据方需使用开源嵌入模型处理数据,而微调服务提供方依赖预训练模型的嵌入层,双方嵌入模型的差异导致语义空间不对齐,加剧了扰动带来的语义偏移,放大了噪声对模型的影响。此外,现有加噪机制未区分词嵌入的语义重要性,可能导致关键信息被过量噪声掩盖而损害模型性能。为此,本文提出一种基于嵌入扰动的大语言模型微调数据隐私保护方法。在该方法中,为缓解语义空间不对齐问题,提出了基于对比学习的语义对齐方法;为增强语义对齐对于词嵌入的适配性,提出了基于余弦相似度的表示一致性优化方法;为降低噪声的过量注入导致的模型效用低的问题,提出了基于语义加权的嵌入扰动方法。实验结果表明,与基线方法相比,所述方法在提供相同隐私保护强度的条件下,获得了更优的模型性能。
Abstract
Large language models typically follow a "pre-training and fine-tuning" paradigm, where domain-specific fine-tuning enhances task performance. However, enterprises with high-quality domain data often lack sufficient computational resources, necessitating collaboration with providers offering fine-tuning as a service. This process risks exposing sensitive information in the fine-tuning data. Existing approaches address this by adding noise to word embeddings based on differential privacy. Yet, FTaaS providers typically restrict access to pre-trained model weights, forcing data owners to process data using open-source embedding models while service providers rely on the pre-trained model\'s embedding layer. This discrepancy creates semantic space misalignment, exacerbating semantic shifts caused by perturbed embeddings and amplifying noise impact. Furthermore, current noise injection mechanisms fail to distinguish the semantic importance of word embeddings, causing critical information to be obscured by excessive noise and degrading model utility. To address these issues, this paper proposes an embedding perturbation-based privacy protection method for fine-tuning data. Specifically, we introduce a contrastive learning-based semantic alignment method to mitigate misalignment, a cosine similarity-based representation consistency optimization to enhance adaptability, and a semantic weighting-based embedding perturbation method to prevent excessive noise injection. Experimental results demonstrate that the proposed method achieves superior model performance while guaranteeing privacy compared to baseline approaches.关键词
自然语言处理/大语言模型/差分隐私/嵌入扰动/语义对齐/隐私保护Key words
Natural Language Processing/Large Language Models/Differential Privacy/Embedding Perturbation/Semantic Alignment/Privacy Protection引用本文复制引用
刘欣诚,程祥.一种基于嵌入扰动的大语言模型微调数据隐私保护方法[EB/OL].(2026-02-12)[2026-02-14].http://www.paper.edu.cn/releasepaper/content/202602-77.学科分类
语言学
评论