|国家预印本平台
首页|基于匿名化样本ID的安全多方集成树模型

基于匿名化样本ID的安全多方集成树模型

Secure Multi-Party Ensemble Decision Tree Models Based on Anonymization ID

中文摘要英文摘要

集成树模型是一种高准确率、可解释性强的机器学习算法,被金融、医药等领域广泛应用。然而,不同企业与机构之间直接利用数据进行联合建模可能会导致个人隐私信息的泄露。如何运用隐私保护相关技术,安全且高效地进行集成树模型的联合建模成为工业界与学术界的研究热点。目前,隐私保护的集成树模型普遍采用位向量表示决策树节点的样本集合,并利用秘密分享的方式对位向量加以保护,但该方法会带来大量的运算开销。此外,信息增益函数计算过程涉及大量安全除法算子,增加模型的运算开销。针对上述问题,本文提出了一种基于匿名化样本ID的安全多方集成树模型。在该模型中,针对位向量表示方法带来大量运算开销的问题,设计基于匿名化样本ID的决策树节点表示方式;针对信息增益函数计算过程中涉及大量安全除法算子的问题,设计一种不涉及安全除法算子的信息增益函数比较算法。在公开数据集上的实验结果表明,与现有工作相比,本文所提出的模型在运行效率上有所提升,且能够取得与集中式训练相近的模型精度。

s a widely used machine learning algorithm with strong interpretability and high adaptability, the ensemble decision tree models have been used in various fields, such as finance and medicine. However, the data in these fields is often strictly private and must not be disclosed and shared freely. Therefore, how to use privacy-related technologies to securely and efficiently perform joint modelling of ensemble tree models has become a research hotspot in industry and academia. At present, the private information of the sample space contains in the nodes in the ensemble decision tree models is generally expressed on the basis of the secret shared bit vector, but this method will bring a large number of redundant zeros and increase the computational cost; in addition, the calculation process of the information gain function involves operation inefficient secure division operator in existing methods.To solve the above problems, this paper proposes a decision tree node representation method based on anonymised sample ID. In this paper, aiming at the redundant zeros problem of node sample space representation method, the anonymized sample ID based on the exchangeable encryption system is used to represent the sample space of the node; for the inefficient safe division operator in the calculation process of information gain function, which can be transformed into efficient safe addition and multiplication operators by common division and simplification. Experiments on public datasets demonstrate that the proposed model has improved computational efficiency and has almost no loss compared with existing centralised training models.

程祥、黄岳嘉

计算技术、计算机技术自动化技术、自动化技术设备

计算机科学与技术隐私计算安全多方计算集成树模型同态加密系统

omputer SciencePrivacy Enhance ComputeSecure Multi-Party Computationensemble decision tree modelsHomomorphic Encryption

程祥,黄岳嘉.基于匿名化样本ID的安全多方集成树模型[EB/OL].(2023-04-10)[2025-08-18].http://www.paper.edu.cn/releasepaper/content/202304-154.点此复制

评论