利用卷积神经网络进行非结构化文本的敏感信息检测
etecting sensitive information of unstructured text using Convolutional Neural Network
随着大量的电子文本文档的使用,从非结构化文本文档中泄露敏感信息对个人、企业以及政府都是一个代价高昂的问题。如何检测敏感信息以防止数据信息泄露成为了一个信息安全领域的课题。现阶段实际应用的检测方法大致分为两种,敏感词匹配以及传统的机器学习手段。这两种方法都依赖于关键词与敏感种子词共现的频率。然而,在实践使用中,这可能会无法准确的检测出更复杂的敏感信息模式。近年来,有科学家提出利用递归神经网络进行敏感信息检测,利用文档的上下文来更准确地预测文档的敏感性,但RNN在提升准确率的同时,模型训练构建的速率不高。所以本课题提出使用Text-CNN代替递归神经网络,在保证检测准确率的同时,又可以提升检测模型的训练构建时间,整体提高检测效率,实现高效准确检测。
With the use of a large number of electronic text documents, the disclosure of sensitive information from unstructured text documents is a costly issue for individuals, businesses, and governments. How to detect sensitive information to prevent data leakage is a topic in the field of information security. At present, the practical detection methods are roughly divided into two types, sensitive word matching and traditional machine learning methods. Both methods rely on the frequency with which keywords are co-occurring with sensitive seed words. However, in practice, this may not accurately detect more complex patterns of sensitive information. In recent years, some scientists have proposed using recurrent neural networks for sensitive information detection, using the context of documents to more accurately predict the sensitivity of documents, but RNN improves the accuracy, the rate of trainingconstruction is not high. Therefore, this paper proposes to use Text-CNNinstead of a recurrent neural network. While ensuring the accuracy of detection, it can also improve the training construction time of the detection model, improve the detection efficiency as a whole, and achieve efficient and accurate detection.
郭燕慧、于海
计算技术、计算机技术
敏感信息卷积神经网络非结构化文本数据泄露防护
Sensitive informationConvolution neural networkUnstructured textData leak prevention
郭燕慧,于海.利用卷积神经网络进行非结构化文本的敏感信息检测[EB/OL].(2019-04-04)[2025-08-23].http://www.paper.edu.cn/releasepaper/content/201904-52.点此复制
评论