基于黑白名单构建并约束的RNN文本分类
RNN Text Classification Based on Construction and Constraint by Blacklist and Whitelist
文本分类作为一种快速、高效的数据挖掘技术,已经成为当前研究的热点。目前,基于神经网络的文本分类方法取得了惊人的分类效果。但是,神经网络需要大量的训练数据,不适用于冷启动和零射场景,并且不可解释。另一方面,基于规则的文本分类方法,不需要训练数据,可解释性好,有不错的分类精度。但是使用规则分类需要人工不断更新规则集合以提高分类效果,效率较低。并且当规则迭代到一定程度后,分类准确率很难提高,在富资源的场景下不如神经网络。黑白名单作为规则的一种,构建简单,已经被广泛应用于垃圾邮件过滤,新闻分类等领域。为了构建分类精度高,可解释性好,能够适用于冷启动、零射、低资源和富资源场景下的文本分类模型,本文基于黑白名单构建循环神经网络。之后,针对构建循环神经网络过程中存在的问题对其进行优化。最后,为了提高模型的可解释性,利用黑白名单约束循环神经网络。通过在不同数据集上的实验,验证了本文所提方法的有效。
s a fast and efficient data mining technology, text classification has become a research hotspot. Currently, text classification methods based on neural networks have achieved amazing classification performance. However, neural networks are not interpretable and require a large amounts of training data, which are not suitable for cold-start and zero-shot scenarios. On the other hand, rule-based text classification methods are interpretable, require no training data, and often achieve decent classification accuracy. However, rule-based text classification continuously requires to manually update the rule to improve accuracy, which is inefficient. And when the rules are iterated to a certain extent, the accuracy is difficult to improve, resulting underperformmance compared with neural networks in rich-resource scenarios. As a type of rules, blacklist/whitelist have been widely used in spam filtering, news classification and other fields because of its uncomplicated construction. In order to build a text classification model with high accuracy and good interpretability, which can be applied to cold-start, zero-shot, low-resource and rich-resource scenarios, this paper constructs a recurrent neural network based on blacklist/whitelist, and proposes a text classification framework. After that, it is optimized for the problems existing in the process of constructing the recurrent neural network. Finally, this paper improve interpretability of the model by constraining the recurrent neural network with blacklist/whitelist. Experiments on different datasets verifiy the effectiveness of the method proposed in this paper .
杨林、徐慧、王超超、杨毅
计算技术、计算机技术
文本分类黑白名单循环神经网络可解释性
text classificationblacklist and whitelistrecurrent nerual networkinterpretability
杨林,徐慧,王超超,杨毅.基于黑白名单构建并约束的RNN文本分类[EB/OL].(2022-04-06)[2025-08-18].http://www.paper.edu.cn/releasepaper/content/202204-95.点此复制
评论