基于双层贝叶斯模型的商品分类算法研究
he Research of Commodity Classification Algorithm Based on Double-layer Bayesian Model
随着电子商务的蓬勃发展,商品信息日益繁杂,为商品贴上正确的标签已经成为挖掘商品数据价值的关键前提,这里的贴标签指的就是分类。本文针对商品文本信息的特征及处理需求,对已有商品文本数据进行中文分词、去停用词处理,使用空间向量模型表示文本数据,并提出双层贝叶斯分类模型,加入平滑平滑参数对每层分类模型进行优化,实现了商品分类算法的性能提升,最后给出了具体实验的分析以及总结。实验表明双层贝叶斯模型分类效果优于单层模型,在准确率和召回率上都有一定的提高,使用者可以根据实际应用选择单层或者双层贝叶斯分类模型。
With the rapid development of e-commerce, product information is increasingly complex, the correct label affixed to commodities, which means classification, has become the key point to data mining in commodities. Firstly, according to the characteristics and requirements of commodity texts, this paper gives the methods of segmentation of Chinese texts, removing stop words, and representing the commodity texts using VSM. After that, a double-layer Bayesian model with Jelinek-Mercer smoothing is proposed in this paper. Finally, experiments in this paper give the tests and analysis of the proposed model. Results show that the double-layer model can improve the performance from the aspects of precision and recall comparing to single-layer Bayesian model. User could make a good choice between single-layer and double-layer model depending on the actual situation.
莫斯雅、崔晓燕
计算技术、计算机技术
机器学习贝叶斯算法双层模型商品文本分类
machine learningBayesdouble-layer modelcommodity text classification
莫斯雅,崔晓燕.基于双层贝叶斯模型的商品分类算法研究[EB/OL].(2015-12-10)[2025-08-16].http://www.paper.edu.cn/releasepaper/content/201512-584.点此复制
评论