|国家预印本平台
首页|基于Spark的混合协同过滤算法改进与实现

基于Spark的混合协同过滤算法改进与实现

中文摘要英文摘要

针对传统协同过滤在推荐过程中存在的稀疏性、扩展性以及个性化问题,通过引入算法集成的思想,旨在优化和改进一种新型的基于Spark平台下的混合协同过滤。借鉴了Stacking集成学习思想,将多个弱推荐器线性加权组合,形成综合性强的推荐器。首先,算法基于近邻协同过滤,结合分类、流行度、好评度等对近邻相似度计算策略进行优化,旨在改善相似度的合理性以及相似度计算的复杂度,在一定程度上改善了评分稀疏性的问题;同时,该算法结合Spark分布式计算平台,充分借鉴分布式平台的优点,利用其流式处理以及分布式存储结构等特性,设计并实现了一种推荐算法的增量迭型,解决了协同过滤算法扩展性和实时性问题。实验数据采用UCI公用数据集MovieLens和NetFlix电影评分数据,实验结果表明,改进算法在推荐个性化、准确率以及扩展性上都有不错的表现,较以前同类型算法均有不同程度的提高,为推荐系统的应用提供了一种可行的算法集成方案。

iming at optimizing and improving a hybrid collaborative filtering based on spark platform for its sparsity, scalability and personalized recommendation by using the method of algorithms integrated. This paper takes the model of Stacking algorithm integrated to integrate multiple weak recommender units in a linearly weighted into a comprehensive recommender. Firstly, this article optimizes the collaborative filtering based on the nearest neighbor by presorting and adjusting the similarity calculation strategy with popularity and praise degree, and improves the rationality and complexity of similarity calculation. It solves the problem of score sparsity to some extent. At the same time, this algorithm integrates closely distributed computing platform, which can make full use of the advantages of distributed platform to design and implement an increment iterative model of recommendation algorithm by using the Spark streaming and distributed storage structure and it solves the problem that collaborative filtering algorithm is hard to expand and make poor real-time performance. The experimental data uses UCI public data set named MovieLens and NetFlix films score, and the experimental results show that the improved algorithm has a good performance and makes great progress in personalized recommendation, accuracy and scalability compared with the previous algorithms. It provides a feasible algorithm integration scheme for the application of the recommended system.

王源龙、孙卫真、向勇

10.12074/201805.00375V1

计算技术、计算机技术

集成学习协同过滤稀疏性扩展性Spark流式计算增量模型分类

王源龙,孙卫真,向勇.基于Spark的混合协同过滤算法改进与实现[EB/OL].(2018-05-18)[2025-08-02].https://chinaxiv.org/abs/201805.00375.点此复制

评论