随机决策树改进算法在大数据上的设计与实现
esign and Implementation of Improved Random Decision Tree Algorithm on Big Data
随着互联网技术的快速发展,人们积累的数据量越来越大,运用机器学习算法挖掘数据中的价值成为研究热点。但是大数据量级和维度的增长,使得机器学习算法的实施面临着执行效率和计算开销等方面的重大挑战。随机决策树算法由于在建树过程中不使用任何纯度检测函数能使计算开销减小,但树形结构的限制却使其很难并行计算并处理大数据问题时效率低下,为此本文在该算法上进行改进,使用非参数随机方法建模,以无监督局部敏感哈希(LSH)代替树形结构来达到随机分配数据空间的目的,因为改进算法本质上是非迭代的,故其能在保证与原始算法同等精确度的前提下,灵活高效的并行运行在分布式平台上,从而满足大数据集中存储与高效处理的双重要求。
With the rapid development of Internet technology, huge amount of data has been accumulated. Using machine learning algorithm to discover the underlying value of data has become a research hotspot. However, due to the growth of the data size and dimension, machine learning algorithm are facing unprecedented challenges in terms of efficiency and calculation cost. Random decision tree algorithm can reduce the calculation cost by building trees without using any purity detection. Because of the tree structure, it makes the algorithm too difficult to parallel deal with the large data problems. In this paper, we propose an improved random decision tree algorithm, using LSH algorithm instead of the tree structure to randomly shatter the data. The improved algorithm is non-iterative, so it can flexibly and efficiently parallel compute in the distributed platform with high accuracy compared with the original algorithm. As a result, the improved algorithm can meet the storage and processing requirements of big data.
赵梦琪、李文生
计算技术、计算机技术
随机决策树LSH大数据非参数模型机器学习
Random decision treeLSHbig datanonparametric modelmachine learning
赵梦琪,李文生.随机决策树改进算法在大数据上的设计与实现[EB/OL].(2017-01-05)[2025-08-02].http://www.paper.edu.cn/releasepaper/content/201701-89.点此复制
评论