MapReduce中Shuffle优化与重构
Optimization and Reconstruction Shuffle in MapReduce
如今Hadoop已成为目前最主流的云计算平台,在Hadoop分布式计算平台中,如何优化MapReduce计算性能是目前研究的一个热点问题。除了编写高性能的Map和Reduce函数,主要从优化系统框架方面提升运算性能。本文通过详细介绍MapReduce编程框架,并具体分析了MapReduce中Shuffle阶段流程。分别从Map端数据压缩,重构远程数据拷贝传输协议,Reduce端内存分配优化三方面来优化和重构Shuffle。最后通过搭建Hadoop集群,运用MapReduce分布式算法测试实验数据。实验结果证明优化重构后的Shuffle能显著提高MapReduce计算性能。
Hadoop has become the most mainstream cloud computing platform. In the Hadoop distributed computing platform, how to optimize the MapReduce computational performance is a hot issue in the present study. In addition to writing high-performance Map and Reduce functions, enhance operator performance mainly from the optimization of the system framework. This paper describes the details of the MapReduce programming framework, and analyzes the shuffle-stage process clearly. In order to optimize and reconstruct Shuffle in MapReduce, the paper has taken the following measures: compress the output of map end, reconstruct the protocol which is used to copy the data form map end to reduce end and optimize memory allocation on reduce end. Finally, through building a Hadoop cluster, using the MapReduce distributed algorithm test experimental data. Experimental results show that optimization of the reconstructed Shuffle can significantly improve performance MapReduce computing.
彭辅权、应晶、吴明晖、金苍宏
计算技术、计算机技术
云计算,Hadoop,MapReduce,Shuffle
cloud computing Hadoop MapReduce shuffle
彭辅权,应晶,吴明晖,金苍宏.MapReduce中Shuffle优化与重构[EB/OL].(2012-05-24)[2025-06-16].http://www.paper.edu.cn/releasepaper/content/201205-411.点此复制
评论