|国家预印本平台
首页|基于GlusterFS的Hadoop系统的架构设计与实现

基于GlusterFS的Hadoop系统的架构设计与实现

he Architecture Design and Implementation of Hadoop based on GlusterFS

中文摘要英文摘要

Hadoop是业界流行的大数据处理平台,利用HDFS作为后端分布式存储系统,可以解决TB乃至PB级别的大数据处理问题。HDFS是以元数据服务器为核心来存取数据的分布式文件系统,在数据量急剧增长的情况下,元数据服务器会出现单点瓶颈问题。GlusterFS是一个可横向扩展的分布式文件系统,在其架构中摒弃了元数据服务器的设计,而采用弹性哈希算法来存取数据,这种设计使其数据存储能力呈线性增长,同时性能也随着规模的增加线性增长。该文章使用GlusterFS代替HDFS与Hadoop MapReduce进行结合,设计了两种不同系统架构,并和HDFS与MapReduce的原生架构对比,进行了数据测试来比较架构间的性能。

he Apache Hadoop is the most popular solution in the industry to store and process extremely large data sets from terabyte to petabyte based on Hadoop Distributed File System(HDFS) on commodity hardware. Since HDFS is a metadata-based distributed file system to store and retrieve any data in it, the metadata server can have much bigger pressure as the data size grows, which obviously can lead to the single point of failure problem. GlusterFS is a scale-out distributed file system that does not depend on a metadata server to locate the data in it, but uses the Elastic Hash algorithm, which makes GlusterFS extremely suitable for linear scalability and have no performance bottlenecks related to metadata.In this paper, we present two different architectures for Hadoop MapReduce enablement on GlusterFS, and evaluate each architecture's performance, then make a comparison with the original architecture based on HDFS+MapReduce.

陈梦飞

计算技术、计算机技术

分布式处理系统HadoopMapReduceGlusterFS架构对比

istributed Processing SystemHadoopMapReduceGlusterFSArchitecture

陈梦飞.基于GlusterFS的Hadoop系统的架构设计与实现[EB/OL].(2015-12-11)[2025-08-04].http://www.paper.edu.cn/releasepaper/content/201512-680.点此复制

评论