|国家预印本平台
首页|HDFS数据节点本地缓存的设计与实现

HDFS数据节点本地缓存的设计与实现

he Design and Implementation of DataNode Local Cache of HDFS

中文摘要英文摘要

随着互联网应用的不断丰富和网络数据的急剧增长,海量数据的处理与存储已成为当前互联网应用中的最主要问题之一。Hadoop分布式文件系统是Apache Hadoop项目开发的适合运行在通用硬件上的分布式文件系统,它具有高可靠性、高容错性的特点,能提供高吞吐量的数据访问,适用于海量数据集的存储和分布式处理。然而,从HDFS中存储海量数据中频繁访问重复的小块数据,会产生频繁的磁盘I/O操作,导致服务器产生磁盘瓶颈等过载现象。本文针对该现象提出一种在HDFS数据节点上增加本地缓存的解决方案,分析并修改了HDFS数据访问部分的开源代码,实现了HDFS数据节点的本地缓存,并通过实验证明了该方案提高了数据访问效率,减轻了服务器的CPU占用率和磁盘占用率。

s the increase of the Internet applications and network data, the process and storage of massive data has become one of the most important problems. Hadoop distributed file system is a file system developed by Apache Hadoop project which is designed to run on commodity hardware. HDFS is high reliable ,tolerable and able to provide high throughput access to large data sets. It is proper for storage and distributed computing for massive data sets. However, the frequent access of small data segments will bring frequent disk I/O operations and causes the disk bottlenecks and resource overloads of servers. To solve this problem, this paper proposes a solution that appending a local cache function to DataNodes of HDFS. The source code of reading data part of HDFS has been analyzed and modified to implement the DataNode local cache of HDFS. The experiments results validate that the data access rate are improved. It also prove that CPU and disk utilization of servers are decreased.

赵婧、王洪波、程时端

计算技术、计算机技术

计算机应用技术HDFS数据节点缓存

omputer application technologyHDFSDataNodecache

赵婧,王洪波,程时端.HDFS数据节点本地缓存的设计与实现[EB/OL].(2011-12-05)[2025-08-10].http://www.paper.edu.cn/releasepaper/content/201112-98.点此复制

评论