MapReduce多数据源信息提取系统的开发与实现
he Research and Implementation of Data extraction system Based on MepReduce for Multiple Data Source
MapReduce是现有大数据平台中典型的分布式并行计算编程模型,在大数据处理中被广泛应用。然而,程序应用往往需要从不同结构、不同存储方式的数据源中提取数据。因此,为MapReduce屏蔽底层复杂的数据源连接,将不同数据源映射为统一的接口,有效地为上层应用提供对异构数据的访问,成为现阶段数据分析需要解决的问题。本文将针对MapReduce,改进现有方法,设计面向HDFS、HBase和Mysql数据库的多数据源的统一数据模型,集成数据操作代码,减少重复代码,提高开发效率。
MapReduce, a famous distributed parallel computing model ,is widely used in large-scale parallel processing . However, applications often need to extract lots of information from numbers of heterogeneous, distributed, multiple data sources . Therefore, for MapReuce, shielding the underlying complex data source connection , mapping heterogeneous data sources to a unified interface, providing effectively access to heterogeneous data for upper layer applications, has become imperative for data analysising at this stage. This article will fouce on MapReduce to improve existing methods, improve existing methods, design unified data model for HDFS、HBase and Mysql ,integrated data manipulation code and reduce duplication of code in order to improve development efficiency.
汪伟、李昕
计算技术、计算机技术
大数据分析多数据源MapReduce
Big data analysisMultiple data sourcesMapReduce
汪伟,李昕.MapReduce多数据源信息提取系统的开发与实现[EB/OL].(2015-12-07)[2025-08-19].http://www.paper.edu.cn/releasepaper/content/201512-363.点此复制
评论