基于Storm的流计算框架
Data Stream Computing Framework Based On Storm
大数据时代的信息爆炸,使得对大数据的处理变得异常重要。各个行业尤其是互联网行业,每天都会产生TB级的服务数据,因此需要更大的硬件资源来处理。MapReduce框架的提出以及Hadoop的出现为处理大数据提供了很好的方法,但是人们慢慢发现Hadoop并不是万能的。对海量数据的实时处理就是Hadoop的缺陷,而Twitter开发的大数据实时处理框架Storm就很好的弥补了Hadoop的不足。Hadoop提供离线数据的批处理,而Storm提供可靠的数据流服务。本文首先通过对Storm架构,原理进行了介绍,随后结合具体实例对一个通用的实时处理数据流的方法进行说明。最后设计了一个整合批处理和实时流计算的大数据处理系统。
Information technology has entered a big-data era, big-data processing becomes even more important.Every industry especially in the Internet industry,they need more hardware resources to handle their increasing data in TB level.MapReduce framework and its open-source implement --Hadoop provide a efficient method to process big data.But people gradually found that Hadoop can't deal with every problems,real-time processing big data is its shortcoming.Storm,a free and open source distributed real-time computation system developed by Twitter can deal this problem.Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing.This paper first give a description of Storm's architecture and principle of operation;then introduce a universal method for real-time processing of data stream;last, author designs a hybrid system that combines real-time processing and batch processing.
刘心光
计算技术、计算机技术
数据流计算实时处理HadoopStorm
ata Stream ComputingReal-time ProcessingHadoopStorm
刘心光.基于Storm的流计算框架[EB/OL].(2013-12-25)[2025-07-01].http://www.paper.edu.cn/releasepaper/content/201312-858.点此复制
评论