Parallelization in Scientific Workflow Management Systems
Parallelization in Scientific Workflow Management Systems
Over the last two decades, scientific workflow management systems (SWfMS) have emerged as a means to facilitate the design, execution, and monitoring of reusable scientific data processing pipelines. At the same time, the amounts of data generated in various areas of science outpaced enhancements in computational power and storage capabilities. This is especially true for the life sciences, where new technologies increased the sequencing throughput from kilobytes to terabytes per day. This trend requires current SWfMS to adapt: Native support for parallel workflow execution must be provided to increase performance; dynamically scalable "pay-per-use" compute infrastructures have to be integrated to diminish hardware costs; adaptive scheduling of workflows in distributed compute environments is required to optimize resource utilization. In this survey we give an overview of parallelization techniques for SWfMS, both in theory and in their realization in concrete systems. We find that current systems leave considerable room for improvement and we propose key advancements to the landscape of SWfMS.
Marc Bux、Ulf Leser
计算技术、计算机技术
Marc Bux,Ulf Leser.Parallelization in Scientific Workflow Management Systems[EB/OL].(2013-03-28)[2025-05-05].https://arxiv.org/abs/1303.7195.点此复制
评论