首页|DawnPiper: A Memory-scablable Pipeline Parallel Training Framework

DawnPiper: A Memory-scablable Pipeline Parallel Training Framework

来源：

英文摘要

Pipeline parallelism is a crucial paradigm for large-scale model training. However, imbalances in memory footprint across stages can lead to significant GPU memory wastage, limiting the model sizes that pipeline parallelism can effectively support. In this paper, we introduce DawnPiper, a memory-scalable pipeline parallel training framework. Firstly, we develop a DL compilation-based profiling method that transforms the model into a fine-grained computation graph. This refinement gives us a finer granularity of model partitioning and memory optimization while facilitating automatic code generation. Based on observed memory usage characteristics, we derive a performance-optimal theorem for pipeline parallel partitioning that substantially reduces the partition search space. Secondly, we propose a binary pipeline partitioning algorithm and utilize a cost-model based memory optimization approach to efficiently identify nearly optimal pipeline parallel strategy. DawnPiper achieves up to a 4x and 11x increase in trainable maximum batch size compared to vPipe and PipeDream, respectively, and provides up to a 1.5x performance speedup compared to vPipe.

作者：Xuan Peng、Xuanhua Shi、Haolin Zhang、Yunfei Zhao、Xuehai Qian

作者单位：

学科分类：计算技术、计算机技术

推荐引用：Xuan Peng,Xuanhua Shi,Haolin Zhang,Yunfei Zhao,Xuehai Qian.DawnPiper: A Memory-scablable Pipeline Parallel Training Framework[EB/OL].(2025-05-09)[2025-07-16].https://arxiv.org/abs/2505.05856.点此复制

DawnPiper: A Memory-scablable Pipeline Parallel Training Framework

DawnPiper: A Memory-scablable Pipeline Parallel Training Framework

评论