Beyond A Single AI Cluster: A Survey of Decentralized LLM Training
Beyond A Single AI Cluster: A Survey of Decentralized LLM Training
The emergence of large language models (LLMs) has revolutionized AI development, yet their training demands computational resources beyond a single cluster or even datacenter, limiting accessibility to large organizations. Decentralized training has emerged as a promising paradigm to leverage dispersed resources across clusters, datacenters, and global regions, democratizing LLM development for broader communities. As the first comprehensive exploration of this emerging field, we present decentralized LLM training as a resource-driven paradigm and categorize it into community-driven and organizational approaches. Furthermore, our in-depth analysis clarifies decentralized LLM training, including: (1) position with related domain concepts comparison, (2) decentralized resource development trends, and (3) recent advances with discussion under a novel taxonomy. We also provide up-to-date case studies and explore future directions, contributing to the evolution of decentralized LLM training research.
Ying Shen、Jingyan Jiang、Jiajun Luo、Jiajun Song、Haotian Dong、Rongwei Lu、Bowen Li、Zhi Wang
计算技术、计算机技术
Ying Shen,Jingyan Jiang,Jiajun Luo,Jiajun Song,Haotian Dong,Rongwei Lu,Bowen Li,Zhi Wang.Beyond A Single AI Cluster: A Survey of Decentralized LLM Training[EB/OL].(2025-03-13)[2025-05-04].https://arxiv.org/abs/2503.11023.点此复制
评论