首页|Designing Deep Learning Frameworks for LLMs:Challenges, Expectations, and Opportunities

Designing Deep Learning Frameworks for LLMs:Challenges, Expectations, and Opportunities

来源：

英文摘要

Large language models (LLMs) drive significant advancements in real industry applications. LLMs rely on DL frameworks for efficient model construction, distributed execution, and optimized deployment. Their large parameter scale and long execution cycles place extreme demands on DL frameworks in terms of scalability, stability, and efficiency. Therefore, poor usability, limited functionality, and subtle bugs in DL frameworks may hinder development efficiency and cause severe failures or resource waste. However, a fundamental question remains underinvestigated, i.e., What challenges do DL frameworks face in supporting LLMs? To seek an answer, we investigate these challenges through a large-scale analysis of issue reports from three major DL frameworks (MindSpore, PyTorch, TensorFlow) and eight associated LLM toolkits (e.g., Megatron). We construct a taxonomy of LLM-centric bugs, requirements, and user questions and enrich it through interviews with 11 LLM users and eight DL framework developers, uncovering key technical challenges and misalignments between user needs and developer priorities. Our contributions are threefold: (1) we develop a comprehensive taxonomy comprising four question themes (nine sub-themes), four requirement themes (15 sub-themes), and ten bug themes (45 sub-themes); (2) we assess the perceived importance and priority of these challenges based on practitioner insights; and (3) we identify five key findings across the LLM development and propose five actionable recommendations to improve the reliability, usability, and testability of DL frameworks. Our results highlight critical limitations in current DL frameworks and offer concrete guidance for advancing their support for the next generation of LLM construction and applications.

作者：Yanzhou Mu、Rong Wang、Juan Zhai、Chunrong Fang、Xiang Chen、Jiacong Wu、An Guo、Jiawei Shen、Bingzhuo Li、Zhenyu Chen

作者单位：

学科分类：计算技术、计算机技术

推荐引用：Yanzhou Mu,Rong Wang,Juan Zhai,Chunrong Fang,Xiang Chen,Jiacong Wu,An Guo,Jiawei Shen,Bingzhuo Li,Zhenyu Chen.Designing Deep Learning Frameworks for LLMs:Challenges, Expectations, and Opportunities[EB/OL].(2025-06-16)[2025-07-01].https://arxiv.org/abs/2506.13114.点此复制

Designing Deep Learning Frameworks for LLMs:Challenges, Expectations, and Opportunities

Designing Deep Learning Frameworks for LLMs:Challenges, Expectations, and Opportunities

评论