MagicGUI: A Foundational Mobile GUI Agent with Scalable Data Pipeline and Reinforcement Fine-tuning
MagicGUI: A Foundational Mobile GUI Agent with Scalable Data Pipeline and Reinforcement Fine-tuning
This paper presents MagicGUI, a foundational mobile GUI agent designed to address critical challenges in perception, grounding, and reasoning within real-world mobile GUI environments. The framework is underpinned by following six key components: (1) a comprehensive and accurate dataset, constructed via the scalable GUI Data Pipeline, which aggregates the largest and most diverse GUI-centric multimodal data to date from open-source repositories, automated crawling, and targeted manual annotation; (2) enhanced perception and grounding capabilities, facilitating fine-grained multimodal alignment for UI element referencing, grounding, and screen comprehension; (3) a comprehensive and unified action space, encompassing both fundamental UI operations and complex interactive intents to support human-agent interactions; (4) planning-oriented reasoning mechanisms that enable the model to decompose complex user instructions into sequential actions with explicit intermediate meta-paln reasoning; (5) an iterative two-stage training procedure, combining large-scale continue pre-training on 7.8M samples with reinforcement fine-tuning utilizing a spatially enhanced composite reward and dual filtering strategy; and (6) competitive performance on both the proprietary Magic-RICH benchmark and over a dozen public benchmarks, achieving superior performance across GUI perception and agent tasks, while demonstrating robust generalization and real-world deployment potential in practical mobile GUI scenarios, as detailed in Figure 1.
Yijia Huang、Mingxu Chai、Zhilin Gao、Xingyu Liu、Xuanjing Huang、Yu-Gang Jiang、Tao Gui、Yingnan Fu、Jiaming Liu、Liujian Tang、Shaokang Dong、Minqi Xiang、Hongtao Ruan、Bin Wang、Shuo Li、Zhiheng Xi、Zhihui Cao、Hailiang Pang、Heng Kong、He Yang、Qi Zhang、Kang Wang、Yunke Zhang、Yuran Wang
计算技术、计算机技术
Yijia Huang,Mingxu Chai,Zhilin Gao,Xingyu Liu,Xuanjing Huang,Yu-Gang Jiang,Tao Gui,Yingnan Fu,Jiaming Liu,Liujian Tang,Shaokang Dong,Minqi Xiang,Hongtao Ruan,Bin Wang,Shuo Li,Zhiheng Xi,Zhihui Cao,Hailiang Pang,Heng Kong,He Yang,Qi Zhang,Kang Wang,Yunke Zhang,Yuran Wang.MagicGUI: A Foundational Mobile GUI Agent with Scalable Data Pipeline and Reinforcement Fine-tuning[EB/OL].(2025-08-08)[2025-08-24].https://arxiv.org/abs/2508.03700.点此复制
评论