Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent
Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent
In this paper, we introduce Hunyuan-Large, which is currently the largest open-source Transformer-based mixture of experts model, with a total of 389 billion parameters and 52 billion activation parameters, capable of handling up to 256K tokens. We conduct a thorough evaluation of Hunyuan-Large's superior performance across various benchmarks including language understanding and generation, logical reasoning, mathematical problem-solving, coding, long-context, and aggregated tasks, where it outperforms LLama3.1-70B and exhibits comparable performance when compared to the significantly larger LLama3.1-405B model. Key practice of Hunyuan-Large include large-scale synthetic data that is orders larger than in previous literature, a mixed expert routing strategy, a key-value cache compression technique, and an expert-specific learning rate strategy. Additionally, we also investigate the scaling laws and learning rate schedule of mixture of experts models, providing valuable insights and guidances for future model development and optimization. The code and checkpoints of Hunyuan-Large are released to facilitate future innovations and applications. Codes: https://github.com/Tencent/Hunyuan-Large Models: https://huggingface.co/tencent/Tencent-Hunyuan-Large
Zilong Zhao、Guanghui Xu、Chong Zha、Yiqing Huang、Jianchen Zhu、Jie Liu、Ruobing Xie、Zhanhui Kang、Roberts Wang、Zhongzhi Chen、Yuhong Liu、Fusheng Xiang、Yangyu Tao、Zhiyuan Xiong、Jie Jiang、Dengpeng Wu、Xiao Feng、Zhichao Hu、Xiaoqin Ren、Xiang Li、Xirui Li、Lulu Wu、Decheng Wu、Meng Chen、Yuyuan Zeng、Mengyuan Huang、Qian Wang、Peijie Yu、Xiaobo Shu、Guiyang Li、Xinhua Feng、Yigeng Hong、Jianfeng Yan、Kan Wu、Jinbao Xue、Chao Yu、Tinghao Yu、Ruibin Chen、Jonny Han、Xuebin Hou、Fan Jiang、Kai Liu、Xipeng Zhang、Dian Jiao、Kai Zhang、Jiawei Song、Weijie Liu、Dong Du、Rui Yuan、Jianqiang Ma、Di Wang、Chengcheng Xu、Hao Fei、Lei Jiang、Hai Wang、Shuaipeng Li、Zhen Yang、Yaping Deng、Han Liu、Chenchen Zhang、Yiqi Chen、Bin Hu、Yanfeng Chen、Xiong Kuang、Feng Zhang、Jun Xia、Suncong Zheng、Hu Chen、Yao Ding、Zhenxiang Yan、Weichao Wang、Liang Dong、Shaohua Chen、Zifan Wu、Tengfei Cao、Zongwei Li、Ze Zhao、Jiahao Bu、Jianglu Hu、Yong Yang、Junqiang Zheng、Feifei Liu、Fengzong Lian、Rongpeng Chen、Xingwu Sun、Hao Gong、Shuang Chen、Winston Hu、Zekun He、Bo Wang、Saiyong Yang、Tao Yang、Chengzhong Xu、Chongqing Zhao、Jiajia Wu、Shihui Hu、Huilin Xu、Yue Mao、Xun Cao、Ao Liu、Wen Ouyang、Yinben Xia、Weiwen Jia、Xuemeng Huang、Rong Gan、Jiaqi Zhu、Yi Shen、Yuchi Deng
计算技术、计算机技术
Zilong Zhao,Guanghui Xu,Chong Zha,Yiqing Huang,Jianchen Zhu,Jie Liu,Ruobing Xie,Zhanhui Kang,Roberts Wang,Zhongzhi Chen,Yuhong Liu,Fusheng Xiang,Yangyu Tao,Zhiyuan Xiong,Jie Jiang,Dengpeng Wu,Xiao Feng,Zhichao Hu,Xiaoqin Ren,Xiang Li,Xirui Li,Lulu Wu,Decheng Wu,Meng Chen,Yuyuan Zeng,Mengyuan Huang,Qian Wang,Peijie Yu,Xiaobo Shu,Guiyang Li,Xinhua Feng,Yigeng Hong,Jianfeng Yan,Kan Wu,Jinbao Xue,Chao Yu,Tinghao Yu,Ruibin Chen,Jonny Han,Xuebin Hou,Fan Jiang,Kai Liu,Xipeng Zhang,Dian Jiao,Kai Zhang,Jiawei Song,Weijie Liu,Dong Du,Rui Yuan,Jianqiang Ma,Di Wang,Chengcheng Xu,Hao Fei,Lei Jiang,Hai Wang,Shuaipeng Li,Zhen Yang,Yaping Deng,Han Liu,Chenchen Zhang,Yiqi Chen,Bin Hu,Yanfeng Chen,Xiong Kuang,Feng Zhang,Jun Xia,Suncong Zheng,Hu Chen,Yao Ding,Zhenxiang Yan,Weichao Wang,Liang Dong,Shaohua Chen,Zifan Wu,Tengfei Cao,Zongwei Li,Ze Zhao,Jiahao Bu,Jianglu Hu,Yong Yang,Junqiang Zheng,Feifei Liu,Fengzong Lian,Rongpeng Chen,Xingwu Sun,Hao Gong,Shuang Chen,Winston Hu,Zekun He,Bo Wang,Saiyong Yang,Tao Yang,Chengzhong Xu,Chongqing Zhao,Jiajia Wu,Shihui Hu,Huilin Xu,Yue Mao,Xun Cao,Ao Liu,Wen Ouyang,Yinben Xia,Weiwen Jia,Xuemeng Huang,Rong Gan,Jiaqi Zhu,Yi Shen,Yuchi Deng.Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent[EB/OL].(2024-11-04)[2025-06-28].https://arxiv.org/abs/2411.02265.点此复制
评论