|国家预印本平台
首页|Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models

Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models

Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models

来源:Arxiv_logoArxiv
英文摘要

As inference-time scaling becomes critical for enhanced reasoning capabilities, it is increasingly becoming important to build models that are efficient to infer. We introduce Nemotron-H, a family of 8B and 56B/47B hybrid Mamba-Transformer models designed to reduce inference cost for a given accuracy level. To achieve this goal, we replace the majority of self-attention layers in the common Transformer model architecture with Mamba layers that perform constant computation and require constant memory per generated token. We show that Nemotron-H models offer either better or on-par accuracy compared to other similarly-sized state-of-the-art open-sourced Transformer models (e.g., Qwen-2.5-7B/72B and Llama-3.1-8B/70B), while being up to 3$\times$ faster at inference. To further increase inference speed and reduce the memory required at inference time, we created Nemotron-H-47B-Base from the 56B model using a new compression via pruning and distillation technique called MiniPuzzle. Nemotron-H-47B-Base achieves similar accuracy to the 56B model, but is 20% faster to infer. In addition, we introduce an FP8-based training recipe and show that it can achieve on par results with BF16-based training. This recipe is used to train the 56B model. We are releasing Nemotron-H base model checkpoints with support in Hugging Face and NeMo.

Duncan Riach、Evelina Bakhturina、Guilin Liu、Gargi Prasad、Dima Rekesh、Elad Segal、Samuel Kriman、Gerald Shen、Dusan Stosic、Ewa Dobrowolska、Selvaraj Anandaraj、Eileen Long、Jiaxuan You、Jason Sewall、Erick Galinkin、Ivan Moshkov、Izik Golan、Jiaqi Zeng、Krzysztof Pawelec、Jimmy Zhang、Jing Zhang、Jining Huang、Jinze Xue、Maciej Bala、Jan Kautz、Jane Polak Scowcroft、Kumar Anik、Kunlun Li、Jared Casper、Eric Chung、Fuxiao Liu、Jocelyn Huang、:、Aaron Blakeman、Aarti Basant、Abhinav Khattar、Adithya Renduchintala、Akhiad Bercovich、Aleksander Ficek、Alexis Bjorlin、Ali Taghibakhshi、Amala Sanjay Deshmukh、Ameya Sunil Mahabaleshwarkar、Andrew Tao、Anna Shors、Ashwath Aithal、Ashwin Poojary、Ayush Dattagupta、Balaram Buddharaju、Bobby Chen、Jarno Seppanen、Jason Lu、Kezhi Kong、Oleg Rybakov、Oleksii Kuchaiev、Olivier Delalleau、Osvald Nitski、Parth Chadha、Pasha Shamis、Paulius Micikevicius、Pavlo Molchanov、Peter Dykas、Philipp Fischer、Pierre-Yves Aquilanti、Piotr Bialecki、Prasoon Varshney、Pritam Gundecha、Boris Ginsburg、Boxin Wang、Brandon Norick、Brian Butterfield、Bryan Catanzaro、Carlo del Mundo、Chengyu Dong、Christine Harvey、Christopher Parisien、Dan Su、Daniel Korzekwa、Danny Yin、Daria Gitman、David Mosallanezhad、Deepak Narayanan、Denys Fridman、Ding Ma、Dmytro Pykhtar、Dong Ahn、Ellie Evans、Fei Jia、Guo Chen、Haifeng Qian、Helen Ngo、Hongbin Liu、Hui Li、Igor Gitman、Ilia Karmanov、Michael Andersch、Michael Evans、Miguel Martinez、Mike Chrzanowski、Mike Ranzinger、Mikolaj Blaz、Misha Smelyanskiy、Mohamed Fawzy、Mohammad Shoeybi、Mostofa Patwary、Nayeon Lee、Nima Tajbakhsh、Ning Xu、Shubham Pachori、Shubham Toshniwal、Shyamala Prayaga、Siddhartha Jain、Sirshak Das、Slawek Kierat、Somshubra Majumdar、Song Han、Soumye Singhal、Sriharsha Niverty、Stefania Alborghetti、Suseella Panguluri、Swetha Bhendigeri、Syeda Nahida Akter、Szymon Migacz、Tal Shiri、Terry Kong、Timo Roman、Tomer Ronen、Trisha Saar、Tugrul Konuk、Tuomas Rintamaki、Tyler Poon、Ushnish De、Vahid Noroozi、Varun Singh、Vijay Korthikanti、Vitaly Kurin、Wasi Uddin Ahmad、Wei Du、Wei Ping、Wenliang Dai、Wonmin Byeon、Xiaowei Ren、Yao Xu、Yejin Choi、Yian Zhang、Ying Lin、Yoshi Suhara、Zhiding Yu、Zhiqi Li、Zhiyu Li、Zhongbo Zhu、Zhuolin Yang、Zijia Chen、Joey Conway、John Kamalu、Kateryna Chumachenko、Kirthi Sivamani、Jon Barker、Jonathan Cohen、Joseph Jennings、Jupinder Parmar、Karan Sapra、Kari Briski、Przemek Tredak、Rabeeh Karimi、Rahul Kandu、Ran El-Yaniv、Raviraj Joshi、Roger Waleffe、Ruoxi Zhang、Sabrina Kavanaugh、Sahil Jain、Sangkug Lym、Sanjeev Satheesh、Saurav Muralidharan、Sean Narenthiran、Seonmyeong Bak、Sergey Kashirsky、Seungju Han、Shantanu Acharya、Shaona Ghosh、Sharath Turuvekere Sreenivas、Sharon Clay、Shelby Thomas、Shrimai Prabhumoye、Maer Rodrigues de Melo、Makesh Narsimhan Sreedhar、Marcin Chochowski、Markus Kliegl、Marta Stepniewska-Dziubinska、Matthieu Le、Matvei Novikov、Katherine Luna、Keshav Santhanam、NVIDIA、Lawrence McAfee、Leon Derczynski、Mehrzad Samadi、Lindsey Pavao、Luis Vega、Lukas Voegtle

计算技术、计算机技术

Duncan Riach,Evelina Bakhturina,Guilin Liu,Gargi Prasad,Dima Rekesh,Elad Segal,Samuel Kriman,Gerald Shen,Dusan Stosic,Ewa Dobrowolska,Selvaraj Anandaraj,Eileen Long,Jiaxuan You,Jason Sewall,Erick Galinkin,Ivan Moshkov,Izik Golan,Jiaqi Zeng,Krzysztof Pawelec,Jimmy Zhang,Jing Zhang,Jining Huang,Jinze Xue,Maciej Bala,Jan Kautz,Jane Polak Scowcroft,Kumar Anik,Kunlun Li,Jared Casper,Eric Chung,Fuxiao Liu,Jocelyn Huang,:,Aaron Blakeman,Aarti Basant,Abhinav Khattar,Adithya Renduchintala,Akhiad Bercovich,Aleksander Ficek,Alexis Bjorlin,Ali Taghibakhshi,Amala Sanjay Deshmukh,Ameya Sunil Mahabaleshwarkar,Andrew Tao,Anna Shors,Ashwath Aithal,Ashwin Poojary,Ayush Dattagupta,Balaram Buddharaju,Bobby Chen,Jarno Seppanen,Jason Lu,Kezhi Kong,Oleg Rybakov,Oleksii Kuchaiev,Olivier Delalleau,Osvald Nitski,Parth Chadha,Pasha Shamis,Paulius Micikevicius,Pavlo Molchanov,Peter Dykas,Philipp Fischer,Pierre-Yves Aquilanti,Piotr Bialecki,Prasoon Varshney,Pritam Gundecha,Boris Ginsburg,Boxin Wang,Brandon Norick,Brian Butterfield,Bryan Catanzaro,Carlo del Mundo,Chengyu Dong,Christine Harvey,Christopher Parisien,Dan Su,Daniel Korzekwa,Danny Yin,Daria Gitman,David Mosallanezhad,Deepak Narayanan,Denys Fridman,Ding Ma,Dmytro Pykhtar,Dong Ahn,Ellie Evans,Fei Jia,Guo Chen,Haifeng Qian,Helen Ngo,Hongbin Liu,Hui Li,Igor Gitman,Ilia Karmanov,Michael Andersch,Michael Evans,Miguel Martinez,Mike Chrzanowski,Mike Ranzinger,Mikolaj Blaz,Misha Smelyanskiy,Mohamed Fawzy,Mohammad Shoeybi,Mostofa Patwary,Nayeon Lee,Nima Tajbakhsh,Ning Xu,Shubham Pachori,Shubham Toshniwal,Shyamala Prayaga,Siddhartha Jain,Sirshak Das,Slawek Kierat,Somshubra Majumdar,Song Han,Soumye Singhal,Sriharsha Niverty,Stefania Alborghetti,Suseella Panguluri,Swetha Bhendigeri,Syeda Nahida Akter,Szymon Migacz,Tal Shiri,Terry Kong,Timo Roman,Tomer Ronen,Trisha Saar,Tugrul Konuk,Tuomas Rintamaki,Tyler Poon,Ushnish De,Vahid Noroozi,Varun Singh,Vijay Korthikanti,Vitaly Kurin,Wasi Uddin Ahmad,Wei Du,Wei Ping,Wenliang Dai,Wonmin Byeon,Xiaowei Ren,Yao Xu,Yejin Choi,Yian Zhang,Ying Lin,Yoshi Suhara,Zhiding Yu,Zhiqi Li,Zhiyu Li,Zhongbo Zhu,Zhuolin Yang,Zijia Chen,Joey Conway,John Kamalu,Kateryna Chumachenko,Kirthi Sivamani,Jon Barker,Jonathan Cohen,Joseph Jennings,Jupinder Parmar,Karan Sapra,Kari Briski,Przemek Tredak,Rabeeh Karimi,Rahul Kandu,Ran El-Yaniv,Raviraj Joshi,Roger Waleffe,Ruoxi Zhang,Sabrina Kavanaugh,Sahil Jain,Sangkug Lym,Sanjeev Satheesh,Saurav Muralidharan,Sean Narenthiran,Seonmyeong Bak,Sergey Kashirsky,Seungju Han,Shantanu Acharya,Shaona Ghosh,Sharath Turuvekere Sreenivas,Sharon Clay,Shelby Thomas,Shrimai Prabhumoye,Maer Rodrigues de Melo,Makesh Narsimhan Sreedhar,Marcin Chochowski,Markus Kliegl,Marta Stepniewska-Dziubinska,Matthieu Le,Matvei Novikov,Katherine Luna,Keshav Santhanam,NVIDIA,Lawrence McAfee,Leon Derczynski,Mehrzad Samadi,Lindsey Pavao,Luis Vega,Lukas Voegtle.Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models[EB/OL].(2025-04-04)[2025-05-05].https://arxiv.org/abs/2504.03624.点此复制

评论