首页|Speeding up Model Loading with fastsafetensors

Speeding up Model Loading with fastsafetensors

来源：

英文摘要

The rapid increases in model parameter sizes introduces new challenges in pre-trained model loading. Currently, machine learning code often deserializes each parameter as a tensor object in host memory before copying it to device memory. We found that this approach underutilized storage throughput and significantly slowed down loading large models with a widely-used model file formats, safetensors. In this work, we present fastsafetensors, a Python library designed to optimize the deserialization of tensors in safetensors files. Our approach first copies groups of on-disk parameters to device memory, where they are directly instantiated as tensor objects. This design enables further optimization in low-level I/O and high-level tensor preprocessing, including parallelized copying, peer-to-peer DMA, and GPU offloading. Experimental results show performance improvements of 4.8x to 7.5x in loading models such as Llama (7, 13, and 70 billion parameters), Falcon (40 billion parameters), and the Bloom (176 billion parameters).

作者：Takeshi Yoshimura、Tatsuhiro Chiba、Manish Sethi、Daniel Waddington、Swaminathan Sundararaman

作者单位：

学科分类：计算技术、计算机技术

推荐引用：Takeshi Yoshimura,Tatsuhiro Chiba,Manish Sethi,Daniel Waddington,Swaminathan Sundararaman.Speeding up Model Loading with fastsafetensors[EB/OL].(2025-05-29)[2025-06-12].https://arxiv.org/abs/2505.23072.点此复制

Speeding up Model Loading with fastsafetensors

Speeding up Model Loading with fastsafetensors

评论