|国家预印本平台
首页|Tensor-Parallelism with Partially Synchronized Activations

Tensor-Parallelism with Partially Synchronized Activations

Tensor-Parallelism with Partially Synchronized Activations

来源:Arxiv_logoArxiv
英文摘要

Training and inference of Large Language Models (LLMs) with tensor-parallelism requires substantial communication to synchronize activations. Our findings suggest that with a few minor adjustments to current practices, LLMs can be trained without fully synchronizing activations, reducing bandwidth demands. We name this "Communication-Aware Architecture for Tensor-parallelism" (CAAT-Net). We train 1B and 7B parameter CAAT-Net models, with a 50% reduction in tensor-parallel communication and no significant drop in pretraining accuracy. Furthermore, we demonstrate how CAAT-Net accelerates both training and inference workloads.

Itay Lamprecht、Asaf Karnieli、Yair Hanani、Niv Giladi、Daniel Soudry

计算技术、计算机技术

Itay Lamprecht,Asaf Karnieli,Yair Hanani,Niv Giladi,Daniel Soudry.Tensor-Parallelism with Partially Synchronized Activations[EB/OL].(2025-06-24)[2025-07-17].https://arxiv.org/abs/2506.19645.点此复制

评论