|国家预印本平台
首页|Federated Learning-Enabled Hybrid Language Models for Communication-Efficient Token Transmission

Federated Learning-Enabled Hybrid Language Models for Communication-Efficient Token Transmission

Federated Learning-Enabled Hybrid Language Models for Communication-Efficient Token Transmission

来源:Arxiv_logoArxiv
英文摘要

Hybrid Language Models (HLMs) combine the low-latency efficiency of Small Language Models (SLMs) on edge devices with the high accuracy of Large Language Models (LLMs) on centralized servers. Unlike traditional end-to-end LLM inference, HLMs reduce latency and communication by invoking LLMs only when local SLM predictions are uncertain, i.e., when token-level confidence is low or entropy is high. However, ambiguous or low-confidence predictions still require frequent offloading to the LLM, leading to significant communication overhead in bandwidth-constrained settings. To address this, we propose FedHLM, a communication-efficient HLM framework that integrates uncertainty-aware inference with Federated Learning (FL). FedHLM's key innovation lies in collaboratively learning token-level uncertainty thresholds that govern when LLM assistance is needed. Rather than using static or manually tuned thresholds, FedHLM employs FL to optimize these thresholds in a privacy-preserving, distributed manner. Additionally, it leverages embedding-based token representations for Peer-to-Peer (P2P) resolution, enabling clients to reuse tokens inferred by semantically similar peers without engaging the LLM. We further introduce hierarchical model aggregation: edge servers refine local routing policies through client updates, while cross-cluster coordination aligns global decision boundaries. This layered design captures recurring uncertainty patterns, reducing redundant LLM queries. Experiments on large-scale news classification tasks show that FedHLM reduces LLM transmissions by over 95 percent with negligible accuracy loss, making it well-suited for scalable and efficient edge-AI applications.

Faranaksadat Solat、Joohyung Lee、Mohamed Seif、Dusit Niyato、H. Vincent Poor

计算技术、计算机技术

Faranaksadat Solat,Joohyung Lee,Mohamed Seif,Dusit Niyato,H. Vincent Poor.Federated Learning-Enabled Hybrid Language Models for Communication-Efficient Token Transmission[EB/OL].(2025-06-30)[2025-07-21].https://arxiv.org/abs/2507.00082.点此复制

评论