BEFI: Balanced and Efficient Federated Inference of Large Language Models

Lulu Zhang; Qian Tao; Zimu Zhou; Yuanyuan Zhang; Yuxiang Wang; Yongxin Tong

doi:10.1007/s11704-026-60010-4

Front. Comput. Sci. ›› DOI: 10.1007/s11704-026-60010-4

RESEARCH ARTICLE

BEFI: Balanced and Efficient Federated Inference of Large Language Models

Author information +

History +

PDF (4874KB)

Abstract

Large language models have shown significant performance across various tasks. This paper focuses on federated LLM inference, in which the context generated by silos requires being protected, resulting in both inefficiency and imbalance in communication overhead. To address this issue, this paper proposes BEFI, enabling Balanced and Efficient Federated LLM Inference with the equipment of carefully designed mechanisms for information sharing among silos. For important tokens, we propose a fine-grained, fixed-size cache sharing mechanism that enables direct and balanced sharing of the KV cache of these tokens. For less critical tokens, we propose a context-free intermediate states sharing mechanism, which allows for the sharing of tokens of arbitrary length with constant communication overhead. Finally, we evaluate the effectiveness of BEFI across various LLMs and privacy mechanisms, demonstrating that BEFI reduces communication overheads by up to 96.0%, while maintaining balanced communication.

Keywords

Large Language Model / Inference Optimization / Federated Learning

Cite this article

Download citation ▾

Lulu Zhang, Qian Tao, Zimu Zhou, Yuanyuan Zhang, Yuxiang Wang, Yongxin Tong. BEFI: Balanced and Efficient Federated Inference of Large Language Models. Front. Comput. Sci. DOI:10.1007/s11704-026-60010-4

登录浏览全文

4963

注册一个新账户忘记密码

References

Publishing order | Descend order by publishing year | Descend order by cited within

RIGHTS & PERMISSIONS

Higher Education Press 2026

PDF (4874KB)

219

Accesses

Citation

Detail

Sections

Recommended

About the journal

Aims & scope

Description

Editorial board

Abstracting / indexing

Contact us

Browse

Just accepted

All volumes and issues

Collections

Featured articles

Most accessed

Most cited

Collections

Multimedia collections

Authors & reviewers

Online submission

Call for papers

Guidelines for authors

Download templates

Guidelines for reviewers

Abstract

Keywords

Cite this article

References

RIGHTS & PERMISSIONS

Just Accepted