PREP: input-aware expert pruning for efficient MoE deployment

Chaoran ZHANG , Lixin ZOU , Xixun LIN , Wen ZOU

Front. Comput. Sci. ›› 2027, Vol. 21 ›› Issue (7) : 2107346

PDF (642KB)
Front. Comput. Sci. ›› 2027, Vol. 21 ›› Issue (7) :2107346 DOI: 10.1007/s11704-026-52030-x
Artificial Intelligence
LETTER
PREP: input-aware expert pruning for efficient MoE deployment
Author information +
History +
PDF (642KB)

Graphical abstract

Cite this article

Download citation ▾
Chaoran ZHANG, Lixin ZOU, Xixun LIN, Wen ZOU. PREP: input-aware expert pruning for efficient MoE deployment. Front. Comput. Sci., 2027, 21 (7) : 2107346 DOI:10.1007/s11704-026-52030-x

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Antoniak S, Krutul M, Pióro M, Krajewski J, Ludziejewski J, Ciebiera K, Król K, Odrzygóźdź T, Cygan M, Jaszczur S. Mixture of tokens: continuous MoE through cross-example aggregation. In: Proceedings of the 38th International Conference on Neural Information Processing Systems. 2024, 3300

[2]

Xie Z, Zhang Y, Zhuang C, Shi Q, Liu Z, Gu J, Zhang G. MoDE: a mixture-of-experts model with mutual distillation among the experts. In: Proceedings of the 38th AAAI Conference on Artificial Intelligence. 2024, 16067−16075

[3]

He Y, Liu Y, Liang C, Awadalla H H. Efficiently editing mixture-of-experts models with compressed experts. In: Proceedings of Findings of the Association for Computational Linguistics: EMNLP 2025. 2025, 7227−7238

[4]

Liu E, Zhu J, Lin Z, Ning X, Blaschko M B, Yan S, Dai G, Yang H, Wang Y. Efficient expert pruning for sparse mixture-of-experts language models: enhancing performance and reducing inference costs. 2024, arXiv preprint arXiv: 2407.00945

[5]

Huang W, Liao Y, Liu J, He R, Tan H, Zhang S, Li H, Liu S, Qi X. Mixture compressor for mixture-of-experts LLMs gains more. In: Proceedings of the 13th International Conference on Learning Representations. 2025

[6]

Hu G, Wang Z . A study and formal framework of the composability of LLM compression techniques. Frontiers of Computer Science, 2026, 20( 9): 2009616

[7]

Jiang A Q, Sablayrolles A, Roux A, Mensch A, Savary B, et al. Mixtral of experts. 2024, arXiv preprint arXiv: 2401.04088

[8]

Li P, Jin X, Cheng Y, Chen T. Examining post-training quantization for mixture-of-experts: a benchmark. 2024, arXiv preprint arXiv: 2406.08155v1

[9]

Lu X, Liu Q, Xu Y, Zhou A, Huang S, Zhang B, Yan J, Li H. Not all experts are equal: efficient expert pruning and skipping for mixture-of-experts large language models. In: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024, 6159−6172

RIGHTS & PERMISSIONS

Higher Education Press

PDF (642KB)

Supplementary files

Highlights

Supplementary materials

312

Accesses

0

Citation

Detail

Sections
Recommended

/