PREP: input-aware expert pruning for efficient MoE deployment
Chaoran ZHANG , Lixin ZOU , Xixun LIN , Wen ZOU
Front. Comput. Sci. ›› 2027, Vol. 21 ›› Issue (7) : 2107346
| [1] |
Antoniak S, Krutul M, Pióro M, Krajewski J, Ludziejewski J, Ciebiera K, Król K, Odrzygóźdź T, Cygan M, Jaszczur S. Mixture of tokens: continuous MoE through cross-example aggregation. In: Proceedings of the 38th International Conference on Neural Information Processing Systems. 2024, 3300 |
| [2] |
Xie Z, Zhang Y, Zhuang C, Shi Q, Liu Z, Gu J, Zhang G. MoDE: a mixture-of-experts model with mutual distillation among the experts. In: Proceedings of the 38th AAAI Conference on Artificial Intelligence. 2024, 16067−16075 |
| [3] |
He Y, Liu Y, Liang C, Awadalla H H. Efficiently editing mixture-of-experts models with compressed experts. In: Proceedings of Findings of the Association for Computational Linguistics: EMNLP 2025. 2025, 7227−7238 |
| [4] |
Liu E, Zhu J, Lin Z, Ning X, Blaschko M B, Yan S, Dai G, Yang H, Wang Y. Efficient expert pruning for sparse mixture-of-experts language models: enhancing performance and reducing inference costs. 2024, arXiv preprint arXiv: 2407.00945 |
| [5] |
Huang W, Liao Y, Liu J, He R, Tan H, Zhang S, Li H, Liu S, Qi X. Mixture compressor for mixture-of-experts LLMs gains more. In: Proceedings of the 13th International Conference on Learning Representations. 2025 |
| [6] |
|
| [7] |
Jiang A Q, Sablayrolles A, Roux A, Mensch A, Savary B, et al. Mixtral of experts. 2024, arXiv preprint arXiv: 2401.04088 |
| [8] |
Li P, Jin X, Cheng Y, Chen T. Examining post-training quantization for mixture-of-experts: a benchmark. 2024, arXiv preprint arXiv: 2406.08155v1 |
| [9] |
Lu X, Liu Q, Xu Y, Zhou A, Huang S, Zhang B, Yan J, Li H. Not all experts are equal: efficient expert pruning and skipping for mixture-of-experts large language models. In: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024, 6159−6172 |
Higher Education Press
/
| 〈 |
|
〉 |