Low-rank spectral adapter for parameter-efficient fine-tuning of transformer

Kun ZHANG , Guangyi LV , Yubin HUANGFU , Minglong XUE , Richang HONG , Jianping FAN , Xin LI , Si WEI

Front. Comput. Sci. ›› 2027, Vol. 21 ›› Issue (1) : 2101306

PDF (3016KB)
Front. Comput. Sci. ›› 2027, Vol. 21 ›› Issue (1) :2101306 DOI: 10.1007/s11704-025-50771-9
Artificial Intelligence
RESEARCH ARTICLE
Low-rank spectral adapter for parameter-efficient fine-tuning of transformer
Author information +
History +
PDF (3016KB)

Abstract

With the rapid development of Large Language Models (LLMs), fine-tuning LLMs with downstream data for better capability transferring has become the mainstream of LLM applications, where Parameter-Efficient Fine-Tuning (PEFT) methods play the most important role. Considering the core architecture of LLMs: transformer block, existing PEFT methods focus on using limited data to fine-tune a small number of parameters of only key components, such as self-attention and feed-forward net. They have achieved impressive performance, where representative works are Low-Rank Adapter (LoRA) and its variances (e.g., AdaLora, GLoRA). However, existing PEFT methods still suffer from severe shortcomings: the sensitivity to the selection of hyper-parameters (e.g., ranks, scales, etc.) and the sensitivity to the initialization of low-rank factors. Inappropriate settings will lead to overfitting or underfitting problem when tuning LLMs, resulting in unstable fine-tuning performance. Meanwhile, searching the optimal hyper-parameters is resource-intensive and experience-dependent. To this end, in this paper, we propose a novel PEFT method: SpecAdapt, which could adapt to various scenarios without sophisticated hyper-parameter tuning. Specifically, to tackle the hyper-parameter sensitivity problem, we design a Singular-guided Weight Decay strategy to control the complexity of fine-tuned parameters. For the table fine-tuning of LLMs, we develop a simple but effective Gradient Normalization module to improve the tuning stability. Extensive experiments on multiple transformer-based pre-trained large models across various benchmarks (i.e., two image benchmarks and one language benchmark) demonstrate the superiority of our proposed SpecAdapt (achieving 75.6% average accuracy and outperforming the state-of-the-art methods with fixed hyper-parameters across 19 datasets). We also release the code to support the community.

Graphical abstract

Keywords

Transformer / PEFT / LoRA / LLMs

Cite this article

Download citation ▾
Kun ZHANG, Guangyi LV, Yubin HUANGFU, Minglong XUE, Richang HONG, Jianping FAN, Xin LI, Si WEI. Low-rank spectral adapter for parameter-efficient fine-tuning of transformer. Front. Comput. Sci., 2027, 21(1): 2101306 DOI:10.1007/s11704-025-50771-9

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Saha O, Van Horn G, Maji S. Improved zero-shot classification by adapting VLMs with text descriptions. In: Proceedings of 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024, 17542−17552

[2]

Kim J, Ku Y, Kim J, Cha J, Baek S. VLM-Pl: advanced pseudo labeling approach for class incremental object detection via vision-language model. In: Proceedings of 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024, 4170−4181

[3]

Huang J, Chang K C C. Towards reasoning in large language models: a survey. In: Proceedings of Findings of the Association for Computational Linguistics: ACL 2023. 2023, 1049−1065

[4]

Yang J, Lin K, Yu X. Think when you need: Self-adaptive chain-of-thought learning. 2025, arXiv preprint arXiv: 2504.03234

[5]

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser Ł, Polosukhin I. Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, 6000−6010

[6]

Zhang S, Dong L, Li X, Zhang S, Sun X, Wang S, Li J, Hu R, Zhang T, Wu F, Wang G. Instruction tuning for large language models: a survey. 2025, arXiv preprint arXiv: 2308.10792

[7]

Li X L, Liang P. Prefix-tuning: optimizing continuous prompts for generation. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. 2021, 4582−4597

[8]

Hu E J, Shen Y, Wallis P, Allen-Zhu Z, Li Y, Wang S, Wang L, Chen W. Lora: low-rank adaptation of large language models. In: Proceedings of the 10th International Conference on Learning Representations. 2022

[9]

Zhang Q, Chen M, Bukharin A, He P, Cheng Y, Chen W, Zhao T. Adaptive budget allocation for parameter-efficient fine-tuning. In: Proceedings of the 10th International Conference on Learning Representations. 2023

[10]

Wang Y, Lin Y, Zeng X, Zhang G. MultiLoRA: democratizing LoRA for better multi-task learning. 2023, arXiv preprint arXiv: 2311.11501

[11]

Li D, Ma Y, Wang N, Ye Z, Cheng Z, Tang Y, Zhang Y, Duan L, Zuo J, Yang C, Tang M. MixLoRA: enhancing large language models fine-tuning with LoRA-based mixture of experts. 2024, arXiv preprint arXiv: 2404.15159

[12]

Zhang D, Yang F, Zhang K, Li X, Wei S, Hong R, Wang M . Optimizing low-rank adaptation with decomposed matrices and adaptive rank allocation. Frontiers of Computer Science, 2025, 19( 5): 195337

[13]

Zhang D, Zhang K, Chu S, Wu L, Li X, Wei S. MoRE: a mixture of low-rank experts for adaptive multi-task learning. In: Proceedings of Findings of the Association for Computational Linguistics: ACL 2025. 2025, 1311−1324

[14]

Mao Y, Ge Y, Fan Y, Xu W, Mi Y, Hu Z, Gao Y . A survey on LoRA of large language models. Frontiers of Computer Science, 2025, 19( 7): 197605

[15]

Helson H. The spectral theorem. In: Helson H, ed. The Spectral Theorem. Berlin: Springer, 1986, 23−41

[16]

Han Z, Gao C, Liu J, Zhang J, Zhang S Q. Parameter-efficient fine-tuning for large models: a comprehensive survey. 2024, arXiv preprint arXiv:2403.14608

[17]

Lei T, Bai J, Brahma S, Ainslie J, Lee K, Zhou Y, Du N, Zhao V Y, Wu Y, Li B, Zhang Y, Chang M W. Conditional adapters: parameter-efficient transfer learning with fast inference. In: Proceedings of the 37th Conference on Neural Information Processing Systems. 2023, 8152−8172

[18]

Pfeiffer J, Kamath A, Rücklé A, Cho K, Gurevych I. AdapterFusion: non-destructive task composition for transfer learning. In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. 2021, 487–503

[19]

Wang Y, Agarwal S, Mukherjee S, Liu X, Gao J, Awadallah A H, Gao J. AdaMix: mixture-of-adaptations for parameter-efficient model tuning. In: Proceedings of 2022 Conference on Empirical Methods in Natural Language Processing. 2022, 5744−5760

[20]

Karimi Mahabadi R, Ruder S, Dehghani M, Henderson J. Parameter-efficient multi-task fine-tuning for transformers via shared hypernetworks. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021, 565−576

[21]

Liu X, Ji K, Fu Y, Tam W, Du Z, Yang Z, Tang J. P-tuning: prompt tuning can be comparable to fine-tuning across scales and tasks. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 2022, 61−68

[22]

Ma F, Zhang C, Ren L, Wang J, Wang Q, Wu W, Quan X, Song D. XPrompt: exploring the extreme of prompt tuning. In: Proceedings of 2022 Conference on Empirical Methods in Natural Language Processing. 2022, 11033−11047

[23]

Zhang Q, Chen M, Bukharin A, Karampatziakis N, He P, Cheng Y, Chen W, Zhao T. AdaLoRA: adaptive budget allocation for parameter-efficient fine-tuning. 2023, arXiv preprint arXiv: 2303.10512

[24]

Mao Y, Huang K, Guan C, Bao G, Mo F, Xu J. DoRA: enhancing parameter-efficient fine-tuning with dynamic rank distribution. In: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024, 11662–11675

[25]

Valipour M, Rezagholizadeh M, Kobyzev I, Ghodsi A. DyLoRA: parameter-efficient tuning of pre-trained models using dynamic search-free low-rank adaptation. In: Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics. 2023, 3274−3287

[26]

Mao Y, Mathias L, Hou R, Almahairi A, Ma H, Han J, Yih S, Khabsa M. UniPELT: a unified framework for parameter-efficient language model tuning. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2022, 6253−6264

[27]

He J, Zhou C, Ma X, Berg-Kirkpatrick T, Neubig G. Towards a unified view of parameter-efficient transfer learning. In: Proceedings of International Conference on Learning Representations. 2022

[28]

Zhang Y, Zhou K, Liu Z . Neural prompt search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025, 47( 7): 5268–5280

[29]

Jia M, Tang L, Chen B C, Cardie C, Belongie S, Hariharan B, Lim S N. Visual prompt tuning. In: Proceedings of the 17th European Conference on Computer Vision. 2022, 709−727

[30]

Zhou K, Yang J, Loy C C, Liu Z. Conditional prompt learning for vision-language models. In: Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022, 16795−16804

[31]

Liu X, Sun T, Huang X, Qiu X. Late prompt tuning: a late prompt could be better than many prompts. In: Proceedings of Findings of the Association for Computational Linguistics: EMNLP 2022. 2022, 1325−1338

[32]

Chavan A, Liu Z, Gupta D, Xing E, Shen Z. One-for-all: generalized LoRA for parameter-efficient fine-tuning. 2023, arXiv preprint arXiv: 2306.07967

[33]

Vandenhende S, Georgoulis S, Van Gansbeke W, Proesmans M, Dai D, Van Gool L . Multi-task learning for dense prediction tasks: a survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44( 7): 3614–3633

[34]

Yang E, Pan J, Wang X, Yu H, Shen L, Chen X, Xiao L, Jiang J, Guo G. AdaTask: a task-aware adaptive learning rate approach to multi-task learning. In: Proceedings of the 37th AAAI Conference on Artificial Intelligence. 2023, 10745−10753

[35]

Oksuz K, Cam B C, Kalkan S, Akbas E . Imbalance problems in object detection: a review. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43( 10): 3388–3415

[36]

Chen Z, Badrinarayanan V, Lee C Y, Rabinovich A. GradNorm: gradient normalization for adaptive loss balancing in deep multitask networks. In: Proceedings of the 35th International Conference on Machine Learning. 2018, 794−803

[37]

He Y, Feng X, Cheng C, Ji G, Guo Y, Caverlee J. MetaBalance: improving multi-task recommendations via adapting gradient magnitudes of auxiliary tasks. In: Proceedings of ACM Web Conference 2022. 2022, 2205−2215

[38]

Liu Q, Wu X, Zhao X, Zhu Y, Xu D, Tian F, Zheng Y. MOELoRA: an MOE-based parameter efficient fine-tuning method for multi-task medical applications. 2024, arXiv preprint arXiv: 2310.18339

[39]

Klema V, Laub A . The singular value decomposition: its computation and some applications. IEEE Transactions on Automatic Control, 1980, 25( 2): 164–176

[40]

Zhai X, Puigcerver J, Kolesnikov A, Ruyssen P, Riquelme C, Lucic M, Djolonga J, Pinto A S, Neumann M, Dosovitskiy A, Beyer L, Bachem O, Tschannen M, Michalski M, Bousquet O, Gelly S, Houlsby N. A large-scale study of representation learning with the visual task adaptation benchmark. 2019, arXiv preprint arXiv: 1910.04867

[41]

Jie S, Deng Z H. FacT: factor-tuning for lightweight adaptation on vision transformer. In: Proceedings of the 37th AAAI Conference on Artificial Intelligence. 2023, 1060−1068

[42]

Bossard L, Guillaumin M, van Gool L. Food-101 - mining discriminative components with random forests. In: Proceedings of the 13th European Conference on Computer Vision. 2014, 446−461

[43]

Krause J, Stark M, Deng J, Fei-Fei L. 3D object representations for fine-grained categorization. In: Proceedings of 2013 IEEE International Conference on Computer Vision Workshops. 2013, 554−561

[44]

Nilsback M E, Zisserman A. A visual vocabulary for flower classification. In: Proceedings of 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2006, 1447−1454

[45]

Maji S, Rahtu E, Kannala J, Blaschko M, Vedaldi A. Fine-grained visual classification of aircraft. 2013, arXiv preprint arXiv: 1306.5151

[46]

Parkhi O M, Vedaldi A, Zisserman A, Jawahar C V. Cats and dogs. In: Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. 2012, 3498–3505

[47]

Wang A, Singh A, Michael J, Hill F, Levy O, Bowman S R. GLUE: a multi-task benchmark and analysis platform for natural language understanding. In: Proceedings of 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. 2018, 353–355

[48]

Zaken E B, Goldberg Y, Ravfogel S. BitFit: simple parameter-efficient fine-tuning for transformer-based masked language-models. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 2022, 1−9

[49]

Mahabadi R K, Henderson J, Ruder S. COMPACTER: efficient low-rank hypercomplex adapter layers. In: Proceedings of the 35th Conference on Neural Information Processing Systems. 2021, 1022−1035

[50]

Chen S, Ge C, Tong Z, Wang J, Song Y, Wang J, Luo P. AdaptFormer: adapting vision transformers for scalable visual recognition. In: Proceedings of the 36th Conference on Neural Information Processing Systems. 2022, 16664−16678

[51]

Luo G, Huang M, Zhou Y, Sun X, Jiang G, Wang Z, Ji R. Towards efficient visual adaption via structural re-parameterization. 2023, arXiv preprint arXiv: 2302.08106

[52]

Houlsby N, Giurgiu A, Jastrzebski S, Morrone B, De Laroussilhe Q, Gesmundo A, Attariyan M, Gelly S. Parameter-efficient transfer learning for NLP. In: Proceedings of the 36th International Conference on Machine Learning. 2019, 2790−2799

[53]

Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J, Houlsby N. An image is worth 16x16 words: transformers for image recognition at scale. In: Proceedings of the 9th International Conference on Learning Representations. 2021

[54]

Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B. Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. 2021, 9992−10002

[55]

Devlin J, Chang M W, Lee K, Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019, 4171−4186

[56]

Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Zhou Y, Li W, Liu P J . Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 2020, 21( 1): 140

RIGHTS & PERMISSIONS

Higher Education Press

PDF (3016KB)

Supplementary files

Highlights

262

Accesses

0

Citation

Detail

Sections
Recommended

/