Low-rank spectral adapter for parameter-efficient fine-tuning of transformer
Kun ZHANG , Guangyi LV , Yubin HUANGFU , Minglong XUE , Richang HONG , Jianping FAN , Xin LI , Si WEI
Front. Comput. Sci. ›› 2027, Vol. 21 ›› Issue (1) : 2101306
With the rapid development of Large Language Models (LLMs), fine-tuning LLMs with downstream data for better capability transferring has become the mainstream of LLM applications, where Parameter-Efficient Fine-Tuning (PEFT) methods play the most important role. Considering the core architecture of LLMs: transformer block, existing PEFT methods focus on using limited data to fine-tune a small number of parameters of only key components, such as self-attention and feed-forward net. They have achieved impressive performance, where representative works are Low-Rank Adapter (LoRA) and its variances (e.g., AdaLora, GLoRA). However, existing PEFT methods still suffer from severe shortcomings: the sensitivity to the selection of hyper-parameters (e.g., ranks, scales, etc.) and the sensitivity to the initialization of low-rank factors. Inappropriate settings will lead to overfitting or underfitting problem when tuning LLMs, resulting in unstable fine-tuning performance. Meanwhile, searching the optimal hyper-parameters is resource-intensive and experience-dependent. To this end, in this paper, we propose a novel PEFT method: SpecAdapt, which could adapt to various scenarios without sophisticated hyper-parameter tuning. Specifically, to tackle the hyper-parameter sensitivity problem, we design a Singular-guided Weight Decay strategy to control the complexity of fine-tuned parameters. For the table fine-tuning of LLMs, we develop a simple but effective Gradient Normalization module to improve the tuning stability. Extensive experiments on multiple transformer-based pre-trained large models across various benchmarks (i.e., two image benchmarks and one language benchmark) demonstrate the superiority of our proposed SpecAdapt (achieving 75.6% average accuracy and outperforming the state-of-the-art methods with fixed hyper-parameters across 19 datasets). We also release the code to support the community.
Transformer / PEFT / LoRA / LLMs
| [1] |
|
| [2] |
|
| [3] |
|
| [4] |
|
| [5] |
|
| [6] |
|
| [7] |
|
| [8] |
|
| [9] |
|
| [10] |
|
| [11] |
|
| [12] |
|
| [13] |
|
| [14] |
|
| [15] |
|
| [16] |
|
| [17] |
|
| [18] |
|
| [19] |
|
| [20] |
|
| [21] |
|
| [22] |
|
| [23] |
|
| [24] |
|
| [25] |
|
| [26] |
|
| [27] |
|
| [28] |
|
| [29] |
|
| [30] |
|
| [31] |
|
| [32] |
|
| [33] |
|
| [34] |
|
| [35] |
|
| [36] |
|
| [37] |
|
| [38] |
|
| [39] |
|
| [40] |
|
| [41] |
|
| [42] |
|
| [43] |
|
| [44] |
|
| [45] |
|
| [46] |
|
| [47] |
Wang A, Singh A, Michael J, Hill F, Levy O, Bowman S R. GLUE: a multi-task benchmark and analysis platform for natural language understanding. In: Proceedings of 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. 2018, 353–355 |
| [48] |
|
| [49] |
|
| [50] |
|
| [51] |
|
| [52] |
|
| [53] |
|
| [54] |
|
| [55] |
|
| [56] |
|
Higher Education Press
/
| 〈 |
|
〉 |