Tuning: an efficient tuning paradigm for large-scale pre-trained models via label representation learning
Yitao LIU , Chenxin AN , Xipeng QIU
Front. Comput. Sci. ›› 2024, Vol. 18 ›› Issue (4) : 184320
Tuning: an efficient tuning paradigm for large-scale pre-trained models via label representation learning
With current success of large-scale pre-trained models (PTMs), how efficiently adapting PTMs to downstream tasks has attracted tremendous attention, especially for PTMs with billions of parameters. Previous work focuses on designing parameter-efficient tuning paradigms but needs to save and compute the gradient of the whole computational graph. In this paper, we propose -Tuning, an efficient yet effective paradigm to adapt frozen large-scale PTMs to specific downstream tasks. -Tuning learns dense representations for labels defined in a given task and aligns them to fixed feature representation. Without computing the gradients of text encoder at training phrase, -Tuning is not only parameter-efficient but also training-efficient. Experimental results show that for with 1.6 billion parameters, -Tuning achieves performance more than of full fine-tuning on GLUE Benchmark with only tunable parameters and much fewer training costs.
pre-trained model / lightweight fine-tuning paradigms / label representation
| [1] |
|
| [2] |
|
| [3] |
|
| [4] |
|
| [5] |
|
| [6] |
|
| [7] |
|
| [8] |
|
| [9] |
|
| [10] |
|
| [11] |
|
| [12] |
Le Scao T, Rush A. How many data points is a prompt worth? In: Proceedings of 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2021, 2627−2636 |
| [13] |
|
| [14] |
Petroni F, Rocktaschel T, Riedel S, Lewis P, Bakhtin A, Wu Y, Miller A. Language models as knowledge bases? In: Proceedings of 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 2019, 2463−2473 |
| [15] |
Jiang Z, Xu F F, Araki J, Neubig G. How can we know what language models know? Transactions of the Association for Computational Linguistics, 2020, 8: 423−438 |
| [16] |
|
| [17] |
Hu E J, Shen Y, Wallis P, Allen-Zhu Z, Li Y, Wang S, Wang L, Chen W. Lora: low-rank adaptation of large language models. In: Proceedings of the 10th International Conference on Learning Representations. 2021 |
| [18] |
He J, Zhou C, Ma X, Berg-Kirkpatrick T, Neubig G. Towards a unified view of parameter-efficient transfer learning. In: Proceedings of the 10th International Conference on Learning Representations. 2022 |
| [19] |
|
| [20] |
|
| [21] |
|
| [22] |
|
| [23] |
Wolf T, Debut L, Sanh V, Chaumond J, Delangue C, Moi A, Cistac P, Rault T, Louf R, Funtowicz M, Davison J, Shleifer S, von Platen P, Ma C, Jernite Y, Plu J, Xu C, Le Scao T, Gugger S, Drame M, Lhoest Q, Rush A. Transformers: state-of-the-art natural language processing. In: Proceedings of 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. 2020, 38−45 |
| [24] |
|
| [25] |
|
| [26] |
|
| [27] |
|
| [28] |
|
| [29] |
Jin D, Jin Z, Zhou J T, Szolovits P. Is BERT really robust? A strong baseline for natural language attack on text classification and entailment. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence. 2020, 8018−8025 |
| [30] |
|
| [31] |
|
| [32] |
|
| [33] |
|
| [34] |
|
| [35] |
Yeh C K, Wu W C, Ko W J, Wang Y C F. Learning deep latent space for multi-label classification. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence. 2017, 2838−2844 |
| [36] |
|
| [37] |
Wang H, Chen C, Liu W, Chen K, Hu T, Chen G. Incorporating label embedding and feature augmentation for multi-dimensional classification. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence. 2020, 6178−6185 |
| [38] |
|
| [39] |
|
| [40] |
|
| [41] |
Sun C, Huang L, Qiu X. Utilizing BERT for aspect-based sentiment analysis via constructing auxiliary sentence. In: Proceedings of 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2019, 380−385 |
| [42] |
|
| [43] |
|
Higher Education Press
Supplementary files
/
| 〈 |
|
〉 |