Large language models meet NLP: a survey
Libo QIN , Qiguang CHEN , Xiachong FENG , Yang WU , Yongheng ZHANG , Yinghui LI , Min LI , Wanxiang CHE , Philip S. YU
Front. Comput. Sci. ›› 2026, Vol. 20 ›› Issue (11) : 2011361
Large language models meet NLP: a survey
While large language models (LLMs) like ChatGPT have shown impressive capabilities in Natural Language Processing (NLP) tasks, a systematic investigation of their potential in this field remains largely unexplored. This study aims to address this gap by exploring the following questions. (1) How are LLMs currently applied to NLP tasks in the literature? (2) Have traditional NLP tasks already been solved with LLMs? (3) What is the future of the LLMs for NLP? To answer these questions, we take the first step to provide a comprehensive overview of LLMs in NLP. Specifically, we first introduce a unified taxonomy including (1) parameter-frozen paradigm and (2) parameter-tuning paradigm to offer a unified perspective for understanding the current progress of LLMs in NLP. Furthermore, we summarize the new frontiers and the corresponding challenges, aiming to inspire further groundbreaking advancements. We hope this work offers valuable insights into {the potential and limitations} of LLMs, while also serving as a practical guide for building effective LLMs in NLP.
natural language processing / large language models / parameter-frozen paradigm / parameter-tuning paradigm / ChatGPT
| [1] |
|
| [2] |
|
| [3] |
|
| [4] |
|
| [5] |
|
| [6] |
|
| [7] |
|
| [8] |
Chen Z, Wu J, Wang W, Su W, Chen G, Xing S, Zhong M, Zhang Q, Zhu X, Lu L, Li B, Luo P, Lu T, Qiao Y, Dai J. Intern VL: scaling up vision foundation models and aligning for generic visual-linguistic tasks. In: Proceedings of 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024, 24185–24198 |
| [9] |
|
| [10] |
|
| [11] |
|
| [12] |
|
| [13] |
|
| [14] |
|
| [15] |
|
| [16] |
|
| [17] |
Min S, Lyu X, Holtzman A, Artetxe M, Lewis M, Hajishirzi H, Zettlemoyer L. Rethinking the role of demonstrations: what makes in-context learning work? In: Proceedings of 2022 Conference on Empirical Methods in Natural Language Processing. 2022, 11048–11064 |
| [18] |
|
| [19] |
|
| [20] |
|
| [21] |
|
| [22] |
Wang L, Lyu C, Ji T, Zhang Z, Yu D, Shi S, Tu Z. Document-level machine translation with large language models. In: Proceedings of 2023 Conference on Empirical Methods in Natural Language Processing. 2023, 16646–16661 |
| [23] |
|
| [24] |
|
| [25] |
Wan Z, Cheng F, Mao Z, Liu Q, Song H, Li J, Kurohashi S. GPT-RE: in-context learning for relation extraction using large language models. In: Proceedings of 2023 Conference on Empirical Methods in Natural Language Processing. 2023, 3534–3547 |
| [26] |
|
| [27] |
|
| [28] |
|
| [29] |
|
| [30] |
|
| [31] |
|
| [32] |
|
| [33] |
|
| [34] |
|
| [35] |
|
| [36] |
|
| [37] |
|
| [38] |
|
| [39] |
|
| [40] |
|
| [41] |
|
| [42] |
Zhao W, Zhao Y, Lu X, Wang S, Tong Y, Qin B. Is ChatGPT equipped with emotional dialogue capabilities? 2023, arXiv preprint arXiv: 2304.09582 |
| [43] |
|
| [44] |
|
| [45] |
|
| [46] |
|
| [47] |
|
| [48] |
Xie T, Li Q, Zhang J, Zhang Y, Liu Z, Wang H. Empirical study of zero-shot NER with ChatGPT. In: Proceedings of 2023 Conference on Empirical Methods in Natural Language Processing. 2023, 7935–7956, doi: 10.18653/v1/2023.emnlp-main.493 |
| [49] |
|
| [50] |
|
| [51] |
|
| [52] |
|
| [53] |
|
| [54] |
|
| [55] |
|
| [56] |
Hudeček V, Dušek O. Are LLMs all you need for task-oriented dialogue? 2023, arXiv preprint arXiv: 2304.06556 |
| [57] |
|
| [58] |
Gao H, Lin TE, Li H, Yang M, Wu Y, Ma W, Huang F, Li Y. Self-explanation prompting improves dialogue understanding in large language models. In: Proceedings of 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). 2024, 14567–14578 |
| [59] |
|
| [60] |
|
| [61] |
|
| [62] |
|
| [63] |
|
| [64] |
|
| [65] |
|
| [66] |
|
| [67] |
|
| [68] |
Chung W, Cahyawijaya S, Wilie B, Lovenia H, Fung P. InstructTODS: large language models for end-to-end task-oriented dialogue systems. In: Proceedings of the 2nd Workshop on Natural Language Interfaces. 2023, 1–21 |
| [69] |
Lee CH, Cheng H, Ostendorf M. OrchestraLLM: efficient orchestration of language models for dialogue state tracking. In: Proceedings of 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 2024, 1434–1445 |
| [70] |
Lin E, Hale J, Gratch J. Toward a better understanding of the emotional dynamics of negotiation with large language models. In: Proceedings of the 24th International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing. 2023, 545–550 |
| [71] |
|
| [72] |
|
| [73] |
|
| [74] |
|
| [75] |
|
| [76] |
|
| [77] |
|
| [78] |
Cheng Z, Xie T, Shi P, Li C, Nadkarni R, Hu Y, Xiong C, Radev D, Ostendorf M, Zettlemoyer L, Smith NA, Yu T. Binding language models in symbolic languages. In: Proceedings of the 11th International Conference on Learning Representations. 2023 |
| [79] |
|
| [80] |
Zhang Z, Li X, Gao Y, Lou JG. CRT-QA: a dataset of complex reasoning question answering over tabular data. In: Proceedings of 2023 Conference on Empirical Methods in Natural Language Processing. 2023, 2131–2153, doi: 10.18653/v1/2023.emnlp-main.132 |
| [81] |
|
| [82] |
Zhang H, Si Q, Fu P, Lin Z, Wang W. Are large language models table-based fact-checkers? In: Proceedings of the 27th International Conference on Computer Supported Cooperative Work in Design. 2024, 3086–3091 |
| [83] |
|
| [84] |
|
| [85] |
|
| [86] |
Jiang J, Zhou K, Dong Z, Ye K, Zhao X, Wen JR. StructGPT: a general framework for large language model to reason over structured data. In: Proceedings of 2023 Conference on Empirical Methods in Natural Language Processing. 2023, 9237–9251, doi: 10.18653/v1/2023.emnlp-main.574 |
| [87] |
Wang Z, Zhang H, Li CL, Eisenschlos JM, Perot V, Wang Z, Miculicich L, Fujii Y, Shang J, Lee CY, Pfister T. Chain-of-table: evolving tables in the reasoning chain for table understanding. In: Proceedings of the 12th International Conference on Learning Representations. 2024 |
| [88] |
Kong K, Zhang J, Shen Z, Srinivasan B, Lei C, Faloutsos C, Rangwala H, Karypis G. OpenTab: advancing large language models as open-domain table reasoners. In: Proceedings of the 12th International Conference on Learning Representations. 2024 |
| [89] |
|
| [90] |
|
| [91] |
|
| [92] |
|
| [93] |
|
| [94] |
|
| [95] |
|
| [96] |
|
| [97] |
Nijkamp E, Pang B, Hayashi H, Tu L, Wang H, Zhou Y, Savarese S, Xiong C. CodeGen: an open large language model for code with multi-turn program synthesis. In: Proceedings of the 11th International Conference on Learning Representations. 2023 |
| [98] |
|
| [99] |
Luo Z, Xu C, Zhao P, Sun Q, Geng X, Hu W, Tao C, Ma J, Lin Q, Jiang D. WizardCoder: Empowering code large language models with evol-instruct. In: Proceedings of the 12th International Conference on Learning Representations. 2024 |
| [100] |
Allal LB, Li R, Kocetkov D, Mou C, Akiki C, et al. SantaCoder: don’t reach for the stars! 2023, arXiv preprint arXiv: 2301.03988 |
| [101] |
Li R, Ben Allal L, Zi Y, Muennighoff N, Kocetkov D, et al. StarCoder: may the source be with you! Transactions on Machine Learning Research, 2023, 2023 |
| [102] |
|
| [103] |
|
| [104] |
|
| [105] |
|
| [106] |
|
| [107] |
|
| [108] |
Li C, Liu M, Zhang H, Chen Y, Xu J, Zhou M. MT2: towards a multi-task machine translation model with translation-specific in-context learning. In: Proceedings of 2023 Conference on Empirical Methods in Natural Language Processing. 2023, 8616–8627 |
| [109] |
|
| [110] |
|
| [111] |
|
| [112] |
Lu H, Yang H, Huang H, Zhang D, Lam W, Wei F. Chain-of-dictionary prompting elicits translation in large language models. In: Proceedings of 2024 Conference on Empirical Methods in Natural Language Processing. 2024, 958–976 |
| [113] |
Zhang Z, Zhang A, Li M, Smola A. Automatic chain of thought prompting in large language models. In: Proceedings of the 11th International Conference on Learning Representations. 2023 |
| [114] |
Wang X, Wei J, Schuurmans D, Le QV, Chi EH, Narang S, Chowdhery A, Zhou D. Self-consistency improves chain of thought reasoning in language models. In: Proceedings of the 11th International Conference on Learning Representations. 2023 |
| [115] |
Lu P, Qiu L, Chang KW, Wu YN, Zhu SC, Rajpurohit T, Clark P, Kalyan A. Dynamic prompt learning via policy gradient for semi-structured mathematical reasoning. In: Proceedings of the 11th International Conference on Learning Representations. 2023 |
| [116] |
|
| [117] |
Das D, Banerjee D, Aditya S, Kulkarni A. MATHSENSEI: a tool-augmented large language model for mathematical reasoning. In: Proceedings of 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 2024, 942–966 |
| [118] |
|
| [119] |
|
| [120] |
|
| [121] |
|
| [122] |
|
| [123] |
|
| [124] |
|
| [125] |
Sainz O, García-Ferrero I, Agerri R, Lopez de Lacalle O, Rigau G, Agirre E. GoLLIE: annotation guidelines improve zero-shot information-extraction. In: Proceedings of the 12th International Conference on Learning Representations. 2024 |
| [126] |
|
| [127] |
Snigdha Sarathi Das S, Zhang RH, Shi P, Yin W, Zhang R. Unified low-resource sequence labeling by sample-aware dynamic sparse finetuning. In: Proceedings of 2023 Conference on Empirical Methods in Natural Language Processing. 2023, 6998–7010, doi: 10.18653/v1/2023.emnlp-main.433 |
| [128] |
|
| [129] |
|
| [130] |
|
| [131] |
|
| [132] |
Xie T, Wu CH, Shi P, Zhong R, Scholak T, et al. UnifiedSKG: unifying and multi-tasking structured knowledge grounding with text-to-text language models. In: Proceedings of 2022 Conference on Empirical Methods in Natural Language Processing. 2022, 602–631 |
| [133] |
|
| [134] |
Gupta R, Lee H, Zhao J, Cao Y, Rastogi A, Wu Y. Show, don’t tell: demonstrations outperform descriptions for schema-guided task-oriented dialogue. In: Proceedings of 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022, 4541–4549 |
| [135] |
|
| [136] |
Feng Y, Lu Z, Liu B, Zhan L, Wu XM. Towards LLM-driven dialogue state tracking. In: Proceedings of 2023 Conference on Empirical Methods in Natural Language Processing. 2023, 739–755 |
| [137] |
|
| [138] |
|
| [139] |
|
| [140] |
|
| [141] |
|
| [142] |
|
| [143] |
Zhang T, Yue X, Li Y, Sun H. TableLlama: towards open large generalist models for tables. In: Proceedings of 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 2024, 6024–6044 |
| [144] |
|
| [145] |
|
| [146] |
|
| [147] |
Zhao L, Zheng F, Zeng W, He K, Xu W, Jiang H, Wu W, Wu Y. Domain-oriented prefix-tuning: towards efficient and generalizable fine-tuning for zero-shot dialogue summarization. In: Proceedings of 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022, 4848–4862, doi: 10.18653/v1/2022.naacl-main.357 |
| [148] |
Yuan R, Wang Z, Cao Z, Li W. Few-shot query-focused summarization with prefix-merging. In: Proceedings of 2022 Conference on Empirical Methods in Natural Language Processing. 2022, 3704–3714 |
| [149] |
|
| [150] |
|
| [151] |
Wang Y, Wang W, Joty S, Hoi SCH. CodeT5: identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. In: Proceedings of 2021 Conference on Empirical Methods in Natural Language Processing. 2021, 8696–8708 |
| [152] |
|
| [153] |
|
| [154] |
|
| [155] |
|
| [156] |
|
| [157] |
|
| [158] |
Xu H, Kim YJ, Sharaf A, Awadalla HH. A paradigm shift in machine translation: boosting translation performance of large language models. In: Proceedings of the 12th International Conference on Learning Representations. 2024 |
| [159] |
Xu H, Sharaf A, Chen Y, Tan W, Shen L, Van Durme B, Murray K, Kim YJ. Contrastive preference optimization: pushing the boundaries of LLM performance in machine translation. In: Proceedings of the 41st International Conference on Machine Learning. 2024 |
| [160] |
Iyer V, Chen P, Birch A. Towards effective disambiguation for machine translation with large language models. In: Proceedings of the 8th Conference on Machine Translation. 2023, 482–495 |
| [161] |
|
| [162] |
Üstün A, Stickland AC. When does parameter-efficient transfer learning work for machine translation? In: Proceedings of 2022 Conference on Empirical Methods in Natural Language Processing. 2022, 7919–7933 |
| [163] |
|
| [164] |
|
| [165] |
|
| [166] |
Yue X, Qu X, Zhang G, Fu Y, Huang W, Sun H, Su Y, Chen W. MammoTH: building math generalist models through hybrid instruction tuning. In: Proceedings of the 12th International Conference on Learning Representations. 2024 |
| [167] |
|
| [168] |
|
| [169] |
Hu Z, Wang L, Lan Y, Xu W, Lim EP, Bing L, Xu X, Poria S, Lee RKW. LLM-adapters: an adapter family for parameter-efficient fine-tuning of large language models. In: Proceedings of 2023 Conference on Empirical Methods in Natural Language Processing. 2023, 5254–5276 |
| [170] |
|
| [171] |
|
| [172] |
|
| [173] |
|
| [174] |
Liu T, Guo Q, Yang Y, Hu X, Zhang Y, Qiu X, Zhang Z. Plan, verify and switch: integrated reasoning with diverse X-of-thoughts. In: Proceedings of 2023 Conference on Empirical Methods in Natural Language Processing. 2023, 2807–2822 |
| [175] |
|
| [176] |
|
| [177] |
|
| [178] |
|
| [179] |
|
| [180] |
|
| [181] |
|
| [182] |
|
| [183] |
|
| [184] |
Qin L, Xie T, Che W, Liu T. A survey on spoken language understanding: recent advances and new frontiers. In: Proceedings of the 30th International Joint Conference on Artificial Intelligence., 2021, 4577–4584, doi: 10.24963/ijcai.2021/622 |
| [185] |
|
| [186] |
Yoon Y, Lee J, Kim K, Park C, Kim T. BlendX: complex multi-intent detection with blended patterns. In: Proceedings of 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). 2024, 2428–2439 |
| [187] |
|
| [188] |
|
| [189] |
|
| [190] |
|
| [191] |
|
| [192] |
|
| [193] |
|
| [194] |
|
| [195] |
|
| [196] |
|
| [197] |
Godbole A, George JG, Shandilya S. Leveraging long-context large language models for multi-document understanding and summarization in enterprise applications. In: Proceedings of the 1st International Conference on Business Intelligence, Computational Mathematics, and Data Analytics. 2025, 208–224 |
| [198] |
|
| [199] |
|
| [200] |
|
| [201] |
|
| [202] |
Nandy A, Bandyopadhyay S. Language models of code are few-shot planners and reasoners for multi-document summarization with attribution. In: Proceedings of the 39th AAAI Conference on Artificial Intelligence. 2025, 24930–24938 |
| [203] |
|
| [204] |
|
| [205] |
|
| [206] |
|
| [207] |
|
| [208] |
|
| [209] |
|
| [210] |
|
| [211] |
|
| [212] |
|
| [213] |
|
| [214] |
|
| [215] |
|
| [216] |
|
| [217] |
|
| [218] |
|
| [219] |
|
| [220] |
|
| [221] |
Huang Y, Li B, Feng X, Huo W, Fu C, Liu T, Qin B. Aligning translation-specific understanding to general understanding in large language models. In: Proceedings of 2024 Conference on Empirical Methods in Natural Language Processing. 2024, 5028–5041 |
| [222] |
Zhu S, Cui M, Xiong D. Towards robust in-context learning for machine translation with large language models. In: Proceedings of 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). 2024, 16619–16629 |
| [223] |
|
| [224] |
Feng Z, Chen R, Zhang Y, Meng Z, Liu Z. Ladder: a model-agnostic framework boosting LLM-based machine translation to the next level. In: Proceedings of 2024 Conference on Empirical Methods in Natural Language Processing. 2024, 15377–15393 |
| [225] |
|
| [226] |
|
| [227] |
|
| [228] |
|
| [229] |
|
| [230] |
|
| [231] |
|
| [232] |
|
| [233] |
|
| [234] |
|
| [235] |
|
| [236] |
|
| [237] |
|
| [238] |
|
| [239] |
|
| [240] |
|
| [241] |
|
| [242] |
|
| [243] |
Xue L, Constant N, Roberts A, Kale M, Al-Rfou R, Siddhant A, Barua A, Raffel C. mt5: a massively multilingual pre-trained text-to-text transformer. In: Proceedings of 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2021, 483–498 |
| [244] |
|
| [245] |
|
| [246] |
|
| [247] |
|
| [248] |
|
| [249] |
|
| [250] |
Shi F, Suzgun M, Freitag M, Wang X, Srivats S, Vosoughi S, Chung HW, Tay Y, Ruder S, Zhou D, Das D, Wei J. Language models are multilingual chain-of-thought reasoners. In: Proceedings of the 11th International Conference on Learning Representations. 2023 |
| [251] |
Lin XV, Mihaylov T, Artetxe M, Wang T, Chen S, Simig D, Ott M, Goyal N, Bhosale S, Du J, Pasunuru R, Shleifer S, Koura PS, Chaudhary V, O’Horo B, Wang J, Zettlemoyer L, Kozareva Z, Diab M, Stoyanov V, Li X. Few-shot learning with multilingual generative language models. In: Proceedings of 2022 Conference on Empirical Methods in Natural Language Processing. 2022, 9019–9052, doi: 10.18653/v1/2022.emnlp-main.616 |
| [252] |
|
| [253] |
Qin L, Chen Q, Wei F, Huang S, Che W. Cross-lingual prompting: improving zero-shot chain-of-thought reasoning across languages. In: Proceedings of 2023 Conference on Empirical Methods in Natural Language Processing. 2023, 2695–2709 |
| [254] |
|
| [255] |
|
| [256] |
|
| [257] |
|
| [258] |
|
| [259] |
|
| [260] |
|
| [261] |
|
| [262] |
|
| [263] |
|
| [264] |
|
| [265] |
|
| [266] |
|
| [267] |
|
| [268] |
|
| [269] |
|
| [270] |
Lu P, Bansal H, Xia T, Liu J, Li C, Hajishirzi H, Cheng H, Chang KW, Galley M, Gao J. MathVista: evaluating mathematical reasoning of foundation models in visual contexts. In: Proceedings of the 12th International Conference on Learning Representations. 2024 |
| [271] |
|
| [272] |
|
| [273] |
|
| [274] |
|
| [275] |
|
| [276] |
|
| [277] |
Qin Y, Liang S, Ye Y, Zhu K, Yan L, Lu Y, Lin Y, Cong X, Tang X, Qian B, Zhao S, Hong L, Tian R, Xie R, Zhou J, Gerstein M, Li D, Liu Z, Sun M. ToolLLM: facilitating large language models to master 16000+ real-world APIs. In: Proceedings of the 12th International Conference on Learning Representations. 2024 |
| [278] |
Hu M, Mu Y, Yu XC, Ding M, Wu S, Shao W, Chen Q, Wang B, Qiao Y, Luo P. Tree-planner: efficient close-loop task planning with large language models. In: Proceedings of the 12th International Conference on Learning Representations. 2024 |
| [279] |
|
| [280] |
|
| [281] |
|
| [282] |
|
| [283] |
|
| [284] |
|
| [285] |
|
| [286] |
|
| [287] |
|
| [288] |
|
| [289] |
|
| [290] |
Min S, Krishna K, Lyu X, Lewis M, Yih WT, Koh PW, Iyyer M, Zettlemoyer L, Hajishirzi H. FActScore: fine-grained atomic evaluation of factual precision in long form text generation. In: Proceedings of 2023 Conference on Empirical Methods in Natural Language Processing. 2023, 12076–12100 |
| [291] |
|
| [292] |
|
| [293] |
Chang KK, Cramer M, Soni S, Bamman D. Speak, memory: an archaeology of books known to ChatGPT/GPT-4. In: Proceedings of 2023 Conference on Empirical Methods in Natural Language Processing. 2023, 7312–7327 |
| [294] |
|
| [295] |
|
| [296] |
Dhamala J, Sun T, Kumar V, Krishna S, Pruksachatkun Y, Chang KW, Gupta R. BOLD: dataset and metrics for measuring biases in open-ended language generation. In: Proceedings of 2021 ACM Conference on Fairness, Accountability, and Transparency. 2021, 862–872 |
| [297] |
|
| [298] |
|
| [299] |
|
| [300] |
|
| [301] |
|
| [302] |
|
| [303] |
|
| [304] |
|
| [305] |
|
| [306] |
|
| [307] |
|
| [308] |
|
| [309] |
|
| [310] |
|
| [311] |
|
| [312] |
|
| [313] |
|
| [314] |
|
| [315] |
|
| [316] |
|
| [317] |
|
| [318] |
|
| [319] |
|
| [320] |
Qi J, Xu Z, Shen Y, Liu M, Jin D, Wang Q, Huang L. The art of SOCRATIC QUESTIONING: recursive thinking with large language models. In: Proceedings of 2023 Conference on Empirical Methods in Natural Language Processing. 2023, 4177–4199 |
| [321] |
|
| [322] |
|
| [323] |
|
| [324] |
Yao S, Zhao J, Yu D, Du N, Shafran I, Narasimhan KR, Cao Y. ReAct: synergizing reasoning and acting in language models. In: Proceedings of the 11th International Conference on Learning Representations. 2023 |
| [325] |
|
The Author(s) 2025. This article is published with open access at link.springer.com and journal.hep.com.cn
/
| 〈 |
|
〉 |