Select-and-Answer Prompting:Facilitating LLMs for Improving Zero-Shot Reasoning

Yufang WANG , Xuesong TANG , Kuangrong HAO

Journal of Donghua University(English Edition) ›› 2025, Vol. 42 ›› Issue (5) : 513 -522.

PDF (6151KB)
Journal of Donghua University(English Edition) ›› 2025, Vol. 42 ›› Issue (5) :513 -522. DOI: 10.19884/j.1672-5220.202406009
Information Technology and Artificial Intelligence
research-article

Select-and-Answer Prompting:Facilitating LLMs for Improving Zero-Shot Reasoning

Author information +
History +
PDF (6151KB)

Abstract

Large language models(LLMs) have demonstrated remarkable generalization abilities across multiple tasks in natural language processing(NLP). For multi-step reasoning tasks, chain-of-thought(CoT) prompting facilitates step-by-step thinking, leading to improved performance. However, despite significant advancements in LLMs, current CoT prompting performs suboptimally on smaller-scale models that have fewer parameters. Additionally, the common paradigm of few-shot CoT prompting relies on a set of manual demonstrations, with performance contingent on the quality of these annotations and varying with task-specific requirements. To address these limitations, we propose a select-and-answer prompting method(SAP) to enhance language model performance on reasoning tasks without the need for manual demonstrations. This method comprises two primary steps: guiding the model to conduct preliminary analysis and generate several candidate answers based on the prompting; allowing the model to provide final answers derived from these candidate answers. The proposed prompting strategy is evaluated across two language models of varying sizes and six datasets. On ChatGLM-6B, SAP consistently outperforms few-shot CoT across all datasets. For GPT-3.5, SAP achieves comparable performance to few-shot CoT and outperforms zero-shot CoT in most cases. These experimental results indicate that SAP can significantly improve the accuracy of language models in reasoning tasks.

Keywords

zero-shot learning / large language model(LLM) / reasoning problem / chain-of-thought(CoT) prompting

Cite this article

Download citation ▾
Yufang WANG, Xuesong TANG, Kuangrong HAO. Select-and-Answer Prompting:Facilitating LLMs for Improving Zero-Shot Reasoning. Journal of Donghua University(English Edition), 2025, 42(5): 513-522 DOI:10.19884/j.1672-5220.202406009

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

BROWN T B, MANN B, RYDER N, et al. Language models are few-shot learners[C]// Conference on Neural Information Processing Systems,NIPS’20:34th International Conference on Neural Information Processing Systems,Vancouver,BC,Canada. New York: NIPS, 2020:1877-1901.

[2]

LU P, QIU L, YU W H, et al. A survey of deep learning for mathematical reasoning[C]//Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers). Stroudsburg: ACL, 2023:14605-14631.

[3]

SUN T, SHAO Y, QIAN H, et al. Black-box tuning for language-model-as-a-service[C]//Proceedings of the 39th International Conference on Machine Learning. New York: PMLR, 2022:20841-20855.

[4]

BAO K Q, ZHANG J Z, ZHANG Y, et al. Tallrec:an effective and efficient tuning framework to align large language model with recommendation[C]//Proceedings of the 17th ACM Conference on Recommender Systems. New York: ACM, 2023:1007-1014.

[5]

HSIEH C Y, LI C L, YEH C K, et al. Distilling step-by-step! Outperforming larger language models with less training data and smaller model sizes[C]//Findings of the Association for Computational Linguistics:ACL 2023. Stroudsburg: ACL, 2023:8003-8017.

[6]

GAO Y F, XIONG Y, GAO X Y, et al. Retrieval-augmented generation for large language models:a survey[EB/OL].(2023-12-18) [2024-06-02]. https://arxiv.org/abs/2312.10997v5.

[7]

WEI J, WANG X Z, SCHUURMANS D, et al. Chain-of-thought prompting elicits reasoning in large language models[C]//Proceedings of the 36th Conference on Neural Information Processing Systems. New York: NIPS, 2022:24824-24837.

[8]

STOLFO A, JIN Z J, SHRIDHAR K, et al. A causal framework to quantify the robustness of mathematical reasoning with language models[C]//Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers). Stroudsburg: ACL, 2023:545-561.

[9]

HENDRYCKS D, BURNS C, KADAVATH S, et al. Measuring mathematical problem solving with the MATH dataset[EB/OL].(2021-03-05) [2024-06-02]. https://arxiv.org/abs/2103.03874.

[10]

COBBE K, KOSARAJU V, BAVARIAN M, et al. Training verifiers to solve math word problems[EB/OL].(2021-10-27) [2024-06-02]. https://arxiv.org/abs/2110.14168.

[11]

HOSSEINI M J, HAJISHIRZI H, ETZIONI O, et al. Learning to solve arithmetic word problems with verb categorization[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Stroudsburg: ACL, 2014:523-533.

[12]

PATEL A, BHATTAMISHRA S, GOYAL N. Are NLP models really able to solve simple math word problems?[C]//Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies,Online. Stroudsburg: ACL, 2021:2080-2094.

[13]

TALMOR A, HERZIG J, LOURIE N, et al. CommonsenseQA:a question answering challenge targeting commonsense knowledge[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. Stroudsburg: ACL, 2019:4149-4158.

[14]

DU Z X, QIAN Y J, LIU X, et al. GLM:general language model pretraining with autoregressive blank infilling[C]//Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers). Stroudsburg: ACL, 2022:320-335.

[15]

JI H Z, KE P, HUANG S H, et al. Language generation with multi-hop reasoning on commonsense knowledge graph[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP),Online. Stroudsburg: ACL, 2020:725-736.

[16]

LING W, YOGATAMA D, DYER C, et al. Program induction by rationale generation:learning to solve and explain algebraic word problems[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers). Stroudsburg: ACL, 2017:158-167.

[17]

GEVA M, KHASHABI D, SEGAL E, et al. Did Aristotle use a laptop? A question answering benchmark with implicit reasoning strategies[J]. Transactions of the Association for Computational Linguistics, 2021,9:346-361.

[18]

RADFORD A, WU J, CHILD R, et al. Language models are unsupervised multitask learners[EB/OL].(2019-02-14) [2024-06-02]. https://openai.com/research/language-unsupervised.

[19]

GUU K, LEE K, TUNG Z, et al. Retrieval augmented language model pre-training[C]//Proceedings of the 37th International Conference on Machine Learning. New York: PMLR, 2020:3929-3938.

[20]

RUBIN O, HERZIG J, BERANT J. Learning to retrieve prompts for in-context learning[C]//Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. Stroudsburg: ACL, 2022:2655-2671.

[21]

MISHRA S, KHASHABI D, BARAL C, et al. Cross-task generalization via natural language crowdsourcing instructions[C]//Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers). Stroudsburg: ACL, 2022:3470-3487.

[22]

LIU H K, TAM D, MUQEETH M, et al. Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning[C]//Proceedings of the 36th International Conference on Neural Information Processing Systems. New York: NIPS, 2022:1950-1965.

[23]

MIN S, LEWIS M, HAJISHIRZI H, et al. Noisy channel language model prompting for few-shot text classification[C]//Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers). Stroudsburg: ACL, 2022:5316-5330.

[24]

SUZGUN M, SCALES N, SCHÄRLI N, et al. Challenging BIG-bench tasks and whether chain-of-thought can solve them[C]//Findings of the Association for Computational Linguistics:ACL 2023. Stroudsburg: ACL, 2023:13003-13051.

[25]

KOJIMA T, GU S S, REID M, et al. Large language models are zero-shot reasoners[C]//Proceedings of the 36th International Conference on Neural Information Processing Systems. New York: NIPS, 2022:22199-22213.

[26]

JIN M, YU Q K, SHU D, et al. The impact of reasoning step length on large language models [C]//Findings of the Association for Computational Linguistics:ACL 2024. Stroudsburg: ACL, 2024:1830-1842.

[27]

ZHOU D, SCHÄRLI N, HOU L, et al. Least-to-most prompting enables complex reasoning in large language models[EB/OL].(2022-05-21) [2024-06-02]. https://arxiv.org/abs/2205.10625.

[28]

ZHU X Y, WANG J J, ZHANG L, et al. Solving math word problems via cooperative reasoning induced language models[C]//Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers. Stroudsburg: ACL, 2023:4471-4485.

[29]

ZELIKMAN E, WU Y, MU J, et al. Star:bootstrapping reasoning with reasoning[C]//Proceedings of the 36th International Conference on Neural Information Processing Systems. New York: NIPS, 2022:15476-15488.

[30]

LU Q Y, QIU B P, DING L, et al. Error analysis prompting enables human-like translation evaluation in large language models[C]//Findings of the Association for Computational Linguistics:ACL 2024. Stroudsburg: ACL, 2024:8801-8816.

[31]

JIANG H Q, WU Q H, LUO X F, et al. LongLLMLingua:accelerating and enhancing LLMs in long context scenarios via prompt compression[C]//Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers). Stroudsburg: ACL, 2024:1658-1677.

Funding

National Natural Science Foundation of China(62176052)

PDF (6151KB)

82

Accesses

0

Citation

Detail

Sections
Recommended

/