PesTest: A Comprehensive Benchmark for Psychological Emotional Support Capability of Large Language Models

Tianwei LAN , Yuhang GUO , Siyuan GAO , Hongfei XIA , Zeming LIU , Heyan HUANG

Front. Comput. Sci. ››

PDF (3141KB)
Front. Comput. Sci. ›› DOI: 10.1007/s11704-026-50643-w
RESEARCH ARTICLE
PesTest: A Comprehensive Benchmark for Psychological Emotional Support Capability of Large Language Models
Author information +
History +
PDF (3141KB)

Abstract

Large language models (LLMs) demonstrating competent psychological emotional support capabilities have significant potential to deliver effective mental health assistance. The development of comprehensive evaluation methodologies for assessing these LLMs’ capabilities represents a critical research challenge. Current evaluation paradigms remain constrained by several limitations: they typically examine single task modalities, utilize monolingual approaches, and focus on restricted topical domains. These methodological constraints may yield misleading assessments where models exhibit strong benchmark performance yet generate potentially harmful outputs in practical applications. To address these limitations, we introduce PesTest, a novel multidimensional evaluation benchmark designed to assess psychological emotional support capabilities through multiple task modalities (judgment-based and dialogue-based evaluations), multilingual support (English and Chinese), and extensive topical coverage comprising 7 categories and 40 sub-category topics. Our comprehensive evaluation using the PesTest reveals two critical findings: First, current LLMs demonstrate particularly limited capabilities in dialogue-based support tasks. Second, we observe substantial performance variance across different psychological domains, highlighting the need for domain-specific evaluation. Furthermore, by fine-tuning the model and performing few-shot experiments, we achieve improved performance over the original model on the test set. These results validate the effectiveness of PesTest in enhancing the psychological emotional support capabilities of large language models. We hope the proposed benchmark will serve as a valuable resource for future research in this field.

Keywords

Psychological emotional support / Large language models / Benchmark of LLMs

Cite this article

Download citation ▾
Tianwei LAN, Yuhang GUO, Siyuan GAO, Hongfei XIA, Zeming LIU, Heyan HUANG. PesTest: A Comprehensive Benchmark for Psychological Emotional Support Capability of Large Language Models. Front. Comput. Sci. DOI:10.1007/s11704-026-50643-w

登录浏览全文

4963

注册一个新账户 忘记密码

References

RIGHTS & PERMISSIONS

The Author(s) 2026. This article is published with open access at link.springer.com and journal.hep.com.cn

PDF (3141KB)

26

Accesses

0

Citation

Detail

Sections
Recommended

/