PesTest: A Comprehensive Benchmark for Psychological Emotional Support Capability of Large Language Models
Tianwei LAN , Yuhang GUO , Siyuan GAO , Hongfei XIA , Zeming LIU , Heyan HUANG
Large language models (LLMs) demonstrating competent psychological emotional support capabilities have significant potential to deliver effective mental health assistance. The development of comprehensive evaluation methodologies for assessing these LLMs’ capabilities represents a critical research challenge. Current evaluation paradigms remain constrained by several limitations: they typically examine single task modalities, utilize monolingual approaches, and focus on restricted topical domains. These methodological constraints may yield misleading assessments where models exhibit strong benchmark performance yet generate potentially harmful outputs in practical applications. To address these limitations, we introduce PesTest, a novel multidimensional evaluation benchmark designed to assess psychological emotional support capabilities through multiple task modalities (judgment-based and dialogue-based evaluations), multilingual support (English and Chinese), and extensive topical coverage comprising 7 categories and 40 sub-category topics. Our comprehensive evaluation using the PesTest reveals two critical findings: First, current LLMs demonstrate particularly limited capabilities in dialogue-based support tasks. Second, we observe substantial performance variance across different psychological domains, highlighting the need for domain-specific evaluation. Furthermore, by fine-tuning the model and performing few-shot experiments, we achieve improved performance over the original model on the test set. These results validate the effectiveness of PesTest in enhancing the psychological emotional support capabilities of large language models. We hope the proposed benchmark will serve as a valuable resource for future research in this field.
Psychological emotional support / Large language models / Benchmark of LLMs
The Author(s) 2026. This article is published with open access at link.springer.com and journal.hep.com.cn
/
| 〈 |
|
〉 |