PesTest: A Comprehensive Benchmark for Psychological Emotional Support Capability of Large Language Models

Tianwei LAN; Yuhang GUO; Siyuan GAO; Hongfei XIA; Zeming LIU; Heyan HUANG

doi:10.1007/s11704-026-50643-w

Front. Comput. Sci. ›› DOI: 10.1007/s11704-026-50643-w

RESEARCH ARTICLE

PesTest: A Comprehensive Benchmark for Psychological Emotional Support Capability of Large Language Models

Author information +

History +

PDF (3141KB)

Abstract

Large language models (LLMs) demonstrating competent psychological emotional support capabilities have significant potential to deliver effective mental health assistance. The development of comprehensive evaluation methodologies for assessing these LLMs’ capabilities represents a critical research challenge. Current evaluation paradigms remain constrained by several limitations: they typically examine single task modalities, utilize monolingual approaches, and focus on restricted topical domains. These methodological constraints may yield misleading assessments where models exhibit strong benchmark performance yet generate potentially harmful outputs in practical applications. To address these limitations, we introduce PesTest, a novel multidimensional evaluation benchmark designed to assess psychological emotional support capabilities through multiple task modalities (judgment-based and dialogue-based evaluations), multilingual support (English and Chinese), and extensive topical coverage comprising 7 categories and 40 sub-category topics. Our comprehensive evaluation using the PesTest reveals two critical findings: First, current LLMs demonstrate particularly limited capabilities in dialogue-based support tasks. Second, we observe substantial performance variance across different psychological domains, highlighting the need for domain-specific evaluation. Furthermore, by fine-tuning the model and performing few-shot experiments, we achieve improved performance over the original model on the test set. These results validate the effectiveness of PesTest in enhancing the psychological emotional support capabilities of large language models. We hope the proposed benchmark will serve as a valuable resource for future research in this field.

Keywords

Psychological emotional support / Large language models / Benchmark of LLMs

Cite this article

Download citation ▾

Tianwei LAN, Yuhang GUO, Siyuan GAO, Hongfei XIA, Zeming LIU, Heyan HUANG. PesTest: A Comprehensive Benchmark for Psychological Emotional Support Capability of Large Language Models. Front. Comput. Sci. DOI:10.1007/s11704-026-50643-w

登录浏览全文

4963

注册一个新账户忘记密码

References

Publishing order | Descend order by publishing year | Descend order by cited within

RIGHTS & PERMISSIONS

The Author(s) 2026. This article is published with open access at link.springer.com and journal.hep.com.cn

PDF (3141KB)

288

Accesses

Citation

Detail

Sections

Recommended

About the journal

Aims & scope

Description

Editorial board

Abstracting / indexing

Contact us

Browse

Just accepted

All volumes and issues

Collections

Featured articles

Most accessed

Most cited

Collections

Multimedia collections

Authors & reviewers

Online submission

Call for papers

Guidelines for authors

Download templates

Guidelines for reviewers

Abstract

Keywords

Cite this article

References

RIGHTS & PERMISSIONS

Just Accepted