A3Bench: An Audience-Aligned Multilingual Benchmark for Video Audience Insights Understanding

Yiming Lei , Guozhen Peng , Zeming Liu , Hui Qiu , Haitao Leng , Shaoguo Liu , Tingting Gao , Qingjie Liu , Annan Li , Yunhong Wang

Front. Comput. Sci. ››

PDF (34209KB)
Front. Comput. Sci. ›› DOI: 10.1007/s11704-026-52159-9
RESEARCH ARTICLE
A3Bench: An Audience-Aligned Multilingual Benchmark for Video Audience Insights Understanding
Author information +
History +
PDF (34209KB)

Abstract

Multimodal Large Language Models (MLLMs) have made remarkable progress in video understanding and consistently perform well on vision-centric benchmarks. However, existing benchmarks primarily evaluate factual or event-based comprehension, while neglecting audience insights. It is a critical yet underexplored dimension of video understanding, reflecting a deep comprehension of cognitive processes from the audience’s perspective. As a result, MLLMs, shaped by such benchmarks, often produce responses that are factually correct but misaligned with audience’s interests. To bridge this gap, we leverage audience insights derived from video comments as a direct proxy to guide the annotation process and introduce A3Bench, an audience-aligned benchmark for evaluating video audience insights with large-scale videos and high-quality multilingual comments. Furthermore, inspired by neuro-imaging studies, we propose Cognition Interaction of Thought (CIoT), a structured reasoning framework that emulates key aspects of cognitive processes. Extensive experiments on A3Bench reveal that current MLLMs struggle to understand audience insights, particularly compared to human-level understanding. In contrast, CIoT can improve the performance of these models, highlighting its potential to enhance the MLLMs’ capability of understanding audience insights in future research.

Keywords

Multilingual Video Understanding / Audience Insights / CIoT / MLLMs

Cite this article

Download citation ▾
Yiming Lei, Guozhen Peng, Zeming Liu, Hui Qiu, Haitao Leng, Shaoguo Liu, Tingting Gao, Qingjie Liu, Annan Li, Yunhong Wang. A3Bench: An Audience-Aligned Multilingual Benchmark for Video Audience Insights Understanding. Front. Comput. Sci. DOI:10.1007/s11704-026-52159-9

登录浏览全文

4963

注册一个新账户 忘记密码

References

RIGHTS & PERMISSIONS

©The Author(s) 2026.

PDF (34209KB)

0

Accesses

0

Citation

Detail

Sections
Recommended

/