A3Bench: An Audience-Aligned Multilingual Benchmark for Video Audience Insights Understanding

Yiming Lei; Guozhen Peng; Zeming Liu; Hui Qiu; Haitao Leng; Shaoguo Liu; Tingting Gao; Qingjie Liu; Annan Li; Yunhong Wang

doi:10.1007/s11704-026-52159-9

Front. Comput. Sci. ›› DOI: 10.1007/s11704-026-52159-9

RESEARCH ARTICLE

A³Bench: An Audience-Aligned Multilingual Benchmark for Video Audience Insights Understanding

Author information +

History +

PDF (34209KB)

Abstract

Multimodal Large Language Models (MLLMs) have made remarkable progress in video understanding and consistently perform well on vision-centric benchmarks. However, existing benchmarks primarily evaluate factual or event-based comprehension, while neglecting audience insights. It is a critical yet underexplored dimension of video understanding, reflecting a deep comprehension of cognitive processes from the audience’s perspective. As a result, MLLMs, shaped by such benchmarks, often produce responses that are factually correct but misaligned with audience’s interests. To bridge this gap, we leverage audience insights derived from video comments as a direct proxy to guide the annotation process and introduce A³Bench, an audience-aligned benchmark for evaluating video audience insights with large-scale videos and high-quality multilingual comments. Furthermore, inspired by neuro-imaging studies, we propose Cognition Interaction of Thought (CIoT), a structured reasoning framework that emulates key aspects of cognitive processes. Extensive experiments on A³Bench reveal that current MLLMs struggle to understand audience insights, particularly compared to human-level understanding. In contrast, CIoT can improve the performance of these models, highlighting its potential to enhance the MLLMs’ capability of understanding audience insights in future research.

Keywords

Multilingual Video Understanding / Audience Insights / CIoT / MLLMs

Cite this article

Download citation ▾

Yiming Lei, Guozhen Peng, Zeming Liu, Hui Qiu, Haitao Leng, Shaoguo Liu, Tingting Gao, Qingjie Liu, Annan Li, Yunhong Wang. A³Bench: An Audience-Aligned Multilingual Benchmark for Video Audience Insights Understanding. Front. Comput. Sci. DOI:10.1007/s11704-026-52159-9

登录浏览全文

4963

注册一个新账户忘记密码

References

Publishing order | Descend order by publishing year | Descend order by cited within

RIGHTS & PERMISSIONS

©The Author(s) 2026.

PDF (34209KB)

193

Accesses

Citation

Detail

Sections

Recommended

About the journal

Aims & scope

Description

Editorial board

Abstracting / indexing

Contact us

Browse

Just accepted

All volumes and issues

Collections

Featured articles

Most accessed

Most cited

Collections

Multimedia collections

Authors & reviewers

Online submission

Call for papers

Guidelines for authors

Download templates

Guidelines for reviewers

Abstract

Keywords

Cite this article

References

RIGHTS & PERMISSIONS

Just Accepted