TrustBench: a comprehensive benchmark upon availability and trustworthiness for large vision-language models

Jian DONG; Zhilei ZHU; Hainan LI; Yanling WANG; Wei BAO; Heng YANG; Jiakai WANG; Qi LI

doi:10.1007/s11704-026-41398-5

Front. Comput. Sci. ›› 2027, Vol. 21 ›› Issue (2) :2102323 DOI: 10.1007/s11704-026-41398-5

Artificial Intelligence

LETTER

TrustBench: a comprehensive benchmark upon availability and trustworthiness for large vision-language models

Author information +

History +

PDF (2889KB)

Graphical abstract

Cite this article

Download citation ▾

Jian DONG, Zhilei ZHU, Hainan LI, Yanling WANG, Wei BAO, Heng YANG, Jiakai WANG, Qi LI. TrustBench: a comprehensive benchmark upon availability and trustworthiness for large vision-language models. Front. Comput. Sci., 2027, 21(2): 2102323 DOI:10.1007/s11704-026-41398-5

登录浏览全文

4963

注册一个新账户忘记密码

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Achiam J, Adler S, Agarwal S, Ahmad L, Akkaya I, , et al. GPT-4 technical report. 2023, arXiv preprint arXiv: 2303.08774

[2]	Anil R, Borgeaud S, Wu Y, Alayrac J B, Yu J, , et al. Gemini: a family of highly capable multimodal models. 2023, arXiv preprint arXiv: 2312.11805

[3]	Hamidieh K, Zhang H, Hartvigsen T, Ghassemi M. Identifying implicit social biases in vision-language models. In: Proceedings of 2023 ICML Workshop on Deployment Challenges for Generative AI. 2023

[4]	Fraser K, Kiritchenko S. Examining gender and racial bias in large vision-language models using a novel dataset of parallel images. In: Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics. 2024, 690−713

[5]	Liu H, Xue W, Chen Y, Chen D, Zhao X, Wang K, Hou L, Li R, Peng W. A survey on hallucination in large vision-language models. 2024, arXiv preprint arXiv: 2402.00253

[6]	Chen X, Ma Z, Zhang X, Xu S, Qian S, Yang J, Fouhey D F, Chai J. Multi-object hallucination in vision language models. In: Proceedings of the 38th International Conference on Neural Information Processing Systems. 2024, 1409

[7]	He Z, Liu Y, Zheng J S, Li X, Yao J G, Qin B, Xuan R, Yang X. FlagEvalMM: a flexible framework for comprehensive multimodal model evaluation. In: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics. 2025, 51−61

[8]	Liu Y, Duan H, Zhang Y, Li B, Zhang S, Zhao W, Yuan Y, Wang J, He C, Liu Z, Chen K, Lin D. MMBench: is your multi-modal model an all-around player?. In: Proceedings of the 18th European Conference on Computer Vision. 2024, 216−233

[9]	Ying K, Meng F, Wang J, Li Z, Lin H, , et al. MMT-bench: a comprehensive multimodal benchmark for evaluating large vision-language models towards multitask AGI. In: Proceedings of the 41st International Conference on Machine Learning. 2024, 2359