TrustBench: a comprehensive benchmark upon availability and trustworthiness for large vision-language models

Jian DONG , Zhilei ZHU , Hainan LI , Yanling WANG , Wei BAO , Heng YANG , Jiakai WANG , Qi LI

Front. Comput. Sci. ›› 2027, Vol. 21 ›› Issue (2) : 2102323

PDF (2889KB)
Front. Comput. Sci. ›› 2027, Vol. 21 ›› Issue (2) :2102323 DOI: 10.1007/s11704-026-41398-5
Artificial Intelligence
LETTER
TrustBench: a comprehensive benchmark upon availability and trustworthiness for large vision-language models
Author information +
History +
PDF (2889KB)

Graphical abstract

Cite this article

Download citation ▾
Jian DONG, Zhilei ZHU, Hainan LI, Yanling WANG, Wei BAO, Heng YANG, Jiakai WANG, Qi LI. TrustBench: a comprehensive benchmark upon availability and trustworthiness for large vision-language models. Front. Comput. Sci., 2027, 21(2): 2102323 DOI:10.1007/s11704-026-41398-5

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Achiam J, Adler S, Agarwal S, Ahmad L, Akkaya I, , et al. GPT-4 technical report. 2023, arXiv preprint arXiv: 2303.08774

[2]

Anil R, Borgeaud S, Wu Y, Alayrac J B, Yu J, , et al. Gemini: a family of highly capable multimodal models. 2023, arXiv preprint arXiv: 2312.11805

[3]

Hamidieh K, Zhang H, Hartvigsen T, Ghassemi M. Identifying implicit social biases in vision-language models. In: Proceedings of 2023 ICML Workshop on Deployment Challenges for Generative AI. 2023

[4]

Fraser K, Kiritchenko S. Examining gender and racial bias in large vision-language models using a novel dataset of parallel images. In: Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics. 2024, 690−713

[5]

Liu H, Xue W, Chen Y, Chen D, Zhao X, Wang K, Hou L, Li R, Peng W. A survey on hallucination in large vision-language models. 2024, arXiv preprint arXiv: 2402.00253

[6]

Chen X, Ma Z, Zhang X, Xu S, Qian S, Yang J, Fouhey D F, Chai J. Multi-object hallucination in vision language models. In: Proceedings of the 38th International Conference on Neural Information Processing Systems. 2024, 1409

[7]

He Z, Liu Y, Zheng J S, Li X, Yao J G, Qin B, Xuan R, Yang X. FlagEvalMM: a flexible framework for comprehensive multimodal model evaluation. In: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics. 2025, 51−61

[8]

Liu Y, Duan H, Zhang Y, Li B, Zhang S, Zhao W, Yuan Y, Wang J, He C, Liu Z, Chen K, Lin D. MMBench: is your multi-modal model an all-around player?. In: Proceedings of the 18th European Conference on Computer Vision. 2024, 216−233

[9]

Ying K, Meng F, Wang J, Li Z, Lin H, , et al. MMT-bench: a comprehensive multimodal benchmark for evaluating large vision-language models towards multitask AGI. In: Proceedings of the 41st International Conference on Machine Learning. 2024, 2359

RIGHTS & PERMISSIONS

Higher Education Press

PDF (2889KB)

Supplementary files

Highlights

TrustBench Appendix

187

Accesses

0

Citation

Detail

Sections
Recommended

/