
LLaVA-Endo: a large language-and-vision assistant for gastrointestinal endoscopy
Jieru YAO, Xueran LI, Qiang XIE, Longfei HAN, Yiwen JIA, Nian LIU, Dingwen ZHANG, Junwei HAN
Front. Comput. Sci. ›› 2025, Vol. 19 ›› Issue (4) : 194331.
LLaVA-Endo: a large language-and-vision assistant for gastrointestinal endoscopy
[1] |
Forrest J H, Finlayson N D C, Shearman D J C . Endoscopy in gastrointestinal bleeding. The Lancet, 1974, 304( 7877): 394–397
|
[2] |
Sharma P, Pante A, Gross S A . Artificial intelligence in endoscopy. Gastrointestinal Endoscopy, 2020, 91( 4): 925–931
|
[3] |
Liu H, Li C, Wu Q, Lee Y J. Visual instruction tuning. In: Proceedings of the 37th Conference on Neural Information Processing Systems. 2023, 36
|
[4] |
Li J, Li D, Xiong C, Hoi S. BLIP: bootstrapping language-image pre-training for unified vision-language understanding and generation. In: Proceedings of the 39th International Conference on Machine Learning. 2022, 12888–12900
|
[5] |
Ye Q, Xu H, Ye J, Yan M, Hu A, Liu H, Qian Q, Zhang J, Huang F, Zhou J. mPLUG-Owl2: revolutionizing multi-modal large language model with modality collaboration. 2023, arXiv preprint arXiv: 2311.04257
|
[6] |
Wu C, Zhang X, Zhang Y, Wang Y, Xie W. Towards generalist foundation model for radiology by leveraging web-scale 2D&3D medical data. 2023, arXiv preprint arXiv: 2308.02463
|
[7] |
Li C, Wong C, Zhang S, Usuyama N, Liu H, Yang J, Naumann T, Poon H, Gao J. LLaVA-med: training a large language-and-vision assistant for biomedicine in one day. In: Proceedings of the 37th Conference on Neural Information Processing Systems. 2024, 36
|
[8] |
Hu E J, Shen Y, Wallis P, Allen-Zhu Z, Li Y, Wang S, Wang L, Chen W. LoRA: low-rank adaptation of large language models. In: Proceedings of the 10th International Conference on Learning Representations. 2022
|
[9] |
Mu Y, Zhang Q, Hu M, Wang W, Ding M, Jin J, Wang B, Dai J, Qiao Y, Luo P. Appendix for embodiedGPT: vision-language pre-training via embodied chain of thought. In: Proceedings of the 37th Conference on Neural Information Processing Systems. 2024, 36
|
Supplementary files
Highlights (225 KB)
Part of a collection:
Excellent Young Computer Scientists Vision on Foundation Models
/
〈 |
|
〉 |