LLaVA-Endo: a large language-and-vision assistant for gastrointestinal endoscopy

Jieru YAO, Xueran LI, Qiang XIE, Longfei HAN, Yiwen JIA, Nian LIU, Dingwen ZHANG, Junwei HAN. LLaVA-Endo: a large language-and-vision assistant for gastrointestinal endoscopy. Front. Comput. Sci., 2025, 19(4): 194331 https://doi.org/10.1007/s11704-024-40319-8

This is a preview of subscription content, contact us for subscripton.

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Forrest J H, Finlayson N D C, Shearman D J C . Endoscopy in gastrointestinal bleeding. The Lancet, 1974, 304( 7877): 394–397

[2]	Sharma P, Pante A, Gross S A . Artificial intelligence in endoscopy. Gastrointestinal Endoscopy, 2020, 91( 4): 925–931

[3]	Liu H, Li C, Wu Q, Lee Y J. Visual instruction tuning. In: Proceedings of the 37th Conference on Neural Information Processing Systems. 2023, 36

[4]	Li J, Li D, Xiong C, Hoi S. BLIP: bootstrapping language-image pre-training for unified vision-language understanding and generation. In: Proceedings of the 39th International Conference on Machine Learning. 2022, 12888–12900

[5]	Ye Q, Xu H, Ye J, Yan M, Hu A, Liu H, Qian Q, Zhang J, Huang F, Zhou J. mPLUG-Owl2: revolutionizing multi-modal large language model with modality collaboration. 2023, arXiv preprint arXiv: 2311.04257

[6]	Wu C, Zhang X, Zhang Y, Wang Y, Xie W. Towards generalist foundation model for radiology by leveraging web-scale 2D&3D medical data. 2023, arXiv preprint arXiv: 2308.02463

[7]	Li C, Wong C, Zhang S, Usuyama N, Liu H, Yang J, Naumann T, Poon H, Gao J. LLaVA-med: training a large language-and-vision assistant for biomedicine in one day. In: Proceedings of the 37th Conference on Neural Information Processing Systems. 2024, 36

[8]	Hu E J, Shen Y, Wallis P, Allen-Zhu Z, Li Y, Wang S, Wang L, Chen W. LoRA: low-rank adaptation of large language models. In: Proceedings of the 10th International Conference on Learning Representations. 2022

[9]	Mu Y, Zhang Q, Hu M, Wang W, Ding M, Jin J, Wang B, Dai J, Qiao Y, Luo P. Appendix for embodiedGPT: vision-language pre-training via embodied chain of thought. In: Proceedings of the 37th Conference on Neural Information Processing Systems. 2024, 36

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China (Grant Nos. 62272468, 62003256, 62027813, U1801265, 62293543, 62322605, 62036005, 62202015, and U21B2048), the Key-Area Research and Development Program of Shaanxi Province (2023-ZDLSF-41), the Anhui Medical University (2022xkj105, 2023cy021), the Anhui Provincial Key R&D Program (2023s07020001), and the University Synergy Innovation Program of Anhui Province (GXXT-2022-052).