LLaVA-Endo: a large language-and-vision assistant for gastrointestinal endoscopy

Jieru YAO, Xueran LI, Qiang XIE, Longfei HAN, Yiwen JIA, Nian LIU, Dingwen ZHANG, Junwei HAN

PDF(469 KB)
PDF(469 KB)
Front. Comput. Sci. ›› 2025, Vol. 19 ›› Issue (4) : 194331. DOI: 10.1007/s11704-024-40319-8
Artificial Intelligence
LETTER

LLaVA-Endo: a large language-and-vision assistant for gastrointestinal endoscopy

Author information +
History +

Graphical abstract

Cite this article

Download citation ▾
Jieru YAO, Xueran LI, Qiang XIE, Longfei HAN, Yiwen JIA, Nian LIU, Dingwen ZHANG, Junwei HAN. LLaVA-Endo: a large language-and-vision assistant for gastrointestinal endoscopy. Front. Comput. Sci., 2025, 19(4): 194331 https://doi.org/10.1007/s11704-024-40319-8

References

[1]
Forrest J H, Finlayson N D C, Shearman D J C . Endoscopy in gastrointestinal bleeding. The Lancet, 1974, 304( 7877): 394–397
[2]
Sharma P, Pante A, Gross S A . Artificial intelligence in endoscopy. Gastrointestinal Endoscopy, 2020, 91( 4): 925–931
[3]
Liu H, Li C, Wu Q, Lee Y J. Visual instruction tuning. In: Proceedings of the 37th Conference on Neural Information Processing Systems. 2023, 36
[4]
Li J, Li D, Xiong C, Hoi S. BLIP: bootstrapping language-image pre-training for unified vision-language understanding and generation. In: Proceedings of the 39th International Conference on Machine Learning. 2022, 12888–12900
[5]
Ye Q, Xu H, Ye J, Yan M, Hu A, Liu H, Qian Q, Zhang J, Huang F, Zhou J. mPLUG-Owl2: revolutionizing multi-modal large language model with modality collaboration. 2023, arXiv preprint arXiv: 2311.04257
[6]
Wu C, Zhang X, Zhang Y, Wang Y, Xie W. Towards generalist foundation model for radiology by leveraging web-scale 2D&3D medical data. 2023, arXiv preprint arXiv: 2308.02463
[7]
Li C, Wong C, Zhang S, Usuyama N, Liu H, Yang J, Naumann T, Poon H, Gao J. LLaVA-med: training a large language-and-vision assistant for biomedicine in one day. In: Proceedings of the 37th Conference on Neural Information Processing Systems. 2024, 36
[8]
Hu E J, Shen Y, Wallis P, Allen-Zhu Z, Li Y, Wang S, Wang L, Chen W. LoRA: low-rank adaptation of large language models. In: Proceedings of the 10th International Conference on Learning Representations. 2022
[9]
Mu Y, Zhang Q, Hu M, Wang W, Ding M, Jin J, Wang B, Dai J, Qiao Y, Luo P. Appendix for embodiedGPT: vision-language pre-training via embodied chain of thought. In: Proceedings of the 37th Conference on Neural Information Processing Systems. 2024, 36

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China (Grant Nos. 62272468, 62003256, 62027813, U1801265, 62293543, 62322605, 62036005, 62202015, and U21B2048), the Key-Area Research and Development Program of Shaanxi Province (2023-ZDLSF-41), the Anhui Medical University (2022xkj105, 2023cy021), the Anhui Provincial Key R&D Program (2023s07020001), and the University Synergy Innovation Program of Anhui Province (GXXT-2022-052).

Competing interests

The authors declare that they have no competing interests or financial conflicts to disclose.

RIGHTS & PERMISSIONS

2025 Higher Education Press
AI Summary AI Mindmap
PDF(469 KB)

Accesses

Citations

Detail

Sections
Recommended

/