LLaVA-Endo: a large language-and-vision assistant for gastrointestinal endoscopy

Jieru YAO , Xueran LI , Qiang XIE , Longfei HAN , Yiwen JIA , Nian LIU , Dingwen ZHANG , Junwei HAN

Front. Comput. Sci. ›› 2025, Vol. 19 ›› Issue (4) : 194331

PDF (469KB)
Front. Comput. Sci. ›› 2025, Vol. 19 ›› Issue (4) : 194331 DOI: 10.1007/s11704-024-40319-8
Excellent Young Computer Scientists Vision
LETTER

LLaVA-Endo: a large language-and-vision assistant for gastrointestinal endoscopy

Author information +
History +
PDF (469KB)

Graphical abstract

Cite this article

Download citation ▾
Jieru YAO, Xueran LI, Qiang XIE, Longfei HAN, Yiwen JIA, Nian LIU, Dingwen ZHANG, Junwei HAN. LLaVA-Endo: a large language-and-vision assistant for gastrointestinal endoscopy. Front. Comput. Sci., 2025, 19(4): 194331 DOI:10.1007/s11704-024-40319-8

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Forrest J H, Finlayson N D C, Shearman D J C . Endoscopy in gastrointestinal bleeding. The Lancet, 1974, 304( 7877): 394–397

[2]

Sharma P, Pante A, Gross S A . Artificial intelligence in endoscopy. Gastrointestinal Endoscopy, 2020, 91( 4): 925–931

[3]

Liu H, Li C, Wu Q, Lee Y J. Visual instruction tuning. In: Proceedings of the 37th Conference on Neural Information Processing Systems. 2023, 36

[4]

Li J, Li D, Xiong C, Hoi S. BLIP: bootstrapping language-image pre-training for unified vision-language understanding and generation. In: Proceedings of the 39th International Conference on Machine Learning. 2022, 12888–12900

[5]

Ye Q, Xu H, Ye J, Yan M, Hu A, Liu H, Qian Q, Zhang J, Huang F, Zhou J. mPLUG-Owl2: revolutionizing multi-modal large language model with modality collaboration. 2023, arXiv preprint arXiv: 2311.04257

[6]

Wu C, Zhang X, Zhang Y, Wang Y, Xie W. Towards generalist foundation model for radiology by leveraging web-scale 2D&3D medical data. 2023, arXiv preprint arXiv: 2308.02463

[7]

Li C, Wong C, Zhang S, Usuyama N, Liu H, Yang J, Naumann T, Poon H, Gao J. LLaVA-med: training a large language-and-vision assistant for biomedicine in one day. In: Proceedings of the 37th Conference on Neural Information Processing Systems. 2024, 36

[8]

Hu E J, Shen Y, Wallis P, Allen-Zhu Z, Li Y, Wang S, Wang L, Chen W. LoRA: low-rank adaptation of large language models. In: Proceedings of the 10th International Conference on Learning Representations. 2022

[9]

Mu Y, Zhang Q, Hu M, Wang W, Ding M, Jin J, Wang B, Dai J, Qiao Y, Luo P. Appendix for embodiedGPT: vision-language pre-training via embodied chain of thought. In: Proceedings of the 37th Conference on Neural Information Processing Systems. 2024, 36

RIGHTS & PERMISSIONS

Higher Education Press

AI Summary AI Mindmap
PDF (469KB)

Supplementary files

Highlights

1899

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/