LLaVA-Endo: a large language-and-vision assistant for gastrointestinal endoscopy
Jieru YAO, Xueran LI, Qiang XIE, Longfei HAN, Yiwen JIA, Nian LIU, Dingwen ZHANG, Junwei HAN
LLaVA-Endo: a large language-and-vision assistant for gastrointestinal endoscopy
[1] |
Forrest J H, Finlayson N D C, Shearman D J C . Endoscopy in gastrointestinal bleeding. The Lancet, 1974, 304( 7877): 394–397
|
[2] |
Sharma P, Pante A, Gross S A . Artificial intelligence in endoscopy. Gastrointestinal Endoscopy, 2020, 91( 4): 925–931
|
[3] |
Liu H, Li C, Wu Q, Lee Y J. Visual instruction tuning. In: Proceedings of the 37th Conference on Neural Information Processing Systems. 2023, 36
|
[4] |
Li J, Li D, Xiong C, Hoi S. BLIP: bootstrapping language-image pre-training for unified vision-language understanding and generation. In: Proceedings of the 39th International Conference on Machine Learning. 2022, 12888–12900
|
[5] |
Ye Q, Xu H, Ye J, Yan M, Hu A, Liu H, Qian Q, Zhang J, Huang F, Zhou J. mPLUG-Owl2: revolutionizing multi-modal large language model with modality collaboration. 2023, arXiv preprint arXiv: 2311.04257
|
[6] |
Wu C, Zhang X, Zhang Y, Wang Y, Xie W. Towards generalist foundation model for radiology by leveraging web-scale 2D&3D medical data. 2023, arXiv preprint arXiv: 2308.02463
|
[7] |
Li C, Wong C, Zhang S, Usuyama N, Liu H, Yang J, Naumann T, Poon H, Gao J. LLaVA-med: training a large language-and-vision assistant for biomedicine in one day. In: Proceedings of the 37th Conference on Neural Information Processing Systems. 2024, 36
|
[8] |
Hu E J, Shen Y, Wallis P, Allen-Zhu Z, Li Y, Wang S, Wang L, Chen W. LoRA: low-rank adaptation of large language models. In: Proceedings of the 10th International Conference on Learning Representations. 2022
|
[9] |
Mu Y, Zhang Q, Hu M, Wang W, Ding M, Jin J, Wang B, Dai J, Qiao Y, Luo P. Appendix for embodiedGPT: vision-language pre-training via embodied chain of thought. In: Proceedings of the 37th Conference on Neural Information Processing Systems. 2024, 36
|
/
〈 | 〉 |