A survey of datasets in medicine for large language models

Deshiwei Zhang , Xiaojuan Xue , Peng Gao , Zhijuan Jin , Menghan Hu , Yue Wu , Xiayang Ying

Intelligence & Robotics ›› 2024, Vol. 4 ›› Issue (4) : 457 -78.

PDF
Intelligence & Robotics ›› 2024, Vol. 4 ›› Issue (4) :457 -78. DOI: 10.20517/ir.2024.27
Review
Review

A survey of datasets in medicine for large language models

Author information +
History +
PDF

Abstract

With the advent of models such as ChatGPT and other models, large language models (LLMs) have demonstrated unprecedented capabilities in understanding and generating natural language, presenting novel opportunities and challenges within the medicine domain. While there have been many studies focusing on the employment of LLMs in medicine, comprehensive reviews of the datasets utilized in this field remain scarce. This survey seeks to address this gap by providing a comprehensive overview of the datasets in medicine fueling LLMs, highlighting their unique characteristics and the critical roles they play at different stages of LLMs' development: pre-training, fine-tuning, and evaluation. Ultimately, this survey aims to underline the significance of datasets in realizing the full potential of LLMs to innovate and improve healthcare outcomes.

Keywords

Large language models (LLMs) / NLP / dataset in medicine / Q&A system in medicine

Cite this article

Download citation ▾
Deshiwei Zhang, Xiaojuan Xue, Peng Gao, Zhijuan Jin, Menghan Hu, Yue Wu, Xiayang Ying. A survey of datasets in medicine for large language models. Intelligence & Robotics, 2024, 4(4): 457-78 DOI:10.20517/ir.2024.27

登录浏览全文

4963

注册一个新账户 忘记密码

References

AI Summary AI Mindmap
PDF

67

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/