LLM-Prompting Driven AutoML: From Sleep Disorder—Classification to Beyond

Yutong Zhao; Jianye Pang; Xinjie Zhu; Wenhua Shao

doi:10.53941/tai.2025.100004

Transactions on Artificial Intelligence ›› 2025, Vol. 1 ›› Issue (1) :59 -82. DOI: 10.53941/tai.2025.100004

Article

research-article

LLM-Prompting Driven AutoML: From Sleep Disorder—Classification to Beyond

Yutong Zhao ¹^,^†
, Jianye Pang ²^,^†
, Xinjie Zhu ²^,^†
, Wenhua Shao ³^,^*

Author information +

History +

PDF (3325KB)

Abstract

Traditional automated machine learning (AutoML) often faces limitations in manual effort, complexity management, and subjective design choices. This paper introduces a novel LLM-driven AutoML framework centered on the innovation of decomposed prompting. We hypothesize that by strategically breaking down complex AutoML tasks into sequential, guided sub-prompts, Large Language Models (LLMs) operating within a code sandbox on standard PCs can autonomously design, implement, evaluate, and select high-performing machine learning models. To validate this, we primarily applied our decomposed prompting approach to sleep disorder classification (illustrating potential benefits in healthcare). To assess the generalizability and robustness of our method across different data types, we subsequently evaluated it on the established 20 Newsgroups text classification benchmark. We rigorously compared decomposed prompting against zero-shot and few-shot prompting strategies, as well as a manually engineered baseline. Our results demonstrate that decomposed prompting significantly outperforms these alternatives. Our results demonstrate that decomposed prompting significantly outperforms alternatives, enabling the LLM to autonomously achieve superior classifier design and performance, particularly showing strong results in the primary sleep disorder domain and demonstrating robustness in the benchmark task. These findings underscore the transformative potential of decomposed prompting as a key technique for advancing LLM-driven AutoML across diverse application areas beyond the specific examples explored here, paving the way for more automated and accessible problem-solving in scientific and engineering disciplines.

Keywords

AutoML / LLM / prompt engineering / Sleep Disorder Newsgroups Classification

Cite this article

Download citation ▾

Yutong Zhao, Jianye Pang, Xinjie Zhu, Wenhua Shao. LLM-Prompting Driven AutoML: From Sleep Disorder—Classification to Beyond. Transactions on Artificial Intelligence, 2025, 1(1): 59-82 DOI:10.53941/tai.2025.100004

登录浏览全文

4963

注册一个新账户忘记密码

Author Contributions

Y.Z.: Literature review, methodology, original draft preparation, methodology, writing, and revising. J.P.: reviewing and editing. X.Z.: reviewing and editing. W.S.: supervision, reviewing, and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data used in this study are from the publicly available Sleep Health and Lifestyle Dataset, which can be accessed at https://www.kaggle.com/datasets/uom190346a/sleep-health-and-lifestyle-dataset. 20Newsgroup Dataset. Available online: http://qwone.com/~jason/20Newsgroups.

Conflicts of Interest

The authors declare no conflict of interest.

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Ibomoiye Domor Mienye N.; Jere N.. Survey of Decision Trees: Concepts, Algorithms, and Applications. IEEE Xplore. May 2024).

[2]	Kumari A.; Akhtar M.; Shah R.; et al. Support matrix machine: A review. arXiv 2023, arXiv:2310.19717.

[3]	Curth A.; Jeffares A.; van der Schaar M. Why do random forests work? Understanding tree ensembles as self-regularizing adaptive smoothers. arXiv 2024, arXiv:2402.01502.

[4]	Vaswani A.; Shazeer N.; Parmar N.; et al. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30.

[5]	Kim Y.; Xu X.; McDuff D.; et al. Health-LLM: Large language models for health prediction via wearable sensor data. arXiv 2024, arXiv:2401.06885.

[6]	Nori H.; Lee Y.T.; Zhang S.; et al. Can generalist foundation models outcompete special-purpose tuning? Case study in medicine. arXiv 2023, arXiv:2311.16452.

[7]	Saab K.; Tu T.; Weng W.-H.; et al. Capabilities of Gemini models in medicine. arXiv 2024, arXiv:2404.18416.

[8]	Singhal K.; Tu T.; Gottweis J.; et al. Towards expert-level medical question answering with large language models. arXiv 2023, arXiv:2305.09617.

[9]	McDuff D.; Schaekermann M.; Tu T.; et al. Towards accurate differential diagnosis with large language models. arXiv 2023, arXiv:2312.00164.

[10]	Wang G.; Zhao W.; Han J.; et al. MedFound: The first medical large language model passing the physician qualification examination. J. Artif. Intell. 2024, 5, 1-12.

[11]	McDuff D.; Xu X.; Kim Y.; et al. Personal health large language model (PH-LLM): Leveraging large language models for personalized health insights. arXiv 2023, arXiv:2311.17133.

[12]	Zhang Y.; Maziarka P.; Klicpera J.; et al. DiffSBDD: Equivariant diffusion for structure-based drug design. arXiv 2024, arXiv:2403.14338.

[13]	Liu X.; Uchiyama M.; Okawa M.; et al. Prevalence and correlates of insomnia in the Japanese general population: Results from the Japan epidemiological sleep study. Sleep 2000, 23, 497-506.

[14]	Katz D.M.; Bommarito M.J.; Gao S.; et al. GPT-4 passes the bar exam. Philos. Trans. R. Soc. A 2024, 382, 20230254

[15]	Sleep Health and Lifestyle Dataset.Available online: accessed on 10 May 2025).

[16]	20Newsgroup Dataset. Available online: accessed on 10 May 2025).

[17]	Doubao-Your Intelligent AI Assistant. doubao.com, ByteDance. Available online: doubao.com/download/desktop (accessed on 13 May 2024).

[18]	Wei J.; Wang X.; Schuurmans D.; et al. Chain-of-thought prompting elicits reasoning in large language models. arXiv 2022, arXiv:2201.11903.

[19]	Xiang V.; Snell C.; Gandhi K.; et al. Towards system 2 reasoning in LLMs: Learning how to think with meta chain-of-thought. arXiv 2025, arXiv:2501.04682.

[20]	Zhou X.; Huang M.; Wang H.; et al. Contextual prompting for few-shot text classification. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Abu Dhabi, UAE, 7-11 December 2022; pp. 9312-9327.

[21]	Chawla N.V.; Bowyer K.W.; Hall L.O.; et al. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321-357.

[22]	LeCun Y.; Bengio Y.; Hinton G. Deep learning. Nature 2015, 521, 436-444.

[23]	LeCun Y.; Bottou L.; Bengio Y.; et al. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278-2324.

[24]	Graves A.; Mohamed A.-R.; Hinton G. Speech recognition with deep recurrent neural networks. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, Canada, 26-31 May 2013; pp. 6645-6649.

[25]	Hochreiter S.; Schmidhuber J. Long short-term memory. Neural Comput. 1997, 9, 1735-1780.

[26]	Cho K.; Van Merriënboer B.; Bahdanau D.; et al. On the properties of neural machine translation: Encoder-decoder approaches. arXiv 2014, arXiv:1409.1259.

PDF (3325KB)

Accesses

Citation

Detail

Sections

Recommended

About the journal

Aims & scope

Description

Editorial board

Cover gallery

Contact us

Browse

Just accepted

Online first

Latest issue

All volumes and issues

Collections

Featured articles

Most accessed

Most cited

Collections

Authors & reviewers

Online submisson

Guidelines for authors

Editorial policy

Ethical requirements

Download templates