Traditional automated machine learning (AutoML) often faces limitations in manual effort, complexity management, and subjective design choices. This paper introduces a novel LLM-driven AutoML framework centered on the innovation of decomposed prompting. We hypothesize that by strategically breaking down complex AutoML tasks into sequential, guided sub-prompts, Large Language Models (LLMs) operating within a code sandbox on standard PCs can autonomously design, implement, evaluate, and select high-performing machine learning models. To validate this, we primarily applied our decomposed prompting approach to sleep disorder classification (illustrating potential benefits in healthcare). To assess the generalizability and robustness of our method across different data types, we subsequently evaluated it on the established 20 Newsgroups text classification benchmark. We rigorously compared decomposed prompting against zero-shot and few-shot prompting strategies, as well as a manually engineered baseline. Our results demonstrate that decomposed prompting significantly outperforms these alternatives. Our results demonstrate that decomposed prompting significantly outperforms alternatives, enabling the LLM to autonomously achieve superior classifier design and performance, particularly showing strong results in the primary sleep disorder domain and demonstrating robustness in the benchmark task. These findings underscore the transformative potential of decomposed prompting as a key technique for advancing LLM-driven AutoML across diverse application areas beyond the specific examples explored here, paving the way for more automated and accessible problem-solving in scientific and engineering disciplines.
Author Contributions
Y.Z.: Literature review, methodology, original draft preparation, methodology, writing, and revising. J.P.: reviewing and editing. X.Z.: reviewing and editing. W.S.: supervision, reviewing, and editing. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Data Availability Statement
The data used in this study are from the publicly available Sleep Health and Lifestyle Dataset, which can be accessed at
https://www.kaggle.com/datasets/uom190346a/sleep-health-and-lifestyle-dataset. 20Newsgroup Dataset. Available online:
http://qwone.com/~jason/20Newsgroups.Conflicts of Interest
The authors declare no conflict of interest.
| [1] |
Ibomoiye Domor Mienye N.; Jere N.. Survey of Decision Trees: Concepts, Algorithms, and Applications. IEEE Xplore. May 2024).
|
| [2] |
Kumari A.; Akhtar M.; Shah R.; et al. Support matrix machine: A review. arXiv 2023, arXiv:2310.19717.
|
| [3] |
Curth A.; Jeffares A.; van der Schaar M. Why do random forests work? Understanding tree ensembles as self-regularizing adaptive smoothers. arXiv 2024, arXiv:2402.01502.
|
| [4] |
Vaswani A.; Shazeer N.; Parmar N.; et al. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30.
|
| [5] |
Kim Y.; Xu X.; McDuff D.; et al. Health-LLM: Large language models for health prediction via wearable sensor data. arXiv 2024, arXiv:2401.06885.
|
| [6] |
Nori H.; Lee Y.T.; Zhang S.; et al. Can generalist foundation models outcompete special-purpose tuning? Case study in medicine. arXiv 2023, arXiv:2311.16452.
|
| [7] |
Saab K.; Tu T.; Weng W.-H.; et al. Capabilities of Gemini models in medicine. arXiv 2024, arXiv:2404.18416.
|
| [8] |
Singhal K.; Tu T.; Gottweis J.; et al. Towards expert-level medical question answering with large language models. arXiv 2023, arXiv:2305.09617.
|
| [9] |
McDuff D.; Schaekermann M.; Tu T.; et al. Towards accurate differential diagnosis with large language models. arXiv 2023, arXiv:2312.00164.
|
| [10] |
Wang G.; Zhao W.; Han J.; et al. MedFound: The first medical large language model passing the physician qualification examination. J. Artif. Intell. 2024, 5, 1-12.
|
| [11] |
McDuff D.; Xu X.; Kim Y.; et al. Personal health large language model (PH-LLM): Leveraging large language models for personalized health insights. arXiv 2023, arXiv:2311.17133.
|
| [12] |
Zhang Y.; Maziarka P.; Klicpera J.; et al. DiffSBDD: Equivariant diffusion for structure-based drug design. arXiv 2024, arXiv:2403.14338.
|
| [13] |
Liu X.; Uchiyama M.; Okawa M.; et al. Prevalence and correlates of insomnia in the Japanese general population: Results from the Japan epidemiological sleep study. Sleep 2000, 23, 497-506.
|
| [14] |
Katz D.M.; Bommarito M.J.; Gao S.; et al. GPT-4 passes the bar exam. Philos. Trans. R. Soc. A 2024, 382, 20230254
|
| [15] |
Sleep Health and Lifestyle Dataset.Available online: accessed on 10 May 2025).
|
| [16] |
20Newsgroup Dataset. Available online: accessed on 10 May 2025).
|
| [17] |
Doubao-Your Intelligent AI Assistant. doubao.com, ByteDance. Available online: doubao.com/download/desktop (accessed on 13 May 2024).
|
| [18] |
Wei J.; Wang X.; Schuurmans D.; et al. Chain-of-thought prompting elicits reasoning in large language models. arXiv 2022, arXiv:2201.11903.
|
| [19] |
Xiang V.; Snell C.; Gandhi K.; et al. Towards system 2 reasoning in LLMs: Learning how to think with meta chain-of-thought. arXiv 2025, arXiv:2501.04682.
|
| [20] |
Zhou X.; Huang M.; Wang H.; et al. Contextual prompting for few-shot text classification. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Abu Dhabi, UAE, 7-11 December 2022; pp. 9312-9327.
|
| [21] |
Chawla N.V.; Bowyer K.W.; Hall L.O.; et al. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321-357.
|
| [22] |
LeCun Y.; Bengio Y.; Hinton G. Deep learning. Nature 2015, 521, 436-444.
|
| [23] |
LeCun Y.; Bottou L.; Bengio Y.; et al. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278-2324.
|
| [24] |
Graves A.; Mohamed A.-R.; Hinton G. Speech recognition with deep recurrent neural networks. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, Canada, 26-31 May 2013; pp. 6645-6649.
|
| [25] |
Hochreiter S.; Schmidhuber J. Long short-term memory. Neural Comput. 1997, 9, 1735-1780.
|
| [26] |
Cho K.; Van Merriënboer B.; Bahdanau D.; et al. On the properties of neural machine translation: Encoder-decoder approaches. arXiv 2014, arXiv:1409.1259.
|