Large language model-based multi-agent systems for automated foundation design: router-driven task classification and expert selection framework
Sompote Youwai , David Phim , Vianne Gayl Murcia , Rianne Clair Onas
AI in Civil Engineering ›› 2026, Vol. 5 ›› Issue (1) : 5
This preliminary study introduces and evaluates a router-based multi-agent framework for automated foundation design calculations through intelligent task classification and expert selection. Three configurations were assessed: single-agent processing, multi-agent designer-checker architecture, and router-based expert selection, using baseline models including DeepSeek R1, ChatGPT 4 Turbo, Grok 3, and Gemini 2.5 Pro. Initial evaluation on 27 test cases with triple-trial execution shows promising performance: the router-based system achieved 95.00% for shallow foundations and 90.63% for pile design, representing improvements of 8.75 and 3.13 percentage points over standalone Grok 3, respectively, and outperforming conventional workflows by 10.0–43.75 percentage points. Grok 3 demonstrated superior standalone performance, indicating enhanced large language model (LLM) mathematical reasoning capabilities. The dual-tier classification framework successfully distinguished foundation types, enabling appropriate analytical approaches. While these preliminary results suggest router-based multi-agent systems as a promising approach for foundation design automation, the limited sample size necessitates comprehensive validation on larger, more diverse datasets before deployment recommendations. Safety–critical requirements necessitate continued human oversight in professional applications. This work provides a methodological foundation for future research in AI-assisted geotechnical engineering.
Router / Based multi / Agent systems / Large language models / Foundation design / Geotechnical engineering / Task classification / Expert selection / AI / Assisted engineering
| [1] |
|
| [2] |
|
| [3] |
DeepSeek-AI. (2025). DeepSeek-R1: Incentivizing reasoning capability in LLMs via reinforcement learning (No. arXiv:2501.12948). arXiv. https://doi.org/10.48550/arXiv.2501.12948 |
| [4] |
Google DeepMind. (2025). Gemini 2.5: Our newest Gemini model with thinking. https://blog.google/technology/google-deepmind/gemini-model-thinking-updates-march-2025/ |
| [5] |
Guan, L., Valmeekam, K., Sreedharan, S., & Kambhampati, S. (2023). Leveraging pre-trained large language models to construct and utilize world models for model-based task planning (Version 2). arXiv. https://doi.org/10.48550/ARXIV.2305.14909 |
| [6] |
|
| [7] |
|
| [8] |
|
| [9] |
IEEE. (2024). IEEE standard for robustness evaluation test methods for a natural language processing service that uses machine learning (No. IEEE Std 3168–2024). IEEE. https://doi.org/10.1109/IEEESTD.2024.10636902 |
| [10] |
|
| [11] |
|
| [12] |
|
| [13] |
|
| [14] |
Liang, H., Kalaleh, M. T., & Mei, Q. (2025). Integrating large language models for automated structural analysis (No. arXiv:2504.09754). arXiv. https://doi.org/10.48550/arXiv.2504.09754 |
| [15] |
n8n GmbH. (2025). n8n: Workflow automation platform. https://n8n.io/ |
| [16] |
OpenAI. (2023). GPT-4 Technical Report. https://doi.org/10.48550/ARXIV.2303.08774 |
| [17] |
OpenRouter. (2024). OpenRouter: Find the best LLM for your use case. https://openrouter.ai |
| [18] |
|
| [19] |
|
| [20] |
SerpApi. (2024). SerpApi: Google Search API. https://serpapi.com/ |
| [21] |
|
| [22] |
|
| [23] |
|
| [24] |
Vesic, A. S. (1977). Design of pile foundations (No. NCHRP Synthesis 42; p. 68). National Cooperative Highway Research Program. https://onlinepubs.trb.org/Onlinepubs/nchrp/nchrp_syn_42.pdf |
| [25] |
Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., & Zhou, D. (2022). Chain-of-thought prompting elicits reasoning in large language models (Version 6). arXiv. https://doi.org/10.48550/ARXIV.2201.11903 |
| [26] |
xAI. (2025). Grok 3 Beta—The Age of Reasoning Agents. https://x.ai/news/grok-3 |
| [27] |
|
| [28] |
Yang, H., Siew, M., & Joe-Wong, C. (2024). An LLM-Based Digital Twin for Optimizing Human-in-the Loop Systems (Version 1). arXiv. https://doi.org/10.48550/ARXIV.2403.16809 |
| [29] |
|
| [30] |
|
The Author(s)
/
| 〈 |
|
〉 |