Optimized machine learning algorithms with SHAP analysis for predicting compressive strength in high-performance concrete
Samuel Olaoluwa Abioye , Yusuf Olawale Babatunde , Oluwafikejimi Abigail Abikoye , Aisha Nene Shaibu , Bailey Jonathan Bankole
AI in Civil Engineering ›› 2025, Vol. 4 ›› Issue (1) : 16
This research examines the application of eight different machine learning (ML) algorithms for predicting the compressive strength of high-performance concrete (HPC). Achieving precise predictions is crucial for enhancing structural reliability and optimizing resource usage in construction projects. The analysis utilized the “Concrete Compressive Strength” dataset, sourced from UC Irvine’s publicly available ML repository. The models evaluated include Gradient Boosting Regressor (GBR), Extreme Gradient Boosting Regression (XGBoost), Random Forest (RF), Support Vector Regression (SVR), Artificial Neural Network (ANN), Multilayer Perceptron (MLP), Lasso, and k-Nearest Neighbors (KNN). To enhance performance, critical data preprocessing steps were undertaken, which involved feature scaling, cleaning, and normalization. Hyperparameter tuning via Grid Search (GS) and K-fold cross-validation further optimized the models. Among those analyzed, XGBoost and GBR achieved the highest predictive accuracy, with R2 values of 93.49% and 92.09% respectively, coupled with lower mean squared error (MSE), mean absolute error (MAE), and root mean squared error (RMSE). SHapley Additive exPlanations (SHAP) analysis revealed cement content and curing age as the most significant factors affecting compressive strength. Validation against experimental data confirmed the reliability of XGBoost and GBR through consistent prediction patterns and close alignment with empirical measurements. The results establish ML as an effective approach for HPC strength prediction, offering advantages in computational efficiency and accuracy over conventional analytical methods.
Concrete strength / Machine learning / Prediction / Boosting / Regression / High performance concrete / Hyperparameter tuning / Grid search / Cross validation / SHapley Additive exPlanations
| [1] |
|
| [2] |
|
| [3] |
|
| [4] |
|
| [5] |
|
| [6] |
|
| [7] |
|
| [8] |
|
| [9] |
|
| [10] |
|
| [11] |
Chen T., Guestrin C. (2016). XGBoost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. https://doi.org/10.1145/2939672.2939785 |
| [12] |
Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785–794). ACM. https://doi.org/10.1145/2939672.2939785 |
| [13] |
|
| [14] |
|
| [15] |
|
| [16] |
|
| [17] |
|
| [18] |
|
| [19] |
|
| [20] |
|
| [21] |
|
| [22] |
|
| [23] |
|
| [24] |
|
| [25] |
|
| [26] |
|
| [27] |
|
| [28] |
|
| [29] |
Lundberg S, Lee S-I. (2017). A unified approach to interpreting model predictions. Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS 2017). https://doi.org/10.48550/arXiv.1705.07874 |
| [30] |
|
| [31] |
|
| [32] |
|
| [33] |
|
| [34] |
|
| [35] |
|
| [36] |
|
| [37] |
|
| [38] |
|
| [39] |
|
| [40] |
|
| [41] |
|
| [42] |
|
| [43] |
|
| [44] |
|
| [45] |
Xu, D., Chong, H., Main, I., Mineter, M., De Bold, R., Forde, M., Gair, C., Madden, P., Angus, E., & Ho, C. (2019). Using statistical models and machine learning techniques to process big data from the Forth Road Bridge. International Conference on Smart Infrastructure and Construction 2019 (ICSIC) Driving Data-Informed Decision-Making, ICE Publishing, 411–419. https://doi.org/10.1680/icsic.64669.411 |
| [46] |
|
| [47] |
|
The Author(s)
/
| 〈 |
|
〉 |