Data proportionality and its impact on machine learning predictions of ground granulated blast furnace slag concrete strength

Jitendra KHATTI; Panagiotis G. ASTERIS; Abidhan BARDHAN

doi:10.1007/s11709-025-1192-5

Front. Struct. Civ. Eng. ›› 2025, Vol. 19 ›› Issue (8) :1305 -1333. DOI: 10.1007/s11709-025-1192-5

RESEARCH ARTICLE

Data proportionality and its impact on machine learning predictions of ground granulated blast furnace slag concrete strength

Author information +

History +

PDF (7764KB)

Abstract

Sustainable concrete’s compressive strength (CST) ensures structural safety, durability, and performance while minimizing environmental impact. It supports eco-friendly design, resource optimization, and compliance with green building standards. Determining the CST using laboratory procedures is time-consuming and lengthy. Therefore, the present research introduces a reliable machine learning (ML) model for assessing the CST of ground granulated blast furnace slag (GGBS) concrete by comparing ten ML models. In addition, this work presents the data proportionality effect on the performance and overfitting of ML models. For that purpose, a database has been compiled from the literature and created three data sets (training: testing), i.e., 70%:30%, 80%:20%, and 85%:15%. The analysis of performance metrics (correlation coefficient of 0.8526 and 0.9780 for 70%:30% and 85%:15%, respectively) presented that the performance of Takagi Sugeno Fuzzy (TSF) model has been enhanced with the database. The TSF model has predicted CST of GGBS concrete with a root mean square error of 3.2460 MPa and performance index of 1.86. In addition, the regression error characteristics curve, score analysis, and uncertainty analysis showed the superiority of the TSF model. Conversely, the a20 (= 93.75), agreement (= 0.90), and scatter (= 0.08) indexes presented that the TSF model is highly reliable in predicting the CST of GGBS concrete. The multicollinearity analysis revealed that the considerable multicollinearity of GGBS to binder ratio and fine aggregate features affected the performance and curve fitting of k-nearest neighbor and multilayer perceptron models. Overall analysis shows that 85% training data set improves generalization by capturing diverse data patterns and minimizes noise and outliers, resulting in a more robust model. The present investigation helps concrete designers and engineers assess the desired CST of GGBS concrete using mixed design parameters.

Graphical abstract

Keywords

compressive strength / sustainable concrete / concrete technology / multicollinearity / Takagi−Sugeno fuzzy / artificial intelligence

Cite this article

Download citation ▾

Jitendra KHATTI, Panagiotis G. ASTERIS, Abidhan BARDHAN. Data proportionality and its impact on machine learning predictions of ground granulated blast furnace slag concrete strength. Front. Struct. Civ. Eng., 2025, 19(8): 1305-1333 DOI:10.1007/s11709-025-1192-5

登录浏览全文

4963

注册一个新账户忘记密码

1 Introduction

Concrete is a human-made construction material used in every civil engineering project. Concrete is a mixture of aggregates (coarse and fine), cement, and water. In recent decades, researchers and scientists have started adding industrial by-products to enhance the mechanical properties of concrete, and ground granulated blast furnace slag (GGBS) is one of them. The GGBS consists of silicates and alumino-silicates of calcium formed during the blast furnace of iron. In the formation process, the molten slag is turned into almost fully non-crystalline, glassy, granular, and fine form, called granulated slag, after the cooling process [1]. In India, 25% to 40% SiO₂, 35% to 45% CaO, 10% to 20% Al₂O₃, less than 10% MgO, 0.5% to 15% Fe₂O₃, less than 3% SO₃, and a smaller amount of TiO₂, K₂O, and Na₂O are found in the GGBS. However, the fluxing stone, ores, and impurities in the coke feed into the blast furnace decide the chemical composition of GGBS. Conversely, the chemical composition affects the hydraulic activity [2]. The alkaline activators accelerate the hydraulic activity. The strength increases with increasing Al₂O₃ content, and alumina (MgO) significantly increases the strength in the deficiency in CaO. A slight improvement in strength can be observed by adding 8% to 10% MgO. More than 10% MgO content negatively affects the strength [2]. It is noted that the hydration of the GGBS is slower than that of Portland cement when only water is added. Therefore, the alkalis and cement content accelerate the hydration process. This process depends on the breakdown and dissolution of slag structures by

O H −

Ions. As a result, the C-S-H gel is formed by reacting the Na, K alkali, and Ca(OH)₂. Piro et al. [3,4] determined the impact of slag and ferrous on the flow of the electric current. Qi et al. [5], Rawat and Pasla [6], Chen et al. [7], Yu et al. [8], and Quan et al. [9] experimentally determined the GGBS with activator enhances the mechanical properties of the concrete. The conventional procedures for assessing the mechanical properties, such as compressive, split tensile, and flexural strength, of GGBS concrete are time-consuming and lengthy. Azimi and Toufigh [10] observed a refinement in the pore volumes by adding slag. Tam et al. [11], Charhate et al. [12], Kamath et al. [13], DeRousseau et al. [14], and Dutta et al. [15] assessed the compressive strength (CST) of the concrete using the multilinear regression methods and compared with advanced computational methods. The comparison illustrated that 1) a preliminary assessment of the strength of concrete can be obtained by the multilinear regression method and 2) the advanced computational methods are more reliable and accurate than the multilinear regression methods. Yang et al. [16] stated that the different machine-learning approaches achieve different accuracies in assessing the CST of fly ash concrete. However, the particle swarm optimized-extreme gradient boosting (PSO_XGBoost) model outperformed the conventional and optimized random forest (RF), support vector regressor (SVR), extreme gradient boosting (XGBoost), genetic programming, and artificial neural network (ANN). Wang et al. [17] estimated the CST of geopolymer concrete using gene expression programming. The researchers also reported that the CST of concrete correlates 0.72 and 0.71 with GGBS and fly ash (FAsh), respectively, presenting a significant impact on the concrete strength. Kashem et al. [18] reported that the optimized-regression tree assesses the CST of rice husk ash (RHA) concrete with a root mean square error (RMSE) of 5.255 MPa. Based on the SHapley Additive exPlanations (SHAP) analysis, the investigators reported that the cement (= 10.17) is a significant variable for CST of RHA concrete, followed by the curing period-CP (= 9.52), water-W (= 8.13), rice husk ash-RHA (= 3.68), superplasticizer-SP (= 3.54), and aggregate-A (= 3.00). Karim et al. [19] analyzed the synergistic effect of RHA and fine aggregate-FA on the CST of concrete. Kalabarige et al. [20] predicted the CST of sustainably developed concrete with a performance of 0.9614. Jamali et al. [21] measured the capillary pressure of GGBS concrete using a deep neural network. The investigators also found that the capillary pressure rate decreases when cement is replaced with GGBS content. Also, concrete’s initial and final setting times are increased by two times (approx). Golafshani et al. [22] estimated the CST of geopolymer concrete using ensemble models. Mohammed et al. [23] concluded that the ANN model predicted the CST of normal and high-performance concrete with high performance and the least residuals. Dodo et al. [24] found that the ensemble boosting model predicts the CST better than the multilayer perceptron model. Choudhary et al. [25] compared the deep learning (DL) generalized linear model, extremely randomized tree, distributed random forest, stacked ensemble, and gradient boosting (GB) models in assessing the CST of concrete. The authors noted that the DL model assessed the CST with an RMSE of 3.2 MPa. The authors also found that the curing period is the most sensitive variable in predicting the CST of concrete, followed by FA, GGBS, NaOH, and Na₂SiO₃. Gogineni et al. [26], Kamath et al. [13], Aslam and Shahab [27], Gogineni et al. [28], Philip et al. [29], Kumar et al. [30], Dinesh and Prasad [31], Singh and Rajhans [32], and Gogineni et al. [33] concluded that 1) GGBS and alkali concentration play an important role in estimating the CST of concrete; 2) machine learning approach achieves the performance more than 0.95 in the testing phase; and 3) curing temperature and period significantly affect the CST of fiber-reinforced GGBS concrete.

Wadhawan et al. [34] concluded that DT and GB models predict the CST of FAsh concrete with a performance of more than 0.95 in the testing phase. Sami et al. [35] performed the feasibility analysis to assess concrete’s tensile and CSTs. The investigators concluded that the rational quadratic covariance function-based Gaussian process regression estimated the strength properties of concrete with a performance of more than 0.99. Paudel et al. [36] obtained a performance of 0.9747 using the XGBoost model in estimating the CST of concrete. The researchers concluded that the fly ash agrees with the CST of concrete. On the other hand, the sensitivity analysis showed that cement and water content significantly affect the estimation of the CST of concrete. Kioumarsi et al. [37] utilized C, GGBS, W, coarse aggregate (CA), FA, and CP parameters to estimate the CST of concrete. The researchers noted that the CST of concrete decreases up to 45.51% by adding 100% GGBS. Cao [38] employed soft computing models using 240 data points to assess the CST of concrete. The investigation noted that the concrete porosity decreases with FAsh and GGBS content. Dash et al. [39], Gupta et al. [40], Kina et al. [41], Al Martini et al. [42], Nhat-Duc [43], Chi et al. [44], Huang et al. [45] compared the different machine, advanced machine, deep, and hybrid learning models in predicting the CST of GGBS concrete. These published works present that the investigators used different databases, and it is questionable to select the most suitable soft computing approach for predicting concrete strength. Shah et al. [46] noted that the RF model predicts the CST of concrete better than the gene expression programming (GEP), ANN, and M5 tree. Moreover, Shah et al. [47] employed ANN and DT models to predict the CST of the GGBS concrete. The authors found that the significant variables were the water content, curing period, and cement. Rathakrishnan et al. [48] predicted high-performance concrete (HPC) CST using boosting machine learning approaches. The investigators concluded that the optimized booting approach accurately predicts the HPC strength. Nazar et al. [49] compared the RF, DT, and GEP models in estimating the CST of nanomodified concrete using 94 concrete specimens. In addition, the researchers concluded that the K-fold value affects the prediction capabilities and performance of the models. Hameed et al. [50] introduced genetic (GA_SVR) and particle swarm optimized (PSO_SVR) support vector regressor approaches to assess the CST of GGBS and FA mixed concrete. The authors concluded that the PSO_SVR model is more reliable than the GA_SVR model. Cao et al. [51] achieved 0.9899 performance for the XGBoost model, higher than MLP and SVR, using 151 data points in predicting the CST of GGBS and FA mixed concrete. Amin et al. [52] performed the permutation analysis and concluded that GGBS, temperature, and fly ash content highly influence the geopolymer concrete strength. Ahmed et al. [53] determined the effect of hidden layers and neurons on the performance of the ANN models. The authors found that the ANN model is better than linear regression, MLR, and M5 tree models in assessing the CST of geopolymer concrete. Biswal et al. [54], Shanmugasundaram et al. [55], Ahmad et al. [56], Imran et al. [57], Zhang et al. [58], Ghosh and Ransinchung [59], and Suprakash et al. [60] employed several artificial intelligence models using the different training and testing database ratios, i.e., 70%:30%, 75%:25%, and 80%:20%. Still, the machine, advanced machine, deep, and hybrid learning models have not been developed, trained, and tested by a common database. Also, a comparison has not been made between artificial intelligence models using 70%:30%, 80%:20%, and 85%:15% ratio of training and testing databases to find the most suitable ratio for soft computing approaches.

Song et al. [61] reported that the bagging regressor assessed the CST of the fly ash mixed concrete with the least residuals than the GEP, ANN, and DT models. The authors also found that configuring the k-fold value is essential for accuracy. Moreover, Song et al. [62] compared the ANN, DT, RF, and gradient boosting regressor and concluded that the RF model gives the most promising estimation of the CST of fly ash concrete. Tran et al. [63] estimated the CST of the GGBS concrete using the ANN model. Also, Shahmansouri et al. [64] employed the ANN model for predicting the CST of silica fume and natural zeolite geopolymer concrete. The researchers observed that increasing NaOH content reduces the concrete strength. Conversely, the natural zeolite and silica fume significantly improve the concrete strength. Mai et al. [65] said that the RF model gives a reliable prediction of the CST of GGBS concrete with a performance of 0.9729. Lavercombe et al. [66] designed eco-friendly concrete for decarbonization and estimated the CST using deep neural networks, SVR, GBR, RF, kNN, and DT models. The investigators reported that the GBR model predicted the CST with a determination coefficient of 0.946. Khursheed et al. [67], Khan et al. [68], Aravind et al. [69], and Ahmad et al. [70] stated that the computational approaches consume less time in estimating the CST of concrete than the conventional procedure. Also, these approaches are more accurate and reliable than the conventional prediction methods, i.e., simple linear and nonlinear regression and multiple linear regression. The literature presents a few researchers who computed the database’s multicollinearity. Still, the impact of database multicollinearity has not been analyzed for the accuracy and performance of models. Tab.1 summarizes the published soft computing models for predicting the CST of sustainably developed concrete.

Novelty of this Investigation—Tab.1 presents that Yang et al. [16], Kashem et al. [18], Karim et al. [19], Golafshani et al. [22], Paudel et al. [36], Cao [38], Cao et al. [51], and Song et al. [61] implemented the GB approach to estimate the CST of concrete. Still, many computational models have not been employed and compared. Hence, this research has the following novelty.

1) This investigation employs, trains, tests, and analyses the performance and accuracy of kNN, Chi-square Automatic Interaction Detection (CHAID), MLP, radial basis function (RBF), multi-variable regressor (MVR), expert Chi-square Automatic Interaction Detection (E-CHAID), Automatic Linear Regression-Information Criterion (ALR-ICC), deep neural network (DNN), Takagi-Sugeno Fuzzy (TSF), and least square support vector machine (LS-SVM) models for the first time in estimating the CST of the GGBS concrete.

2) This research also depicts the effect of different ratios of training and testing databases, i.e., 70%:30%, 80%:20%, and 85%:15%, on the performance and accuracy of the kNN, CHAID, MLP, RBF, MVR, E-CHAID, ALR-ICC, DNN, TSF, and LS-SVM models. In addition, it reveals the suitable ratio of the training and testing database for each model to achieve a better estimation.

3) This work also computes the database multicollinearity using the variance inflation factor (VIF) and analyses its impact on the performance and accuracy of soft computing models.

4) This investigation implements three new metrics, i.e., a20-index, scatter index, and agreement index, to check the reliability of the optimal performance model.

Research Significance—The literature study demonstrates that determining the CST of the GGBS concrete using laboratory procedures is time-consuming. It is very complex and hard to determine the CST of concrete in mega-projects through laboratory procedures. Therefore, to help the concrete designers and engineers, this investigation introduces a computational method to estimate the CST of the GGBS concrete. This research also helps select the suitable training and testing database ratio for models. This research briefs on database multicollinearity and its effect on performance and the accuracy of computational models for concrete engineers and designers.

2 Research methodology

This research uses artificial intelligence approaches and introduces an optimal performance model to estimate the GGBS concrete’s CST. For that purpose, the kNN, CHAID, MLP, RBF, MVR, E-CHAID, ALR-ICC, DNN, TSF, and LS-SVM models are employed, trained, tested, and analyzed. This work uses available literature databases. The database consists of temperature (T in degree centigrade), water to binder ratio (w/b in %), GGBS to binder ratio (GGBS/B in %), water content (W in kg/m³), fine aggregate (FA in kg/m³), coarse aggregate (CA in kg/m³), superplasticizer (SP in %), and compressive strength (CST in MPa) of 268 GGBS concrete specimens. The database has been analyzed using the variance analysis (ANOVA) test, z test, and multicollinearity analysis. After analyzing the hypothesis, the 70%:30%, 80%:20%, and 85%:15% ratios of training and testing databases were created, and data points were randomly picked up. For measuring the performance of models, the ratio of RMSE to the standard deviation of the observations (RSR), Legate and McCabe’s Index (LMI), Normalized Mean Bias Error (NMBE), Performance index (PI), Nash-Sutcliffe Efficiency (NS), Weighted Mean Absolute Percentage Error (WMAPE), Variance Accounted for (VAF), Mean Absolute Percentage Error (MAPE), Correlation Coefficient (R), Mean Absolute Error (MAE), and RMSE metrics have been implemented. In addition, the regression error characteristics (REC) curve, score analysis, and uncertainty analysis have been performed to find the optimal performance model for each 70%:30%, 80%:20%, and 85%:15% ratio. Moreover, three new indexes, i.e., a20, scatter, and agreement, have been implemented to check the reliability of the optimal performance model over the rest of the models. Finally, this investigation maps the linear path of CST for the optimal performance model using regression analysis. Fig.1 demonstrates the flow of the present research.

3 Data analysis

The present investigation uses the published database by Han et al. [71]. The database consists of T (in degree centigrade), w/b (in %), GGBS/B (in %), W (in kg/m³), FA (in kg/m³), CA (in kg/m³), SP (in %), and CST (in MPa) of 268 GGBS concrete specimens. The descriptive statistics of the overall database have been calculated to analyze the database for creating the training and testing databases, as summarized in Tab.2. The table shows that the T, w/b, GGBS/B, W, FA, CA, SP, and CST range from 5° to 75°, 25% to 88.9%, 0% to 85%, 128 to 295 kg/m³, 395 to 947 kg/m³, 723 to 1135 kg/m³, 0% to 2.9%, and 17.2 to 77 MPa, respectively. The frequency distribution visualizes the distribution of data points across various categories or bins, showing how the data are spread or clustered, as illustrated in Fig.2. Fig.2 demonstrates that the database has (a) 197 concrete specimens have temperatures from 20 to 25°C, (b) 76 and 39 concrete specimens have w/b ratio from 0.4 to 0.45 and from 0.45 to 0.5, respectively, (c) 63, 61, and 59 concrete specimens have GGBS/B ratio from 50% to 60%, 0% to 10%, and 30% to 40%, respectively, (d) 184 concrete specimens have water content from 160 to 180 kg/m³, (e) 55, 51, 43, and 40 concrete specimens have FA from 800 to 850 kg/m³, 750 to 800 kg/m³, 700 to 750 kg/m³, and 850 to 900 kg/m³, respectively, (f) 108 and 90 concrete specimens have coarse aggregate from 950 to 1000 kg/m³ and 900 to 950 kg/m³, respectively, (g) 98 concrete specimen has SP from 0.65 to 0.80% and (h) 3 to 40 concrete specimens having CST from 20 to 80 MPa.

Furthermore, the Pearson method calculates the correlation coefficient to find the relationship between the variables. The values of correlation coefficient ±0 to ±0.2, ±0.21 to ±0.4, ±0.41 to ±0.6, ±0.61 to ±0.8, and ±0.81 to ±1.0 present no, weak, moderate, strong, and very strong relationship between variables. Fig.3 illustrates the relationship between variables: (a) temperature has no relationship with w/b (= −0.1842), GGBS/B (= 0.1429), W (= −0.1015), FA (= −0.0167), CA (= 0.1108), and SP (= −0.0007), (b) w/b ratio has a moderate relationship with W (= 0.5280), CA (= 0.5264), and SP (= −0.4425), (c) w/b ratio has no relationship with GGBS/B (= −0.1628) and FA (= −0.0026), (d) GGBS/B ratio has no relationship with W (= −0.0196), FA (= −0.0330), CA (= −0.1589), and SP (= −0.1158), (e) W has a moderate relationship with FA (= −0.5924) and SP (= −0.4231), (f) W has no relationship with CA (= 0.0493), (g) FA has no relationship with CA (= 0.1000) and SP (= 0.0010), (h) CA has no relationship with SP (= −0.0117). The overall analysis reveals that the CST of the GGBS concrete has 1) strong relationship with w/b ratio (= −0.6439), 2) weak relationship with T (= 0.2214), W (= −0.3499), and SP (= 0.3066), 3) no relationship with GGBS/B ratio (= 0.0434), FA (= −0.0482), and CA (= −0.1020). It can be noted that the weak or non-existent relationships between certain variables, such as GGBS/B ratio and CST, can be attributed to the complex interplay of concrete components and their varying effects on strength. Factors like the dominance of the water-to-binder (w/b) ratio in governing CST may overshadow the influence of other variables, such as the GGBS/B ratio, which primarily contributes to long-term strength rather than immediate performance. Additionally, confounding effects from other components, such as FA and SP, can dilute the observable impact of GGBS/B on CST. Low variability in some variables (e.g., FA or SP) or interactions between components can also obscure direct correlations. Furthermore, nonlinear or threshold effects, where certain variables only significantly impact CST beyond specific ranges, may explain the lack of observable relationships in the analyzed data set. These factors highlight the complexity of concrete systems, where strength is influenced by the combined and interactive effects of multiple variables rather than individual ones in isolation.

Based on the descriptive statistics and results obtained from the correlation coefficient method, the following research hypothesis statements have been made.

1) The w/b ratio significantly enhances the CST of the GGBS concrete, followed by W and SP.

2) The w/b ratio has a moderate relationship with W (= 0.5280) and CA (= 0.5264), presenting no database multicollinearity.

To analyze the research hypothesis (R_H), the ANOVA and Z tests have been performed using the Data Analysis Tool of Microsoft Excel. The results of the ANOVA test are summarized in Tab.3. From Table, it is noted that the T (1.55E+02 > 3.86E+00, 2.41E–31 < 0.05), GGBS/B (4.66E+01 > 3.86E+00, 2.32E–11 < 0.05), W (5.47E+03 > 3.86E+00, 7.88E–283 < 0.05), FA (1.03E+04 > 3.86E+00, 0 < 0.05), CA (6.95E+04 > 3.86E+00, 0 < 0.05), and SP (2.80E+03 > 3.86E+00, 2.12E–214 < 0.05) variables follow the statistical clauses (F > F Crit, P < 0.05) and accept the research hypothesis. The significance of a p-value < 0.05 indicates a statistically significant relationship between the independent variables (such as T, GGBS/B, W, FA, CA, and SP) and the dependent variable, CST, of the concrete. Specifically, a p-value less than 0.05 means that the probability of observing the given data (or more extreme results) assuming no actual effect (null hypothesis) is less than 5%. Therefore, the small p-values (such as 2.41E−31 for T, 7.88E−283 for W, etc.) strongly suggest that these variables significantly influence CST, rejecting the null hypothesis that they do not affect CST. By comparing the F-statistics (calculated values) to the critical F-values (F Crit), it is confirmed that the relationships between these variables and CST are statistically valid and not due to random chance. This provides strong evidence for the importance of these factors in predicting the CST of GGBS concrete. Still, the ANOVA test presents the database’s multicollinearity for the w/b variable (2.11E+00 < 3.86E+00, 1.47E–01 > 0.05) in predicting the CST of GGBS concrete.

The Z test reanalyses the research hypothesis, and the results obtained from the Z test have been summarized in Tab.4. From the Table, it is found that the w/b ratio variable has database multicollinearity. Still, the T, GGBS/B, W, FA, CA, and SP variables have followed the hypothesis clauses and accepted the research hypothesis (Z critical two tails > Z critical one tail, p one and two tail < 0.05).

Hence, the variance inflation factor (

V I F = 1 / (1 − R 2)

) method has been used to compute the multicollinearity of each input variable. Multicollinearity occurs when two or more independent variables in a regression model are highly correlated. In the presence of multicollinearity, the standard errors of the regression coefficients tend to increase. Also, it is hard to determine the true significance of each predictor variable because large standard errors reduce the precision of the coefficient estimates. Highly correlated predictors provide redundant information to the model. This redundancy can lead to overfitting, where the model becomes too complex and captures noise or random fluctuations in the training data rather than the underlying relationships. Johnston et al. [72], Gareth et al. [73], Vittinghoff et al. [74], Menard [75], and Khatti and Grover [76] presented different levels of the database multicollinearity. Khatti and Grover [76] introduced five multicollinearity levels based on the VIF values, i.e., problematic level (10 < VIF), moderate level (5 < VIF≤ 10), considerable level (2.5 < VIF≤ 5), weak level (0 < VIF≤ 2.5), and no multicollinearity (VIF = 0). While calculating the multicollinearity using the VIF method, the regression analysis is performed for each variable. One input variable is considered the target variable, the rest are considered as input, and multilinear regression is performed. Fig.4 illustrates the computation of VIF for each variable. Tab.5 presents the results of multicollinearity analysis for the GGBS concrete. It is noted that the w/b variable has weak multicollinearity (VIF = 1.16). Similarly, the T (VIF = 1.00), W (VIF = 1.03), CA (VIF = 2.09), and SP (VIF = 1.79) variables have weak multicollinearity in estimating the CST of the GGBS concrete. Conversely, the GGBS/B (VIF = 3.13) and FA (VIF = 3.26) variables have considerable multicollinearity levels, which may affect the performance and accuracy more than the T, W, CA, and SP variables.

Determining the sensitivity of input variables is essential for soft computing approaches. The types of sensitivity analysis are one-way sensitivity analysis, multi-way sensitivity analysis, scenario analysis, threshold analysis, Monte−Carlo simulation, tornado diagram, global sensitivity analysis, and local sensitivity analysis [77]. The most popular global sensitivity analysis method, cosine amplitude, is widely used to estimate the sensitivity of independent variables. Let a database consist of n data points in J-space. Equation (1) presents a data array for J. For each element, i.e.,

J i 1

, in the data array, J is a vector of m lengths (Eq. (2)). Therefore, the mathematical formulation of cosine amplitude sensitivity analysis (CASA) is Eq. (3). The CASA varies from 0 to 1; 0 is for low sensitivity, and one is for significant sensitivity [78–81].

(1)

J = {J 1, J 2, J 3, …, J m},

(2)

j = {J i 1, J i 2, J i 3, …, J i m} i,

(3)

C A S A = ∑ k = 1 m J i k J j k ∑ k = 1 m J i k 2 . ∑ k = 1 m J j k 2 .

Fig.5 demonstrates the sensitivity of T, w/b, GGBS/B, W, FA, CA, and SP variables in estimating the CST of GGBS concrete. Fig shows that the CA (= 0.9534) and FA (= 0.9439) significantly affect the estimating of CST of GGBS concrete, followed by W (= 0.9313), w/b (= 0.8761), T (= 0.8503), and SP (= 0.8273). The ratio of GGBS to binder has the least sensitivity, i.e., 0.7874, comparatively less than aggregate sensitivity.

In other words, each CA, FA, and W variable affects the assessment of CST of GGBS concrete by more than 15%. Still, the GGBS/B variable is the least important parameter, with a sensitivity of 12%.

4 Details of soft computing approaches

The present investigation employs kNN, CHAID, MLP, RBF, MVR, E-CHAID, ALR-ICC, DNN, TSF, and LSSVM models for assessing the CST of GGBS concrete. The reasons for selecting these approaches are as follows: 1) kNN is a non-parametric methodology capable of delineating intricate data relationships without presuming an underlying distribution; 2) CHAID and E-CHAID advantageous when data relationships can be succinctly represented through a sequence of if-then rules; 3) MLP effective at processing large data sets and can be trained using diverse optimization strategies to enhance their performance; 4) DNN acclaims for their capacity to discern intricate patterns and correlations within data utilizing multiple layers of interconnected neurons; 5) TSF performs better and handles the noisy or incomplete database easily due to rule-based approach; 6) LS-SVM is computationally efficient and adept at processing high-dimensional data, making them suitable for extensive regression tasks.

4.1 K-nearest neighbor

The k-nearest neighbor is a supervised machine learning algorithm that estimates the target values using the k-closest data points to a given input variable. The algorithm calculates distances between the input and all other data points in the training set. The kNN is a simple algorithm because it does not involve training or optimizing weights and biases. Instead, it stores training data points and their class labels or output values in memory for predictions [82]. The computational cost of the kNN algorithm is based on the features of the database [83].

4.2 Chi-square automatic interaction detection

The CHAID algorithm is a decision tree that identifies the relationship between the database variables. The structure of a CHAID decision tree is similar to traditional decision trees, with nodes representing splitting criteria based on variable values. However, in CHAID, the splitting criterion is determined using the Chi-square test, measuring independence or association between two categorical variables [84]. The CHAID is a non-parametric method, not assuming data distribution, thus robust to outliers and non-normal data [85]. The CHAID algorithm recursively partitions the data set into smaller subsets based on significant relationships between variables. Based on the Chi-square test, the algorithm looks for the variable with the strongest association with the target variable at each node. The process continues until no more splits can be made or a stopping criterion is met.

4.3 Multilayer perceptron

MLP is an ANN consisting of multiple layers of nodes, or neurons, arranged feedforward. The theory behind MLP is based on the concept of a perceptron, a type of artificial neuron that takes multiple inputs, applies weights to them, and passes the weighted sum through an activation function to produce an output. By stacking multiple layers of perceptron, MLP can learn complex patterns and relationships in the input data in this study [86]. In the MLP network, the information goes from the input to the output layer through the hidden layers, which is called the feedforward process. Conversely, the backpropagation calculates the gradient of an error function concerning the weights and uses it to update the weights in the network [87]. Backpropagation adjusts the weights of a MLP by calculating the gradient of the loss function and propagating the error backward through the network. This iterative process helps minimize prediction errors by updating the weights, allowing the model to learn and optimize its parameters. It is essential for effectively training MLPs, enabling them to solve complex problems in tasks like pattern recognition, classification, and regression. This capability makes MLP a powerful and versatile tool for solving complex problems.

4.4 Radial basis function

A RBF is a mathematical function that depends only on the distance from a particular point, often referred to as the center. The general form of an RBF is given by Ref. [88]:

(4)

ϕ (x) = f (| | x − c | |),

where x is the input vector, c is the center, ||...|| is a norm function, and f is a nonlinear function that shapes the RBF. The most used norm function is the Euclidean norm, which calculates the distance between two points in Euclidean space. There are various types of RBFs, with the Gaussian RBF being the most popular choice, as used in this study. The Gaussian RBF is given by:

(5)

ϕ (x) = e x p (− γ | | x − c | | 2),

where γ is a parameter controlling the shape of the Gaussian curve. The RBF networks are good at generalizing to unseen data, making them suitable for function approximation and classification tasks.

4.5 Multi-variable regression

Multi-variable regression analysis is a statistical technique used to understand the relationship between two or more independent variables and a dependent variable. The basic idea behind multivariable regression analysis is to model the relationship between the independent variables and the dependent variable in the form of a mathematical equation (

y = a + b 1 X 1 + b 2 X 2 + ⋯ + b n X n

). Where y is the dependent variable, a is the intercept,

X 1, X 2, …, X n

are the independent variables, and

b 1, b 2, …, b n

are the regression factors [89].

4.6 Expert Chi-square automatic interaction detection

Exhaustive Chi-square automatic interaction detection (ECHAID) is a data mining technique used to build decision trees by recursively partitioning a data set based on significant interactions between predictor variables. The structure of an ECHAID decision tree includes nodes for variables and branches for split points. The root node represents the entire data set, and each subsequent node identifies the best predictor variable for splitting the data based on a chi-square test, creating branches to child nodes for subgroups [90].

4.7 Automatic linear regression–information criterion

The ALR-ICC is a statistical method to select the best model from a set of linear regression models. The theory behind ALR-ICC is based on information criteria, which measure the trade-off between model complexity and goodness of fit. The most common criteria are the Akaike Information Criterion and Bayesian Information Criterion. These criteria penalize models with more parameters to prevent overfitting and select the best balance of goodness of fit and simplicity [91]. The ALR-ICC can handle both forward and backward model selection, adding or removing predictors iteratively based on the criteria. This process helps identify the best subset of predictors that explain the variability in the target variable.

4.8 Deep neural network

A deep neural network is an ANN with multiple layers between the input and output layers. The concept behind deep neural networks originates from DL, a subset of machine learning that emphasizes learning data representations through multiple layers of abstraction. Deep neural networks can acquire hierarchies of features from the input data, enabling them to extract higher-level representations that encapsulate complex patterns within the data. The DNN model solves the problem by updating the weights and bias values to minimize the error through feedforward and backpropagation processes [92].

4.9 Takagi−Sugeno fuzzy

The Takagi−Sugeno (T−S) fuzzy model is a popular method for approximating complex nonlinear systems in fuzzy logic systems. It consists of fuzzy if-then rules representing a nonlinear system. Each rule follows the pattern:

1) i f x i s A 1 a n d y i s B 1, t h e n z = f 1 (x, y)

;

2) i f x i s A 2 a n d y i s B 2, t h e n z = f 2 (x, y)

;

3) i f x i s A n a n d y i s B n, t h e n z = f n (x, y)

Here, x and y are system inputs, z is the output,

A 1

A n

and

B 1

B n

are fuzzy sets’ linguistic variables, and

f 1

f n

are output functions for each rule. The fuzzy rule base, fuzzification interface, and inference engine are the main components of the TFS model [93].

4.10 Least square support vector machine

The LS-SVM algorithm works by first transforming the data into a high-dimensional feature space using a kernel function. The algorithm then seeks to find a hyperplane that separates the transformed data points into two classes while minimizing the sum of square errors. This hyperplane can be represented as a linear combination of the support vectors of the data points closest to the decision boundary. LS-SVM tends to provide more accurate predictions for regression tasks, as it directly optimizes the mean squared error between the predicted and actual target values [94].

4.11 Configuration of hyperparameters

The configuration of the hyperparameters plays a vital role in achieving excellent performance and accuracy. Several researchers analyzed the effect of hidden layers and neurons on the performance of neural network models. Therefore, the MLP and DNN models have been configured with the number of hidden layers and neurons. It was also found that the LSSVM approach gives the most promising and reliable results with a linear kernel compared to the Gaussian kernel. Therefore, this research configures the LSSVM model with a linear kernel function. Barrena-González et al. [95], Suleymanov et al. [96], Huang et al. [97], and Zolfaghari et al. [98] found that the k values significantly affect the prediction residuals and performance of the kNN model. In addition, the CHAID and E-CHAID models have been configured with a minimum child node of 50 and a parent node of 100. The TSF model, called Takagi-Sugeno fuzzy, has been employed with 60 rules and a weighted average de-fuzzy method to predict better. For the first time, the ALR-ICC model has been implemented to predict the CST of GGBS concrete. The model randomly selected the seeds to achieve better accuracy. The hyperparameters configuration of kNN, CHAID, MLP, RBF, MVR, E-CHAID, ALR-ICC, DNN, TSF, and LSSVM models is summarized in Tab.6.

5 Results and discussion

This investigation creates three databases using the data proportionality method to find the effect of database ratio on the performance of soft computing models in predicting the CST of GGBS concrete. Therefore, three databases (train:test), i.e., 70%:30%, 80%:20%, and 85%:15%, have been created and used to train and test the kNN, CHAID, MLP, RBF, MVR, E-CHAID, ALR-ICC, DNN, TSF, and LSSVM models. The following metrics have measured the performance of each model [99–105].

Correlation Coefficient:

(6)

R = ∑ i = 1 n (A i − A m e a n) (P i − P m e a n) ∑ i = 1 n (A i − A m e a n) ∑ i = 1 n (P i − P m e a n) .

Weighted Mean Absolute Percentage Error:

(7)

W M A P E = ∑ i = 1 n A i − P i A i P i ∑ i = 1 n P i .

Nash-Sutcliffe Efficiency:

(8)

N S = 1 − ∑ i = 1 n (A i − P i) 2 ∑ i = 1 n (A i − A m e a n) 2 .

RMSE:

(9)

R M S E = 1 N ∑ i = 1 n (A i − P i) 2 .

Variance Accounted For:

(10)

V A F = (1 − v a r (A i − P i) v a r A i) × 100 .

Performance Index:

(11)

P I = A d j R 2 + 0.01 V A F − R M S E .

RMSE to Observation’s Standard Deviation Ratio (RSR):

(12)

R S R = R M S E 1 N ∑ i = 1 n (A i − A m e a n) 2 .

Normalized Mean Bias Error:

(13)

N M B E = 1 n ∑ i = 1 n (P i − A i) 1 N ∑ i = 1 n A i × 100 .

Mean Absolute Percentage Error:

(14)

M A P E = 1 N ∑ i = 1 n | A i − P i A i | .

Legate and McCabe’s Index:

(15)

L M I = 1 − [∑ i = 1 n | A i − P i | ∑ i = 1 n | A i − A m e a n |] .

Mean Absolute Error:

(16)

M A E = ∑ i = 1 n | P i − A i | N .

Where A and P are the actual and predicted parameters, respectively.

A m e a n

and

P m e a n

are the average of actual and predicted parameters, respectively. n is the sample size. K is the number of input variables in the regression equation. SD is the standard deviation. The values of R (= 1), WMAPE (= 0), NS (= 1), RMSE (= 0), VAF (= 100), PI (= 2), RSR (= 1), NMBE (= 0), MAPE (= 0), LMI (0 to 1), and MAE (= 0) close to ideal value, represents the reliable and best soft computing model [102]. These evaluation metrics are selected to comprehensively assess model performance, capturing various aspects of accuracy, reliability, and error distribution. Metrics like the Correlation Coefficient (R) and VAF measure the strength and consistency of the relationship between predicted and observed values, while RMSE and MAE quantify overall prediction errors in absolute terms. The WMAPE and MAPE evaluate relative errors, making them particularly relevant when dealing with variables of different scales. Advanced indices such as the NSE and RSR focus on model efficiency and error normalization, providing insights into the model’s predictive skill. Furthermore, NMBE and Legate and McCabe’s Index are used to identify systematic biases and assess reliability in specific contexts. Together, these metrics ensure a robust evaluation by addressing error magnitude and distribution while highlighting areas for potential improvement in model performance. The summary of the performance of models has been summarized in Electronic Supplementary materials Appendix Table A (A1 for 70%:30%, A2 for 80%:20%, and A3 for 85%:15%)

5.1 Simulation of results

In the first case, 70% and 30% data points have been randomly picked up from the 268 data points to train and test the models. Comparing the performance of models (refer Table A1), it has been found that the TSF model achieved higher performance in the testing phase. Model TSF estimated the CST of the GGBS concrete with an RSR of 0.5385, LMI of 0.3699, NMBE of 1.6291 MPa, PI of 1.32, NS of 0.7129, WMAPE of 0.1163%, VAF of 72.50, MAPE of 0.1394%, R of 0.8526, MAE of 4.7830 MPa, and RMSE of 8.1839 MPa, higher than the kNN, CHAID, MLP, RBF, MVR, E-CHAID, ALR-ICC, DNN, and LSSVM models. Similarly, the TSF model outperformed in the training phase with an RSR of 0.3095, LMI of 0.2782, NMBE of 0.3140 MPa, PI of 1.75, NS of 0.9042, WMAPE of 0.0613%, VAF of 90.42, MAPE of 0.0624%, R of 0.9509, MAE of 2.7435 MPa, and RMSE of 3.7476 MPa, close to the ideal values. Hence, the TSF model has been recognized as the best-performing model using 70%:30% database in predicting the CST of GGBS concrete. The testing performance analysis of other models reveals that the MLP (RMSE = 19.8331 MPa, MAE = 12.6710 MPa, R = 0.4142, MAPE = 0.4021%, VAF = −49.89, WMAPE = 0.3163%, NS = −0.7915, PI = 1.27, NMBE = 9.8183, LMI = 1.0552, and RSR = 1.3385), RBF (RMSE = 15.3421 MPa, MAE = 10.9270 MPa, R = 0.4481, MAPE = 0.2815%, VAF = −0.42, WMAPE = 0.2658%, NS = −0.0091, PI = −0.06, NMBE = 5.7252, LMI = 0.8450, and RSR = 1.0045), and LS-SVM (RMSE = 18.0782 MPa, MAE = 13.6913 MPa, R = 0.1987, MAPE = 0.4812%, VAF = −17.12, WMAPE = 0.3417%, NS = −0.4885, PI = 0.14, NMBE = 8.1576, LMI = 1.1401, and RSR = 1.2200) models are significantly affected by the combination of weak and considerable multicollinearity. Therefore, the MLP, RBF, and LS-SVM models achieved a performance of less than 0.5 and predicted CST of GGBS concrete with an error of more than ±11.0 MPa. Fig.6(a) illustrates the relationship between actual and predicted CST of the GGBS concrete for model TSF.

On the other side (case 80%:20%), the MLP model (refer Table A2) gained RMSE of 1.6088 MPa, MAE of 1.0996 MPa, R of 0.9921, MAPE of 0.0273%, VAF of 98.43, WMAPE of 0.0245%, NS of 0.9840, PI of 1.94, NMBE of 0.0578 MPa, LMI of 0.1054, and RSR of 0.1265 in the training phase, comparatively higher than the kNN, CHAID, RBF, MVR, E-CHAID, ALR-ICC, DNN, TSF, and LSSVM models, close to the ideal values. The analysis of testing performance presents that the MLP model predicted the CST of GGBS concrete with an RSR of 0.5666, LMI of 0.4107, NMBE of 1.6651 MPa, PI of 1.6651, NS of 0.6789, WMAPE of 0.1221%, VAF of 67.95, MAPE of 0.1424%, R of 0.8389, MAE of 4.8119 MPa, and 8.1008 MPa, higher than other models. The E-CHAID and LS-SVM models predicted the CST of GGBS concrete with poor performance and high residuals because of the database multicollinearity. Therefore, the MLP model has been recognized as the best-performing model for the database at 80%:20%. Fig.6(b) illustrates the relationship between actual and predicted CST of the GGBS concrete using the MLP model.

The performance comparison of models for 85%:15 database (Table A3 in Electronic Supplementary Material) reveals that the training and testing performances of the TSF model have been increased with the database. Using 85%:15% database, the TSF model estimated CST of the GGBS concrete with an RSR of 0.2191, LMI of 0.2028, NMBE of 0.2610 MPa, PI of 1.86, NS of 0.9520, WMAPE of 0.0602%, VAF of 95.65, MAPE of 0.0765%, R of 0.9780, MAE of 2.4285 MPa, and RMSE of 3.2460 MPa in the testing phase, comparatively better than other models. Similarly, the TSF model outperformed the kNN, CHAID, MLP, RBF, MVR, E-CHAID, ALR-ICC, DNN, and LSSVM models with higher training performance (RMSE = 2.3457 MPa, MAE = 1.6017 MPa, R = 0.9831, MAPE = 0.0388%, VAF = 96.65, WMAPE = 0.0361%, NS = 0.9665, PI = 1.89, NMBE = 0.1240 MPa, LMI = 0.1513, and RSR = 0.1832). Hence, the TSF model has been recognized as the best-performing model for the 85%:15% database in predicting the CST of the GGBS concrete. Fig.6(c) reveals the relationship between actual and predicted CST of GGBS concrete using model TSF.

The histogram presented in Fig.6(a)–Fig.6(c) demonstrates that 1) the TSF (70%:30%) model has assessed the CST with a residual of 47.54 MPa in the testing phase; 2) the MLP (80%:20%) model has estimated the CST with a residual of 27.66 MPa, comparatively less than the TSF (70%:30%) model, and 3) the TSF (85%:15%) predicted the CST of GGBS concrete with a residual of 8.69 MPa in the testing phase, comparatively lower than the TSF (70%:30%) and MLP (80%:20%) models. The performance comparison for both TSF models reveals that 1) RSR, LMI, NMBE, WMAPE, MAPE, MAE, and RMSE have been decreased by 59.11%, 45.17%, 83.98%, 48.29%, 45.09%, 49.23%, and 60.34%, respectively, and 2) R, VAF, NS, and PI have been increased by 14.70%, 31.93%, 33.54%, and 41.33%, respectively, in the testing phase. Therefore, it can be stated that the TSF model achieves higher performance and accuracy with a large database, while the MLP model achieves better performance with the 80%:20% database. Still, the TSF model (85%:15%) is more reliable and accurate than both models in predicting the CST of GGBS concrete. The TSF model is ideal for handling incomplete or noisy data because it can represent complex, nonlinear systems using a combination of fuzzy logic and rule-based modeling. The TSF model incorporates fuzzy sets, which allow it to process uncertainty and vagueness inherent in noisy or incomplete data sets. Instead of relying on precise inputs, it uses membership functions to classify data into fuzzy categories, making it robust against variability and imperfections.

5.2 Regression error characteristics curve

A valuable tool for comparing and visualizing classification results is receiver operating characteristic (ROC) curves. The REC curve is a generalization of ROC curves to regression [106]. Plotting the error tolerance on the x-axis against the proportion of predicted points within the tolerance on the y-axis is known as a recurring curve. The resulting curve estimates the cumulative distribution function of the error. Statistics that are often utilized are visually presented by the REC curve. An inaccurate representation of the predicted error is the area-over-the-curve (AOC). The ratio of an individual model’s AOC to the null model’s AOC can be used to determine the R² value. The least AOC value represents the most accurate and reliable soft computing model. Tab.7 summarizes the AOC results for each model using 70%:30%, 80%:20%, and 85%:15% databases, along with REC curves, as shown in Fig.7.

From the analysis of the REC curves and AOC results, it has been observed that 1) the TSF model has predicted CST of GGBS concrete better using 70%:30% in training (= 3.67E–03) and testing (= 1.49E–02) phase; 2) the MLP model also has computed the CST with a less AOC value in training (= 6.88E–04) and testing (= 1.65E–020 phase and 3) the TSF model has estimated the AOC of 1.46E–03 (in training phase) and 2.73E–03 (in testing phase), close to the AOC of the actual database. Hence, models TSF (70%:30), MLP (80%:20%), and TSF (85%:15%) have been identified as the best-performing models in predicting the CST of GGBS concrete.

5.3 Score analysis

Score analysis is another statistical analysis performed to determine the best soft computing model [107,108]. The score analysis is based on the value of the performance metrics of each model. In the score analysis, the rank of each model is computed by the total number of models and their performance metrics. The values of performance metrics close to the ideal value represent the higher rank for the soft computing model. Moreover, the rank of each model is added for each performance metric, and the total score is calculated in each training and testing phase. The grand score is computed by summating each train score + test score. The model with a high grand score is the best soft computing model. The present study performs the score analysis for ten soft computing models and their 11 performance metrics for each 70%:30%, 80%:20%, and 85%:15% databases. The results of the score analysis for each database are summarized in Electronic Supplementary Materials Appendix Table B (B1 for 70%:30%, B2 for 80%:20%, and B3 for 85%:15%), as presented in Fig.8(a)–Fig.8(c).

Fig.8(a) presents that the TSF model has achieved a higher score in each training (= 110) and testing (= 110) phase using a 70%:30% database. Due to the database multicollinearity, the MLP model did not achieve a good score in each testing (= 24) phase. Still, in the testing phase, the kNN model achieved a better score, i.e., 98. Moreover, Fig.8(b) demonstrates that the MLP model gained the highest score in the training (= 110) and testing (= 110) phase. Still, the kNN model secured second place in the testing (score = 96) phase. Fig.8(c) illustrates that the TSF model outperformed the other models developed by 85%:15% database, scoring 110 (in both phases). Fig.9 shows the grand score of each model, and it has been observed that the TSF, MLP, and TSF models obtained grand scores of 220, 220, and 220 in predicting the CST of GGBS concrete. Therefore, the TSF, MLP, and TSF models have been recognized as the best-performing models using 70%:30%, 80%:20%, and 85%:15% databases, respectively.

5.4 Uncertainty analysis

Any reliable and trustworthy predictive model estimates the output accurately. In artificial intelligence, uncertainty analysis is performed to assess the quantitive prediction of error of the employed model [109]. The uncertainty analysis for any predictive model is performed by computing margin error at a 95% confidence level (ME), standard deviation (SD), mean error (ME), absolute error (AE), standard error (SE), upper bound (UB), lower bound (LB), and width of confidence bound (WCB). The confidence bound’s width is the difference between the upper and lower bound. Considering the values of ME, SD, ME, SE, and WCB parameters, the reliable and trustworthy model is recognized. These parameters’ lower values represent the computational model’s first rank. The present research performs uncertainty analysis for each model developed using 70%:30%, 80%:20%, and 85%:15% databases. The uncertainty analysis results have been summarized in Electronic Supplementary Material Appendix Table C (C1 for 70%:30%, C2 for 80%:20%, and C3 for 85%:15%), along with presentation as shown in Fig.10(a)–10(c). Table C1 in Electronic Supplementary Material and Fig.10(a) present that the TSF model achieved the first rank with the lowest uncertainty bandwidth using a 70%:30% database. Conversely, Table C2 in Electronic Supplementary Material and Fig.10(b) demonstrate that the MLP model secured the first rank with a narrow neck of the uncertainty band using an 80%:20% database. Moreover, the TSF model ranked first with an 85%:15% database in predicting the CST of GGBS concrete, as reported in Electronic Supplementary Material Table C3. Hence, this research recognizes the TSF and MLP models as the best-performing models.

5.5 Discussion on results

This investigation presents three best-performing soft computing models to predict the CST of the GGBS concrete. Interestingly, it has been observed that the performance and accuracy of the TSF model have been enhanced by increasing the database (70%:30% to 85%:15%). Conversely, the other published research reported that the MLP model outperformed the other models using an 80%:20% database. Therefore, the index parameters, i.e., a20-index, scatter index, and agreement index, have been computed for these better-performing models to check the reliability in assessing the CST of the GGBS concrete. The mathematical formulation of the indexes is as follows.

a20-index:

(17)

a 20 = m 20 H .

Index of Agreement:

(18)

I O A = 1 − ∑ i = 1 n (P i − A i) 2 ∑ i = 1 n (A i − A m e a n) .

Index of Scatter:

(19)

I O S = R M S E A v g . o f A c t u a l V a l u e s,

where m20 is the ratio of the laboratory test to the computed value, varies from 0.8 to 1.2, and H is the total number of data samples. Fig.11 illustrates the comparison of indexes for each model. Fig.11(a) demonstrates that the TSF model has achieved a higher a20 index in training (= 97.85 using 70%:30% and = 99.09 using 85%:15%) and testing (= 86.75 using 70%:30% and = 93.75 using 85%:15%) phase using 70%:30% and 85%:15% databases. Furthermore, the MLP model has predicted the CST of the GGBS concrete with an a20-index of 99.52 and 81.36 in the training and testing phases, respectively. It is noted that the kNN (75.90 > 59.18, 91.67 > 58.33) and E-CHAID (61.22 > 59.18, 62.50 > 58.33) models performed better than the MLP model using 70%:30% and 85%:15% databases in the testing phase. The phenomenon of overfitting and underfitting can be seen here (analyzed and discussed in the next section). It is also noted that the E-CHAID model has gained a better a20-index than the CHAID model because E-CHAID 1) handles continue variables; 2) identifies more complex relationships, and 3) enhances interpretability. Fig.11(b) illustrates the comparison of the agreement indexes for each model.

The comparison shows that the TSF model obtained the highest agreement index in the training and testing phase using 70%:30% and 85%:15% databases. It can be noted that the TSF model has predicted the CST of the GGBS concrete with an agreement index of 0.86 (in the training phase using 70%:30% database), 0.82 (in the testing phase using 70%:30% database), 0.92 (in training phase using 85%:15% database), and 0.90 (in training phase using 85%:15% database). Similarly, the MLP model has gained 0.95 and 0.79 agreement indexes in the training and testing phases. The value of the agreement index is close to the ideal value, i.e., 1. Hence, the TSF and MLP models are the best-performing model. Still, the present research is finding the optimum performance model with a suitable database. Therefore, the developed models have been compared using a scatter index. Fig.11(c) shows the comparison of the scatter index for each model. The model having the least scatter index, i.e., 0, is the most reliable and accurate. The comparison shows that 1) the TSF model has predicted the CST of GGBS concrete with a scatter index of 0.08 and 0.20 in the training and testing phase, respectively, 2) the MLP model has obtained scatter index of 0.04 and 0.50 in the training and testing phase, and 3) the TSF model has assessed the CST of concrete with a scatter index of 0.05 (in training) and 0.08 (in testing). Based on the index comparison, the TSF model (developed by an 85%:15% database) has been recognized as the optimal performance model in this investigation.

The TSF model has assessed the CST of the GGBS concrete with an RMSE of 3.2460 MPa, MAE of 2.4285 MPa, R of 0.9780, MAPE of 0.0765%, VAF of 95.65, WMAPE of 0.0602%, NS of 0.9520, PI of 1.86, NMBE of 0.2610 MPa, LMI of 0.2028, and RSR of 0.2191, close to ideal values and higher than the rest of the models. The performance of the TSF model has been compared with the available models in the literature study, as shown in Tab.8. It can be seen that the TSF model (developed by 85%:15% database) has outperformed the models available in the literature, i.e., PSO_XGBoost, XGB-LGB, XGBoost, VR, XGBR, EBoost, GBM, DT, LBGM, RF, and ANN. Also, the combination of input variables, i.e., temperature, w/b ratio, GGBS-binder ratio, water content, FA, coarse aggregate, and SP, is better than the combination of input variables used in developing the published models. Hence, it can be said that the TSF model is the optimal performance model in predicting the CST of the GGBS concrete.

5.6 Analysis of results

The analysis of the RMSE, MAE, R, MAPE, VAF, WMAPE, NS, PI, NMBE, LMI, RSR, a20, IOA, and IOS reveals that the developed models have achieved an excellent performance (excepting TSF and MLP) in the training phase while poor in the testing phase, presenting curve fitting issue. This phenomenon has occurred due to the database’s multicollinearity. Therefore, the ratio of test RMSE and train RMSE has been calculated to analyze the curve fitting issue. Fig.12 illustrates the curve fitting for each model developed using each 70%:30%, 80%:20%, and 85%:15% database. Fig.12 shows that the MLP model has predicted the CST of GGBS concrete with overfitting of 5.22, 5.04, and 6.73 using 70%:30%, 80%:20%, and 85%:15% databases, respectively. It can be seen that the kNN, CHAID, RBF, MVR, DNN, E-CHAID, and ALR-ICC models have assessed the CST of GGBS concrete with overfitting, close to the best-fit curve. Conversely, the comparison of curve fitting for the TSF model reveals that the TSF model (developed by 85%:15% database) has estimated the CST with 1.38, lower than the TSF (developed by 70%:30% database) and MLP (developed by 80%:20%) models. Hence, the TSF model has been recognized as the optimal performance model in assessing the CST of the GGBS concrete.

Furthermore, regression analysis is performed to validate the optimal performance model. In this analysis, one input variable varies, and the rest of the input variables remain constant. The results obtained through this analysis are compared with those in Fig.3. Tab.9 presents the details of the variable for simulation. The results of the simulation are graphically presented in Fig.13.

The simulation results presented in Fig.13 show that the CST of the GGBS concrete (a) increases with the temperature, (b) decreases with increasing w/b ratio, (c) increases gradually with increasing GGBS/B ratio, (d) decreases with increasing W up to 190 kg/m³; further, the CST increases with W, (e) decreases with increasing FA from 665 to 950 kg/m³, (f) decreases with increasing CA from 990 kg/m³, and (g) increases with increasing SP up to 1.27%. The same trend has been recognized between input and output variables, as shown in Fig.3, presenting the accuracy of the TSF model (developed by 85%:15% database) in predicting the CST of the GGBS concrete.

6 Summary and conclusions

The present study introduces an efficient prediction model, i.e., TSF, by comparing the kNN, CHAID, MLP, RBF, MVR, E-CHAID, ALR-ICC, DNN, TSF, and LS-SVM models to assess the CST of the GGBS concrete. This research also reveals the impact of the data proportionality approach by investigating the performance for three ratios, i.e., 70%:30%, 80%:20%, and 85%:15%. The performance comparison reveals that the MLP model outperformed the other models with high performance, which was trained by the 80%:20% database. Conversely, the TST model outperformed other models using 70%:30% and 85%:15% databases. Interestingly, it is noted that the performance of the TST model was increased by increasing the number of data points in the training phase; presenting the large database enhances the prediction capabilities of the TSF model. It was also noted that the overfitting of the TSF model decreased from 2.18 (in the case of 70%:30%) to 1.38 (in the case of 85%:15%). Hence, it can be stated that a large database of good quality can predict the CST of GGBS concrete with the best fit. Moreover, the sensitivity analysis exhibits that the parameter GGBS/B has the least sensitivity (0.7874) compared to other factors.

As a part of novel contribution, the effect of data proportionality demonstrates that the utilized TSF model achieved the most accurate predictions with superior accuracy for the extensive database. Thus, it can be asserted that a substantial database/sample size can accurately estimate the CST of GGBS concrete with optimal alignment. Therefore, the efficacy of the utilized TSF model should be evaluated against a large database obtain from different experimental setups in predicting the CST of GGBS concrete. Furthermore, the technical robustness of the TSF model must be evaluated by several laboratory tests. Thus, the future research includes: 1) use of additional laboratory test data to assess the accuracy of the TSF model over other standalone and hybrid models; 2) examining the impact of data proportionality through a large data set, and 3) integrating dimension reduction techniques to remove the least effective parameters in estimating the CST of GGBS concrete.

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Özbay E, Erdemir M, Durmuş H İ. Utilization and efficiency of ground granulated blast furnace slag on concrete properties—A review.Construction and Building Materials, 2016, 105: 423–434

[2]	NkinamubanziP CBaalbakiMBickleyJAitcinP C. The use of slag for making high performance concrete. In: Proceedings of Sixth NCB International Seminar on Cement and Building Materials. New Delhi: National Council for Cement and Building Materials, 1998

[3]	Piro N S, Mohammed A S, Hamad S M. The impact of GGBS and ferrous on the flow of electrical current and compressive strength of concrete.Construction and Building Materials, 2022, 349: 128639

[4]	Piro N S, Mohammed A S, Hamad S M. Evaluate and predict the resist electric current and compressive strength of concrete modified with GGBS and steelmaking slag using mathematical models.Journal of Sustainable Metallurgy, 2023, 9(1): 194–215

[5]	Qi W, Duan G, Han Y, Zhao Q, Huang Y, Zhu W, Pang H, Zhang J. Comparison of mechanical properties and microstructure of GGBS-based cementitious materials activated by different combined alkaline wastes.Construction and Building Materials, 2024, 422: 135784

[6]	Rawat R, Pasla D. Assessment of mechanical and durability properties of FA-GGBS based lightweight geopolymer concrete.Construction and Building Materials, 2024, 426: 135984

[7]	Chen W, Wu M, Liang Y. Effect of SF and GGBS on pore structure and transport properties of concrete.Materials, 2024, 17(6): 1365

[8]	Yu M, Wang T, Chi Y, Li D, Li L Y, Shi F. Residual mechanical properties of GGBS-FA-SF blended geopolymer concrete after exposed to elevated temperatures.Construction and Building Materials, 2024, 411: 134378

[9]	Quan P, Sun Q, Xu Z, Shi M, Gao Z, Wang D, Liu D, Yang L, Song S. Study on the mechanical properties and strength formation mechanism of high-volume graphite tailings concrete.Journal of Building Engineering, 2024, 84: 108500

[10]	Azimi Z, Toufigh V. Influence of blast furnace slag on pore structure and transport characteristics in low-calcium fly-ash-based geopolymer concrete.Sustainability, 2023, 15(18): 13348

[11]	Tam V W, Butera A, Le K N, Da Silva L C, Evangelista A C. A prediction model for compressive strength of CO₂ concrete using regression analysis and artificial neural networks.Construction and Building Materials, 2022, 324: 126689

[12]	Charhate S, Subhedar M, Adsul N. Prediction of concrete properties using multiple linear regression and artificial neural network.Journal of Soft Computing in Civil Engineering, 2018, 2(3): 27–38

[13]	Kamath M V, Prashanth S, Kumar M, Tantri A. Machine-learning-algorithm to predict the high-performance concrete compressive strength using multiple data.Journal of Engineering, Design and Technology, 2024, 22(2): 532–560

[14]	DeRousseau M A, Laftchiev E, Kasprzyk J R, Rajagopalan B, Srubar W V III. A comparison of machine learning methods for predicting the compressive strength of field-placed concrete.Construction and Building Materials, 2019, 228: 116661

[15]	Dutta S, Samui P, Kim D. Comparison of machine learning techniques to predict compressive strength of concrete.Computers and Concrete, 2018, 21(4): 463–470

[16]	Yang Y, Liu G, Zhang H, Zhang Y, Yang X. Predicting the compressive strength of environmentally friendly concrete using multiple machine learning algorithms.Buildings, 2024, 14(1): 190

[17]	Wang Y, Iqtidar A, Amin M N, Nazar S, Hassan A M, Ali M. Predictive modelling of compressive strength of fly ash and ground granulated blast furnace slag based geopolymer concrete using machine learning techniques.Case Studies in Construction Materials, 2024, 20: e03130

[18]	Kashem A, Karim R, Das P, Datta S D, Alharthai M. Compressive strength prediction of sustainable concrete incorporating rice husk ash (RHA) using hybrid machine learning algorithms and parametric analyses.Case Studies in Construction Materials, 2024, 20: e03030

[19]	Karim R, Islam M H, Datta S D, Kashem A. Synergistic effects of supplementary cementitious materials and compressive strength prediction of concrete using machine learning algorithms with SHAP and PDP analyses.Case Studies in Construction Materials, 2024, 20: e02828

[20]	Kalabarige L R, Sridhar J, Subbaram S, Prasath P, Gobinath R. Machine learning modeling integrating experimental analysis for predicting compressive strength of concrete containing different industrial byproducts.Advances in Civil Engineering, 2024, 2024(1): 7844854

[21]	Jamali A, Marani A, Railton J, Nehdi M L, Nagaratnam B, Lim M, Mendes J. Novel multi-scale experimental approach and deep learning model to optimize capillary pressure evolution in early age concrete.Cement and Concrete Research, 2024, 180: 107490

[22]	Golafshani E, Khodadadi N, Ngo T, Nanni A, Behnood A. Modelling the compressive strength of geopolymer recycled aggregate concrete using ensemble machine learning.Advances in Engineering Software, 2024, 191: 103611

[23]	Mohammed H X, Mohammed A S, Hassan A M T. Soft computing models to evaluate the effect of fly ash and ground granulated blast furnace slag (GGBS) on the compressive strength of concrete in normal and high strength ranges.Structures, 2023, 58: 105459

[24]	Dodo Y, Arif K, Alyami M, Ali M, Najeh T, Gamil Y. Estimation of compressive strength of waste concrete utilizing fly ash/slag in concrete with interpretable approaches: optimization and graphical user interface (GUI).Scientific Reports, 2024, 14(1): 4598

[25]	Choudhary L.. , Sahu V, Dongre A.Prediction of compressive strength of sustainable concrete using machine learning tools. Computers and Concrete, 2024, 33(2): 137

[26]	GogineniAPandayI KKumarPPaswanR K. Predictive modelling of concrete compressive strength incorporating GGBS and alkali using a machine-learning approach. Asian Journal of Civil Engineering, 2024a, 25(1): 699–709

[27]	Aslam F, Shahab M Z. Supplementary cementitious materials in blended cement concrete: Advancements in predicting compressive strength through machine learning.Materials Today Communications, 2024, 38: 107725

[28]	Gogineni A, Panday I K, Kumar P, Paswan R K. Predicting compressive strength of concrete with fly ash and admixture using XGBoost: A comparative study of machine learning algorithms.Asian Journal of Civil Engineering, 2024, 25(1): 685–698

[29]	Philip S, Nidhi M, Ahmed H U. A comparative analysis of tree-based machine learning algorithms for predicting the mechanical properties of fibre-reinforced GGBS geopolymer concrete.Multiscale and Multidisciplinary Modeling, Experiments and Design, 2024, 7(3): 2555–2583

[30]	Kumar P, Pratap B, Sharma S, Kumar I. Compressive strength prediction of fly ash and blast furnace slag-based geopolymer concrete using convolutional neural network.Asian Journal of Civil Engineering, 2024, 25(2): 1561–1569

[31]	Dinesh A, Prasad B R. Predictive models in machine learning for strength and life cycle assessment of concrete structures.Automation in Construction, 2024, 162: 105412

[32]	Singh K, P P. Influence of treated recycled concrete aggregate and modified mixing approach on the mechanical properties of ternary blend geopolymer concrete: Experiments and machine learning algorithms.Journal of Cleaner Production, 2024, 443: 141007

[33]	GogineniARoutM DShubhamK. Prediction of compressive strength of glass fiber-reinforced self-compacting concrete interpretable by machine learning algorithms. Asian Journal of Civil Engineering, 2024c, 25(2): 2015–2032

[34]	Wadhawan S, Bassi A, Singh R, Patel M. Prediction of compressive strength for fly ash-based concrete: Critical comparison of machine learning algorithms.Journal of Soft Computing in Civil Engineering, 2023, 7(3): 68–110

[35]	Sami B H Z, Sami B F Z, Kumar P, Ahmed A N, Amieghemen G E, Sherif M M, El-Shafie A. Feasibility analysis for predicting the compressive and tensile strength of concrete using machine learning algorithms.Case Studies in Construction Materials, 2023, 18: e01893

[36]	Paudel S, Pudasaini A, Shrestha R K, Kharel E. Compressive strength of concrete material using machine learning techniques.Cleaner Engineering and Technology, 2023, 15: 100661

[37]	Kioumarsi M, Dabiri H, Kandiri A, Farhangi V. Compressive strength of concrete containing furnace blast slag: Optimized machine learning-based models.Cleaner Engineering and Technology, 2023, 13: 100604

[38]	Cao C. Prediction of concrete porosity using machine learning.Results in Engineering, 2023, 17: 100794

[39]	Dash P K, Parhi S K, Patro S K, Panigrahi R. Efficient machine learning algorithm with enhanced cat swarm optimization for prediction of compressive strength of GGBS-based geopolymer concrete at elevated temperature.Construction and Building Materials, 2023, 400: 132814

[40]	Gupta P, Gupta N, Saxena K K. Predicting compressive strength of geopolymer concrete using machine learning.Innovation and Emerging Technologies, 2023, 10: 2350003

[41]	Kina C, Tanyildizi H, Turk K. Forecasting the compressive strength of GGBFS-based geopolymer concrete via ensemble predictive models.Construction and Building Materials, 2023, 405: 133299

[42]

Al Martini S, Sabouni R, Khartabil A, Wakjira T G, Alam M S. Development and strength prediction of sustainable concrete having binary and ternary cementitious blends and incorporating recycled aggregates from demolished UAE buildings: Experimental and machine learning-based studies.Construction and Building Materials, 2023, 380: 131278

[43]	Nhat-Duc H. Estimation of the compressive strength of concretes containing ground granulated blast furnace slag using a novel regularized deep learning approach.Multiscale and Multidisciplinary Modeling. Experiments and Design, 2023, 6(3): 415–430

[44]	Chi L, Wang M, Liu K, Lu S, Kan L, Xia X, Huang C. Machine learning prediction of compressive strength of concrete with resistivity modification.Materials Today Communications, 2023, 36: 106470

[45]	Huang P, Dai K, Yu X. Machine learning approach for investigating compressive strength of self-compacting concrete containing supplementary cementitious materials and recycled aggregate.Journal of Building Engineering, 2023, 79: 107904

[46]	Shah H A, Yuan Q, Akmal U, Shah S A, Salmi A, Awad Y A, Shah L A, Iftikhar Y, Javed M H, Khan M I. Application of machine learning techniques for predicting compressive, splitting tensile, and flexural strengths of concrete with metakaolin.Materials, 2022, 15(15): 5435

[47]	Shah S A R, Azab M, Seif ElDin H M, Barakat O, Anwar M K, Bashir Y. Predicting compressive strength of blast furnace slag and fly ash based sustainable concrete using machine learning techniques: an application of advanced decision-making approaches.Buildings, 2022, 12(7): 914

[48]	Rathakrishnan V. Beddu S, Ahmed A N.Predicting compressive strength of high-performance concrete with high volume ground granulated blast-furnace slag replacement using boosting machine learning algorithms. Scientific Reports, 2022, 12(1): 9539

[49]	Nazar S, Yang J, Ahmad W, Javed M F, Alabduljabbar H, Deifalla A F. Development of the new prediction models for the compressive strength of nanomodified concrete using novel machine learning techniques.Buildings, 2022, 12(12): 2160

[50]	Hameed M M, Abed M A, Al-Ansari N, Alomar M K. Predicting compressive strength of concrete containing industrial waste materials: novel and hybrid machine learning model.Advances in Civil Engineering, 2022, 2022(1): 1–19

[51]	Cao R, Fang Z, Jin M, Shang Y. Application of machine learning approaches to predict the strength property of geopolymer concrete.Materials, 2022, 15(7): 2400

[52]	Amin M N, Khan K, Javed M F, Aslam F, Qadir M G, Faraz M I. Prediction of mechanical properties of fly-ash/slag-based geopolymer concrete using ensemble and non-ensemble machine-learning techniques.Materials, 2022, 15(10): 3478

[53]	Ahmed H U, Mohammed A A, Mohammed A. Soft computing models to predict the compressive strength of GGBS/FA-geopolymer concrete.PLoS One, 2022, 17(5): e0265846

[54]	Biswal U S, Mishra M, Singh M K, Pasla D. Experimental investigation and comparative machine learning prediction of the compressive strength of recycled aggregate concrete incorporated with fly ash, GGBS, and metakaolin.Innovative Infrastructure Solutions, 2022, 7(4): 242

[55]	Shanmugasundaram N, Praveenkumar S, Gayathiri K, Divya S. Prediction on compressive strength of engineered cementitious composites using machine learning approach.Construction and Building Materials, 2022, 342: 127933

[56]	Ahmad A, Ahmad W, Aslam F, Joyklad P. Compressive strength prediction of fly ash-based geopolymer concrete via advanced machine learning techniques.Case Studies in Construction Materials, 2022, 16: e00840

[57]	Imran H, Ibrahim M, Al-Shoukry S, Rustam F, Ashraf I. Latest concrete materials dataset and ensemble prediction model for concrete compressive strength containing RCA and GGBFS materials.Construction and Building Materials, 2022, 325: 126525

[58]	Zhang L V, Marani A, Nehdi M L. Chemistry-informed machine learning prediction of compressive strength for alkali-activated materials.Construction and Building Materials, 2022, 316: 126103

[59]	Ghosh A, Ransinchung G D. Application of machine learning algorithm to assess the efficacy of varying industrial wastes and curing methods on strength development of geopolymer concrete.Construction and Building Materials, 2022, 341: 127828

[60]	Suprakash A S, Karthiyaini S, Shanmugasundaram M. A study on compressive strength of ultrafine graded fly ash replaced concrete and machine learning approaches in its strength prediction.Structural Concrete, 2022, 23(6): 3849–3863

[61]	Song H, Ahmad A, Farooq F, Ostrowski K A, Maślak M, Czarnecki S, Aslam F. Predicting the compressive strength of concrete with fly ash admixture using machine learning algorithms.Construction and Building Materials, 2021, 308: 125021

[62]	Song Y, Zhao J, Ostrowski K A, Javed M F, Ahmad A, Khan M I, Aslam F, Kinasz R. Prediction of compressive strength of fly-ash-based concrete using ensemble and non-ensemble supervised machine-learning approaches.Applied Sciences, 2021, 12(1): 361

[63]	Tran V Q, Mai H V T, Nguyen T A, Ly H B. Investigation of ANN architecture for predicting the compressive strength of concrete containing GGBFS.PLoS One, 2021, 16(12): e0260847

[64]	Shahmansouri A A, Yazdani M, Ghanbari S, Bengar H A, Jafari A, Ghatte H F. Artificial neural network model to predict the compressive strength of eco-friendly geopolymer concrete incorporating silica fume and natural zeolite.Journal of Cleaner Production, 2021, 279: 123697

[65]	Mai H V T, Nguyen T A, Ly H B, Tran V Q. Prediction compressive strength of concrete containing GGBFS using random forest model.Advances in Civil Engineering, 2021, 2021(1): 1–12

[66]	Lavercombe A, Huang X, Kaewunruen S. Machine learning application to eco-friendly concrete design for decarbonisation.Sustainability, 2021, 13(24): 13663

[67]	Khursheed S, Jagan J, Samui P, Kumar S. Compressive strength prediction of fly ash concrete by using machine learning techniques.Innovative Infrastructure Solutions, 2021, 6(3): 149

[68]	Khan M A, Farooq F, Javed M F, Zafar A, Ostrowski K A, Aslam F, Malazdrewicz S, Maślak M. Simulation of depth of wear of eco-friendly concrete using machine learning based computational approaches.Materials, 2021, 15(1): 58

[69]	Aravind N, Nagajothi S, Elavenil S. Machine learning model for predicting the crack detection and pattern recognition of geopolymer concrete beams.Construction and Building Materials, 2021, 297: 123785

[70]	Ahmad A, Ostrowski K A, Maślak M, Farooq F, Mehmood I, Nafees A. Comparative study of supervised machine learning algorithms for predicting the compressive strength of concrete at high temperature.Materials, 2021, 14(15): 4222

[71]	Han I J, Yuan T F, Lee J Y, Yoon Y S, Kim J H. Learned prediction of compressive strength of GGBFS concrete using hybrid artificial neural network models.Materials, 2019, 12(22): 3708

[72]	Johnston R, Jones K, Manley D. Confounding and collinearity in regression analysis: A cautionary tale and an alternative procedure, illustrated by studies of British voting behaviour.Quality and Quantity, 2018, 52(4): 1957–1976

[73]	GarethJDanielaWTrevorHRobertT. An Introduction to Statistical Learning: With Applications in R. Berlin: Spinger, 2013

[74]	VittinghoffEGliddenD VShiboskiS CMcCullochC E. Regression Methods in Biostatistics: Linear, Logistic, Survival, and Repeated Measures Models. Berlin: Springer Science & Business Media, 2012

[75]	MenardS. Applied Logistic Regression Analysis. Thousand Oaks, CA: SAGE Publications, 2001

[76]	Khatti J, Grover K S. Prediction of compaction parameters for fine-grained soil: Critical comparison of the deep learning and standalone models.Journal of Rock Mechanics and Geotechnical Engineering, 2023, 15(11): 3010–3038

[77]	Bahmed I T, Khatti J, Grover K S. Hybrid soft computing models for predicting unconfined compressive strength of lime stabilized soil using strength property of virgin cohesive soil.Bulletin of Engineering Geology and the Environment, 2024, 83(1): 46

[78]	Ghorbani B, Arulrajah A, Narsilio G, Horpibulsuk S, Bo M W. Development of genetic-based models for predicting the resilient modulus of cohesive pavement subgrade soils.Soil and Foundation, 2020, 60(2): 398–412

[79]	Sadrossadat E, Heidaripanah A, Osouli S. Prediction of the resilient modulus of flexible pavement subgrade soils using adaptive neuro-fuzzy inference systems.Construction and Building Materials, 2016, 123: 235–247

[80]	Majdi A, Rezaei M. Prediction of unconfined compressive strength of rock surrounding a roadway using artificial neural network.Neural Computing & Applications, 2013, 23(2): 381–389

[81]	YangYZhangQ. Analysis for the results of point load testing with artificial neural network. In: Proceedings of International Conference Computer Methods and Advances in Geomechanics. Wuhan: A.A. Balkema Publishers, 1997, 607–612

[82]	BatistaG E A P ASilvaD F. How k-nearest neighbor parameters affect its performance. Argentine symposium on artificial intelligence. Princeton, NJ: Citeseer, 2009, 1–12

[83]	Zhang Z. Introduction to machine learning: K-nearest neighbors.Annals of Translational Medicine, 2016, 4(11): 218

[84]	Hill D A, Delaney L M, Roncal S. A chi-square automatic interaction detection (CHAID) analysis of factors determining trauma outcomes.Journal of Trauma Injury Infection and Critical Care, 1997, 42(1): 62–66

[85]	CollinsK M. Chi-Square Automatic Interaction Detection Analysis of Qualitative Data. London: Routledge, 2021: 69–76

[86]	RiedmillerMLernenA. Multi Layer Perceptron. Machine Learning Lab Special Lecture, University of Freiburg. 2014

[87]	KruseRMostaghimSBorgeltCBrauneCSteinbrecherM. Multi-layer perceptrons. In: Proceedings of Computational Intelligence: A Methodological Introduction. Proceedings of Computational Intelligence: A Methodological Introduction, 2022, 53–124

[88]	DuK LSwamyM N S. Radial basis function networks. Neural Networks in a Softcomputing Framework, 2006: 251–294

[89]	DraperN RSmithH. Applied Regression Analysis. Hoboken, NJ: John Wiley & Sons, 1998

[90]	NovitaRSabariahM KEffendyV. Identifying factors that influence student failure rate using Exhaustive CHAID (Chi-square automatic interaction detection). In: Proceedings of 2015 3rd International Conference on Information and Communication Technology (ICoICT). Nusa Dua: IEEE, 2015: 482–487

[91]	Yang H. The case for being automatic: introducing the automatic linear modeling (LINEAR) procedure in SPSS statistics.Multiple Linear Regression Viewpoints, 2013, 39(2): 27–37

[92]	Sze V, Chen Y H, Yang T J, Emer J S. Efficient processing of deep neural networks: A tutorial and survey.Proceedings of the IEEE, 2017, 105(12): 2295–2329

[93]	TakagiTSugenoM. Fuzzy identification of systems and its applications to modeling and control. IEEE Transactions on Systems, Man, and Cybernetics, 1985, SMC-15(1): 116–132

[94]	Mitra V, Wang C J, Banerjee S. Text classification: A least square support vector machine approach.Applied Soft Computing, 2007, 7(3): 908–914

[95]	Barrena-González J, Gabourel-Landaverde V A, Mora J, Contador J F L, Fernández M P. Exploring soil property spatial patterns in a small grazed catchment using machine learning.Earth Science Informatics, 2023, 16(4): 3811–3838

[96]	Suleymanov A, Tuktarova I, Belan L, Suleymanov R, Gabbasova I, Araslanova L. Spatial prediction of soil properties using random forest, k-nearest neighbors and cubist approaches in the foothills of the Ural Mountains, Russia.Modeling Earth Systems and Environment, 2023, 9(3): 3461–3471

[97]	Huang S, Huang M, Lyu Y. An improved KNN-based slope stability prediction model.Advances in Civil Engineering, 2020, 2020(1): 1–16

[98]	Zolfaghari A A, Taghizadeh-Mehrjardi R, Moshki A R, Malone B P, Weldeyohannes A O, Sarmadian F, Yazdani M R. Using the nonparametric k-nearest neighbor approach for predicting cation exchange capacity.Geoderma, 2016, 265: 111–119

[99]	Daniel C, Khatti J, Grover K S. Assessment of compressive strength of high-performance concrete using soft computing approaches.Computers and Concrete, 2024, 33(1): 55

[100]

He B, Armaghani D J, Tsoukalas M Z, Qi C, Bhatawdekar R M, Asteris P G. A case study of resilient modulus prediction leveraging an explainable metaheuristic-based XGBoost.Transportation Geotechnics, 2024, 45: 101216

[101]

Ghanizadeh A R, Salehi M, Mamou A, Koutras E I, Jalali F, Asteris P G. Investigation of subgrade stabilization life-extending benefits in flexible pavements using a non-linear mechanistic-empirical analysis.Infrastructures, 2024, 9(2): 33

[102]

Armaghani D J, Rasekh H, Asteris P G. An advanced machine learning technique to predict compressive strength of green concrete incorporating waste foundry sand.Computers and Concrete, 2024, 33(1): 77

[103]

Jaf D K I, Abdulrahman P I, Mohammed A S, Kurda R, Qaidi S M, Asteris P G. Machine learning techniques and multi-scale models to evaluate the impact of silicon dioxide (SiO₂) and calcium oxide (CaO) in fly ash on the compressive strength of green concrete.Construction and Building Materials, 2023, 400: 132604

[104]

Alkayem N F, Shen L, Mayya A, Asteris P G, Fu R, di Luzio G, Strauss A, Cao M. Prediction of concrete and FRC properties at high temperature using machine and deep learning: A review of recent advances and future perspectives.Journal of Building Engineering, 2024, 83: 108369

[105]

Kumar D R, Wipulanusat W, Kumar M, Keawsawasvong S, Samui P. Optimized neural network-based state-of-the-art soft computing models for the bearing capacity of strip footings subjected to inclined loading.Intelligent Systems with Applications, 2024, 21: 200314

[106]

Kaloop M R, Bardhan A, Samui P, Hu J W, Zarzoura F. Computational intelligence approaches for estimating the unconfined compressive strength of rocks.Arabian Journal of Geosciences, 2023, 16(1): 37

[107]

Mustafa R, Samui P, Kumari S, Mohamad E T, Bhatawdekar R M. Probabilistic analysis of gravity retaining wall against bearing failure.Asian Journal of Civil Engineering, 2023, 24(8): 3099–3119

[108]

KumarD RSamuiPWipulanusatWKeawsawasvongSSangjindaKJitchaijaroenW. Machine learning approaches for prediction of the bearing capacity of ring foundations on rock masses. Earth Science Informatics, 2023, 16(4): 4153–4168

[109]

KumarD RSamuiPWipulanusatWKeawsawasvongSSangjindaKJitchaijaroenW. Bearing capacity of eccentrically loaded footings on rock masses using soft computing techniques. Engineering and Science, 2023, 24(929): 2023

RIGHTS & PERMISSIONS

Higher Education Press