Novel computational methods to predict the compressive strength of hydrothermally solidified clay

Aydin SHISHEGARAN; Mehrshad SAMADI; Mina TORABI

doi:10.1007/s11709-025-1237-9

Front. Struct. Civ. Eng. ›› 2025, Vol. 19 ›› Issue (12) :2054 -2072. DOI: 10.1007/s11709-025-1237-9

RESEARCH ARTICLE

Novel computational methods to predict the compressive strength of hydrothermally solidified clay

Author information +

History +

PDF (5781KB)

Abstract

Hydrothermally solidified clay (HSC) is clay that has hardened through hydrothermal conditions, which involve high temperatures and pressures. The HSC has outstanding features that render it beneficial for various applications, including construction materials, ceramics, and various industrial uses. Furthermore, the production process of HSC, which can be used as an eco-friendly construction material, stands out because of its lower energy consumption, which is consistent with sustainable development objectives. Therefore, this study employed the applications of machine learning (ML) techniques, including stronger variable creator machines (SVCM), high-correlated variable creator machines (HCVCM), gene expression programming (GEP), multivariate adaptive regression splines (MARS), group method of data handling (GMDH), and combinations of GEP with SVCM and HCVCM, i.e., SVCM + GEP and HCVCM + GEP, to predict the compressive strength (CS), which is a crucial indicator of soil performance. Based on the proposed ML methods, mathematical equations were derived to predict CS using reliable, published experimental data sets. The performance of the predictive models for the prediction of CS was assessed using statistical measures, the objective function (OBJ) parameter, and the uncertainty method. In addition, the graphical plots, including scatter and Taylor diagrams, were evaluated to assess the effectiveness and accuracy of the suggested approaches. Overall, the results demonstrated that the SVCM method has the highest accuracy for predicting CS. Finally, a SHapley Additive exPlanations (SHAP) method, sensitivity analysis, and a parametric study were used to evaluate the performance of the best predictive model for predicting CS.

Graphical abstract

Keywords

hydrothermally solidified clay / machine learning methods / statistical parameters / sustainable development / eco-friendly construction material / compressive strength prediction

Cite this article

Download citation ▾

Aydin SHISHEGARAN, Mehrshad SAMADI, Mina TORABI. Novel computational methods to predict the compressive strength of hydrothermally solidified clay. Front. Struct. Civ. Eng., 2025, 19(12): 2054-2072 DOI:10.1007/s11709-025-1237-9

登录浏览全文

4963

注册一个新账户忘记密码

1 Introduction

Construction development inevitably generates construction waste, such as waste soil. Therefore, recycling and the use of eco-friendly materials are important issues in civil engineering projects [1,2]. Based on an estimation for transparency research, construction in the whole world will produce 2200 million tons in 2025, and most of this waste type refers to waste soil that causes environmental pollution and land consumption for landfilling [3]. Coastal waste soils contain soft clay, which has significant compressibility, high moisture content, and low strength [4]. As an ordinary approach to the solidification of this waste, the waste clay was combined with alternative substances and then burned at a temperature between 750 and 1100 °C to produce clay bricks [5,6]. For the production of each brick, 0.2–0.3 kg of coal or firewood should be burned, causing the emission of 0.41 kg of CO₂, one of the major greenhouse gases. This issue contradicts sustainability.

Hydrothermal solidification can convert inorganic waste into hard materials in 24 h at a temperature below 200 °C and saturated vapor pressure, utilizing just 16.7% of the energy required for the production of sintered ceramic [7]. This approach is used to recycle inorganic waste, such as production from waste soil [8], glass waste recycling [9], generation of fly ash by burning the waste [10–12], and blast furnace slag [13]. To do the clay treatment, Zhou et al. [14] evaluated the effect of the curing strength of sepiolite clay with various hydrothermal temperature circumstances on the strength of solidified clay. The influence of the initial dry density, curing condition, and alkaline area on the strength of hydrothermally solidified clay (HSC) has been evaluated in some recent studies by several researchers. They indicated that the creation of hydro-garnet, calcium aluminosilicate hydrates, and calcium silicate hydrates could enhance the compressive strength (CS) of HSC [15,16]. According to previous studies, like those mentioned above, compressive strength meaningfully impacts the efficacy of solidified clay; therefore, most researchers presented this performance for several limited mix propositions by conducting experimental research [17]. To assess the effect of mix proportion, curing conditions, and environmental parameters on compressive strength as an indicator to evaluate the performance of solidified clay, numerous experimental studies should be carried out, which require costly tests and are time-consuming. As a result, the numerical methods can improve the knowledge about the effect of design parameters on the performance of solidified clay.

HSC is a method used to stabilize soil by treating it with high-temperature hydrothermal processes. This procedure causes chemical reactions inside the clay soil structure, forming a consolidated matrix that improves the soil’s strength and stability. The HSC process has several beneficial outcomes for the soil, including increasing strength, improving load-bearing capacity, reducing compressibility, and enhancing erosion resistance [18]. HSC utilizes natural clay materials and minimizes the need for external additives or chemicals. This eco-friendly approach reduces the environmental impact associated with conventional soil stabilization techniques. In addition, locally available clay and site-specific hydrothermal treatment parameters save costs in construction projects by simplifying processes and reducing imported materials. Moreover, the importance of HSC in construction projects shows the need for computational methods to predict the features of HSC, especially its compressive strength.

Artificial intelligence (AI) approaches can estimate compressive strength without costly tests and time-consuming procedures and propose the best mix proportion for the required target. Successful applications of AI in civil engineering projects have been validated by extensive research conducted by researchers [19,20]. However, efforts are ongoing to develop new computational methods or improve the existing machine learning (ML) approaches for enhancing the performance and accuracy of modeling complex problems [21].

Recently, Shishegaran et al. [22] used a new ML method called high correlated creator machine (HCVCM) and a new regression method named step-by-step regression to predict the compressive strength of concrete. The Stronger Variables Creator Machine (SVCM) is a novel white-box machine learning system employed to estimate the flexural and compressive strength of recycled stone [23]. They show that the HCVCM and SVCM perform better than Gene Expression Programming (GEP) and the adaptive neuro-fuzzy inference system (ANFIS) in predicting the strength of the material. Furthermore, some researchers predicted the CS of cement-based materials according to curing age, cement dosage, and water per cement [24–26]. Some other parameters, including curing temperature, molding method, raw material, dry density, and other curing conditions, have an effect on the strength of HSC [27].

Nowadays, mathematical modeling using numerical simulations and data-driven methods is widely employed in various fields of engineering applications [28–32]. The unconfined compressive strength (UCS) test is widely used as a key indicator for assessing the quality of stabilized soil materials. In recent years, numerous studies have employed artificial intelligence and data-driven modeling techniques to predict UCS more accurately and cost-effectively. Dezfuli and Ghanizadeh [33] applied artificial neural networks (ANN) and ANFIS to predict both UCS and indirect tensile strength (ITS) of clayey subgrade soils stabilized with Portland cement and iron ore mine tailings (IOMT), with results indicating the superiority of ANN over ANFIS. Similarly, Ghanizadeh et al. [34] utilized evolutionary polynomial regression (EPR) to model the UCS and Young’s modulus of lime/cement-stabilized clayey subgrade soils, further demonstrating the effectiveness of evolutionary algorithms in geotechnical prediction tasks. In another study, Ghanizadeh and Naseralavi [35] employed Gaussian Process Regression (GPR) to estimate UCS in lean clay stabilized with IOMT and hydrated lime, showcasing the robustness of probabilistic learning approaches. GEP was also successfully implemented by Ghanizadeh and Safi Jahanshahi [36] for modeling the UCS of cement- and lime-stabilized clay soils, revealing that curing time and stabilizer content positively influence UCS, while increased moisture content has a detrimental effect. Additionally, the Group Method of Data Handling (GMDH) was employed by Ghanizadeh et al. [37] to develop interpretable models, with sensitivity analysis underscoring the significance of cement percentage and moisture content as key influencing parameters. More recently, multivariate adaptive regression splines (MARS), coupled with an evolutionary-based search, were used by Ghanizadeh et al. [38] to predict UCS and California Bearing Ratio for expansive soils stabilized with hydrated lime and rice husk ash, indicating high prediction accuracy. Safi Jahanshahi and Ghanizadeh [39] further emphasized the benefits of machine learning techniques in UCS modeling, citing their ability to overcome the high cost and time associated with traditional laboratory testing.

Clay is widely used in civil engineering projects due to its properties, including compressibility, strength, permeability, and aging behavior, and its various characteristics have been extensively studied [40–42]. However, some previous studies used various methods to estimate the CS of hydrothermally solidified clay; there is a lack of applying some new ML methods that can increase the accuracy of performance prediction of hydrothermally solidified clay. For instance, Babu et al. [43] compared ANN and ANFIS for estimating the green CS of clay. They found that both data-driven models aligned with the observed values. However, ANFIS demonstrated higher accuracy than ANN in predicting compressive strength. Karunakar and Datta [44] successfully modeled the green sand mixture using experimental samples and applications of ANN and genetic algorithms. Onyelowe et al. [45] employed gene GEP to estimate the strength characteristics of expansive soil by examining various input parameters. The GEP outcomes aligned with the measured values and provided reliable and accurate results. Jeremiah et al. [46] presented a systematic review of the application of ANN in predicting the geo-mechanical properties of clay stabilized with different materials. Their study emphasizes the effectiveness of ANN models in handling the mathematical modeling of nonlinear parameters involved in stabilizing clays. Jalal et al. [47] developed ANN, ANFIS, and GEP for modeling the unconfined CS of expansive soils. They discovered that the GEP and ANN models outperformed the ANFIS model in terms of precision and reliability. In a study by Onyelowe et al. [48], three different training algorithms for ANN were applied to predict the behavior of soil strength parameters. The finding demonstrated that the Levenberg-Marquardt backpropagation algorithm provided the utmost precision estimations for modeling the soil’s parameters. Iqbal et al. [49] suggested ANFIS and Ensemble Random Forest (ERF) regression techniques to predict the strength properties of soft clay soil. The findings indicated that ERF regression outperformed the ANFIS model. Lin et al. [17] combined experimental data and data-driven methods to estimate the unconfined CS of HSC. They concluded that GPR offered the best prediction for UCS. Furthermore, their sensitivity analysis revealed that the most significant parameters impacting UCS were dry density and curing time.

In this study, the data on HSC is collected from some previous studies, and then several new white box ML methods, which have been presented in recent years, are used to estimate the CS of HSC. Therefore, the novelty of the present study refers to the new white-box method and its equations, which were used. In other words, the SVCM and HCVCM were separately published in a paper before it in 2021 and 2023, and they were utilized in the presented study for prediction.

2 Methodology

The main purpose of this investigation is to assess and compare the effectiveness of white-box data-driven techniques such as GEP, GMDH, MARS, SVCM, and HCVCM, as well as to ensemble models, which are the ensemble of SVCM and GEP, as well as HCVCM and GEP in prediction the CS of HSC. The second aim of this investigation concerns the use of new ML methods to present more reliable and accurate models to estimate the CS of HSC. The focus of this research is on employing white-box data-driven techniques to formulate precise equations for modeling the CS, incorporating the physical properties of hydrothermally solidified clay. In conclusion, a comparative analysis was performed to assess the capability of the proposed data-driven methods, with the results being measured using statistical metrics and graphical representations. Figure 1 illustrates the arrangement of steps of this investigation. It is worth noting that the data split was conducted randomly.

This section provides a brief overview of data-driven models such as SVCM, HCVCM, GEP, MARS, GMDH, and ensemble models (the grouping of SVCM and GEP, as well as the grouping of HCVCM and GEP) for modeling the CS of HSC. In addition, it describes the data set used for developing the models. The data set was collected from studies by Lin et al. [17]. The details of the data are provided in Subsection 2.1.

2.1 Data set and effective parameters

Lin et al. [17] conducted research involving the experimental process of hydrothermally solidifying clay. Their experimental method for HSC involved the use of kaolin clay and Ca(OH)₂ powder. The experimental process included weighing and premixing the kaolin and Ca(OH)₂ powders, blending them with water, and uniaxially pressing the mixture into cylindrical specimens.

The kaolin and Ca(OH)₂ powders were accurately measured and pre-mixed in a jar for 10 min, combined with water in a blender, and blended for 10 min. The combined material was uniaxially compressed into cylindrical samples of 30 mm in diameter and 30 mm in height using a tablet press and mold. These samples were then subjected to hydrothermal reactions and curing in an oven at 60 °C for 12 h before being tested for compressive strength with a universal testing machine at a loading rate of 0.5 mm/min. The experimental process yielded 140 reliable experimental data points and influential parameters for modeling the CS of hydrothermally solidified clay. Lin et al. [17] represent the data’s details. Soil physical parameters were used to derive equations for the prediction of CS. The following functional relationship is used to develop the predictive equations for the estimation of CS [17]:

(1)

C S = f (C O, ω, C t e m p, C t i m e, γ d),

where CS is compressive strength (MPa);

C O

C a (O H) 2

content (%),

ω

is the moisture content (WC) (%),

C t e m p

is the temperature of curing (°C),

C t i m e

is curing time (h), and

γ d

is dry density (DD) (g/cm³).

For the generation of models, the right side of Eq. (1), including

C O, ω, C t e m p, C t i m e

, and

γ d

are regarded as input variables and the left side as an output parameter, i.e., CS. Table 1 lists the key statistical measures for the data set used in this study.

For generating the proposed models, 112 training data sets (80% of all data) were utilized for development, and the rest of the data samples (28 data sets) were employed for the evaluation of the models. The histogram of the data set is illustrated in Fig. 2.

The correlation plot of variables is displayed in Fig. 3.

According to Fig. 3, the correlation between the input variables and output is insufficient for regression and prediction models. As seen in this figure, it is concluded that the C_time and γ_d have the highest correlation on CS, with values of 0.49.

2.2 High-correlated variables creator machine

HCVCM creates the new variables from the initial variables to have better accuracy in predicting the output. The mission of HCVCM is to produce and discover new inputs that have more correlation with the target in comparison to initial inputs as well as have lower correlations with each other; thus, it can generate more independent new variables together in comparison to the initial variables and improve prediction models and more dependent to output in comparison to the initial inputs. There are various generations in this model that are generated from the previous generation. Each generation is created in three steps. First, the new variables are generated by some mathematical equations, trigonometric functions, their combinations, and user-defined functions. Then, the coefficient of determination (R²) between the intimal inputs and output, as well as the new inputs and output, are calculated to compare to select more efficient variables between them for prediction. Among all new inputs, only the inputs whose R² values with output are more than the R² values of the initial inputs with the target are brought into the next step. In the last step, among all new variables selected in the previous step, one of the variables is imported to create the next generation, whose R² values with each other are less than the R² values between the initial variables. Figure 4 demonstrates the process of generating new variables in HCVCM. In this study, the new variables obtained from the first generation are utilized in the regression technique and GEP because the variables generated in the second generation could not enhance the precision of the above-mentioned models [22].

2.3 Stronger variable creator machine

With respect to several directions, SVCM is similar to HCVCM and can generate better variables in comparison to the initial variables with respect to increasing the accuracy of a model. This method created several loops in which new variables were generated. At first, new variables are generated from the initial variables by generating new variables using mathematical functions. In the second step, only new variables are selected that have a higher correlation to the target in comparison to their original variables. The stronger variables are imported into the next loop of the next generation. Finally, all initial variables and the selected new variables, which have a coefficient of determination of more than 0.5, are used in the prediction process using regression techniques or ensemble models [23]. In this study, Revised SVCM (RSVCM) is used, in which all initial variables and the newly selected variables in all loops, which have a coefficient of determination more than the mean R² of the original variables, are used to estimate the outputs. To improve the RSVCM, the sensitivity analysis is used to select the criteria for using new variables. For example, the mean R² of the original variable can decrease and increase by 10%, 20%, and 30%, and after that, the accuracy of models can be compared. The first difference between HCVCM, SVCM, and RSVCM is the number of input variables that they imported into ensemble models and regression techniques to predict the output. It is clear that the input variables used in SVCM are more than those in RSVCM and HCVCM [22,23]. Figure 5 demonstrates the process of generating new variables in SVCM.

2.4 Gene expression programming

GEP is an evolutionary computing algorithm that is used to discover complex functional relationships between involved variables in a regression problem. It works by representing a mathematical expression using a fixed-length chromosome. In GEP, a chromosome is a string of genes representing a mathematical expression. Each gene can be an operator, a variable, or a constant. The chromosome consists of a head and a tail. The head contains the function set, which includes arithmetic operators. The tail comprises the terminal set, which consists of variables and constants [50]. The GEP method begins by randomly generating an initial population of chromosomes. The length of the chromosome and the function and terminal sets are predefined based on the problem domain. Each chromosome in the population is assessed regarding its fitness, which indicates how effectively it solves the problem at hand. The fitness function is problem-specific and can be defined based on the desired outcome [51]. The population with the best performance is chosen to produce the subsequent generation. Genetic operators, such as crossover and mutation, are utilized on the designated chromosomes to generate new offspring. Crossover denotes the procedure for interchanging genetic material between two parental chromosomes. In addition, the mutation causes stochastic modifications to the genes.

The GEP algorithm continues to iterate through selection, genetic operators, and replacement until the ending requirement is satisfied. This requirement may be a set time restriction, a specified fitness threshold, or a set number of generations. Once the algorithm terminates, the best chromosome in the final population represents the discovered mathematical expression that solves the problem. This expression can be used for prediction, optimization, or other applications, depending on the problem domain. GEP has been effectively implemented in many different areas, including civil engineering problems [52]. It offers a flexible and powerful approach to discovering complex functional relationships. Overall, GEP is a powerful artificial intelligence algorithm that can be used to discover complex functional relationships in various fields.

2.5 Multivariate adaptive regression splines model

MARS is a nonparametric approach for regression studies that implies a functional relationship between involved variables in a certain problem [53]. It is suitable for modeling complex problems and capturing both linear and nonlinear relationships. MARS is adaptive, meaning it can automatically select the most relevant predictors and interactions between them, unlike regression methods such as linear regression, which require the user to manually select the predictors and interactions [38]. MARS employs the basis function (BF) to establish the input-output relationship between the variables. The BFs are defined as follows [54]:

(2)

B F = m a x (0, x − c) o r B F = m a x (0, c − x),

where x is the input parameter, c is the value of the threshold related to x, BF denotes as basis function. The outcome of the MARS approach is defined as follows:

(3)

y = C 0 + ∑ i = 1 m C m B F m,

where C₀ represents the constant value, C_m is the coefficients of the basis functions, and m represents the number of BF in the problem. The optimal MARS model is determined in two stages. Initially, basis functions are incorporated into the model. Subsequently, basis functions with the minimum influence on the MARS model are discarded in accordance with the generalized cross-validation (GCV) measure.

2.6 Group method of data handling

The GMDH model is an advanced approach that develops upon inductive self-organizing approaches to address complicated practical problems. It has been successfully used to analyze nonlinear structure characteristics [55]. The GMDH structure is a nonlinear framework that uses a multilayer quadratic polynomial to predict the output parameter using effective input variables. The GMDH method utilizes a neural network structure where nodes are organized in layers and connected using quadratic polynomials. This connection process generates new nodes in the following layer. The network nodes are coupled to create the predetermined quadratic equation between involved variables, as depicted in Eq. (5). The weighting coefficients (

w 0

w 1, …, w 5

) of the quadratic polynomial in Eq. (5) are determined using the least square method to minimize the discrepancy between the real outcome, y, and the estimated result, y, for each combination of the input variables (

x i

and

x j)

[56].

(4)

y^= w 0 + w 1 x i + w 2 x j + w 3 x i x j + + w 4 x i 2 + w 5 x j 2 .

2.7 Ensemble models (SVCM + GEP and HCVCM + GEP)

Two ensemble models, including the grouping of SVCM and GEP as well as the grouping of HCVCM and GEP, were used to estimate the CS of HSC. Based on the findings of previous research performed by Shishegaran et al. [23], it was determined that SVCM and HCVCM can enhance the precision of prediction models. The outputs of these models generate new stronger variables as well as more correlated new variables with output and low correlated together can be used in other AI methods or regression. As a result, the current research aims to utilize SVCM and HCVCM to generate new variables to import into the GEP model, leading to the creation of two ensemble models.

3 Results

Equation (1) was considered for modeling CS of hydrothermally solidified clay. Therefore, the five input variables, including

C O, ω, C t e m p, C t i m e a n d γ d

were used for modeling CS. Using the SVCM and HCVCM algorithms, the subsequent mathematical expressions were provided to estimate CS. The formulae derived from SVCM are as follows:

(5)

C S = − 0.4028 × C O + 6.9531 × C O 0.5 + 4.0120 × ω − 0.2926 × ω 2 + 0.0060 × ω 3 − 7.45 e − 7 × ω 5 − 3.5587 × C t e m p + 0.0453 × C t e m p 2 − 0.0002 × C t e m p 3 + 1.22 e − 9 × C t e m p 5 + 6.5697 × C t i m e − 0.5516 × C t i m e 2 + 0.0167 × C t i m e 3 + 2.3706 × γ d + 3.8656 × γ d 2 + 7.7581 × γ d 3 − 1.1805 × γ d 5 − 0.1724.

The formula derived from HCVCM is as follows:

(6)

C S = 3.2430 × C O + 0.1249 × ω 2 − 0.0062 × ω 3 + 1.93 e − 6 × ω 5 + 0.0032 × C t e m p 2 − 1.93 e − 5 × C t e m p 3 + 2.14 e − 10 × C t e m p 5 + 0.2630 × C t i m e 2 − 0.0108 × C t i m e 3 + 178.4070 × γ d 2 − 104.8232 × γ d 3 + 7.7431 × γ d 5 − 146.8054.

The subsequent step is conducted for modeling CS using GEP. GEP addresses the issue by encoding linearly within chromosomes and offering nonlinear responses as sub-expression trees (Sub-ETs) of different sizes and forms. Each chromosome is comprised of genes, and each gene contains a number of smaller sub-expression trees (Sub-ETs). A common GEP framework encompasses a collection of mathematical functions and a terminal set that includes input parameters and constant values. The root relative square error (RRSE) serves as a fitness function for assessing the chromosomes in all generations. The terminal set (T) has five input variables as follows:

(7)

T = {C O, ω, C t e m p, C t i m e, γ d} .

The mathematical function set (

F

) was used to estimate CS are considered as follows:

(8)

F = {+, −, ×, ÷, x n, l n (x), E x p (x), S i n (x), C o s (x), A r c s i n (x), A r c c o s (x), A r c t a n (x), S i n h (x), C o s h (x), T a n g h (x)} .

Many GEP models with different parameter settings were constructed using the trial-and-error method to estimate CS. The generation number was specified as 86494. This indicates that the algorithm will undergo 300000 iterations to optimize the solution. Moreover, the population size, head length, and the number of genes were designated as 43, 5, and 5, respectively. The GEP expression for the prediction of CS was obtained as follows:

(9)

C S = 1.11089208030304 × ω × γ d + 0.000510080180776197 × C O × C t e m p × C t i m e × γ d − 4.44924241783918 − 0.0056680093274089 × ω × C t i m e 2 − 0.0317514507890288 × γ d × ω 2 − 9.79976612993958 e − 6 × C t e m p × C t i m e × C O 2 .

Employing the MARS technique, a sequence of linear combinations of basis functions is constructed for the estimate of CS. The general expression of MARS is shown below:

(10)

y = − 4.36376 + ∑ i = 1 9 C m B F m .

The expansion of MARS is as follows:

(11)

C S = − 4.36376 + 36.7767 × m a x (0, γ d − 1.1) + 1.25535 × m a x (0, C t i m e − 9) − 1.97078 × m a x (0, 9 − C t i m e) + 1.0262 × m a x (0, ω − 20) − 0.506078 × m a x (0, 20 − ω) − 0.585456 × m a x (0, 35 − C O) + 0.184293 × m a x (0, C t e m p − 60) − 2.03774 × m a x (0, ω − 15) − 0.212292 × m a x (0, C O − 45),

The first 30 basis functions were considered for the development of MARS formulae, and finally, 9 BFs were used for CS estimation. In addition, the value of GCV was equal to 22.57. As seen, the MARS approach splits the domain of the problem and fits several local models in the form of multiple linear regression, which is considered for each divided domain by the MARS method.

With the implementation of the GMDH method, a network with four layers, including two intermediate layers, is obtained, as shown in Fig. 6.

GMDH is a self-organizing network that incrementally builds its structure layer by layer, selecting optimal intermediate neurons based on prediction accuracy. This adaptive nature allows it to effectively model complex, nonlinear relationships with multiple inputs. For modeling with multiple inputs, including CO, ω, C_temp, C_time, γ_d, the GMDH network operates as follows:

1) Input Layer. This initial layer receives raw input variables—specifically CO, ω,

C t i m e

C t e m p

, and γ_d. These variables serve as the foundation for subsequent modeling processes.

2) First Intermediate Layer. In this layer, the network creates new neurons by exploring all possible pairwise combinations of the initial input variables. Each newly created neuron represents a quadratic polynomial formed from these pairs, following a defined mathematical structure. Given a set of N input variables, the number of possible combinations is determined by the formula N(N−1)/2. For instance, with five inputs, 10 potential neurons are generated. The coefficients of each polynomial (i.e., Eq. (4)) in this layer are calculated using a least squares method, which leverages training data. To prevent overfitting and to manage the complexity of the model, only the “best” performing neurons, those exhibiting the lowest error rates, are retained and selected for progression to the next layer.

3) Second Intermediate Layer. This layer functions similarly to the first, where new neurons are created by combining all possible pairs of the selected neurons from the first layer, again utilizing a quadratic polynomial approach. This iterative refinement allows the model to capture complex relationships between the variables.

4) Output Layer. The end of this process is the output layer, where the GMDH network generates predictions for Compressive Strength. This final output is derived from a combination of neurons in the second intermediate layer, utilizing a secondary polynomial equation to accurately represent the predicted values.

This self-organizing, multi-layered structure, built upon pairwise quadratic combinations, allows the GMDH network to effectively capture highly nonlinear and intricate relationships among multiple input variables, ultimately leading to a robust predictive model.

The details of the equation related to the GMDH model are as follows:

For the first intermediate layer:

(12)

(C S ∗) 11 = A 0 + A 1 × C t e m p + A 2 × γ d + A 3 × C t e m p 2] + A 4 × γ d 2 + A 5 × C t e m p × γ d; (C S ∗) 21 = B 0 + B 1 × C t i m e + B 2 × γ d + B 3 × C t i m e 2 + B 4 × γ d 2 + B 5 × C t i m e × γ d; (C S ∗) 31 = C 0 + C 1 × C O + C 2 × ω + C 3 × C O 2 + C 4 × ω 2 + C 5 × C O × ω .

For the second intermediate layer:

(13)

(C S ∗) 12 = D 0 + D 1 × (C S ∗) 21 + D 2 × (C S ∗) 31 + D 3 × ((C S ∗) 21) 2 + D 4 × ((C S ∗) 31) 2 + D 5 × (C S ∗) 21 × (C S ∗) 31; (C S ∗) 22 = E 0 + E 1 × (C S ∗) 11 + E 2 × (C S ∗) 31 + E 3 × ((C S ∗) 11) 2 + E 4 × ((C S ∗) 31) 2 + E 5 × (C S ∗) 11 × (C S ∗) 31 .

For the output layer:

(14)

(C S ∗) 13 = F 0 + F 1 × (C S ∗) 12 + F 2 × (C S ∗) 22 + F 3 × ((C S ∗) 12) 2 + F 4 × ((C S ∗) 22) 2 + F 5 × (C S ∗) 12 × (C S ∗) 22 .

The coefficients related to GMDH expressions are given in Table 2.

For developing the ensemble models, including SVCM + GEP and HCVCM + GEP, the results of SVCM and HCVCM are fed to GEP for the generation of ensemble models. The mathematical expressions of the proposed ensemble models are as follows.

The formula extracted from SVCM + GEP is as follows:

(15)

C S = 4.10838604106427 × C O 0.5 + 0.0111527315400177 × C t e m p × C t e m p × γ d 3 × C t i m e 0.5 − 16.3793064656171 − 0.000351391476676616 × ω 3 × γ d − 0.00114594344916609 × C t i m e × c o s h (C O 0.5) .

The formula extracted from HCVCM + GEP is as follows:

(16)

C S = 0.00129758036409283 × C t i m e 3 + 0.00021489291599372 × C t e m p 2 γ d 2 + 4.61149241793436 × γ d 2 (ω 3) 0.0142349946860235 + 2.40391737618562 × C O 0.5 × (C t i m e 3) 0.070458690769382 − 22.8331960090231 − 0.000584509313740101 × ω 3 .

3.1 Performance assessment criteria

To evaluate the efficacy and accuracy of all compressive strength prediction models, three common statistical metrics, including

R 2

R M S E

, and

M A E

were evaluated and calculated as follows [57]:

(17)

R 2 = (∑ i = 1 i = n [(C S i o b s − C S o b s ¯) (C S i p r e − C S p r e ¯)] ∑ i = 1 i = n (C S i o b s − C S o b s ¯) 2 ⋅ ∑ i = 1 i = n (C S i p r e − C S p r e ¯) 2) 2,

(18)

M A E = 1 n ∑ i = 1 i = n | C S i o b s − C S i p r e |,

(19)

R M S E = 1 n ∑ i = 1 i = n (C S i o b s − C S i p r e) 2,

where

n

is the total number of

C S

observed values;

C S i o b s

and

C S i p r e

are the ith estimated and the measured value of

C S

, respectively.

C S o b s ¯

and

C S p r e ¯

are the mean of measured and estimated values. For an ideal model, the RMSE and MAE values should equal zero, and the R² value should equal one.

Moreover, the evaluation of the developed data-driven models is comprehensive, considering two key criteria: the objective function (OBJ) parameter and the uncertainty criterion (U95). The OBJ parameter, a crucial criterion in model selection, considers statistical indicators such as RMSE, MAE, and R² for training and testing data sets. The OBJ equation is expressed as follows [58,59]:

(20)

O B J = (n t r a i n i n g d a t a n a l l d a t a × R M S E t r a i n i n g d a t a + M A E t r a i n i n g d a t a R t r a i n i n g d a t a 2 + 1) + (n t e s t i n g d a t a n a l l d a t a × R M S E t e s t i n g d a t a + M A E t e s t i n g d a t a R t e s t i n g d a t a 2 + 1),

where n_trainingdata, n_testingdata, and n_alldata are denoted as the amount of training, testing, and all data sets. The most optimal model is the one that possesses the lowest value for the OBJ parameter.

Moreover, we considered the uncertainty criterion to evaluate the models’ performance, as Saberi-Movahed et al. [60] outlined. The main aim of uncertainty analysis is to measure the variability present in the expected results generated by data-driven models. It is crucial to recognize this range of uncertainty in order to assess the generalizability and reliability of our models and to estimate potential errors associated with their predictions. The mathematical equation for calculating the uncertainty measurement (U95) is defined as follows:

(21)

U 95 = (1.96 n) ∑ i = 1 n ((C S) i (O b s e r v e d) − (C S) (O b s e r v e d) ¯) 2 + ∑ i = 1 n ((C S) i (O b s e r v e d) − (C S) i (P r e d i c t e d)) 2,

The model with a lower U95 value typically indicates a higher level of precision and reliability in its predictions compared to models with higher U95 values.

4 Discussions

The commonly used statistical indices, including R², RMSE, and MAE, were used to compare the accuracy of the generated data-driven methods in estimating CS. Table 3 demonstrates the values of statistical metrics. The models exhibiting the lowest RMSE and MAE and the highest values for R² have superior prediction accuracy.

Table 3 presents the error values calculated by the ML models for the training and testing data sets. According to the results from Table 3, the SVCM model outperformed other ML approaches using the training and testing data. This can be seen by comparing the statistical values of the ML models. As seen, the statistical values of the SVCM for training and testing data sets have the lowest RMSE and MAE values and the highest R-value compared to other models. During the training stage, SVCM demonstrated the highest R² (0.965) and the lowest RMSE (2.418) and MAE (1.853) on the training data, indicating excellent performance in fitting the training set. The HCVCM + GEP model also performed well on the training data, with an R² of 0.948, RMSE of 2.472, and MAE of 1.951. The remaining models, GEP, SVCM + GEP, MARS, GMDH, and HCVCM, were ranked based on the values of R², RMSE, and MAE in the training stage. Similarly, in the testing stage, SVCM maintains strong performance with an R² of 0.947 and low RMSE (2.889) and MAE (2.233), followed by HCVCM + GEP, GEP, SVCM + GEP, MARS, GMDH, and HCVCM. Based on the performance of both training and testing data, the models can be ranked in terms of accuracy as follows: SVCM, HCVCM + GEP, GEP, SVCM + GEP, MARS, GMDH, and HCVCM. The ranking is based on statistical indicators to predict the accuracy of CS. All models are compared based on the values of determination correlation, RMSE, and MAE.

The graphical plots are essential graphical evaluation tools for assessing the difference between predicted and observed values. Figure 7 displays a comparison between observed and estimated CS for each data-driven model.

Figure 7 indicates that the outcomes of the models appear to be reasonably able to predict CS. The X-axis corresponds to the sample number and represents the individual data points, while the Y-axis indicates the output parameter, CS, with values. The black dots represent the actual values of CS for each sample. The blue dots represent the predicted CS values by a model in the training stage, and the red dots represent the predicted CS values by the same model in the testing set. The observed and estimated values from all the proposed models fluctuate around the observed values. Both the blue (train) and red (test) lines closely follow the black (observed) line, suggesting that the proposed data-driven models effectively capture the underlying patterns in the data.

As seen in Fig. 7, although the precision of the ML models is good, in the models of SVCM in two cases and GEP and HCVCM + GEP in one case, for low values of compressive strength, the negative values are estimated for CS. Therefore, more caution should be used when using these models to estimate low values of CS. Furthermore, based on Fig. 7, it was concluded that the SVCM outcomes have the lowest difference between experimental and estimated values of CS compared to the other models. Overall, Fig. 7 provides a visual representation of the SVCM model’s performance, which can predict the CS variable very well.

Figure 8 illustrates the Taylor plot, which is employed to assess the models graphically in training and testing data sets.

The Taylor plot provides a visual comparison of the performance of several models based on their correlation, standard deviation, and RMSE with the observed data. It can be a valuable tool for selecting the best model. The Taylor plot includes the proposed models SVM, HCVCM, GEP, MARS, GMDH, and ensemble models (SVCM + GEP and HCVCM + GEP). Each model is represented by a different shape and color, making it easy to identify and compare the results visually. The observed data are represented by a red square, while the SVCM results are represented by a green circle. As seen in Fig. 8, the SVCM model yielded the closest estimated values to the observed ones. The OBJ and U95 values for all data sets are illustrated in Figs. 9 and 10, respectively.

Figure 9 provides a clear visualization of the accuracy of the models in terms of the OBJ value. The plot visually compares the performance of different models based on the OBJ value. The X-axis includes Model names (SVCM, HCVCM + GEP, GEP, SVCM + GEP, MARS, GMDH, and HCVCM), and the Y-axis represents the OBJ values for each model. In addition, each bar represents the OBJ value for a specific model. It’s a valuable tool for comparing models and selecting the best one for a given task.

The OBJ values obtained for proposed data-driven models are illustrated in Fig. 9. Regarding Fig. 9, it is obvious that the SVCM model has the lowest OBJ value (2.310) and identifies the superior model for the prediction of CS. In contrast, the HCVCM model with an OBJ value equal to 4.600 had the weakest performance. The order of OBJ values of the other suggested models (HCVCM + GEP, GEP, SVCM + GEP, MARS, GMDH, and HCVCM) can be ranked based on their OBJ values to determine their relative performance. The result of uncertainty analysis (U95) for all data sets is illustrated in Fig. 10. As previously discussed, a lower U95 value generally suggests a model with less uncertainty, which is desirable for reliable predictions.

The X-axis denotes the model names, and the Y-axis denotes the U95 values. Each bar on the graph corresponds to the U95 value for a specific model. The representation in Fig. 10 allows for comparing the uncertainty associated with different models. Based on Fig. 10, it is evident that the SVCM model, with the lowest U95 value of 1.884, demonstrates superior performance compared to other models in predicting CS. The U95 of the SVCM model is equal to 1.884, which signifies the lowest level of uncertainty in its predictions. In contrast, the HCVCM model, with the highest U95 value of 1.981, exhibits the weakest performance. According to the U95 values, the models were ranked in the following order from second to seventh: HCVCM + GEP, GEP, SVCM + GEP, MARS, GMDH, and HCVCM.

The statistical values obtained (OBJ and U95) are consistent with the statistical indicators and graphical plots. According to the statistical evaluations, it can be concluded that the SVCM model exhibited superior results compared to other proposed data-driven models. Furthermore, GEP, SVCM + GEP, HCVCM + GEP, MARS, GMDH, and HCVCM models are ranked second to seventh based on their capability and accuracy.

4.1 Comparison between the results of data-driven models

For a better evaluation of the precision and efficiency of the proposed data-driven methods compared with previous methods, based on testing data. The values of statistical measures are listed in Table 4.

Regarding Table 4, the statistical values of the SVCM model for R², RMSE, and MAE were 0.974, 2.899, and 2.233, whereas these values were 0.989, 1.048, and 0.787 for the GPR model, which is the best approach from the previous study. However, the statistical values for the SVCM model are higher than those of other data-driven models, except the GPR model. On the other hand, the main characteristic of the suggested method in the presented study is that it provides mathematical expressions that straightforwardly calculate CS compared to the black box model presented by previous research. Overall, regarding the statistical values of the developed models, it is apparent that all proposed data-driven models in the present study performed well for prediction CS.

5 Sensitivity analysis

SA is used to determine the importance of each input parameter in influencing the model output. Various approaches have been widely used for conducting sensitivity analysis and interpretability of machine learning models, including the SHAP (SHapley Additive exPlanations) method [61,62] and leave-one-out analysis [63].

5.1 SHapley Additive exPlanations analysis

To assess the feature significance of each parameter, the SHAP were applied with consideration that the SVCM method has the highest accuracy for predicting CS. The feature importance assessment of the estimated value of CS is shown in Fig. 11.

As can be observed, C_time is the most significant input parameter for the estimation of CS. The input variables DD and CO are ranked second and third, respectively, in terms of importance. These findings of the feature importance analysis in this study align with the sensitivity analysis performed by Lin et al. [17]. Furthermore, Fig. 12 illustrates the effect of each input parameter on the CS based on the SHAP values.

As seen in Fig. 12, the SHAP values of C_time, DD, CO, and C_temp exhibit an upward trend as they increase; however, the SHAP value of WC indicates a downward trend as it increases. Thus, the general pattern matches up with the findings of the previous study conducted by Li et al. [17].

5.2 Leave-One-Out sensitivity analysis method

One robust sensitivity analysis approach is the LOO (Leave-One-Out) sensitivity analysis method, which is also known as the removal method. This method assesses the quantitative influence of each input parameter on the model output by iteratively removing one input variable at a time and observing the resulting changes in the model’s performance metrics. A substantial deterioration in model performance upon removal of a specific input indicates that the excluded variable has a significant influence on the output, whereas a minimal change suggests a lower impact. A sensitivity analysis based on the LOO method was performed to determine the relative significance of the input variables on the CS. The sensitivity analysis is carried out using the SVCM method to discover the importance of parameters in CS. Therefore, various SVCM models are developed by excluding each independent variable from Eq. (1). The outcomes of the sensitivity analysis are displayed in Table 5.

Results of the sensitivity analysis demonstrated that C_time and

γ d

are the most important variables in CS, which aligns with the findings of Lin et al. [17] and SHAP results.

6 Parametric study

A parametric study is used for systematically investigating the individual influence of each input parameter on the output parameter. In the present study, the parametric study was conducted by isolating each input variable, including CO, WC, C_temp, C_time, and DD, and systematically varying its value across its relevant experimental range. During this process, all other input variables were held constant at their representative values, as derived from the experimental data set. The parametric study precisely delineates how each individual input parameter affects the compressive strength. Figure 13 illustrates the parametric study of each input variable on the CS. For each plot, one variable is systematically varied across its experimental range while all other input parameters are held constant at representative values, as indicated in the figure titles.

As seen in Fig. 13, a direct comparison is made between the observed experimental compressive strength values and the predicted compressive strength values obtained from the SVCM model.

As seen in Fig. 13, the trend of each variable on CS was aligned with the experimental data. This visual verification unequivocally demonstrates the model’s ability to accurately capture the complex, nonlinear relationships between the input parameters and the CS. The close alignment between the observed and predicted data points across all parametric variations confirms the robustness and reliability of our model in reproducing the experimental behavior.

7 Summary and conclusions

This study presents the utilization of novel computation methods for estimating the compressive strength of HSC, which is an eco-friendly construction material with lower energy consumption and carbon emissions. HSC is an innovative technique that enhances soil strength, environmental sustainability, cost-effectiveness, and versatility. This study used the soil sample data set and influential factors to establish predictive formulas for the estimation of CS. The suggested models were trained and evaluated on 140 soil samples obtained from the literature. This research employed the physical parameters of the clay to develop predictive equations for CS. Hence, robust data-driven methods, including SVCM, HCVCM, GEP, MARS, GMDH, and hybridization models, namely, HCVCM + GEP and SVCM + GEP, were used to extract explicit mathematical equations for the estimation of CS. The efficiency of the derived equations was compared using the same training and testing data sets. The accuracy of the ML methods was assessed through statistical indices, graphical diagrams, Taylor plots, objective function parameters, and uncertainty methods. Furthermore, sensitivity analysis methods and a parametric study were conducted to investigate the influence of each input parameter on CS. The following main findings can be highlighted in this study.

1) The results show that the SVCM model outperformed all other models in both the training and testing stages. It had the highest R² (0.965 in training and 0.947 in testing) and the lowest RMSE (2.418 in training and 2.889 in testing) and MAE (1.853 in training and 2.233 in testing). Further analysis using the Taylor plot, OBJ, and U95 values confirmed that the SVCM model provided the closest estimates to the experimental data. It had the lowest OBJ value (2.310) and the lowest uncertainty (U95 = 1.884), indicating reliable predictions with minimal uncertainty. These metrics indicate excellent predictive performance.

2) The assessment of the ML models indicated that SVCM and HCVCM + GEP were more accurate than other models, followed by the GEP, SVCM + GEP, MARS, GMDH, and HCVCM methods.

3) Ala HCVCM, SVCM, and GEP produce new variables to fit the netter to experimental values, but SVCM performs better than the other models, because it generates the new variables, which have a coefficient of determination for the output that is more than 0.5 and have a lower correlation than the original variables. In other words, the SVCM can generate more efficient variables.

4) The white box models present equations that can be used in the design books and by engineers in their design and analyses of CS.

5) In contrast to the suggested black-box models, such as ANN, the proposed approaches in this work gave explicit expressions that made it easy to see a clear and understandable relationship between CS and its associated predictor variables.

6) The sensitivity analysis, including SHAP methods and LOO analysis, indicated that the C_time and

γ d

parameters were the most effective variables on the CS.

7) Parametric study confirmed that the results of the best predictive model align with experimental observations.

8) This work suggests that data-driven methods can be used as alternative approaches to traditional methods for estimating the compressive strength of HSC. The study also recommends further investigations of the applications of proposed data-driven methods to other types of soil and construction materials.

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Shi T , Li K M , Wang C Z , Jin Z , Hao X K , Sun P , Han Y X , Pan C G , Fu N , Wang H B . Fracture toughness of recycled carbon fibers reinforced cement mortar and its environmental impact assessment. Case Studies in Construction Materials, 2025, 22: e04866

[2]	Chen Y , Sha A , Jiang W , Lu Q , Du P , Hu K , Li C . Eco-friendly bismuth vanadate/iron oxide yellow composite heat-reflective coating for sustainable pavement: Urban heat island mitigation. Construction and Building Materials, 2025, 470: 140645

[3]	24 Construction Waste Statistics & Tips to Reduce Landfill Debris. 2022 (available at the website of BigRentz)

[4]	Min F , Ma J , Zhang N , Song H , Du J , Wang D . Experimental study on lime-treated waste soil based on water transfer mechanism. KSCE Journal of Civil Engineering, 2021, 25(5): 1645–1652

[5]	Chiang K , Chien K , Hwang S . Study on the characteristics of building bricks produced from reservoir sediment. Journal of Hazardous Materials, 2008, 159(2–3): 499–504

[6]	Zhang L . Production of bricks from waste materials—A review. Construction and Building Materials, 2013, 47: 643–655

[7]	IshidaE H. Torestore potentialities of soil-development of hydrothermally solidified soil: Earth ceramics. In: Proceedings of the 4th International Conference Ecomaterials, Tokyo: Society of Non-traditional Technology, 1999, 7–10

[8]	Deng Y , Xu C , Marsheal F , Geng X , Chen Y , Sun H . Constituent effect on mechanical performance of crushed demolished construction waste/silt mixture. Construction and Building Materials, 2021, 294: 123567

[9]	Zhang J , Xu Q , Wang H , Li S . Preparation of hydrothermally solidified materials from waste cathode ray tube panel glass for construction applications. Environmental Science and Pollution Research International, 2022, 29(38): 57516–57522

[10]	Xue Y , Liu X . Detoxification, solidification and recycling of municipal solid waste incineration fly ash: A review. Chemical Engineering Journal, 2021, 420: 130349

[11]	Yang W , Cao X , Zhang Q , Ma R , Fang L , Liu S . Coupled microwave hydrothermal dechlorination and geopolymer preparation for the solidification/stabilization of heavy metals and chlorine in municipal solid waste incineration fly ash. Science of the Total Environment, 2022, 853: 158563

[12]	Zhang Z , Wang Y , Zhang Y , Shen B , Ma J , Liu L . Stabilization of heavy metals in municipal solid waste incineration fly ash via hydrothermal treatment with coal fly ash. Waste Management, 2022, 144: 285–293

[13]	Rungchet A , Chindaprasirt P , Wansom S , Pimraksa K . Hydrothermal synthesis of calcium sulfoaluminate–belite cement from industrial waste materials. Journal of Cleaner Production, 2016, 115: 273–283

[14]	Zhou L , Jing Z , Zhang Y , Wu K , Ishida E H . Stability, hardening and porosity evolution during hydrothermal solidification of sepiolite clay. Applied Clay Science, 2012, 69: 30–36

[15]	Lin M , Chen G , Chen Y , Han D , Xu J . Hydrothermal solidification of alkali-activated clay-slaked lime mixtures. Construction and Building Materials, 2022, 325: 126660

[16]	Chen G , Lin M , Chen Y , Kong G , Geng Z . Alkali-reinforced hydrothermal solidification of waste soil. Materials Chemistry and Physics, 2022, 289: 126505

[17]	Lin M , Su R , Chen G , Chen Y , Ye Z , Hu N . Compressive strength prediction of hydrothermally solidified clay with different machine learning techniques. Journal of Cleaner Production, 2023, 413: 137541

[18]	Ahmadi S , Ghasemzadeh H , Changizi F . Effects of thermal cycles on microstructural and functional properties of nano treated clayey soil. Engineering Geology, 2021, 280: 105929

[19]	Ostovar A , Davari D D , Dzikuć M . Determinants of design with multilayer perceptron neural networks: A comparison with logistic regression. Sustainability, 2025, 17(6): 2611

[20]	PourM AGhiasiM BKarkehabadiA. Applying Machine Learning Tools for Urban Resilience Against Floods. In 2025 Fifth International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT). Piscataway, NJ: IEEE, 2025

[21]	Tian A , Zhang W , Hei J , Hua Y , Liu X , Wang J , Gao R . Resistance reduction method for building transmission and distribution systems based on an improved random forest model: A tee case study. Building and Environment, 2025, 282: 113256

[22]	Shishegaran A , Varaee H , Rabczuk T , Shishegaran G . High correlated variables creator machine: Prediction of the compressive strength of concrete. Computers and Structures, 2021, 247: 106479

[23]	Shishegaran A , Saeedi M , Mirvalad S , Korayem A H . Computational predictions for estimating the performance of flexural and compressive strength of epoxy resin-based artificial stones. Engineering with Computers, 2023, 39(1): 347–372

[24]	Abdalla A A , Salih Mohammed A . Theoretical models to evaluate the effect of SiO₂ and CaO contents on the long-term compressive strength of cement mortar modified with cement kiln dust (CKD). Archives of Civil and Mechanical Engineering, 2022, 22(3): 105

[25]	Aslam F , Farooq F , Amin M N , Khan K , Waheed A , Akbar A , Javed M F , Alyousef R , Alabdulijabbar H . Applications of gene expression programming for estimating compressive strength of high-strength concrete. Advances in Civil Engineering, 2020, 2020(1): 1–23

[26]	Kaloop M R , Kumar D , Samui P , Hu J W , Kim D . Compressive strength prediction of high-performance concrete using gradient tree boosting machine. Construction and Building Materials, 2020, 264: 120198

[27]	Lan H , Zhang Y , Cheng M , Li Y , Jing Z . An intelligent humidity regulation material hydrothermally synthesized from ceramic waste. Journal of Building Engineering, 2021, 40: 102336

[28]

Samaniego E , Anitescu C , Goswami S , Nguyen-Thanh V M , Guo H , Hamdia K , Zhuang X , Rabczuk T . An energy approach to the solution of partial differential equations in computational mechanics via machine learning: Concepts, implementation and applications. Computer Methods in Applied Mechanics and Engineering, 2020, 362: 112790

[29]	Eshaghi M S , Anitescu C , Thombre M , Wang Y , Zhuang X , Rabczuk T . Variational physics-informed neural operator (VINO) for solving partial differential equations. Computer Methods in Applied Mechanics and Engineering, 2025, 437: 117785

[30]

Wang Y , Sun J , Bai J , Anitescu C , Eshaghi M S , Zhuang X , Rabczuk T , Liu Y . Kolmogorov–Arnold-Informed neural network: A physics-informed deep learning framework for solving forward and inverse problems based on Kolmogorov–Arnold Networks. Computer Methods in Applied Mechanics and Engineering, 2025, 433: 117518

[31]	Niu Y , Wang W , Su Y , Jia F , Long X . Plastic damage prediction of concrete under compression based on deep learning. Acta Mechanica, 2024, 235(1): 255–266

[32]	Long X , Li H , Iyela P M , Kang S B . Predicting the bond stress–slip behavior of steel reinforcement in concrete under static and dynamic loadings by finite element, deep learning and analytical methods. Engineering Failure Analysis, 2024, 161: 108312

[33]	Dezfuli H T , Ghanizadeh A R . Prediction of compressive and tensile strength of clayey subgrade soil stabilized with portland cement and iron ore mine tailing using computational intelligence methods. Civil Infrastructure Researches, 2020, 6(1): 73–88

[34]	Ghanizadeh A R , Heidarabadizadeh N , Bayat M , Khalifeh V . Modeling of unconfined compressive strength and Young’s modulus of lime and cement stabilized clayey subgrade soil using Evolutionary Polynomial Regression (EPR). International Journal of Mining and Geo-Engineering, 2022, 56(3): 257–269

[35]	Ghanizadeh A R , Naseralavi S S . Intelligent prediction of unconfined compressive strength and Young’s modulus of lean clay stabilized with iron ore mine tailings and hydrated lime using gaussian process regression. Journal of Soft Computing in Civil Engineering, 2023, 7(4): 1–23

[36]	Ghanizadeh A R , Safi Jahanshahi F . Intelligent modeling of unconfined compressive strength of stabilized clay soil using gene expression programming. Road, 2024, 32(119): 137–156

[37]	Ghanizadeh A R , Bayat M , Tavana Amlashi A , Rahrovan M . Prediction of unconfined compressive strength of clay subgrade soil stabilized with Portland cement and lime using Group Method of Data Handling (GMDH). Journal of Transportation Infrastructure Engineering, 2019, 5(1): 77–96

[38]	Ghanizadeh A R , Safi Jahanshahi F , Ziayi A . Presenting a model for predicting CBR and UCS of expensive soil stabilized with hydrated lime activated with rice husk ash using the hybrid MARS-EBS method. Road, 2025, 33(122): 45–66

[39]	Safi Jahanshahi F , Ghanizadeh A R . Machine learning approaches for resilient modulus modeling of cement-stabilized magnetite and hematite iron ore tailings. Scientific Reports, 2025, 15(1): 4950

[40]	Zheng Y , Zhu T , Chen J , Shan K , Li J . Relationship between pore-size distribution and 1D compressibility of different reconstituted clays based on fractal theory. Fractal and Fractional, 2025, 9(4): 235

[41]	Torabi M , Sarkardeh H , Mirhosseini S M , Samadi M . Effect of water temperature and soil type on infiltration. Geomechanics and Engineering, 2023, 32(4): 445–452

[42]	Ahmadi S , Ghasemzadeh H , Changizi F . Effects of A low-carbon emission additive on mechanical properties of fine-grained soil under freeze-thaw cycles. Journal of Cleaner Production, 2021, 304: 127157

[43]	Babu N N , Ohdar R K , Pushp P T . Evaluation of green compressive strength of clay bonded moulding sand mix: neural network and neuro-fuzzy based approaches. International Journal of Cast Metals Research, 2006, 19(2): 110–115

[44]	Karunakar D B , Datta G L . Controlling green sand mould properties using artificial neural networks and genetic algorithms-A comparison. Applied Clay Science, 2007, 37(1–2): 58–66

[45]	Onyelowe K C , Jalal F E , Onyia M E , Onuoha I C , Alaneme G U . Application of gene expression programming to evaluate strength characteristics of hydrated-lime-activated rice husk ash-treated expansive soil. Applied Computational Intelligence and Soft Computing, 2021, 2021: 1–17

[46]	Jeremiah J J , Abbey S J , Booth C A , Kashyap A . Results of application of artificial neural networks in predicting geo-mechanical properties of stabilised clays-a review. Geotechnics., 2021, 1(1): 147–171

[47]	Jalal F E , Xu Y , Iqbal M , Javed M F , Jamhiri B . Predictive modeling of swell-strength of expansive soils using artificial intelligence approaches: ANN, ANFIS and GEP. Journal of Environmental Management, 2021, 289: 112420

[48]	Onyelowe K C , Iqbal M , Jalal F E , Onyia M E , Onuoha I C . Application of 3-algorithm ANN programming to predict the strength performance of hydrated-lime-activated rice husk ash-treated soil. Multiscale and Multidisciplinary Modeling Experiments and Design, 2021, 4(4): 259–274

[49]

Iqbal M , Onyelowe K C , Jalal F E . Smart computing models of California bearing ratio, unconfined compressive strength, and resistance value of activated ash-modified soft clay soil with adaptive neuro-fuzzy inference system and ensemble random forest regression techniques. Multiscale and Multidisciplinary Modeling Experiments and Design, 2021, 4(3): 207–225

[50]	FerreiraC. Gene Expression Programming: Mathematical Modeling by an Artificial Intelligence. 1st ed. London: Springer, 2006

[51]	Samadi M , Sarkardeh H , Jabbari E . Prediction of the dynamic pressure distribution in hydraulic structures using soft computing methods. Soft Computing, 2021, 25(5): 3873–3888

[52]	Torabi M , Sarkardeh H , Mirhosseini S M . Prediction of soil permeability coefficient using the GEP approach. Numerical Methods in Civil Engineering, 2022, 7(1): 9–15

[53]	Friedman J H . Multivariate adaptive regression splines. Annals of Statistics, 1991, 19(1): 1–67

[54]	Samadi M , Jabbari E , Azamathulla H M , Mojallal M . Estimation of scour depth below free overfall spillways using multivariate adaptive regression splines and artificial neural networks. Engineering Applications of Computational Fluid Mechanics, 2015, 9(1): 291–300

[55]	Ghasemi M , Samadi M , Soleimanian E , Chau K W . A comparative study of black-box and white-box data-driven methods to predict landfill leachate permeability. Environmental Monitoring and Assessment, 2023, 195(7): 862

[56]	Torabi M , Sarkardeh H , Mirhosseini S M . Estimating the permeability coefficient of soil using CART and GMDH approaches. Water Science and Technology: Water Supply, 2022, 22(8): 6756–6764

[57]	Shipra E H , Rahaman M S , Ara T , Ullah S M . A machine learning approach to forecast wind speed based on geographical location in Bangladesh. International Journal of Sustainable Energy and Environmental Research, 2024, 13(2): 83–94

[58]	Azma A , Borthwick A G , Ahmadian R , Liu Y , Zhang D . Modeling the discharge coefficient of labyrinth sluice gates using hybrid support vector regression and metaheuristic algorithms. Physics of Fluids, 2025, 37(4): 045117

[59]	Samadi M , Sarkardeh H , Jabbari E . Explicit data-driven models for prediction of pressure fluctuations occur during turbulent flows on sloping channels. Stochastic Environmental Research and Risk Assessment, 2020, 34(5): 691–707

[60]	Saberi-Movahed F , Najafzadeh M , Mehrpooya A . Receiving more accurate predictions for longitudinal dispersion coefficients in water pipelines: training group method of data handling using extreme learning machine conceptions. Water Resources Management, 2020, 34(2): 529–561

[61]	Azma A , Liu Y , Eftekhari M , Zhang D . Comparison of hybrid deep learning models for estimation of the time-dependent scour depth downstream of river training structures. Physics of Fluids, 2024, 36(10): 101911

[62]	Ghanizadeh A R , Firouzranjbar S , Amlashi A T , Eid E , Dessouky S . Novel integration of forensic-based investigation optimization algorithm and ensemble learning for estimating hydraulic conductivity of coarse-grained road materials. Transportation Geotechnics, 2025, 54: 101624

[63]	Samadi M , Afshar M H , Jabbari E , Sarkardeh H . Prediction of current-induced scour depth around pile groups using MARS, CART, and ANN approaches. Marine Georesources and Geotechnology, 2021, 39(5): 577–588

RIGHTS & PERMISSIONS

Higher Education Press

AI Summary AI Mindmap

PDF (5781KB)

653

Accesses

Citation

Detail

Sections

Recommended

About the journal

Aims & scope

Description

Editorial board

Contact us

Latest issue

Just accepted

Collections

Authors & reviewers

Online submisson

Call for papers

Guidelines for authors

Abstract

Graphical abstract

Keywords

Cite this article

1 Introduction

2 Methodology

2.1 Data set and effective parameters

2.2 High-correlated variables creator machine

2.3 Stronger variable creator machine

2.4 Gene expression programming

2.5 Multivariate adaptive regression splines model

2.6 Group method of data handling

2.7 Ensemble models (SVCM + GEP and HCVCM + GEP)

3 Results

3.1 Performance assessment criteria

4 Discussions

4.1 Comparison between the results of data-driven models

5 Sensitivity analysis

5.1 SHapley Additive exPlanations analysis

5.2 Leave-One-Out sensitivity analysis method

6 Parametric study

7 Summary and conclusions

References

RIGHTS & PERMISSIONS

About the journal

Authors & reviewers

Abstract

Graphical abstract

Keywords

Cite this article

1 Introduction

2 Methodology

2.1 Data set and effective parameters

2.2 High-correlated variables creator machine

2.3 Stronger variable creator machine

2.4 Gene expression programming

2.5 Multivariate adaptive regression splines model

2.6 Group method of data handling

2.7 Ensemble models (SVCM + GEP and HCVCM + GEP)

3 Results

3.1 Performance assessment criteria

4 Discussions

4.1 Comparison between the results of data-driven models

5 Sensitivity analysis

5.1 SHapley Additive exPlanations analysis

5.2 Leave-One-Out sensitivity analysis method

6 Parametric study

7 Summary and conclusions

References

RIGHTS & PERMISSIONS

AI思维导图