Compressive strength prediction and optimization design of sustainable concrete based on squirrel search algorithm-extreme gradient boosting technique

Enming LI; Ning ZHANG; Bin XI; Jian ZHOU; Xiaofeng GAO

doi:10.1007/s11709-023-0997-3

Front. Struct. Civ. Eng. ›› 2023, Vol. 17 ›› Issue (9) :1310 -1325. DOI: 10.1007/s11709-023-0997-3

RESEARCH ARTICLE

Compressive strength prediction and optimization design of sustainable concrete based on squirrel search algorithm-extreme gradient boosting technique

Author information +

History +

PDF (10727KB)

Abstract

Concrete is the most commonly used construction material. However, its production leads to high carbon dioxide (CO₂) emissions and energy consumption. Therefore, developing waste-substitutable concrete components is necessary. Improving the sustainability and greenness of concrete is the focus of this research. In this regard, 899 data points were collected from existing studies where cement, slag, fly ash, superplasticizer, coarse aggregate, and fine aggregate were considered potential influential factors. The complex relationship between influential factors and concrete compressive strength makes the prediction and estimation of compressive strength difficult. Instead of the traditional compressive strength test, this study combines five novel metaheuristic algorithms with extreme gradient boosting (XGB) to predict the compressive strength of green concrete based on fly ash and blast furnace slag. The intelligent prediction models were assessed using the root mean square error (RMSE), coefficient of determination (R²), mean absolute error (MAE), and variance accounted for (VAF). The results indicated that the squirrel search algorithm-extreme gradient boosting (SSA-XGB) yielded the best overall prediction performance with R² values of 0.9930 and 0.9576, VAF values of 99.30 and 95.79, MAE values of 0.52 and 2.50, RMSE of 1.34 and 3.31 for the training and testing sets, respectively. The remaining five prediction methods yield promising results. Therefore, the developed hybrid XGB model can be introduced as an accurate and fast technique for the performance prediction of green concrete. Finally, the developed SSA-XGB considered the effects of all the input factors on the compressive strength. The ability of the model to predict the performance of concrete with unknown proportions can play a significant role in accelerating the development and application of sustainable concrete and furthering a sustainable economy.

Graphical abstract

Keywords

sustainable concrete / fly ash / slay / extreme gradient boosting technique / squirrel search algorithm / parametric analysis

Cite this article

Download citation ▾

Enming LI, Ning ZHANG, Bin XI, Jian ZHOU, Xiaofeng GAO. Compressive strength prediction and optimization design of sustainable concrete based on squirrel search algorithm-extreme gradient boosting technique. Front. Struct. Civ. Eng., 2023, 17(9): 1310-1325 DOI:10.1007/s11709-023-0997-3

登录浏览全文

4963

注册一个新账户忘记密码

1 Introduction

Concrete is the most commonly used construction material after water and is widely used in various engineering projects owing to its excellent mechanical properties and durability. Ordinary Portland Cement (OPC) is the most important cementitious material in concrete, with its production accompanied by high energy consumption and carbon emissions. According to statistics, industries related to cement production account for approximately 4%–6% of global CO₂ emissions and consume 3% of the world’s energy [1]. In the past, engineers overemphasized the robustness of concrete and ignored the environmental burden of its production. Developing greener and lower-carbon construction materials has recently become a key consideration in energy conservation and sustainable production [2,3]. Consequently, various studies on green concrete have rapidly developed over the last decade [4–6]. Different strategies have been proposed to produce low-carbon and low-energy concrete, including replacing natural aggregates with recycled aggregates [6] or introducing industrial waste into concrete [7]. Using supplementary cementitious materials (SCMs) instead of OPC is the most widely discussed method and has been successfully applied to various engineering infrastructures. SCMs are a series of materials with volcanic ash properties, such as silica fume (SF), fly ash (FA), blast furnace slag (BS), and steel slag, most originating from industrial waste; therefore, their application in concrete can also provide a substitute solution for effectively disposing of such waste. Among these wastes, FA and BS are the two most frequently used materials for preparing sustainable concrete (SC) [1,7,8].

FA is a major waste product of thermal power production. In the past, much FA was released into the air, leading to serious environmental disasters. However, since 1985, FA added to concrete has resulted in improved late strength and workability of concrete as well as reduced carbon footprint of concrete [8]. Sun et al. [9] found that replacing cement with a 40%–70% FA mass ratio resulted in lower early compressive strengths of concrete. The high-volume FA (HVFA) concrete showed 180-d compressive strength comparable to that of the references. Kara de Maeijer et al. [10] compared the effects of two types of FA, one with particle size d 90 < 9.3 μm (FA1) and the second with d 90 < 4.6 μm (FA2), on the properties of concrete. Increasing the fineness of FA contributes to the processability of the mixture. Replacing cement with ultrafine FA (FA2) can improve the resistance of concrete to penetration.

BS, a byproduct collected during the production of iron in blast furnaces, is a common SCM used to replace OPC in concrete. Since the first commercial slag-based cement was introduced in 1865, over 200 million tons of slag cement have been utilized worldwide annually [1]. Concrete with slag has a low heat of hydration and high late strength and durability [11].

Guo et al. [12] believed that the addition of BS weakened the brittleness of concrete, which helped improve its fracture energy. Moreover, the potential hydration of BS extends the fatigue life of concrete. BS is used in ultra-high-performance concrete (UHPC), and the hardenability of BS-based UHPC can be increased by 40% and 60% under standard water curing and high-temperature curing conditions, respectively. The addition of high-volume BS resulted in a denser UHPC microstructure. Afroughsabet et al. [13] concluded that incorporating FA as a fine aggregate (FAG) in concrete can effectively improve its strength and reduce shrinkage and creep. The compressive strength of the concrete with 30% sand replaced by FA increased by 45% and that of the concrete with 30% BS replaced by sand increased by 112%. The concrete with 10% FA exhibited the lowest drying shrinkage. In addition, the CO₂ emissions of the concrete containing 30% BS substituted sand were approximately 50% of the control concrete compared to the control concrete.

The properties of concrete are influenced by several factors, including the proportion of alternative cement and aggregates, curing time, and water binding ratio. The traditional approach involves clarifying the effects of different factors on the mechanical properties of concrete through experimental trials and errors. However, this approach is often time-consuming and involves high experimental costs. Therefore, it is important to establish reliable models to predict the relationship between mechanical properties and material proportions, which also promotes the large-scale application of green concrete and the development of a low-carbon economy.

Machine learning is currently used to predict mold construction for complex problems, with this approach shown to achieve a remarkably high level of accuracy. Moreover, machine learning has been adopted to predict the various properties of concrete and can achieve very high accuracy rates [14–18]. Feng et al. [14] used the AdaBoost algorithm to predict the compressive strength of concrete. The input data considered different components of the mix and curing times. The best model achieved an average accuracy of > 98%. Song et al. [15] fabricated concrete cylinders with several mix ratios and tested their compressive strengths at different ages. A total of 98 data sets were collected. Gene expression programming (GEP), artificial neural network (ANN), and decision tree (DT) algorithms have been investigated for the prediction of compressive strength. In addition, the bagging algorithm predicted the results with a high coefficient of determinnation (R²) value of 0.95 higher than those of the other models. Kang et al. [16] developed intelligent strength prediction models for steel using boosting- and tree-based models. Shariati et al. [17] proposed intelligent prediction models for the compressive strength of concrete containing BS and FA and found that ANN-GE presented a better prediction performance than the conventional backpropagation neural network. Chopra et al. [18] applied an ANN and GEP to predict the compressive strength of concrete. Concrete data were obtained based on laboratory experiments with curing times of 28–91 d. The highest prediction performance was achieved using the Levenberg–Marquardt algorithm, with R² > 0.8.

Benign prediction outcomes can be achieved by utilizing machine-learning-based models; however, the potential of some advanced machine-learning techniques is still promising for providing more accurate prediction performance for concrete strength. In addition, most studies only focused on compressive strength prediction and did not provide a reasonable concrete optimization design. This study proposes hybrid extreme gradient boosting (XGB) models to predict the compressive strength of SC based on FA and BS. A total of six optimization algorithms, including five novel meta-heuristic optimization algorithms—chimp optimization algorithm (CHOA), jellyfish search algorithm (JSA), golden jackal optimization (GJO), sand cat swarm optimization (SCSO), and sparrow search algorithm (SSA)—and a classical optimization algorithm, i.e., a genetic algorithm (GA), were employed to assist in tuning the 10-fold cross-validation and enhance the robustness of the prediction model for the concrete compressive strength. The prediction models are evaluated using classical mathematical indicators and Taylor diagrams. Finally, the prediction model with the best overall prediction performance was utilized to provide a concrete optimization scenario.

2 Materials

2.1 Data description and analysis

To develop SC compressive strength prediction models, a concrete data set collected from the published literature was investigated [19–21]. This data set consists of 899 data points, where cement (C), water (W), FA, BS, FAG, coarse aggregate (CAG), superplasticizer (SP), and age (A) were considered as potential influential factors on the compressive strength. To demonstrate the data distribution and characteristics, Tab.1 provides the maximum value (Max), minimum value (Min), mean value (Mean), standard deviation (Std), 10% percentile value (10%), 25% percentile value (25%), 50% percentile value (50%), 75% percentile value (75%) and 90% percentile value (90%) of each influential factor, and compressive strength. In addition, a violin plot was constructed to present the distribution of each variable visually, as shown in Fig.1. In the violin plot, the thick black bar in the middle represents the quartile range, the extended thin black line denotes the 95% confidence interval, and the white points represent the median value. The width of the violin plot reflects the data density.

To measure the correlation between the two variables with values ranging from −1 to +1, the three major correlation approaches including Pearson, Spearman and Kendall were employed. Zero indicates that the two variables are not correlated, positive values indicate a positive correlation, and negative values indicate a negative correlation. A larger absolute value indicates a stronger correlation. As shown in Fig.2, FA, W, CAG, and FAG showed negative correlations according to the three correlation coefficients. According to the three correlation approaches, the most correlated factors were C, A, and SP. The BS, FA, and FAG were considered slightly correlated. However, BS and FA are indispensable additives in SC. Therefore, they can also be used as inputs for the prediction models of the SC compressive strength, and the strength is used as the only output.

Mutual information (MI) was also calculated to measure the mutual dependence between the two variables, which indicates the amount of information communicated [22]. As shown in Fig. A in the Supplementary Materials, the MI results are listed using the 5 nearest-neighbors method. It can be found that C and A have stronger MI with predicted variable, compressive strength. The FA has the least MI. To further reflect the correlation between different variables, the distance correlation was calculated, which can solve the deficiency of Pearson’s linear correlation. It can be found that all variables presented positive nonlinear correlations as shown in Fig. B in the Supplementary Materials. C and A still exhibited a stronger correlation with compressive strength than with the other influencing factors.

To identify and compare the importance of various factors in compressive strength, the cosine amplitude method proposed by Yang and Zhang [23] was utilized for the sensitivity analysis. The results of the sensitivity analysis are shown in Fig.3, where C, BS, and SF are the most important factors influencing the compressive strength of the concrete. This finding is consistent with the fact that the source of the compressive strength in concrete is primarily the hydration of cementitious materials, which continuously generates C-S-H gels and a dense matrix. Therefore, the results of the sensitivity analysis provide a good explanation of the current practical research phenomena.

Based on the aforementioned analysis, it can be concluded that all potential factors have some influence on compressive strength; therefore, eight influencing factors were used as inputs, that is, C, BS, FA, W, SP, CAG, FAG, A, and compressive strength were used as the outputs.

3 Methodology

During the last few years, XGB presented powerful fitting abilities over complicated nonlinear relationships in various fields [24,25]. It has a prominent ability to handle large-scale data and low hardware resource requirements. In addition, its strong robustness inspires its utilization to develop intelligent prediction models of SC compressive strength. However, some significant parameters in XGB must be tuned carefully to produce more robust prediction models, such as the learning rate, max_depth, and Num_boosting_rounds. Regarding this, five novel meta-heuristic optimization algorithms, i.e., CHOA, JSA, GJO, SCSO and SSA were employed to assist to tune the parameters in XGB were employed in this part to assist to select the optimal parameters in XGB. In the following section, a brief introduction to XGB and the five meta-heuristic optimization algorithms is provided.

3.1 Extreme gradient boosting

Improved performance of a machine learning system based on Friedman Gradient Boosting. XGB was developed by Chen and Guestrin [26]. The Classification and Regression Tree (CART) is the essential model that is used in XGB. The model generates a weak learner at each step and accumulates this weak learner in the entire model.

When using the XGB algorithm, there exists a data set D containing n samples and m features, which is formulated as follows:

(1)

D = {(x i, y i) : i = 1, . . . n, x i ∈ R m, y i ∈ R} .

The predictor variable

y^i

is an ensemble tree model represented by Eqs. (2) and (3):

(2)

y^i = ϕ (x i) = ∑ k = 1 K f k (x i), f k ∈ F,

(3)

F = {f (x) = w s (x)} (s : R m → T, W s ∈ R T),

where

f k (x i)

represents the score of the k-th tree for the i-th observation in the data; F denotes the set of a regression tree which is the tree structure parameter of s.

Therefore, it is necessary to evaluate the operational efficiency of the algorithm. Thus, XGB sets up an objective function consisting of two different components representing the deviations of the model and a regularization term to prevent overfitting. The loss function and regularization term are as follows:

(4)

L (ϕ) = ∑ l n l (y i, ˙ ˙ ˙ y^i) + ∑ k Ω (f k),

(5)

Ω (f k) = γ T + 12 λ | | w | | 2 .

In both formulas,

l

denotes a second-order derivable loss function that measures the difference between the actual

y i

and predicted values

y^i

Ω (f k)

represents the regularization term. The number of leaves of the tree is T and the weight of each leaf is w. Two parameters,

γ

and

λ

, are introduced to control the complexity of the tree.

An important feature of the model is that traditional optimization methods in Euclidean space cannot be applied to its parameters. As previously stated, the prediction of the model is the sum of the scores of each CART, and the predicted value of the ith instance at the tth iteration is introduced into the objective function as follows:

(6)

L (t) = ∑ i n l (y i, y^i t − 1 + f t (x i)) + Ω (f (t)),

where

f t

is conducive to minimizing the objective function. A more detailed characterization of XGB features and description can be found in Ref. [26].

3.2 Meta-heuristic optimization algorithms

3.2.1 Chimp optimization algorithm

The CHOA is a metaheuristic algorithm introduced by Khishe and Mosavi [27]. This algorithm simulates the hunting activities of the chimps. CHOA classifies chimps into four categories: attackers, barriers, chasers, and drivers. The attacker was the group leader. The other three classes help with hunting and are in declining order of rank. The mathematical equations for the updated position are given in Eqs. (7) and (8):

(7)

X 1 (t + 1) = X a t t a c k e r (t) − a 1 ⋅ d a t t a c k e r, X 2 (t + 1) = X b a r r i e r (t) − a 2 ⋅ d b a r r i e r, X 3 (t + 1) = X c h a s e r (t) − a 3 ⋅ d c h a s e r, X 4 (t + 1) = X d r i v e r (t) − a 4 ⋅ d d r i v e r,

(8)

X c h i m p 4 (t + 1) = X 1 + X 2 + X 3 + X 4 4,

where t denotes the number of current iterations and the location of the chimp is adjusted based on the mean value of the location of the four types of chips. Additional explanations for CHOA can be found in Ref. [27].

3.2.2 Jellyfish search algorithm

Chou and Truong [28] were inspired by the movement of jellyfish in the ocean to propose the JSA for 2021. When jellyfish swarms hunt for food in the ocean, their activities include moving with ocean currents or within the swarm, continuously shifting between these movements via a time control mechanism.

Jellyfish in the swarm move through both passive and active movements. Passive motion means that the jellyfish travel around their locations, as shown in Eq. (9). Active motion is described in Eq. (10).

(9)

X i (t + 1) = X i (t) + r a n d (0, 1) ∗ δ ∗ (H b − L b),

(10)

X i (t + 1) = X i (t) + r a n d (0, 1) ∗ D .

This motion always occurs in the direction of the best food, as expressed in Eq. (11).

(11)

D = {X i (t) − X j (t), if F F (X i) < F F (X j), X j (t) − X i (t), e l s e .

C(t) is employed to shift between the ocean current and the passive and active motions, which can be represented by Eq. (12).

(12)

C (t) = (1 − t T max) ∗ (2 ∗ r a n d (0, 1) − 1) .

Each jellyfish continues to move to a better location within the swarm through active and passive motions. Simultaneously, shifts occurred between motions. Two control settings, pop and T_max, were used.

3.2.3 Golden jackal optimization

By simulating the collaborative hunting behavior of golden jackals, Chopra and Mohsin Ansari [29] proposed the GJO algorithm in 2021. The social hunting behavior of the golden jackal is divided into two distinct stages: the exploration stage, which requires finding, sensing, and tracking prey, and the exploitation stage, which requires encircling and attacking. A female jackal is accompanied by a male jackal. The new positions of the male and female jackals are identified using Eqs. (13) and (14).

(13)

P 1 (t) = P M (t) − E ⋅ | P M (t) − r l ⋅ P r e y (t) |,

(14)

P 2 (t) = P F M (t) − E ⋅ | P F M (t) − r l ⋅ P r e y (t) |,

where

P r e y (t)

represents the prey’s location vector at iteration t,

P M (t)

and

P F M (t)

denote the male and female jackals, respectively. The updated male and female jackal positions are

P 1 (t)

and

P 2 (t)

The jackal locations are adjusted by averaging

P 1 (t)

and

P 2 (t)

according to Eq. (15).

(15)

P (t + 1) = P 1 (t) + P 2 (t) 2 .

Exploitation stage: as the prey is pursued by jackals, the prey’s escape energy decreases, and the jackal pair surrounds the prey discovered in the exploration phase. They capture and consume their prey after encircling it. The following equations describe the actions of male and female jackals when hunting together:

(16)

P 1 (t) = P M (t) − E ⋅ | r l P M (t) − P r e y (t) |,

(17)

P 2 (t) = P F M (t) − E ⋅ | r l P F M (t) − P r e y (t) | .

where

P 1 (t)

and

P 2 (t)

reflect the updated locations of the jackal pair while encircling and attacking the prey.

3.2.4 Sand cat swarm optimization

Seyyedabbasi and Kiani [30] drew inspiration from the foraging behavior of sand cats in the desert and established the SCSO. Sand cats locate their prey using low-frequency noise and exhibit rapid hunting behaviors. The advantage of this algorithm is the balance between the exploration and exploitation phases.

The general sensitivity range of sandcat to noise frequencies, which linearly decreasing from 2 to 0 kHz, is expressed by Eq. (18). The sensitivity range for each sandcat is expressed by Eq. (19):

(18)

r G → = s M − 2 × s M × i t e r c i t e r M a x + i t e r M a x,

(19)

r → = r G → × r a n d (0, 1) .

The decisive parameters for determining the exploration and exploitation phases are expressed in Eq. (20).

(20)

R → = 2 × r G → × r a n d (0, 1) − r G → .

The sand cat attacks its prey only when the R value is less than one; that is, it enters the exploitation phase. Otherwise, sandcat remained in the search phase. The prey search and attack mechanisms used by the sandcat population are mainly described by the position and movement of each sandcat in Eq. (21).

(21)

X → ⁡ (t + 1) = {X b → ⁡ (t) − r → ⋅ X r n d → ⋅ cos ⁡ θ, | R | < 1, r → ⋅ (X b → ⁡ (t) − rand (0, 1) ⋅ X b → ⁡ (t)), | R | ⩾ 1,

where

X → r n d

represents a random position between the best and current positions, preventing the generation of local optima (Eq. (22)):

(22)

X → r n d = | r a n d (0, 1) ⋅ X b → ⁡ (t) − X c → ⁡ (t) | .

where

X → b

indicates optimal hunting position,

X → c

indicates current location.

3.2.5 Sparrow search algorithm

Xue and Shen [31] established an SSA. The algorithm assumes a sparse population of N and determines the optimal solution in a region of j dimensions. According to the division of labor, three identities exist in the sparrow community: producers, scroungers, and scouts.

The location update of the producers is described in Eq. (23):

(23)

X i, j t + 1 = {X i, j t ⋅ exp ⁡ (− i t e r α ⋅ i t e r M a x), i f R 2 < S T, X i, j t + Q ⋅ L, i f R 2 ⩾ S T,

where

α

is a random number in the range 0 to 1, Q is a random number and satisfies a normal distribution, L is a unit vector,

R 2 < [0, 1]

and

S T < [0. 5, 1]

indicate warning and safety values, respectively. When the warning value is less than the safe value, the environment is safe and producers have access to a wide area to search for food. Once the warning value exceeds the safe value, predators are present and all sparrows need to be guided by the producer to another safe place.

The location update Eq. (24) for the scroungers is:

(24)

X i, j t + 1 = {Q ⋅ exp ⁡ (X w t − X i, j t i 2), i f i > n 2, X p t + 1 + | X i, j t − X p t + 1 | ⋅ A + ⋅ L, o t h e r w i s e,

where

X W

refers to the worst position in the sparrow population and

X P

represents an ideal position for the producer to be located. When

i > n 2

, the scroungers cannot obtain food and energy and must move to another location. A represents a 1 × d matrix in which each element is randomly assigned to 1 or −1.

The sparrows responsible for scouting occupy less than 20% of the population, and details of their exact location movements are available in the study by Xue and Shen [31].

3.2.6 Genetic algorithm

For comparison, a classical GA was also employed [32]. The GA mimics the natural selection process and genetic mechanisms of biological evolution. The GA begins by randomly generating an initial population of chromosomes, each of which comprises multiple genes that represent the specific characteristics of the data. These chromosomes are considered as potential solutions to a given problem. The population is then updated through a series of iterations utilizing three primary operators (selection, crossover, and mutation) to preserve the optimal chromosomes for the next generation. The crossover operator involves the exchange of segments of the genetic code between two selected individuals, leading to the creation of new individuals by combining their genes. In contrast, the mutation operator randomly alters the value of specific genes in an individual, maintaining genetic diversity within the population. The GA continually evaluates the generated solutions and compares them with a fitness function until a predetermined number of iterations or a predefined threshold is reached, resulting in the identification of the optimal solution.

3.3 Model verification and evaluation

Four evaluation indices were used to assess the accuracy of the model: R², variance accounted for (VAF), mean absolute error (MAE), and root mean square error (RMSE). Their equations are shown in Eqs. (25)–(28). These indices can be used to describe the differences between real and predicted values. The closer the values of R² and VAF are to 1, the more accurate the prediction. Smaller MAE and RMSE values indicate better model performance.

(25)

R 2 = 1 − ∑ i = 1 M t (y − y 2) 2 ∑ i = 1 M N (y − y ~) 2,

(26)

V A F = [1 − v a r (y − y ′) v a r (y)] × 100 %,

(27)

M A E = 100 % N N ∑ i = 1 N N | y ′ − y |,

(28)

R M S E = 1 N N ∑ i = 1 N (y ′ − y ~) 2,

where y is the measured compressive strength,

y ′

and

y ~

are the predicted and average compressive strength, NN is the total number of samples, and i is the current sample number.

4 Model development and discussion

4.1 Development of hybrid sustainable concrete compressive strength prediction models and discussion

To develop the XGB-based prediction models, the SC data set was randomly divided into training and testing sets to ensure the generalization of the developed SC compressive strength prediction models. Generally, a larger training set tends to cause overfitting, and a smaller training set cannot develop a model with strong generalization ability. The ratio of the training set to the testing set was set to be 80%/20% according to previous studies [33,34]. However, to enhance the generalization of the developed SC strength prediction model, a ratio of 70%/30% for the training and testing sets was used in this study. The general data distributions for the training and testing sets are shown in Fig. C in Supplementary Materials. It can be found that the training set and testing set have similar data distribution. On the one hand, it reflects that the whole SC data set is comparatively uniform. However, this guarantees that the developed prediction model is not influenced by outliers. The training set was used to establish the prediction network and a 10-fold cross-validation method was employed in the training set to verify the reliability of the training network [35]. Six optimization algorithms were employed to search for and optimize three significant parameters in the XGB-based models: the minimum child weight, eta, and number of trees. The other parameters were constantly limited by the workload. Before employing the meta-heuristic optimization algorithms, several parameters that influence the optimization efficiency need to be defined, and they were set according to the original references. The detailed values are listed in Table A in Supplementary Materials. To optimize the aforementioned three parameters in XGB, five hybrid XGB models were initialized. In the initialization process, some positions of the three parameters in XGB are determined according to the swarm size, searching the upper and lower bounds, where the swarm size signifies the number of positions of the three parameters. The searching upper and lower bounds determine the search ranges of the three parameters in XGB. In this study, the swarm size was set to 50. The search bounds were set to [0,10], [0,10], and [0,800] for the Min_child_weight, learning rate, and Num_tree, respectively, where Min_child_weight was used to avoid overfitting. When the value was large, the model avoided local fitting. However, if this value is too high, underfitting will occur. The learning rate was used to update the leaf node weights. Higher learning rates can induce divergent conditions, where Num_trees represent the number of trees. After obtaining 50 positions of the three parameters, the corresponding fitness value was obtained, and the best parameter position was selected. In this study, mean squared error was set as the fitness value. Subsequently, the parameter position was updated according to the different optimization algorithms until the iteration number was satisfied. The general principles of these five optimization scenarios, as well as the general process of this study, are shown in Fig.4. The three parameters optimized in XGB using the five optimization algorithms are listed in Table A in the Appendix.

The convergence line with iterations for different optimization methods is shown in Fig.5(a). At the beginning of the optimization process, the mean square error (MSE) from 10-fold cross validation was used as the initial fitness value, as shown in Fig.5(b). As the optimization proceeds, the fitness decreases; the final fitness values are shown in Fig.5(c). It can be found that different optimization scenarios presented different optimization speeds and fitness values and thus, caused different model performances. Four mathematical indicators were employed to evaluate the performance of the test sets, as listed in Tab.2: to measure the overall performance of each model, the comprehensive evaluation system proposed was utilized [24], where a higher performance was given a higher score, and the model with the highest cumulative score was considered the best prediction model. As a result, the six developed hybrid XGB models presented desirable prediction potential for SC compressive strength. By means of a systematic evaluation method, the performance of six hybrid XGB-based models can be ranked from high to low: JS-XGB (R²: 0.9937; VAF: 99.37; RMSE: 1.27; MAE: 0.40), SSA-XGB (R²: 0.9930; VAF: 99.30; RMSE: 1.34; MAE: 0.52), GJO-XGB (R²: 0.9927; VAF: 99.27; RMSE: 1.36; MAE: 0.53), GA-XGB (R²: 0.9919; VAF: 99.19; RMSE: 1.44; MAE: 0.62), CHOA-XGB (R²: 0.9916; VAF: 99.16; RMSE: 1.46; MAE: 0.69), SCSO-XGB (R²: 0.9889; VAF: 98.89; RMSE: 1.68; MAE: 0.91) for the training set and SCSO-XGB (R²: 0.9613; VAF: 96.21; RMSE: 2.48; MAE: 2.48), GA-XGB (R²: 0.9597; VAF: 95.96; RMSE: 3.25; MAE: 2.46), SSA-XGB (R²: 0.9576; VAF: 95.79; RMSE: 3.31; MAE: 2.50), GJO-XGB (R²: 0.9536; VAF: 95.46; RMSE: 3.45; MAE: 2.54), CHOA-XGB (R²: 0.9526; VAF: 95.36; RMSE: 3.49; MAE: 2.66), JS-XGB (R²: 0.9523; VAF: 95.33; RMSE: 3.51; MAE: 2.67) for the testing set. For the training set, JS-XGB exhibited the best performance, in which each evaluation indicator was the best among the six prediction models. However, for the testing set, JS-XGB exhibited the worst performance for each evaluation indicator. SCSO-XGB provided the best overall performance for the testing set. The best overall performance for the training and testing sets was obtained by SSA-XGB because it considers both the training and testing sets with R²: 0.9930; VAF: 99.30; RMSE: 1.34; MAE: 0.52 for the training set and R²: 0.9576; VAF: 95.79; RMSE: 3.31; MAE: 2.50 for the testing set. The cumulative scores for each prediction model are shown in Fig. D in Supplementary Materials. Comparisons of the predicted and original results using the six SC comprehensive strength prediction models are shown in Figs. E and F in the Appendix, respectively. In these figures, five fitting lines are depicted, i.e., y = x, y = 1.1x, y = 0.9x, y = 1.2x and y = 0.8x. For the training set, it can be observed that most predicted points are located in the range of y = 1.1x and y = 0.9x. Some predicted points locate between y = 1.1x and y = 1.2x or between y = 0.9x and y = 0.8x. Only one or two points are out of y = 1.2x. These results indicate that the developed training model fits the SC comprehensive strength very well. For the testing set, several predicted points were out of the range of y = 1.2x and y = 0.8x whereas most predicted points had a desirable fit with the original values, which indicates that the developed prediction models have strong generalization abilities for the new SC comprehensive strength data. To verify the superiority of the six proposed prediction models, some non-optimized classical machine learning methods were employed to predict the SC compressive strength: support vector regression (SVR), random forest (RF), ANN, and extreme learning machine (ELM). And state-of-the-art machine learning models, i.e., LightGBM and CatBoost were also evaluated [36]. The comparison results are shown in the Radar Chart in Fig.6 for the training and testing sets.

According to the comparison results from the Radar Chart, the hybrid XGB models presented better overall performance than non-optimized machine learning techniques, which reflects the effectiveness of the optimization algorithms and the powerful fitting ability of XGB. However, for classical machine learning techniques, SVR produced a benign modeling performance for the training set, and RF presented a competitive prediction performance for the testing set. In addition, LightGBM and CatBoost presented promising prediction abilities for both the training and testing sets. Therefore, the predictive potential of these advanced techniques should be investigated in future studies. The prediction performance of the non-optimized and hybrid XGB models was also examined using a Taylor Diagram for an all-round presentation, as shown in Fig.7. Three significant statistical indicators were employed in the Taylor Diagram: RMSE, Pearson correlation coefficients, and standard deviation. A smaller distance between the predicted point and reference point (black dot) signifies better prediction performance. It can be observed that the hybrid XGB models have a smaller distance than the non-optimized models. A similar phenomenon is presented as a Radar Chart, that is, RF, LightGBM, and CatBoost, which results in a desirable prediction performance. With the participation of optimization algorithms, RF, LightGBM, and CatBoost could also provide prominent prediction accuracy for the SC compressive strength. Finally, it can be concluded that SSA-XGB is recommended for predicting the SC compressive strength because of its excellent performance in both the training and testing sets.

4.2 sustainable concrete optimization design based on optimized extreme gradient boosting model

Appropriate design of the component ratio of the SC is key to improving the mechanical properties and reducing waste material consumption. Parametric analysis (PA) was conducted based on the SSA-XGB prediction model, which facilitated the determination of the effect of the input variables on the output variable. This helps to determine the influence of the input parameter on the output parameter, and the results can be employed for the preliminary study of SC mixture designs as a quick estimation of the compressive strength. This was accomplished by keeping the other variables constant at their average values, except for the investigated variables. Moreover, changes in the compressive strength occur when one input variable varies from its lowest to its highest value within a specified interval. The results are shown in Fig.8. a) An increase in C had a beneficial influence on the compressive strength because more cement particles participated in the hydration reaction to form CSH, thereby enhancing the strength. b) A BS of less than 300 kg/m³ has a positive effect on the compressive strength. The silica dioxide content of the BS was greater than 95% [3]. The silicon dioxide in the BS combined with the calcium hydroxide formed from cement hydration to generate extra CSH, which enhanced the compressive strength. Because of its dilution effect, the addition of an excessive amount of BS to concrete has a detrimental effect on its compressive strength [37]. c) For an FA content of 200 kg/m³, the compressive strength decreased sharply. The compressive strength increased significantly when the FA content was over 200 kg/m³, which may be due to the absence of data for the FA content in the 200 kg/m³ range. This should be expanded upon in future studies. An increase in FA has a positive effect on the fluidity and matrix density of the concrete. d) Because the hydration of cement requires the presence of water, an increase in W to 140 kg/m³ improves the compressive strength; however, the compressive strength decreases above this threshold. When the W is further increased, it may lead to more voids inside the concrete, reducing its compressive strength. e) The increase in SP had little effect on the compressive strength. SP generally affects the workability of concrete [38]. f) CAG can have a detrimental impact on the compressive strength, which is probably owing to the creation of excessively weak interfacial transition zones from the introduction of CAG [39]. The compressive strength of concrete is also closely related to the type and size of CAGs [40]. g) FAG at 600−650 kg/m³ had a positive effect on the compressive strength; however, further increases did not affect the compressive strength [41]. f) The compressive strength of SC increases the fastest at an early age owing to the rapid hydration reaction, whereas it also gains some growth at a late age owing to the volcanic ash effect of BS and FA [11].

5 Conclusions

In this study, an SC-related data set is extracted from an existing concrete data set. In this SC-related data set, eight significant influential factors were involved: cement quality, BS, FA, water, SP, CAG, FAG, and testing age. The SC compressive strength is the only prediction target. Six XGB-based regression methods were employed to systematically simulate the prediction model of the SC compressive strength, where XGB was used to establish the basic prediction framework where three XGB-related parameters needed to be tuned, and six optimization algorithms (CHOA, GJO, JS, SCSO, SSA, and GA) were applied to optimize these three parameters. As a result, SSA-XGB had the highest ranking score compared to the other five hybrid models, and it was given priority in predicting the SC compressive strength. To verify the superiority of the hybrid XGB models, six non-optimized machine-learning techniques were tested and compared: SVR, RF, ELM, ANN, LightGBM, and CatBoost. A Radar Chart and Taylor Diagram were used to compare the prediction results. The six developed hybrid prediction models can be considered to predict SC compressive strength cases that encounter similar conditions. The generalization of the developed models is expected to be enhanced with an increase in the data set size. In addition, these modeling methods can be applied to other types of concrete or concrete-related topics. Given the limited errors and acceptable prediction accuracies, the prediction model implemented in this study can be used as a first assessment of the SC compressive strength, thus saving raw materials, compressive tests, and curing time. Finally, the parametric analysis shows that W, C, and A have a direct effect on the compressive strength of the SC. Increases in the amounts of FA and BS also contributed to compressive strength. The increase in CAG decreased the strength, whereas FAG had little effect on the strength development. The developed SSA-XGB model was able to consider the effects of all the input factors on the compressive strength, and the results obtained were consistent with those of previous studies, indicating the reliability of the model. The ability of the model to predict the performance of concrete with unknown proportions can play a significant role in accelerating the development and application of SC and furthering a sustainable economy.

However, some shortcomings must be overcome in future studies. First, the potential of LightGBM and CatBoost should be investigated in view of their powerful prediction abilities; thus, more accurate prediction models for SC compressive strength should be developed. Other advanced regression techniques, such as k-nearest neighbors and DTs, will also be useful in the future. Second, this study was conducted based on only one random division of the training and testing sets, and the repeated cross-validation technique is worthwhile to investigate and thus provides a more convincing prediction result. Finally, additional potential factors influencing the compressive strength should be investigated and utilized as inputs to provide a more robust prediction model.

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Amran M, Murali G, Khalid N H A, Fediuk R, Ozbakkaloglu T, Lee Y H, Haruna S, Lee Y Y. Slag uses in making an ecofriendly and sustainable concrete: A review. Construction & Building Materials, 2021, 272: 121942

[2]	Li J, Xu G. Circular economy towards zero waste and decarbonization. Circular Economy, 2022, 1(1): 100002

[3]	Zhang N, Xi B, Li J, Liu L, Song G. Utilization of CO₂ into recycled construction materials: A systematic literature review. Journal of Material Cycles and Waste Management, 2022, 24(6): 2108–2125

[4]	Sellami A, Merzoud M, Amziane S. Improvement of mechanical properties of green concrete by treatment of the vegetals fibers. Construction & Building Materials, 2013, 47: 1117–1124

[5]	Xi B, Zhou Y, Yu K, Hu B, Huang X, Sui L, Xing F. Use of nano-SiO₂ to develop a high performance green lightweight engineered cementitious composites containing fly ash cenospheres. Journal of Cleaner Production, 2020, 262: 121274

[6]	Shi C, Li Y, Zhang J, Li W, Chong L, Xie Z. Performance enhancement of recycled concrete aggregate—A review. Journal of Cleaner Production, 2016, 112: 466–472

[7]	Zhou Y, Xi B, Sui L, Zheng S, Xing F, Li L. Development of high strain-hardening lightweight engineered cementitious composites: Design and performance. Cement and Concrete Composites, 2019, 104: 103370

[8]	Malhotra V M. Durability of concrete incorporating high-volume of low-calcium (ASTM Class F) fly ash. Cement and Concrete Composites, 1990, 12(4): 271–277

[9]	Sun J, Shen X, Tan G, Tanner J E. Compressive strength and hydration characteristics of high-volume fly ash concrete prepared from fly ash. Journal of Thermal Analysis and Calorimetry, 2019, 136(2): 565–580

[10]	Kara de Maeijer P, Craeye B, Snellings R, Kazemi-Kamyab H, Loots M, Janssens K, Nuyts G. Effect of ultra-fine fly ash on concrete performance and durability. Construction & Building Materials, 2020, 263: 120493

[11]	Samad S, Shah A. Role of binary cement including Supplementary Cementitious Material (SCM), in production of environmentally sustainable concrete: A critical review. International Journal of Sustainable Built Environment, 2017, 6(2): 663–674

[12]	Guo L P, Sun W, Zheng K R, Chen H J, Liu B. Study on the flexural fatigue performance and fractal mechanism of concrete with high proportions of ground granulated blast-furnace slag. Cement and Concrete Research, 2007, 37(2): 242–250

[13]	Afroughsabet V, Biolzi L, Ozbakkaloglu T. Influence of double hooked-end steel fibers and slag on mechanical and durability properties of high performance recycled aggregate concrete. Composite Structures, 2017, 181: 273–284

[14]	Feng D C, Liu Z T, Wang X D, Chen Y, Chang J Q, Wei D F, Jiang Z M. Machine learning-based compressive strength prediction for concrete: An adaptive boosting approach. Construction & Building Materials, 2020, 230: 117000

[15]	Song H, Ahmad A, Farooq F, Ostrowski K A, Maślak M, Czarnecki S, Aslam F. Predicting the compressive strength of concrete with fly ash admixture using machine learning algorithms. Construction & Building Materials, 2021, 308: 125021

[16]	Kang M C, Yoo D Y, Gupta R. Machine learning-based prediction for compressive and flexural strengths of steel fiber-reinforced concrete. Construction & Building Materials, 2021, 266: 121117

[17]	Shariati M, Mafipour M S, Mehrabi P, Ahmadi M, Wakil K, Trung N T, Toghroli A. Prediction of concrete strength in presence of furnace slag and fly ash using Hybrid ANN-GA (Artificial Neural Network-Genetic Algorithm). Smart Structures and Systems, 2020, 25(2): 183–195

[18]	Chopra P, Sharma R K, Kumar M. Prediction of compressive strength of concrete using artificial neural network and genetic programming. Advances in Materials Science and Engineering, 2016, 2016: 1–10

[19]	Dao D V, Adeli H, Ly H B, Le L M, Le V M, Le T T, Pham B T. A sensitivity and robustness analysis of GPR and ANN for high-performance concrete compressive strength prediction using a monte carlo simulation. Sustainability, 2020, 12(3): 830

[20]	Yeh I C. Modeling concrete strength with augment-neuron networks. Journal of Materials in Civil Engineering, 1998, 10(4): 263–268

[21]	Yeh I C. Design of high-performance concrete mixture using neural networks and nonlinear programming. Journal of Computing in Civil Engineering, 1999, 13(1): 36–42

[22]	Paninski L. Estimation of entropy and mutual information. Neural Computation, 2003, 15(6): 1191–1253

[23]	Yang Y, Zhang Q. A hierarchical analysis for rock engineering using artificial neural networks. Rock Mechanics and Rock Engineering, 1997, 30(4): 207–222

[24]	Zhou J, Qiu Y, Armaghani D J, Zhang W, Li C, Zhu S, Tarinejad R. Predicting TBM penetration rate in hard rock condition: A comparative study among six XGB-based metaheuristic techniques. Geoscience Frontiers, 2021, 12(3): 101091

[25]	Duan J, Asteris P G, Nguyen H, Bui X N, Moayedi H. A novel artificial intelligence technique to predict compressive strength of recycled aggregate concrete using ICA-XGBoost model. Engineering with Computers, 2021, 37(4): 3329–3346

[26]	ChenTGuestrin C. Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2016, 785–794

[27]	Khishe M, Mosavi M R. Chimp optimization algorithm. Expert Systems with Applications, 2020, 149: 113338

[28]	Chou J S, Truong D N. A novel metaheuristic optimizer inspired by behavior of jellyfish in ocean. Applied Mathematics and Computation, 2021, 389: 125535

[29]	Chopra N, Mohsin Ansari M. Golden jackal optimization: A novel nature-inspired optimizer for engineering applications. Expert Systems with Applications, 2022, 198: 116924

[30]	SeyyedabbasiAKianiF. Sand Cat swarm optimization: A nature-inspired algorithm to solve global optimization problems. Engineering with Computers, 2022, 1–25

[31]	Xue J, Shen B. A novel swarm intelligence optimization approach: sparrow search algorithm. Systems Science & Control Engineering, 2020, 8(1): 22–34

[32]	LiSLiD. Artificial Intelligence for Materials Science. Cham: Springer, 2021, 115–131

[33]	Li E, Zhou J, Shi X, Jahed Armaghani D, Yu Z, Chen X, Huang P. Developing a hybrid model of salp swarm algorithm-based support vector machine to predict the strength of fiber-reinforced cemented paste backfill. Engineering with Computers, 2021, 37(4): 3519–3540

[34]	Li E, Yang F, Ren M, Zhang X, Zhou J, Khandelwal M. Prediction of blasting mean fragment size using support vector regression combined with five optimization algorithms. Journal of Rock Mechanics and Geotechnical Engineering, 2021, 13(6): 1380–1397

[35]	van der GaagMHoffmanTRemijsenM HijmanRde Haan Lvan MeijelBVanhartenPValmaggia LDehertMCuijpersA. The five-factor model of the Positive and Negative Syndrome Scale II: A ten-fold cross-validation of a revised model. Schizophrenia Research, 2006, 85(1−3): 280−287

[36]	Bentéjac C, Csörgő A, Martínez-Muñoz G. A comparative analysis of gradient boosting algorithms. Artificial Intelligence Review, 2021, 54(3): 1937–1967

[37]	Chidiac S E, Panesar D K. Evolution of mechanical properties of concrete containing ground granulated blast furnace slag and effects on the scaling resistance test at 28 days. Cement and Concrete Composites, 2008, 30(2): 63–71

[38]	Mardani-Aghabaglou A, Tuyan M, Yılmaz G, ArıözÖ, Ramyar K. Effect of different types of superplasticizer on fresh, rheological and strength properties of self-consolidating concrete. Construction & Building Materials, 2013, 47: 1020–1025

[39]	Rangaraju P R, Olek J, Diamond S. An investigation into the influence of inter-aggregate spacing and the extent of the ITZ on properties of Portland cement concretes. Cement and Concrete Research, 2010, 40(11): 1601–1608

[40]	Beshr H, Almusallam A, Maslehuddin M. Effect of coarse aggregate quality on the mechanical properties of high strength concrete. Construction & Building Materials, 2003, 17(2): 97–103

[41]	Kronlöf A. Effect of very fine aggregate on concrete strength. Materials and Structures, 1994, 27(1): 15–25

RIGHTS & PERMISSIONS

Higher Education Press

PDF (10727KB)

Part of a collection:

Supplementary files

FSC-23997-OF-EL_suppl_1

3214

Accesses

Citation

Detail

Sections

Recommended

About the journal

Aims & scope

Description

Editorial board

Contact us

Latest issue

Just accepted

Collections

Authors & reviewers

Online submisson

Call for papers

Guidelines for authors

Abstract

Graphical abstract

Keywords

Cite this article

1 Introduction

2 Materials

2.1 Data description and analysis

3 Methodology

3.1 Extreme gradient boosting

3.2 Meta-heuristic optimization algorithms

3.2.1 Chimp optimization algorithm

3.2.2 Jellyfish search algorithm

3.2.3 Golden jackal optimization

3.2.4 Sand cat swarm optimization

3.2.5 Sparrow search algorithm

3.2.6 Genetic algorithm

3.3 Model verification and evaluation

4 Model development and discussion

4.1 Development of hybrid sustainable concrete compressive strength prediction models and discussion

4.2 sustainable concrete optimization design based on optimized extreme gradient boosting model

5 Conclusions

References

RIGHTS & PERMISSIONS