Smart prediction: Hybrid random forest for high-volume fly ash self-compacting concrete strength

Shashikant KUMAR; Rakesh KUMAR; Sayan SIRIMONTREE; Divesh Ranjan KUMAR; Warit WIPULANUSAT; Suraparb KEAWSAWASVONG; Chanachai THONGCHOM

doi:10.1007/s11709-025-1184-5

Front. Struct. Civ. Eng. ›› 2025, Vol. 19 ›› Issue (6) :892 -918. DOI: 10.1007/s11709-025-1184-5

RESEARCH ARTICLE

Smart prediction: Hybrid random forest for high-volume fly ash self-compacting concrete strength

Author information +

History +

PDF (6293KB)

Abstract

Sustainable development in the concrete industry necessitates a standardized framework for material development, despite promising experimental results. High-volume fly ash (HVFA) self-compacting concrete’s (SCC) strength characteristics are investigated in this study through the use of sophisticated modeling techniques such as random forest (RF), RF-particle swarm optimization, RF-Bayesian optimization, and RF-differential evolution (RF-DE). Cement was partially replaced with HVFA and silica fume (SF), enhancing fresh and hardened concrete properties such as compressive and split-tensile strengths, passing ability, and filler capacity. Input parameters included cement, SF, fly ash, T-500-time, maximum spread diameter, L-box blocking ratio, J-ring test, V-funnel time, and age. Statistical tools like uncertainty analysis, SHapley Additive exPlanations, and regression error characteristic curves validated the models. The RF-DE model showed the best predictive accuracy among them. Machine learning (ML) is great at predicting compressive strength (CS), but SCC-mix engineers have a hard time understanding it because of its “black-box” nature. To address this, an open-source graphical user interface based on RF-DE was developed, offering precise CS predictions for diverse mix conditions. This user-friendly tool empowers engineers to optimize mix proportions, supporting sustainable concrete design and facilitating the practical application of ML in the industry.

Graphical abstract

Keywords

high-volume fly ash / silica fume / self-compacting concrete / random forest / differential evolution / Bayesian optimization / particle swarm optimization

Cite this article

Download citation ▾

Shashikant KUMAR, Rakesh KUMAR, Sayan SIRIMONTREE, Divesh Ranjan KUMAR, Warit WIPULANUSAT, Suraparb KEAWSAWASVONG, Chanachai THONGCHOM. Smart prediction: Hybrid random forest for high-volume fly ash self-compacting concrete strength. Front. Struct. Civ. Eng., 2025, 19(6): 892-918 DOI:10.1007/s11709-025-1184-5

登录浏览全文

4963

注册一个新账户忘记密码

1 Introduction

Among several special features of self-compacting concrete (SCC) are outstanding pumpability, constructability, high flowability under its own weight, and consolidation around reinforcement without vibration [1]. Owing to these unique qualities, SCC is a valuable material for many project participants who use rebar congestion [2]. The qualities of SCC are typically characterized by new workability-related properties or rheological parameters [3]. Workability-based tests, such as slump flow, the L-box blocking ratio, the U-box, orimet, and V-funnel time, are used to assess these attributes [4]. The quantities of many factors utilized in SCC mixtures, including fly ash (FA), fine and coarse aggregate, high-range water reducers (HRWR), and supplementary cementitious materials (SCM), significantly affect the fresh and hardened properties of SCC [5].

For all SCC structures workability is typically improved, and the needed HRWR is decreased by substituting the cement with low volume fly ash at a rate of 15%–25% [6]. As an additional point of interest, it has been demonstrated that the utilization of a high-volume fly ash (HVFA) leads to an improvement in the slump of the concrete mixture as well as a reduction in the cracking of concrete. Specifically, this is because FA has a lower heat of hydration than other types of ash [7]. Therefore, foot bridges, dams, roads, marine constructions, and precast thin structures are built via SCC with HVFA [8]. The pozzolanic reactivity and filling capacity of microsilica make it useful for enhancing the hardness and durability characteristics of SCC. Permeability is a major factor in the durability of a number of these structures [9].

The need to reduce the harmful impacts of industrial waste and increasing environmental consciousness have led to a surge in the use of industrial byproducts in recent years [10]. The mixed composition of SCC achieves desirable levels of self-compaction and flowability. To fill the gaps created by the coarse aggregate particles, SCC heavily uses fines such as sand and admixtures such as FA and silica fume (SF). With these particles, a superplasticizer (SP) is typically added to improve the mixture’s flowability. The required flowability and self-compaction properties are more easily attained in SCC with relatively low concentrations of coarse aggregate and relatively high water-to-binder ratios [11]. In addition to improving concrete’s properties, using FA and SF in a ternary blend makes SCC more economical and environmentally friendly. Owing to their adverse impacts on human health and the quality of the air, rivers, seas, and groundwater, these waste products have consequently drawn much attention as research topics in recent years [12]. Thus, it is better to find applications for these materials than to release them into the environment. In addition to reducing the negative impact that the production of concrete has on the environment, this type of concrete has an effect on the performance of concrete in its fresh state, as well as on its strength, density, and durability [5,13].

The purpose of Yazıcı’s [14] experiment was to evaluate the mechanical properties and durability characteristics of SCC blends with 10% SF and a fixed 0.28 w/b ratio. To accomplish this, class C FA was substituted for cement in amounts ranging from 30% to 60%. He discovered that adding 10% SF to concrete enhances its toughness and fresh qualities. Furthermore, the characteristics of the component materials affect the SCC compactivity. Consequently, the strength growth of concrete needs to be examined when a large volume of cement is changed. Similarly, when Askari et al. [15] examined the mechanical characteristics of SCC, they reported that a high FA concentration increased the compressive strength (CS) after 28 to 120 d of curing, indicating that the pozzolanic activity of FA was continuous over time. Even with a 10% substitution of SF for cement, SCC that contains a high volume of FA is able to maintain and even improve its tensile strength. In accordance with the findings of Askari et al. [15], the rounded shape of FA particles enables HVFA to reduce the amount of SP that is required to achieve the desired level of self-compacting ability. When it comes to the chloride ion resistance and CS of SCC, Wongkeo et al. [16] investigated the effects of replacing Portland cement with high-calcium class C FA or SF at weight percentages of 50%, 60%, and 70%. These researchers made the discovery that the chloride ion resistance and CS of SCC are improved by the addition of SF and FA when Portland cement is substituted in large quantities.

To prevent unnecessary material waste and test repetition, models for forecasting the strength characteristics of concrete are constantly being developed. Concrete properties can be modeled via well-known models such as best fit curves derived from regression analysis. However, regression analysis approaches may not accurately capture the underlying nature of concrete because of the nonlinear nature of the material [17]. Furthermore, the impact of the constituent materials in concrete may not be well measured via regression models [18]. Some of the advanced current machine learning (ML) tools that have been shown to be useful in the field of civil engineering are artificial neural networks (ANNs) [19,20], gene expression programming (GEP) model [21–24] and genetic programming [25]. For modeling intricate relationships and streamlining procedures, GEP has shown promise in the field of civil engineering. Shishegaran et al. [23] achieved greater accuracy than conventional regression techniques by using GEP to predict concrete CS using ultrasonic pulse velocity and rebound number data. GEP enhanced membrane performance and wastewater treatment efficiency in a different study [24] by optimizing ultrafiltration process parameters. The versatility of GEP in forecasting material properties and streamlining engineering procedures is demonstrated by these applications, which aid engineers in making data-driven choices and improving system performance. Through experimentation, the output models are validated, and these methods simulate responses on the basis of integrated input parameters. In regard to data classification, prediction, optimization, and forecasting, ANNs can produce appropriate results [26]. To simulate the characteristics of concrete durability, ANNs have been extensively used in the past ten years [27,28]. Computational intelligence can be useful for estimating the CS of an SCC, as demonstrated by Dutta et al. [29]. The estimation of CS via the adaptive neuro-fuzzy inference system, extreme learning machine, and multivariate adaptive regression splines models has been ongoing for some time [30,31]. A cross model combining the radial basis function neural network and fruit fly optimization algorithm was designed by Pazouki et al. [32] to forecast the CS of SCC. Several studies have attempted to predict slump flow and flexural strength, two characteristics of SCC. Saha et al. [33] utilized support vector regression (SVR) to forecast multiple SCC characteristics. He was able to forecast the CS, V-funnel duration, slump flow time, and L-box blocking ratio using 115 data points and an FA mineral admixture. The slump flow had a coefficient of determination (R²) of 0.965, followed by the L-box ratio, with a coefficient of 0.954; the funnel time, with a coefficient of 0.979; and the strength, with a coefficient of 0.977. To determine the variable’s significance, a method that ranks the predictors is needed. One such algorithm is random forest (RF), which can handle nonlinear and nonparametric regression and classification tasks and measure the importance of variables by switching their values and calculating the difference in prediction accuracy [34]. Owing to its superior predictive ability, use of bagging or boosting, availability of continuous and categorical parameters, reduction in calculation time, and avoidance of overfitting, RF outperforms other ML algorithms in performance [35]. Recent advancements in ML have illustrated the potential of integrating data-driven models with physical principles to improve prediction accuracy in engineering applications. Guo and Yin [36] incorporated local time-updating discrete schemes into deep learning frameworks, ensuring that predictions conform to established physical laws, thereby enhancing the modeling of multi-dimensional consolidation processes. Zhuang et al. [37] developed physics-informed neural networks that predict material behavior without the need for labeled data sets by integrating governing equations directly into the learning process. These methodologies underscore the increasing trend of hybrid modeling techniques that integrate data-driven insights with domain expertise, consistent with this study’s aim of improving the predictive efficacy of RF models via advanced optimization algorithms. The use of recycled aggregates in SCC strength prediction has been the subject of several related studies that have explored various boosting strategies. Through the utilization of RF, extreme gradient boosting, gradient boosting, categorical boosting, K-nearest neighbors, and extremely randomized trees, a study [38] was conducted with the purpose of predicting the CS of SCC that was made with recycled aggregates. Using a test R² value of 0.7128, the research demonstrated that the RF method is the most accurate, as it pertains to predicting CS. In all, 515 pieces of literature data were processed via the algorithms. The results show that these methodologies are very good at predicting the different parts of SCC. The aforementioned literature review indicates that further work is needed to fill certain gaps before more accurate and reliable models can be developed. The reliability of the model’s predictions is compromised when critical input parameters are missing. Picking too few parameters might lead to inaccurate results because the model will give too much weight to a small number of parameters. In addition, while training, the model modulates the impact of each input parameter on the model’s output by assigning different weights to each parameter. By taking into account more effective input parameters to build robust and dependable ML models, this study intends to fill these particular gaps in the literature. Despite RF’s advantages, this model’s predictive accuracy is lower than it could be because some hyperparameters are still chosen at random. In recent years, optimization algorithms based on neural networks have been widely used to find efficient solutions and increase the precision of regression algorithms [39–41]. These metaheuristic algorithms are well-known technologies for a variety of reasons, including their adaptability, their ability to handle nonlinearity, their capacity for learning, and their demonstrated performance across a wide range of applications. They are also supported by implementation tools and frameworks. It is necessary to conduct additional research into hybrid RF approaches for SCC materials because these methods are still in their infancy. ML models for predicting CS of SCC have been the subject of much research, but many questions remain. The conventional regression methods used in many current studies find it difficult to capture the nonlinear relationships between input variables and CS. Furthermore, although ML models like SVR and ANNs have demonstrated encouraging outcomes, their black-box nature and interpretability restrict their applicability in concrete mix design. Furthermore, ML models frequently use manual methods to choose the best hyperparameters, which can result in less-than-ideal performance. Moreover, the practical implementation of these predictive models is impeded by the majority of studies’ lack of easily accessible tools for professionals in the industry. By filling these gaps, this study offers a novel method that improves prediction accuracy and reliability by combining the RF model with sophisticated optimization algorithms like particle swarm optimization (PSO), bayesian optimization (BO), and differential evolution (DE). By automatically optimizing internal parameters, the hybrid RF models, in contrast to conventional models, guarantee more precise and reliable predictions of the CS of HVFA-SCC. To make the model more transparent and easier to use, SHapley Additive exPlanations (SHAP) analysis is used to quantify the impact of each input parameter. In addition, the creation of an open-source graphical user interface (GUI) based on the RF-DE model helps engineers to quickly and accurately predict strength by bridging the gap between sophisticated ML techniques and real-world concrete design. This thorough approach overcomes the shortcomings of earlier research by improving predictive performance and offering a useful tool that aids in the design of high-performance and sustainable SCC mixtures.

2 Materials properties

In this experimental investigation, the cement that was utilized was grade 43 ordinary Portland cement (OPC). This particular grade of OPC has a specific gravity of 3.15 and is in accordance with the International Standard (IS) 8112-2013 [42]. With a specific gravity of 2.2, the alternative cementitious material was made up of SF and FA. For the purpose of this experimental investigation, samples of low-calcium FA were utilized. National Thermal Power Corporation in Kahalgaon was the source of these samples. According to IS 3812-Part I, the FA samples are suitable [43]. Another pozzolanic material utilized in this study is microsilica or compressed SF, which is supplied by Elkem South Asia Pvt. Ltd. and complies with IS 15388-2003 [44]. The choice of SF was influenced by the fact that it is a substance that improves the characteristics of concrete.

Packed pakur stone, which had a nominal size of 16 mm, a specific gravity of 2.713, and a water absorption of 0.78%, was the coarse aggregate that was utilized in this scenario. The Sone River in India was the source of the river sand that we utilized as components of our fine aggregate. It has a specific gravity of 2.66 and has the ability to absorb 1.35% of the water. In accordance with the requirements of IS 383-2016, an analysis was performed on the aggregates to determine their different physical characteristics [45]. A SP that was based on poly-carboxylic ether and contained a viscosity modifying agent was utilized in this investigation. Within the Master Glenium SKY 8630/8632 brand, the chemical admixtures that were utilized were supplied by BASF India Limited, which was the supplier. A specific gravity of 1.06 was characteristic of these admixtures.

A Rigaku Ultima IV system was used to qualitatively analyze the mineralogical composition of the SF, FA, OPC, and HVFA-SCC samples via CuKα radiation. The X-ray diffraction (XRD) patterns of HVFA-SCC and various binding materials are displayed in Fig.1.

2.1 Mixture proportions

A compliance with IS 10262-2019 was maintained throughout the development of the SCC-mix designs [46]. Binary blend mixtures (OPC + FA) were designed to include low-calcium FA in amounts ranging from 10% to 100% by weight of cement, whereas the control mixture did not contain FA or SF. Among the ternary blend mixtures were SF, FA, and OPC. For each step of FA substitution, the SF percentage added to the ternary blend mixtures ranged from 2% to 10% by cement weight.

The w/b ratio was always 0.36 for all the SCC mixtures. The chemical additive level was set at 2.2% by cement weight, which allowed the SCC mixtures to have the proper fresh characteristics. To account for the fact that both coarse and fine aggregates absorb water when they are cast, more water is added to the mixture until the aggregates are saturated and surface dry. The w/b ratio does not account for this additional water because it is not used in the concrete reaction process. Tab.1 displays the mixture proportions of HVFA-SCC.

2.2 Test procedures

2.2.1 Fresh property tests

Following the methodology prescribed by the European Guidelines and EFNARC for SCC, the subsequent metrics were employed to assess the essential fresh properties of SCC mixtures. Filling ability, passing ability, deformability, and segregation resistance are some of these metrics. Additionally used were the L-box blocking ratio, slump flow time, spread diameter, J-ring test, and V-funnel flow time.

2.2.2 Hardened properties test

The three water-cured cubes of 150 mm × 150 mm × 150 mm for each mixture were subjected to CS tests at 14, 28, 56, and 90 d, in compliance with IS 516-2021 [25]. In accordance with IS 516-2021, the split tensile strength of cylindrical samples with heights of 300 mm and diameters of 150 mm was evaluated after 28 d of water curing [25].

2.2.3 Microstructural property tests

XRD analysis was used to determine the chemical composition of the paste that was found in the transition zone. The OPC, SF, FA, and HVFA-SCC samples were ground into a powder with the help of a mortar and pestle. Each individual powder was then subjected to an XRD analysis. A Rigaku Ultima IV device was used to apply CuKα radiation in order to observe and analyze the samples’ qualitative mineralogical compositions. The X’pert High Score Plus program was used to help with the quantitative phase analysis of the raw material.

3 Machine learning methodology

3.1 Random forest

ML is a collection of algorithmic structures that allow computers to learn and get better at what they do by looking for patterns in data [47]. There are three primary types: supervised, unsupervised, and reinforcement learning. Modeling the relationships between the predictor variables and the dependent variable is the goal of supervised learning algorithms, in which the user predefines the goal. Thus, supervised learning could be used to solve regression and classification problems with ease. A well-liked supervised learning approach, RF was first proposed by Breiman [35]. It employs a cluster of multiple classification and regression trees (CART) to perform classification, prediction, and variable selection [48]. Entering variables and data, creating a fixed number of trees, and finally summarizing their outputs constitute the random selection process. The RF method employs an unbiased metric of the error rate, is relatively more robust against data noise, and resists overfitting, among other benefits over other predictive models. Variable importance is one of many metrics it offers to help with interpretation; it is based on how much worse a prediction would be if the data for that predictor were randomly permuted. The input parameters

x i = (x 1, …, x p) T

and output parameters

y i

are both randomly generated variables in a decision tree. The probability distribution of the relationships between the variables

P x y (x i, y i)

is unknown in this scenario. On the basis of binary division, analyses are carried out at each node. The first node in the decision tree, referred to as the “root”, houses data on all of the input parameters, whereas the nonsplit nodes, referred to as the terminal nodes, dictate the decision tree’s ultimate form. A third of the data did not contribute to tree growth because the RF method relies on the bootstrap repeated sampling method. This data set, known as “out-of-bag (OOB) data”, validates the regression model and helps to put a numerical value on the significance of the parameters. Hence, to obtain an objective estimate of the error, the RF method can be used without cross-validation or an independent test set. Hence, the RF method’s procedure can be characterized as a gray box approach, as opposed to a “black box”, because it is considerably more interpretable than other methods, such as neural networks. To find the optimal features/split from the training set, one uses out-of-the-box data sets to test decision trees. A RF is formed by all decision trees. Find the new data set’s predictions by taking the mean of the predictions made by n decision trees or the average of the regressions.

Selecting the optimal algorithm and building a robust model architecture through hyperparameter optimization can be demanding and time-consuming processes when trying to create a powerful ML model [49,50]. With respect to tree-based ML algorithms, a number of hyperparameters can greatly affect the accuracy of model predictions [51]. Therefore, tuning the hyperparameters correctly requires an optimization strategy. Hyperparameter optimization is crucial but challenging because of its combinatorial complexity and computational demands. Grid search (GS) exhaustively explores options but suffers from dimensionality constraints, whereas random search (RS) lacks efficiency. Automated methods such as BO offer informed search strategies, minimize manual effort and outperform GS and RS in terms of generalizability and efficiency. Some of the numerous automated approaches that have been suggested include the genetic algorithm, BO, PSO, and DE. Because every method has its own set of benefits and drawbacks, it would be foolish to recommend just one approach to solving every problem [52]. One of the most popular metaheuristic algorithms, PSO, is useful for solving many different types of optimization problems [53]. This research examined the accuracy of CS of SCC calculations via more recent RF algorithms. Consequently, this research recommends optimizing the hyperparameters of the suggested RF algorithms via three state-of-the-art optimization methods: PSO, BO, and DE.

3.2 Particle swarm optimization

PSO was suggested by Kennedy and Eberhart [54]. The Reynolds boid model and the behavior of schools of fish and birds served as inspiration for this population-based stochastic optimization method [55]. Comparing PSO to evolutionary computing technology reveals numerous similarities. The basic premise of PSO is to load the system with a large number of randomly generated solutions and then update the generations until the optimal solution is found. The members of the swarm are called particles, and each of them has a fitness value, which could be an answer to the optimization problem. Through social interaction and knowledge sharing among the particles, PSO primarily seeks to identify the ideal or subideal solution of an objective function (OBJ) [56]. Owing to its versatility and speed of convergence, PSO is used to solve complicated mathematical and engineering problems in a wide range of domains [55,57]. The PSO model incorporates a population of particles, where in individual particles randomly traverse a research space while being influenced by their immediate surroundings. Each particle’s knowledge and that of its neighbors influence the outcome of its position. Therefore, it is safe to say that in a swarm, a particle’s search strategy can be impacted by the knowledge of its fellow particles. In a d-dimensional search space, the equations

X i t = (x i, 1 t, x i, 2 t, x i, 3 t, x i, 4 t, …, x i, d t)

and

V i t = (v i, 1 t, v i, 2 t, v i, 3 t, v i, 4 t, …, v i, d t)

represent the position and velocity of each particle at time t. Equations (1) and (2) show how

X i t

and

V i t

change throughout evolution when both personal (p_best) and global best (g_best) experiences are considered. Here,

w t

is the inertia weight that controls the change in velocity,

r 1

and

r 2

are random variables with values between 0 and 1, and

c 1

and

c 2

are acceleration constants.

(1)

X i t + 1 = X i t + V i t + 1,

(2)

V i t + 1 = w t V i t + c 1 r 1 (p b e s t i t − X i t) + c 2 r 2 (g b e s t t − X i t) .

3.3 Bayesian optimization

By constructing a surrogate reconstruction (probability model) using the target’s previous evaluation results, BO minimizes the OBJ and finds the value that fits with it. Recently, extensive use for hyperparameter tuning in ML has been reported. The basic principle of BO is to find the optimal hyperparameter after many iterations by updating the posterior probability distribution with the OBJ’s prior probability and known observation points. Then, finding the next minimal value point with a more posterior distribution is the next step. To configure the optimal combination of hyperparameters quickly, subsequent hyperparameters are taken on the basis of the outcomes of the preceding hyperparameters. Equation (3) represents BO, where

x ∗

is the ideal combination of hyperparameters,

χ

is the space of parameters, and

f (x)

is the function that aims to achieve the desired outcome.

(3)

x ∗ = a r g m a x x ∈ χ f (x) .

Both the probabilistic surrogate model and the acquisition function are crucial to BO. By utilizing historical data, the probabilistic proxy model is used to construct a probability distribution model. The acquisition function then chooses the next combination of parameters and adds them to the surrogate model in a variety of ways until the maximum number of iterations is reached. Since the acquisition function is easier to optimize computationally, it can outperform the costly black-box evaluation function. The expected improvement (EI) can be expressed in closed form via Eqs. (4) and (5), provided that the model prediction y in the configuration follows a normal distribution. In this context, the value

f m i n

represents the best value that has been observed up to this point, whereas

φ (⋅)

and

ϕ (⋅)

represent the standard normal density and general normal distribution functions, respectively.

This makes the EI a popular choice, even though many acquisition functions are available.

(4)

E [I (λ)] = E [m a x (f m i n − y, 0)],

(5)

E [I (λ)] = (f m i n − μ (λ)) ϕ (f m i n − μ (λ) σ) + σ φ (f m i n − μ (λ) σ) .

3.4 Differential evolution

DE has been extensively used in structural engineering as a nature-based approach to solving challenging optimization problems [58]. In 1995, Storn and Price developed this algorithm [59]. Through repeated manipulation of a set of number of population (NP) candidate solutions (or individuals)

x i = (x i 1, …, x i j, …, x i n)

(where

i

= 1

, …,

NP) at each iteration t, it attempts to find the optimal solution iteratively. The so-called mutation is the first significant operator. It operates by using the distinctions between pairs of chosen individuals to disturb a potential solution. The following mutation operators form the foundation of the (μ + λ)-constrained DE (Eqs. (6)–(10)).

(6)

v i = x r 1 + F (x r 2 − x r 3),

(7)

v i = x r 1 + F (x r 2 − x r 3) + F (x r 4 − x r 5),

(8)

v i = x i + F (x r 1 − x i) + F (x r 2 − x r 3),

(9)

v i = x i + F (x b e s t − x i) + F (x r 1 − x r 2) .

Common names for these functions are rand/1, rand/2, current-to-rand/1, and current-to-best/1 where

v i

is the vector that represents the mutation in the equations presented above.

r 1

r 2, …, r 5

, are the integers that are randomly selected from the set

{1, …, N P}

such that

r 1

is not equal to

r 2

r 3

is not equal to

r 4

, and

r 4

is not equal to

r 5

F

is the mutation (or scaling) factor, and

x b e s t

is the individual who is the best in the population. To guarantee that

v i

meets all boundary constraints, a projection scheme is applied. With this operation, each out-of-bound component of

v i

is modified in accordance with the rule in Eq. (10), where

x j l

and

x j u

are the jth components of

x l

and

x u

, respectively.

(10)

v i j = {2 x j l v i j, i f v i j < x j l, 2 x j u − v i j, i f v i j > x j u, v i j, o t h e r w i s e .

From the (μ + λ)-constrained DE, Eq. (11) is derived. In this equation,

r a n d

represents a uniformly distributed random number between 0 and 1,

j r a n d

represents a randomly selected integer between 1 and

n

, and CR represents the crossover rate. The trial vector

u i j

is created by employing a binomial crossover.

(11)

u i j = {v i j, i f r a n d ≤ C R o r j = j r a n d, x i j, o t h e r w i s e .

Satisfying a termination criterion causes the iterative strategy to stop. The study’s evolutionary search continues until the maximum number of fitness evaluations is reached, which can be determined by calculating the maximum number of iterations

t

3.5 Data preprocessing

The authors’ prior research data set was utilized in this study [5]. A total of 240 test points and nine input features were used to predict the CS of HVFA-SCC. The RF-PSO, RF-BO, and RF-DE hybrid ML algorithms were employed in this analysis in addition to the base RF model. The variables that were model as input parameters in this study were as follows: SF, maximum spread diameter, L-box blocking ratio, cement, FA, J-ring test (mm), age (d), and V-funnel time (s). The output for the nine input parameters was the CS of HVFA-SCC at 14, 28, 56, and 90 d. Tab.2, which displays a variety of experimental data, displays the descriptive statistics for the input and output parameters. To ascertain the correlation between each input feature and the CS of HVFA-SCC, a statistical analysis of the input and output attributes was also carried out. Fig.2(a)–Fig.2(i) illustrate the preparation and analysis of the plot of all the input variables with CS. Mutual information (MI) has been employed by numerous researchers to establish information-based feature selection criteria, as it is capable of evaluating the mutual interdependence of variables. Fig.3 shows that the MI values of age, FA, maximum spread diameter, cement, and SF (kg) are significantly greater than those of the other input variables. There is a decrease in the MI values of the V-funnel time (s) and the L-box blocking ratio (h₂/h₁). The correlation between two variables with respect to each other can be measured using the distance correlation method, which is scale insensitive and can account for both linear and nonlinear relationships. Fig.4’s distance correlation bar plot reveals that the J-ring test and age are the variables that have the strongest correlation with output. The analysis of the study reveals that the V-funnel time (s) and the L-box blocking ratio (h₂/h₁) have lower MI values. In contrast to MI, distance correlation can capture both linear and nonlinear associations, enabling complex interactions between variables. If two variables are correlated, then changes in one of them will affect the other. The idea of MI, on the other hand, proposes that illuminating one variable can help illuminate another. Variables that rely on each other for information are indicative of a web of interconnections between them, which may include a third variable. The output variable may be indirectly or nonlinearly related to the V-funnel time (s) and the L-box blocking ratio (in h₂/h₁), or there may not be much MI. By considering the possible long-term importance of each of the nine input variables, this study was able to produce output predictions that were spot on. The data set was preprocessed multiple times before the CS of HVFA-SCC prediction models were created. To guarantee an objective assessment of the model, the data set was then split into two sections at random: 70% for training and 30% for testing. After experimenting with different splits (such as 60:40 and 75:25), this ratio was chosen to strike a balance between generalization performance and training accuracy. K-fold cross-validation was also used to increase the model’s resilience. For greater clarity, these details have been emphasized in the updated manuscript.

4 Results and discussion

4.1 Fresh state results

For every SCC mixture, slump flow values were established and attained within the range of (750 ± 20) mm. T-500 increases when FA content increases. A significantly reduced V-funnel flow duration was observed in SCC containing a large amount of FA, indicating a decrease in viscosity. Combining SF with OPC and FA binary cementitious blends makes SCC mixtures more viscous. OPC, SF, and FA make up the ternary cementitious mixes of SCC, which meet the standards for the EFNARC 2005 [60]. The T-500 vs J-ring test and maximum spread diameter test results are presented in Fig.5, which shows good correlation.

According to the experiment results, the simultaneous addition of FA and SF helped to lessen the amount of bleeding and segregation that took place in the SCC mixtures. In addition, the addition of SF was desirable because it increased the viscosity of the SCC mixtures. In the SCC mixtures that were prepared by using ternary cementitious blends of FA and SF, a viscous flow that was more uniformly distributed was observed, and the flowability test results were more consistent. Possible explanations for this include the fact that SF and FA have distinct particle geometries. Fig.6 shows excellent correlation between the T-500 and V-funnel test results along with the T-500 and L-box test results.

4.2 Hardened properties

After 14, 28, 56, and 90 d of curing in fresh water, the CS was evaluated, and after 28 d, the split tensile strength was evaluated. All of these measurements were taken. Fig.7 presents the results of the investigation. After 90 d of curing in fresh water, it was discovered that adding SF at percentages between 6% and 8% increased the CS of all the SCC mixtures.

Compared with the control concrete, all of the FA-containing concretes presented weaker strengths at 28 d; nonetheless, they still achieved strengths between 45 and 55 MPa. The addition of SF strengthened the ternary SCC mixtures even after they were cured in fresh water for 28 d. Compared with those of the other mixes, the split tensile strengths of the concretes, including the ternary blended cements with FA and SF, were greater.

4.3 Microstructural properties

Scanning electron microscopy (SEM) and XRD were used to assess the microstructural characteristics of concrete mixtures. Tests were also conducted to corroborate the findings of the mechanical qualities and modifications caused by the substitution of HVFA for cement in addition to SF.

4.3.1 X-ray diffraction analysis

The XRD patterns at 28 and 90 d for HVFA-SCC with 40% FA are presented in Fig.1(d) and Fig.1(e), respectively. When the XRD patterns of HVFA-SCC with 40% FA at 28 d were compared, the peaks of mullite and quartz at 90 d decreased. The quantitative analysis of the XRD traces shown in Fig.1(d) and Fig.1(e) indicates that the consumption of quartz dropped from 11.44% to 6.32% and that of mullite from 4.98% to 2.28%. Fig.1(d) and 1(e) illustrate the formation of peaks for ettringite (E) and calcium hydroxide (CH) at approximately 30°.

The controlling peak detected via XRD of the HVFA-SCC with SF ternary system, as shown in Fig.1(f), was attributed to Ca(OH)₂, denoted as CH. Every HVFA-SCC mixture included CH peaks, which indicated that each mixture had undergone a hydration reaction. The XRD patterns displayed in Fig.1(f) clearly reveal that the Ca(OH)₂ intensity is lower in the HVFA-SCC mixtures with added SF than in the HVFA-SCC mixtures. An analysis of the XRD pattern of the HVFA-SCC with 40% FA and 8% SF revealed that after 28 d, the consumption of quartz dropped from 11.4% to 7.27% and the consumption of mullite dropped from 4.98% to 2.43%. Additionally, after 28 d, the consumption of Ca(OH)₂ dropped from 6.31 mass percent to 4.53 mass percent, per the quantitative analysis of the CH peak. As a result of the SF-Al₂O₃ and/or SiO₂ pozzolanic reaction with Ca(OH)₂, Ca(OH)₂ was rapidly consumed.

4.3.2 Scanning electron microscopy analysis

To assess the alterations in the microstructure of the concrete mixtures, SEM examination was performed. Fig.8 shows the SEM micrograph of the control mixture. The interface transition zone (ITZ) development between the aggregates and mortar phase was found to be inadequate. Additionally, there were voids and cracks, which led to a mechanical performance that was worse than that of the binary and ternary mixtures of SCC.

In the binary mix of HVFA-SCC, an excellent ITZ formed, as shown in Fig.9. A dense mortar mixture was observed in the SEM micrograph of the binary mixture of SCC, which revealed fewer voids than before. Furthermore, Fig.9 shows that the HVFA-SCC sample that does not contain SF has a lower concentration of CH and E, but it has a denser appearance and a significant amount of calcium silicate hydrate (CSH). This is something that occurs at later ages. It is evident that the fiber-like CSH gel aggregates to create even larger crystals by adhering to the plate-like CH crystal surface.

As shown in Fig.8, the ternary mixture of HVFA-SCC with SF displayed a very dense mortar matrix and the fewest voids. The improved mechanical performance was the outcome of proper HVFA, SF, and cement proportions and distributions. HVFA-SCC with SF appears to create a net-like structure, as shown in Fig.10, which suggests the presence of secondary CSH.

4.4 Hyperparameters of the developed machine learning models

In the course of the training phase, iterative refinement was utilized to meticulously fine-tune the architectures, functions, and hyperparameters of the base RF models. This process was carried out to achieve optimal performance. To maximize the effectiveness of the RF model, a systematic process of trial and error was conducted. Establishing a foundation was the first step in the process of training the final models. Finding the hyperparameters that resulted in the highest average prediction accuracy across the test data set allowed for this to be done. More precisely, the following hyperparameter values were obtained the most optimum value for RF model: Min_samples_leaf equals 2, min_samples_split equals 2, max_depth equals 16, and n_estimators equals 90. The integration of the optimization algorithm has been studied for its potential to enhance the performance of RF model in engineering applications [61,62]. To improve model output, optimization algorithms (OA) are used to fine-tune RF model hyperparameters such as the learning rate, number of trees, tree depth, minimum number of samples per leaf, loss function, and subsampling. In this research, three different optimization algorithms, namely, PSO, BO, and DE, are used to obtain the most out of RF. While BO focuses on local optimization, PSO and DE conduct global exploration of the parameter space. They are useful because they aid in the enhancement of RF models by identifying optimal hyperparameter settings. Fig.11 shows the procedure followed in the present investigation to create hybrid RFs through hybridization. The ideal values of the hyperparameters that emerged from the optimized algorithms PSO, BO, and DE are displayed in Tab.3. The RF algorithm’s built-in ensemble learning approach, which minimizes overfitting by averaging predictions from several decision trees trained on various data subsets, was one of several strategies used to control overfitting. Hyperparameter tuning using BO, PSO, and DE further decreased overfitting by selecting optimal values for parameters such as the number of trees, maximum tree depth, and minimum samples per split and leaf. By training the models on several subsets of data and validating them on unseen folds, K-fold cross-validation was also used to guarantee model generalizability and avoid overfitting to any particular subset. By estimating prediction error using OOB samples, the RF algorithm offered an internal validation mechanism that eliminated the need for a separate validation set.

4.5 Statistical results

This section thoroughly analyzes the predictive performance of constructed models that were suggested for accurately predicting the CS of HVFA-SCC. R², Adj.R², LMI, NS, VAF, WI, WMAPE, and RSR were among the performance metrics painstakingly computed and tabulated to perform an exhaustive evaluation of the ML models’ efficiency and reliability.

Equations (12)–(19) explicitly define these performance parameters, where k represents the total observations per parameter and q denotes the inputs used for prediction. To provide further clarification, the term

x m e a n

refers to the average value that is derived from the variables that are input, whereas the terms

x i

and

x^i

represent the expected and actual outputs, respectively [50,63–68].

(12)

R 2 = ∑ i = 1 k (x i − x m e a n) 2 − ∑ i = 1 k (x i − x^i) 2 ∑ i = 1 k (x i − x m e a n) 2,

(13)

A d j . R 2 = 1 − k − 1 k − q − 1 (1 − R 2),

(14)

L M I = 1 − [∑ i = 1 k | x i − x^i | ∑ i = 1 k | x i − x m e a n |], 0 < L M I ≤ 1,

(15)

N S = 1 − ∑ i = 1 k (x i − x^i) 2 ∑ i = 1 n (x i − x m e a n) 2, − ∞ < N S ≤ 1,

(16)

V A F = 1 − v a r (x i − x^i) v a r (x i) × 100 %,

(17)

W I = 1 − [∑ i = 1 k (x i − x^i) 2 ∑ i = 1 k (| x^i − x m e a n | + | x i − x m e a n |) 2], 0 < W I ≤ 1,

(18)

W M A P E = ∑ i = 1 k | x i − x^i x i | × x i ∑ i = 1 k x i,

(19)

R S R = R M S E [1 k ∑ i = 1 k (x i − x m e a n) 2] .

A comprehensive comparative analysis of the proposed models was one of the primary objectives of the research project. The statistical parameters that are meant to evaluate model’s predictive performance, including the training and testing data sets, have been thoroughly examined in this section using Tab.4 and Tab.5. The purpose of these tables is to provide a concise summary of the score analyses as well as an understanding of the effectiveness of each model that is discussed. The training phase revealed that there were significant differences in performance between the models. Notably, the RF-BO model outperformed the other models in the training phase, achieving a score of 32 with an R² of 0.99744. Conversely, the RF-PSO model achieves the least favorable result, with a score of 8 and an R² of 0.976967. Interestingly, the base RF model performed better than the hybrid RF-PSO model. The performance of the RF-DE model in training was comparable to that of RF-BO, with R² = 0.997377, which is very close to the R² value of RF-BO. The RF-DE model outperformed all three models in testing, with R² = 0.990134 and a total score of 32. After RF-DE, RF-BO was the second-best model, with a score of 24 and an R² = 0.989456. The base RF model obtained the lowest score of 10 in the testing stage, whereas RF-PSO obtained a score of 14. Except for the R² and Adj.R² values, the values of the other performance parameters were better for the RF-PSO model than for the RF model in the testing stage. As a result, throughout training and testing, the RF-DE and RF-BO models outperformed the other two. Tab.4 and Tab.5 provide performance indicators with rank value for model evaluation during training and testing. To enhance comparison and visual clarity, graphical representations are selectively used. Notably, spider plots facilitate a comprehensive assessment of error criteria across the most promising approaches, as shown in Fig.12(a) and Fig.12(b) for the training and testing phases, respectively. The similar outcomes of the two models are explained by the spider plot’s overlapping area for RF-DE and RF-BO.

Fig.13(a) and Fig.13(b) show the comparison of experimental and developed models predicted value of CS. The best performing model’s data points should be located on the line (y = x). When looking at the RF-BO and RF-DE models for CS of HVFA-SCC during training, the predicted and observed values coincide on the line (y = x), as shown in Fig.13(a). However, during the testing phase, some data points for these models deviated from the line (y = x). Fig.13(a) and Fig.13(b) show that, compared with those of the RF-DE and RF-BO models, the data points of the other two models (RF and RF-PSO) are farther from line (y = x) during the testing and training phases. Since the data points are somewhat distant from the line (y = x) in the regression plots, it is easy to see that the base RF model had poor predictive performance when tested.

For the purpose of evaluating the precision of a model across a variety of performance metrics, a heatmap matrix that was recently constructed and given the name accuracy matrix can be utilized. Fast and easy model accuracy evaluation without inspecting performance indicator values is now possible with the help of the accuracy matrix [69–73]. The CS of HVFA-SCC can be predicted using one of four models: RF, RF-PSO, RF-BO, or RF-DE. Fig.14 depicts the accuracy matrices that they established. The accuracy matrix demonstrates that the RF-DE and RF-BO models perform better than the other models do during testing and training, with the RF-PSO and RF models following closely behind.

4.6 Regression error characteristic (REC) curve

The REC curve is an evaluation tool that can be utilized for the purpose of comparing regression models [74,75]. An intuitive comparison is feasible, as each ascending curve on the REC corresponds to a distinct regression model. This is because of the relative positions of the curves. It is possible to obtain significant information regarding the precision of the models by examining the axes of the REC curve that are associated with error tolerance and accuracy. By observing where a model lies on the curve, it is possible to determine the performance of the model; a model that is more accurate will align itself closer to the upper left corner of the curve. AOC, which represents the area over the REC curve, is a metric that is used to measure the predictive error of the model. As a measure of the model’s overall accuracy, a smaller AOC corresponds to greater confidence in the predictions. Fig.15(a) and Fig.15(b) show the REC curves for the RF, RF-PSO, RF-BO, and RF-DE models during training and testing, respectively, with the intention of predicting the CS of HVFA-SCC. These curves were created while the models were being trained and tested. These charts show that out of all the models tested, the RF models consistently had the worst prediction accuracy. The placement of the REC curve in both the training and testing sets is noticeably closer to the upper left corner, which confirms that the RF-DE and RF-BO models performed better. This is an interesting finding. This alignment confirms and supports the findings of the statistical analysis. The RF-DE and RF-BO models achieved the lowest AOC values during training and testing, as shown in Fig.16, which compares the models’ AOC values. While RF-PSO achieved the greatest AOC value during training, the RF-BSO model achieved the second-highest AOC value during testing. Thus, it can be confirmed from the REC curve that hybrid RF-PSO performs even poorer than its base RF model in training, whereas in testing, its performance is improved over that of the base RF model, as evident from its REC curve in testing.

4.7 Uncertainty analysis

When assessing the possible mistakes and unknowns in predictive models, uncertainty analysis is one tool at your disposal. There are several steps involved in carrying out an uncertainty analysis. During the initial phase, the focus is on pinpointing areas of uncertainty. Obtaining a quantitative representation of the uncertainty associated with each individual source is essential. Equations (20) and (21) show that to account for this situation, it may be necessary to compute standard deviations (

S e

) and mean errors (

e^

). Equation (22), where

d i

and

y i

stand for the actual and predicted output values, respectively. The mean error of individual features for developed models is denoted as

e i

(20)

S e = ∑ i = 1 n (e i − e^) n − 1,

(21)

e^= ∑ i = 1 n e i n − 1,

(22)

e i = y i − d i .

To quantify the overall uncertainty associated with developed models, the individual uncertainties are first merged into a single uncertainty measure. This uncertainty is then skillfully transmitted to ensure accurate data interpretation. Finally, uncertainty analysis is integrated into scientific investigations to confirm accuracy. In accordance with previous methods [76,77], the Wilson score method uses the calculated

S e

and

e^

values to compute confidence intervals for the mean error. Because of this, confidence intervals can be generated. The results of the uncertainty analysis that was carried out during the project’s testing and training phases are displayed in Tab.6. Additionally, at these points, Fig.17 and Fig.18 show how the uncertainty bandwidths of all the models compare. Within the framework of these phases, these comparisons are laid out. Both the RF-DE and RF-BO models yield the best results in terms of the mean error in terms of the prediction phases (Tab.6). A better predictive ability is exhibited by models with narrower uncertainty bandwidths. As shown in Fig.17 and Fig.18, the RF-DE and RF-BO models outperform the RF and RF-PSO models in both phases because of their narrower uncertainty bandwidths. Additionally, the RF-PSO model is the model with the maximum bandwidth in training, whereas the RF model is the most uncertain model in testing.

4.8 Akaike information criterion (AIC)

To compare the generalization capacities of various proposed models, Parzen et al. [78] developed the AIC. Verifying the resilience of the developed data-driven models is the primary goal of this investigation. The impact of models developed for a range of engineering problems has been assessed via the AIC technique. where N is the number of data sets used in the testing or training phase and k is the total number of input variables employed in this investigation. Equation (23) presents the mathematical equation used for the AIC criteria analysis.

(23)

A I C = N × l n ((R M S E 2)) + 2 p .

To be considered optimal, a model must have the lowest AIC value. During the testing periods, the AIC values were compared and are shown in Fig.19. While training, the RF-DE model’s AIC value was the lowest of the four models: RF-DE, RF-BO, RF-PSO, and RF. As a result, the RF-DE model has good generalizability when compared to the other two hybrid models and the base model (RF).

4.9 Objective function criterion

The suggested models were assessed via Gandomi et al.’s OBJ criteria in both phases [79]. The OBJ was instead calculated using the statistical parameters known as the R² and mean absolute error (MAE). When OBJ was calculated via Eq. (24), the following mathematical formula was used.

(24)

O B J = (N T R − N T S N A B) × (M A E T R R T R 2) + (2 N T S N A B) × (M A E T S R T S 2) .

In Eq. (24), the training data set, testing data set, and total data set quantities are denoted by the notations

N T R

N T S

, and

N A B

, respectively. The symbols

M A E T R

and

M A E T S

, respectively, stand for the MAE for the training and testing sets of data. The model that obtains the lowest OBJ value is said to be the best. The OBJ values of the RF (OBJ = 1.66), RF-PSO (OBJ = 1.53), and RF-BO (OBJ = 0.74) models are larger than the OBJ value for the RF-DE (OBJ = 0.72) model, as shown in Fig.20. Consequently, RF-DE performs better in terms of accuracy than the other models that were tested.

4.10 Feature sensitivity analysis using SHapley additive exPlanations and local interpretable model-agnostic explanations (LIME)

In this study, the SHAP framework is employed to provide both local and global interpretability for each input parameter. The local interpretability aspect allows for an in-depth understanding of how individual data points contribute to model predictions, while the global interpretability aspect offers insights into the overall influence and importance of each parameter across the entire data set [80]. Ensemble ML models are better suited to SHAP, despite their greater stability and quantitative information. There is much use for it in the literature because it provides data that are similar to feature importance. The SHAP method simplifies the explanation of complicated ML models by using simpler models that are conceptually similar to the original model. This approach can be summed up using Eqs. (25) and (26) where s is the explanation model,

x ′ ∈ {0, 1} M

, and M and

∅ j

stand for the total number of input features and the influence of each input feature, respectively.

The subset S of

F

that does not contain the ith feature is denoted by

F

in Eq. (26), where

F

represents the set of all features. A model’s Shapley regression values are determined by comparing the model’s predictions with and without the ith input feature.

(25)

s (x ′) = ∅ 0 + ∑ j = 1 M ∅ j x j ′,

(26)

∅ j = ∑ S ⊆ F ∖ {i} | S |! (| F | − | S | − 1)! | F |! [f S ∪ {i} (x S ∪ {i}) − f S (x S)] .

Fig.21 displays the global SHAP explanation of input variables based on outperforming RF-DE models that were used to learn each input parameter’s CS significance. The inputs and how they affect the output parameters are shown in descending order on the vertical axis. The order is determined by computing the mean of global SHAP value, as shown in Fig.21 and Fig.22. This is how the order is determined. The SHAP values of the input variables are displayed on the horizontal axis. When a parameter has a positive value, it affects the output parameter; when it has a negative value, it affects the target. Moreover, as the color transitions from red to blue, there is a corresponding decrease in the input parameter within the specified data range. This phenomenon occurs due to the underlying relationship between the color variation and the parameter’s behavior, indicating a systematic trend in the data set. The observed shift suggests that the input parameter is inversely correlated with the color change, emphasizing the significance of this transformation in the analysis of the given data.

The CS of HVFA-SCC increases with age, SF, V-funnel time, J-ring test, and T-500, as indicated by the red points in Fig.21 at the rightmost position. Conversely, it decreases with these factors. Additionally, the results demonstrate a negative correlation between the FA, maximum spread diameter, and L-box blocking ratio and the CS. Fig.22 shows that the key factors that significantly affect the CS of HVFA-SCC are age, SF, J-ring test parameters, and T-500 test parameters. In this study, cement ranks as the fourth most influential factor affecting SCC strength. There are a number of factors that go into making model predictions; the second most important is the impact of the SCM SF, while the sixth most important is FA. Since cement is a prerequisite for SCC, these factors likely influence its strength. Notably, material properties interact, affecting overall performance. Therefore, the impact of one feature on another may vary depending on the circumstances. The conclusions of the study were influenced by the data set that was used. Increasing the number of data points can result in more precise outcomes.

LIME and SHAP are complementary methods for analyzing ML models. Because it explains how each feature affects predictions across the entire data set, SHAP offers global insights that are perfect for comprehending the behavior of the entire model. By using a simplified model fitted to perturbed input data, LIME, on the other hand, provides local interpretability by elucidating the nature of the model’s prediction for a single instance. When SHAP and LIME are combined, a thorough grasp of local decision-making and global feature importance is obtained, improving the interpretability of intricate “black-box” models. The instability of explanations produced using LIME is demonstrated in Fig.23 and Fig.24. The Gradient Boosting-differential evolution model’s negative and positive coefficients are depicted by red and green bars, respectively, in Fig.23. The dependent and independent attributes have a positive correlation, as indicated by the positive coefficients. Conversely, negative coefficients indicate a negative relationship between the independent and dependent characteristics. As can be seen in Fig.24, age has the maximum positive coefficient of 11.09, and SF (10.30) and maximum spread diameter (2.22) have maximum negative coefficients. Fig.23 provides values in ranges whereas Fig.24 gives the exact coefficients values.

4.11 Sensitivity analysis

Sensitivity analysis serves the purpose of assessing the significance of individual model parameters (input factors) on the response of the model under examination. One of the local sensitivity analysis that are often used in the literature to analyze geotechnical engineering problems is the cosine amplitude method (CAM). CAM constitutes a sophisticated sensitivity analysis technique that systematically quantifies the impact of individual input parameters on model outputs through the computation of angular similarity between their respective vectors. This methodology is esteemed for its computational efficacy, straightforwardness, and capacity to manage high-dimensional data sets, rendering it particularly advantageous in disciplines such as geotechnical engineering and environmental modeling. CAM has been extensively employed to discern critical parameters within complex systems, as evidenced in research conducted by several researchers [23,81–85]. The mathematical underpinning of CAM is rooted in the cosine similarity formula, which assesses the degree of alignment between two vectors: one that embodies input parameter values and another that signifies model outputs. The calculation of cosine similarity is expressed as shown in Eq. (27),

(27)

R i = ∑ i = 1 N (x i ⋅ y i) ∑ i = 1 N x i 2 ⋅ ∑ i = 1 N y i 2,

where

R i

indicate the impact of input variable on CS of HVFA-SCC

x i

signifies the value of the input variable for the ith observation,

y i

denotes the corresponding output variable, and

N

represents the aggregate number of observations. The sensitivity analysis of HVFA-SCC highlights the key factors influencing its CS, which is presented in Fig.25. The most significant contributors are age (

R i

of 0.915), J-ring test parameters (0.986), and T-500 test parameters (0.984), indicating that curing time and flowability play a major role in determining strength. Cement content (0.610) and SF (0.483) also have moderate impacts, with SF being more influential than FA (0.163). In contrast, V-funnel time (0.065) and L-box blocking ratio (0.065) have minimal effects on CS. These results suggest that optimizing curing time, flowability, and the use of supplementary materials, particularly SF, are essential for achieving high CS of HVFA-SCC mixes.

4.12 Development of the graphical user interface

Design engineers see ML’s utility diminished because of the rapid development of new technology to produce more accurate and dependable predictions based on in-vitro experiments, which often disregards the use of empirical methods. For validation, improvement, and ongoing development, both engineers and researchers depend on these methods with great critical access. This work closes this gap by offering a very practical ML model capable of CS estimation for HVFA-SCC. The present work developed a public GUI situated at https://github.com/rakesh0312/self-compacting-flow-parameters to enable exact, spontaneous forecasts. To predict the CS of HVFA-SCC via Tkinkter, the RF-DE model has a straightforward user interface, as shown in Fig.26 [86]. In the first prediction stage, users input critical parameters that correspond to the data sets used to train the ML models. These parameters include the cement, FA, SF, T-500, maximum spread diameter, J-ring test, L-box blocking ratio, V-funnel time, and age. These parameters are crucial because they are used to train the algorithm. The screen interface displays user-defined input parameters along with computed CS values. Most importantly, the code operates automatically anytime the input variables are changed. The prediction process is typically finished in just a few seconds, and the prediction process itself is remarkable for its lightning speed.

5 Conclusions

In SCC, FA, and SF have a significant amount of potential to replace some cement. Making profitable use of the HVFA and SF was the main goal of this study. The following inferences could be made in light of the test results. It is feasible to produce SCC by combining SF and HVFA. When HVFA and SF were added to SCC, the fresh, hardened, and microstructural qualities improved. Although the mix showed enhanced hardening and microstructural performance, the addition of SF resulted in a greater T-500 time, decreased slump flow, greater V-funnel time, and greater J-ring flow. The mechanical properties that were examined demonstrated that the use of additional cementitious materials led to comparatively greater strengths after 56 d. For every mix in the set, the maximum CS is achieved at 6%–8% SF.

SEM image analysis of SCC made with SF and HVFA revealed fewer voids, enhanced cement paste, improved aggregate packing, and a thick mortar matrix. Microstructural improvements in SCC with ternary blends were observed, likely due to the reduced production of ettringite and CH content. This study’s RF-DE, RF-BO, RF-PSO, and RF models successfully predict the CS of HVFA-SCC when given the following parameters: age, cement, FA, SF, T-500, maximum spread diameter, J-ring test, V-funnel time, and L-box blocking ratio. A wide range of performance metrics were utilized, including R², Adj.R², VAF, WI, WMAPE, RSR, LMI, and NS, by utilizing the best hyperparameters for every model. In terms of predictive accuracy, the RF-DE model was found to be the best performing model during testing (R² = 0.990134, WMAPE = 0.014812). Following validation with the REC curve and uncertainty analysis, the RF-DE model was found to be more precise than RF-BO, RF-PSO, and RF. Additional evidence that the RF-DE model was robust came from the AIC and OBJ criteria. According to SHAP analyses, which also revealed that age, SF, J-ring test, and cement had significant impacts on the strength results, the CS of the HVFA-SCC could be accurately predicted by incorporating input factors. The relationships between strength and mixed components can be better understood and CS of HVFA-SCC can be more accurately predicted with the use of GUI based on RF-DE models.

This study’s main limitations include the restricted use of tertiary blended SCC, which were only allowed to contain FA and SF combinations as additional cementitious ingredients. Because they are data-driven, ML models are very effective; however, their performance is dependent on the input parameters that are chosen, which limits their capacity to capture the physical mechanisms that underlie concrete behavior. Prediction accuracy may be further increased by extending the range of input variables. The comparatively small sample size also emphasizes the necessity of bigger data sets to improve the generalizability and robustness of the model. To verify the model’s dependability in a variety of situations, it is essential to compare it to alternative experimental data. Additionally, constant model improvement, frequent laboratory validation, and adjustment to changing construction technologies are required due to the risk of technological obsolescence. To improve the model’s adaptability and suitability for different formulations, future studies should investigate how different cementitious materials affect CS of SCC.

The results of the study have important ramifications for future research as well as engineering practice. When paired with an intuitive GUI, the hybrid RF-DE model offers a useful tool for improving the accuracy and efficiency of concrete mix design. Through the provision of prompt and accurate CS predictions, the model streamlines the design process, reduces material waste, and promotes environmentally friendly building methods. By adding SHAP analysis, engineers can better interpret the model, comprehend the impact of important input parameters, and make well-informed design choices. This data-driven approach improves SCC’s performance, cost-effectiveness, and environmental impact by replacing conventional trial-and-error methods with more accurate and effective design processes. Looking ahead, the model’s use can be expanded to include real-time prediction based on sensor data, which would improve structural performance evaluation and concrete quality control even more.

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Okamura H, Ouchi M. Self-compacting concrete. Journal of Advanced Concrete Technology, 2003, 1(1): 5–15

[2]	Tian J, Wang W, Du Y. Damage behaviors of self-compacting concrete and prediction model under coupling effect of salt freeze–thaw and flexural load. Construction & Building Materials, 2016, 119: 241–250

[3]	Jau W C, Yang C T. Development of a modified concrete rheometer to measure the rheological behavior of conventional and self-consolidating concretes. Cement and Concrete Composites, 2010, 32(6): 450–460

[4]	Bosiljkov V B. SCC mixes with poorly graded aggregate and high volume of limestone filler. Cement and Concrete Research, 2003, 33(9): 1279–1286

[5]	Kumar S, Rai B. Synergetic effect of fly ash and silica fume on the performance of high volume fly ash self-compacting concrete. Journal of Structural Integrity and Maintenance, 2022, 7(1): 61–74

[6]	Dinakar P, Kartik Reddy M, Sharma M. Behaviour of self compacting concrete using Portland pozzolana cement with different levels of fly ash. Materials & Design, 2013, 46: 609–616

[7]	Kumar S, Rai B. Pulse velocity–strength and elasticity relationship of high volume fly ash induced self-compacting concrete. Journal of Structural Integrity and Maintenance, 2019, 4(4): 216–229

[8]	Şahmaran M, Li V C. Durability properties of micro-cracked ECC containing high volumes fly ash. Cement and Concrete Research, 2009, 39(11): 1033–1043

[9]	Kumar S, Rai B, Biswas R, Samui P, Kim D. Prediction of rapid chloride permeability of self-compacting concrete using multivariate adaptive regression spline and minimax probability machine regression. Journal of Building Engineering, 2020, 32: 101490

[10]	Liew K M, Sojobi A O, Zhang L W. Green concrete: Prospects and challenges. Construction & Building Materials, 2017, 156: 1063–1095

[11]	Valcuende M, Marco E, Parra C, Serna P. Influence of limestone filler and viscosity-modifying admixture on the shrinkage of self-compacting concrete. Cement and Concrete Research, 2012, 42(4): 583–592

[12]	Rahman M E, Muntohar A S, Pakrashi V, Nagaratnam B H, Sujan D. Self compacting concrete from uncontrolled burning of rice husk and blended fine aggregate. Materials & Design, 2014, 55: 410–415

[13]	Khodabakhshian A, Ghalehnovi M, de Brito J, Asadi Shamsabadi E. Durability performance of structural concrete containing silica fume and marble industry waste powder. Journal of Cleaner Production, 2018, 170: 42–60

[14]	Yazıcı H. The effect of silica fume and high-volume class C fly ash on mechanical properties, chloride penetration and freeze-thaw resistance of self-compacting concrete. Construction & Building Materials, 2008, 22(4): 456–462

[15]	AskariASohrabi M RRahmaniY. An investigation into mechanical properties of self compacting concrete incorporating fly ash and silica fume at different ages of curing. Advanced Materials Research, 2011, 261–263: 261-263

[16]	Wongkeo W, Thongsanitgarn P, Ngamjarurojana A, Chaipanich A. Compressive strength and chloride resistance of self-compacting concrete containing high level fly ash and silica fume. Materials & Design, 2014, 64: 261–269

[17]	Awoyera P O. Nonlinear finite element analysis of steel fibre-reinforced concrete beam under static loading. Journal of Engineering Science and Technology, 2016, 11: 1669–1677

[18]	Sadrmomtazi A, Sobhani J, Mirgozar M A. Modeling compressive strength of EPS lightweight concrete using regression, neural network and ANFIS. Construction & Building Materials, 2013, 42: 205–216

[19]	Zhang G, Eddy Patuwo B, Hu M Y. Forecasting with artificial neural networks: The state of the art. International Journal of Forecasting, 1998, 14(1): 35–62

[20]	Asteris P G, Ashrafian A, Rezaie-Balf M. Prediction of the compressive strength of self-compacting concrete using surrogate models. Computers and Concrete, 2019, 24(2): 137–150

[21]	Shishegaran A, Khalili M R, Karami B, Rabczuk T, Shishegaran A. Computational predictions for estimating the maximum deflection of reinforced concrete panels subjected to the blast load. International Journal of Impact Engineering, 2020, 139: 103527

[22]	Shishegaran A, Saeedi M, Kumar A, Ghiasinejad H. Prediction of air quality in Tehran by developing the nonlinear ensemble model. Journal of Cleaner Production, 2020, 259: 120825

[23]	Shishegaran A, Varaee H, Rabczuk T, Shishegaran G. High correlated variables creator machine: Prediction of the compressive strength of concrete. Computers & Structures, 2021, 247: 106479

[24]	Shishegaran A, Boushehri A N, Ismail A F. Gene expression programming for process parameter optimization during ultrafiltration of surfactant wastewater using hydrophilic polyethersulfone membrane. Journal of Environmental Management, 2020, 264: 110444

[25]	Mansouri I, Azmathulla H M, Hu J W. Gene expression programming application for prediction of ultimate axial strain of FRP-confined concrete. Advances in Civil and Architectural Engineering, 2018, 9(16): 64–76

[26]	Parichatprecha R, Nimityongskul P. Analysis of durability of high performance concrete using artificial neural networks. Construction & Building Materials, 2009, 23(2): 910–917

[27]	Khan K, Iqbal M, Jalal F E, Nasir Amin M, Waqas Alam M, Bardhan A. Hybrid ANN models for durability of GFRP rebars in alkaline concrete environment using three swarm-based optimization algorithms. Construction & Building Materials, 2022, 352: 128862

[28]	Amiri M, Hatami F. Prediction of mechanical and durability characteristics of concrete including slag and recycled aggregate concrete with artificial neural networks (ANNs). Construction & Building Materials, 2022, 325: 126839

[29]	Dutta S, Murthy A R, Kim D, Samui P. Prediction of compressive strength of self-compacting concrete using intelligent computational modeling. Computers, Materials & Continua, 2017, 53(2): 157–174

[30]	Naghsh M A, Shishegaran A, Karami B, Rabczuk T, Shishegaran A, Taghavizadeh H, Moradi M. An innovative model for predicting the displacement and rotation of column-tree moment connection under fire. Frontiers of Structural and Civil Engineering, 2021, 15(1): 194–212

[31]	Shishegaran A, Karami B, Safari Danalou E, Varaee H, Rabczuk T. Computational predictions for predicting the performance of steel 1 panel shear wall under explosive loads. Engineering Computations, 2021, 38(9): 3564–3589

[32]	Pazouki G, Golafshani E M, Behnood A. Predicting the compressive strength of self-compacting concrete containing Class F fly ash using metaheuristic radial basis function neural network. Structural Concrete, 2022, 23(2): 1191–1213

[33]	Saha P, Debnath P, Thomas P. Prediction of fresh and hardened properties of self-compacting concrete using support vector regression approach. Neural Computing & Applications, 2020, 32(12): 7995–8010

[34]	Scornet E. On the asymptotics of random forests. Journal of Multivariate Analysis, 2016, 146: 72–83

[35]	Breiman L. Random forests. Machine Learning, 2001, 45(1): 5–32

[36]	Guo H, Yin Z Y. A novel physics-informed deep learning strategy with local time-updating discrete scheme for multi-dimensional forward and inverse consolidation problems. Computer Methods in Applied Mechanics and Engineering, 2024, 421: 116819

[37]	Zhuang X, Guo H, Alajlan N, Zhu H, Rabczuk T. Deep autoencoder based energy method for the bending, vibration, and buckling analysis of Kirchhoff plates with transfer learning. European Journal of Mechanics. A, Solids, 2021, 87: 104225

[38]	de-Prado-Gil J, Palencia C, Silva-Monteiro N, Martínez-García R. To predict the compressive strength of self compacting concrete with recycled aggregates utilizing ensemble machine learning models. Case Studies in Construction Materials, 2022, 16: e01046

[39]	Ly H B, Pham B T, Le L M, Le T T, Le V M, Asteris P G. Estimation of axial load-carrying capacity of concrete-filled steel tubes using surrogate models. Neural Computing & Applications, 2021, 33(8): 3437–3458

[40]	Chen H, Asteris P G, Jahed Armaghani D, Gordan B, Pham B T. Assessing dynamic conditions of the retaining wall: Developing two hybrid intelligent models. Applied Sciences, 2019, 9(6): 1042

[41]	Firouzi N, Dohnal F. Dynamic stability of the Mindlin-Reissner plate using a time-modulated axial force. Mechanics Based Design of Structures and Machines, 2025, 53(1): 446–463

[42]	IS8112-2013. Ordinary Portland Cement, 43 Grade-Specification. New Delhi: Bureau of Indian Standards, 2013

[43]	IS3812-2013. Specification for Pulverized Fuel Ash, Part-1: For Use as Pozzolana in Cement, Cement Mortar and Concrete. New Delhi: Bureau of Indian Standards, 2013

[44]	IS15388-2003. Indian Standard Specification for Silica Fume. New Delhi: Bureau of Indian Standards, 2003

[45]	IS383-2016. Specification for Coarse and Fine Aggregates from Natural Sources for Concrete. New Delhi: Bureau of Indian Standards, 2016

[46]	IS10262-2019. Concrete Mix Proportioning—Guidelines. New Delhi: Bureau of Indian Standards, 2019

[47]	Awada M, Srour F J, Srour I M. Data-driven machine learning approach to integrate field submittals in project scheduling. Journal of Management Engineering, 2021, 37(1): 4020104

[48]	Zhang Y, Javanmardi A, Liu Y C, Yang S J, Yu X X, Hsiang S M, Jiang Z H, Liu M. How does experience with delay shape managers’ making-do decision: Random forest approach. Journal of Management Engineering, 2020, 36(4): 4020030

[49]	ElshawiRMaher MSakrS. Automated machine learning: State-of-the-art and open challenges. 2019, arXiv: 1906.02287

[50]	Kumar R, Rai B, Samui P. A comparative study of prediction of compressive strength of ultra-high performance concrete using soft computing technique. Structural Concrete, 2023, 24(4): 5538–5555

[51]	HutterFKotthoff LVanschorenJeds. Automated Machine Learning. Cham: Springer, 2019

[52]	Gomes G F, de Almeida F A, Ancelotti A C Jr, da Cunha S S Jr. Inverse structural damage identification problem in CFRP laminated plates using SFO algorithm based on strain fields. Engineering with Computers, 2021, 37(4): 3771–3791

[53]	Yang L, Shami A. On hyperparameter optimization of machine learning algorithms: Theory and practice. Neurocomputing, 2020, 415: 295–316

[54]	KennedyJEberhart R. Particle swarm optimization. In: Proceedings of ICNN’95-International Conference on Neural Networks. Perth: IEEE, 1995, 1942–1948

[55]	Cui Z, Shi Z. Boid particle swarm optimisation. International Journal of Innovative Computing and Applications, 2009, 2(2): 77–85

[56]	Kushwaha N, Pant M. Modified particle swarm optimization for multimodal functions and its application. Multimedia Tools and Applications, 2019, 78(17): 23917–23947

[57]	EngelbrechtA P. Computational Intelligence: An Introduction. 2nd ed. Chichester: John Wiley & Sons Ltd, 2007

[58]	MaranoG CQuaranta GAvakianJPalmeriA. Identification of passive devices for vibration control by evolutionary algorithms. Metaheuristic Applications in Structures and Infrastructures, 2013, 373–387

[59]	Storn R, Price K. Differential evolution—A simple and efficient adaptive scheme for global optimization over continuous spaces. Journal of Global Optimization, 1997, 11: 341–359

[60]	SoufianeF. The European Guidelines for Self-Compacting Concrete: Specification, Production and Use. EFNARC Technical Report. 2005

[61]	Koopialipoor M, Fallah A, Armaghani D J, Azizi A, Mohamad E T. Three hybrid intelligent models in estimating flyrock distance resulting from blasting. Engineering with Computers, 2019, 35(1): 243–256

[62]	Le L T, Nguyen H, Dou J, Zhou J. A comparative study of PSO-ANN, GA-ANN, ICA-ANN, and ABC-ANN in estimating the heating load of buildings’ energy efficiency for smart city planning. Applied Sciences, 2019, 9(13): 2630

[63]	Kumar R, Rai B, Samui P. Machine learning techniques for prediction of failure loads and fracture characteristics of high and ultra-high strength concrete beams. Innovative Infrastructure Solutions, 2023, 8(8): 219

[64]	Kumar D R, Samui P, Wipulanusat W, Keawsawasvong S, Sangjinda K, Jitchaijaroen W. Bearing capacity of eccentrically loaded footings on rock masses using soft computing techniques. Engineered Science, 2023, 24: 929

[65]	Karami B, Shishegaran A, Taghavizade H, Rabczuk T. Presenting innovative ensemble model for prediction of the load carrying capacity of composite castellated steel beam under fire. Structures, 2021, 33: 4031–4052

[66]	Bigdeli A, Shishegaran A, Naghsh M A, Karami B, Shishegaran A, Alizadeh G. Surrogate models for the prediction of damage in reinforced concrete tunnels under internal water pressure. Journal of Zhejiang University-Science A, 2021, 22(8): 632–656

[67]

George C, Zumba E, Procel Silva M A, Selvan S S, Christo M S, Kumar R, Kumar Singh A, S S, Onyelowe K. Predicting the fire-induced structural performance of steel tube columns filled with SFRC-enhanced concrete: using artificial neural networks approach. Frontiers in Built Environment, 2024, 10: 1403460

[68]	Tahera N, Urs K S, Raj R, Kumar H, Soundalgekar T, Deepa M A. Comparative analysis of sloshing effects on elevated water tanks’ dynamic response using ANN and MARS. Discover Materials, 2025, 5(1): 9

[69]	Kumar R, Prakash S, Rai B, Samui P. Development of a prediction tool for the compressive strength of ternary blended ultra-high performance concrete using machine learning techniques. Journal of Structural Integrity and Maintenance, 2024, 9(3): 2385206

[70]	Sathvik S, Oyebisi S, Kumar R, Shakor P, Adejonwo O, Tantri A, Suma V. Analyzing the influence of manufactured sand and fly ash on concrete strength through experimental and machine learning methods. Scientific Reports, 2025, 15(1): 4978

[71]	George C, Kumar R, Ramaraju H K. Comparison of experimental and analytical studies in light gauge steel sections on CFST using SFRC in beams subjected to high temperatures. Asian Journal of Civil Engineering, 2025, 26(2): 667–681

[72]	Satyanarayana A, Dushyanth V B R, Riyan K A, Geetha L, Kumar R. Assessing the seismic sensitivity of bridge structures by developing fragility curves with ANN and LSTM integration. Asian Journal of Civil Engineering, 2024, 25: 5865–5888

[73]	Kumar R, Kumar D R, Wipulanusat W, Thongchom C, Samui P, Rai B. Estimation of the compressive strength of ultrahigh performance concrete using machine learning models. Intelligent Systems with Applications, 2025, 25: 200471

[74]	Sathvik S, Kumar R, Ulloa N, Shakor P, Ujwal M S, Onyelowe K, Kumar G S, Christo M S. Modelling the mechanical properties of concrete produced with polycarbonate waste ash by machine learning. Scientific Reports, 2024, 14(1): 11552

[75]	Kumar R, Karthik S, Kumar A, Tantri A, Shahaji S. Machine learning approach for predicting the compressive strength of biomedical waste ash in concrete: A sustainability approach. Discover Materials, 2025, 5(1): 46

[76]	Newcombe R G. Two-sided confidence intervals for the single proportion: Comparison of seven methods. Statistics in Medicine, 1998, 17(8): 857–872

[77]	Kumar D R, Samui P, Wipulanusat W, Keawsawasvong S, Sangjinda K, Jitchaijaroen W. Soft computing techniques for predicting penetration and uplift resistances of dual pipelines in cohesive soils. Engineered Science, 2023, 24: 897

[78]	ParzenETanabe KKitagawaGeds. Selected Papers of Hirotugu Akaike. New York: Springer, 1998, 199–213

[79]	Gandomi A H, Alavi A H, Sahab M G, Arjmandi P. Formulation of elastic modulus of concrete using linear genetic programming. Journal of Mechanical Science and Technology, 2010, 24(6): 1273–1278

[80]	Hilloulin B, Tran V Q. Using machine learning techniques for predicting autogenous shrinkage of concrete incorporating superabsorbent polymers and supplementary cementitious materials. Journal of Building Engineering, 2022, 49: 104086

[81]	Dev K L, Kumar D R, Wipulanusat W. Machine learning prediction of the unconfined compressive strength of controlled low strength material using fly ash and pond ash. Scientific Reports, 2024, 14(1): 27540

[82]	Jitchaijaroen W, Keawsawasvong S, Wipulanusat W, Kumar D R, Jamsawang P, Sunkpho J. Machine learning approaches for stability prediction of rectangular tunnels in natural clays based on MLP and RBF neural networks. Intelligent Systems with Applications, 2024, 21: 200329

[83]	He L, Wen Z, Jin Y, Torrent D, Zhuang X, Rabczuk T. Inverse design of topological metaplates for flexural waves with machine learning. Materials & Design, 2021, 199: 109390

[84]	Hamdia K M, Ghasemi H, Zhuang X, Alajlan N, Rabczuk T. Sensitivity and uncertainty analysis for flexoelectric nanostructures. Computer Methods in Applied Mechanics and Engineering, 2018, 337: 95–109

[85]	Vu-Bac N, Lahmer T, Zhuang X, Nguyen-Thoi T, Rabczuk T. A software framework for probabilistic sensitivity analysis for computationally expensive models. Advances in Engineering Software, 2016, 100: 19–31

[86]	Pant A, Ramana G V. Prediction of pullout interaction coefficient of geogrids by extreme gradient boosting model. Geotextiles and Geomembranes, 2022, 50(6): 1188–1198

RIGHTS & PERMISSIONS

Higher Education Press

PDF (6293KB)

622

Accesses

Citation

Detail

Sections

Recommended

About the journal

Authors & reviewers

Abstract

Graphical abstract

Keywords

Cite this article

1 Introduction

2 Materials properties

2.1 Mixture proportions

2.2 Test procedures

2.2.1 Fresh property tests

2.2.2 Hardened properties test

2.2.3 Microstructural property tests

3 Machine learning methodology

3.1 Random forest

3.2 Particle swarm optimization

3.3 Bayesian optimization

3.4 Differential evolution

3.5 Data preprocessing

4 Results and discussion

4.1 Fresh state results

4.2 Hardened properties

4.3 Microstructural properties

4.3.1 X-ray diffraction analysis

4.3.2 Scanning electron microscopy analysis

4.4 Hyperparameters of the developed machine learning models

4.5 Statistical results

4.6 Regression error characteristic (REC) curve

4.7 Uncertainty analysis

4.8 Akaike information criterion (AIC)

4.9 Objective function criterion

4.10 Feature sensitivity analysis using SHapley additive exPlanations and local interpretable model-agnostic explanations (LIME)

4.11 Sensitivity analysis

4.12 Development of the graphical user interface

5 Conclusions

References

RIGHTS & PERMISSIONS