Application of extreme gradient boosting in predicting the viscoelastic characteristics of graphene oxide modified asphalt at medium and high temperatures
Huong-Giang Thi HOANG
,
Hai-Van Thi MAI
,
Hoang Long NGUYEN
,
Hai-Bang LY
Application of extreme gradient boosting in predicting the viscoelastic characteristics of graphene oxide modified asphalt at medium and high temperatures
University of Transport Technology, Hanoi 100000, Vietnam
banglh@utt.edu.vn
Show less
History+
Received
Accepted
Published
2022-11-03
2023-05-16
2024-05-15
Issue Date
Revised Date
2024-05-29
PDF
(16016KB)
Abstract
Complex modulus (G*) is one of the important criteria for asphalt classification according to AASHTO M320-10, and is often used to predict the linear viscoelastic behavior of asphalt binders. In addition, phase angle (φ) characterizes the deformation resilience of asphalt and is used to assess the ratio between the viscous and elastic components. It is thus important to quickly and accurately estimate these two indicators. The purpose of this investigation is to construct an extreme gradient boosting (XGB) model to predict G* and φ of graphene oxide (GO) modified asphalt at medium and high temperatures. Two data sets are gathered from previously published experiments, consisting of 357 samples for G* and 339 samples for φ, and these are used to develop the XGB model using nine inputs representing the asphalt binder components. The findings show that XGB is an excellent predictor of G* and φ of GO-modified asphalt, evaluated by the coefficient of determination R2 (R2 = 0.990 and 0.9903 for G* and φ, respectively) and root mean square error (RMSE = 31.499 and 1.08 for G* and φ, respectively). In addition, the model’s performance is compared with experimental results and five other machine learning (ML) models to highlight its accuracy. In the final step, the Shapley additive explanations (SHAP) value analysis is conducted to assess the impact of each input and the correlation between pairs of important features on asphalt’s two physical properties.
Huong-Giang Thi HOANG, Hai-Van Thi MAI, Hoang Long NGUYEN, Hai-Bang LY.
Application of extreme gradient boosting in predicting the viscoelastic characteristics of graphene oxide modified asphalt at medium and high temperatures.
Front. Struct. Civ. Eng., 2024, 18(6): 899-917 DOI:10.1007/s11709-024-1025-y
Over the past decades, significant increases in loads, traffic flow, and negative impacts of climate change have placed heavy demands on the improvement of asphalt pavements’ performance [1]. Asphalt, being an organic binder, is a vital pavement material utilized worldwide in the construction of roads. However, a negative aspect of asphalt lies in the fact that it can be easily oxidized by climate factors such as heat, sunlight, rain, or by other factors related to the mixing, transportation, compaction, and exploitation processes. At present, asphalt as traditionally used can hardly meet the current and future requirements for pavement construction. Therefore, the use of different additives to improve the mechanical properties of asphalt and ensure a more environmentally friendly product is highly desirable. As a result, various modifiers have been proposed and their effectiveness has been proven, namely fibers [2–4], crumb rubber (CR) [5–8], styrene-butadiene-rubber [9,10], ethylene glycidyl acrylate [11]. Indeed, these additives can improve asphalt binders’ performance in terms of permanent deformation, fatigue and aging resistance, and moisture sensitivity. So far, asphalt modifiers have remained an active research area.
Recently, there has been rapid development in materials science and nanotechnology. Nanomaterials, described as those with at least one dimension in the 1–100 nm range [12], were first added to asphalt for pavement engineering application by Xiao et al. [12,13] and Goli et al. [14]. Since then, a wide variety of different nanomaterials have been included in asphalt concrete in order to enhance its mechanical properties. Nanomaterials such as nano-clay [15–18], carbon nanofibers [19,20], and carbon nanotubes [14,21] have been shown to enhance asphalt properties such as high-temperature behavior, resistance to aging, visco-elasticity, fatigue, and moisture damage. Nanoscale modifiers play an ever-increasing role in asphalt modification.
Graphene oxide (GO), an emerging carbon-based nanomaterial, has received increasing attention and interest from researchers, especially in the pavement engineering community [22]. Compared with other nanomaterials, GO’s structure and surface area are unique, with one or several layers. The thickness of the layers is a few nanometers, while the remaining two-dimensional sizes can be up to several tens of micrometers. In addition, GO layers are held together by the van der Waals force, which diminishes as layer separation rises [23]. Because the chemical structure of GO is so similar to that of asphalt binders, its compatibility with asphalt is noticeably superior than that of other additives [23,24]. It has been shown that GO, a potential asphalt modifier, may greatly enhance a variety of mechanical characteristics of asphalt, such as the material’s anti-aging performance and surface free energy [25], fatigue performance [23,24], softening point, ductility, and high-temperature properties [26,27]. In particular, GO has a considerable impact on increasing the complex modulus (G*) of the asphalt at medium and high temperatures while simultaneously having a negative effect on the phase angle (φ) [28,29]. The G* is one of the important criteria for classifying asphalt, as stated in AASHTO M320-10, and is also a critical factor in predicting the linear viscoelastic behavior of asphalt binders. Meanwhile, φ is the angle of deviation between the sinusoidal strain acting on the sample and the sinusoidal stress in the strain-controlled test mode. Such an asphalt property is often used to characterize the deformation resilience of the binders. Normally, these two factors are determined using an experimental approach performed on a dynamic shear rheometer device over a wide range of temperatures and frequencies. However, the process is complicated, costly, time-consuming, and requires special equipment. To simplify this process, a different approach is needed to quickly determine these two parameters of GO-modified asphalt at medium and high temperatures.
Machine learning (ML) is widely used in many areas due to its efficiency, especially for highly complex problems [30–32]. Regarding research in the road sector, ML models have been used to solve problems related to asphalt binders and mixtures, for instance, stiffness and Marshall parameters [33], dynamic modulus [34–38], phase angle [39], rutting depth [40], international roughness index [41,42], and fatigue life [43]. Some physicomechanical parameters of polymer-modified asphalt [44] and rubber asphalt [45] have also been predicted by ML algorithms with high accuracy (coefficient of determination R2 up to 0.976 [35]). However, to the best of the authors’ knowledge, the application of ML algorithms in predicting GO-modified asphalt’s properties, such as G* and φ, has not yet taken place.
Tree-based algorithms have been increasingly applied to various problems due to their ability to handle complex and nonlinear relationships between input and output parameters and their robustness against overfitting. Some popular tree-based models include decision trees (DT), random forests (RF), light gradient boosting machines (LGB), and extreme gradient boosting (XGB). These models have been employed with great success in civil engineering for various tasks, such as predicting material properties, structural performance, soil or rock material, geotechnical engineering, and optimizing design parameters. For instance, tree-based models have been used to predict the soil shear strength [46], advance rate of a tunnel boring machine [47], flyrock distance [48], rockburst prediction [49], slope stability [50], and pile friction bearing capacity [51]. Due to the demonstrated effectiveness of tree-based models in civil engineering applications, the XGB model has been selected in this study. XGB is an advanced tree-based model that employs gradient boosting techniques to iteratively refine the model’s performance by combining weak learners to form a strong learner. It has shown superior performance in many applications due to its ability to handle missing values, regularization, and parallelization of the learning process. Additionally, XGB offers a high level of interpretability, which can be advantageous in understanding the relationships between inputs and the predicted viscoelastic properties of GO-modified asphalt.
Asphalt binders are a critical component of road construction and maintenance. Their performance is influenced by factors such as temperature, load, and the composition materials. One of the key challenges in designing asphalt mixtures is predicting asphalt binders’ viscoelastic properties under various conditions. At medium to high temperatures, asphalt mixtures experience thermal stresses that can affect their deformation resistance and susceptibility to rutting. Therefore, accurately predicting G* and φ of asphalt binders at these temperature ranges is essential for optimizing their performance. As mentioned, GO-modified asphalt has attracted significant attention due to its improved mechanical properties compared to traditional one. However, predicting the viscoelastic characteristics of GO-modified asphalt at medium and high temperatures remains a challenge. Traditional methods for estimating G* and φ can be time-consuming and may not always provide accurate results. ML techniques, such as XGB, have shown promise in predicting complex relationships between input parameters and material properties.
Therefore, the problem this research aims to address is the development of a robust and accurate XGB model for predicting the viscoelastic characteristics of GO-modified asphalt, such as G* and φ, at medium and high temperatures. To achieve this goal, two data sets are used, containing 357 experimental results (for G* data set) and 339 experimental results (for φ data set). An XGB model is selected to estimate G* and φ of GO-modified asphalt at different temperatures, ranging from 30 to 82 °C and a 10 rad/s frequency. This paper consists of the following main parts: the first part is dedicated to introducing the work’s context, the second part describes the two constructed data sets, followed by the third part which presents an overview of predictive ML models, including the XGB model, and other techniques used in this study. The fourth section presents the prediction results and discussion (including a comparison with experimental results and five other ML models), and is followed by the conclusion.
2 Data set description and analysis
The key to successfully training any ML model lies in the reliability of the database. This study’s process of building a database is therefore carefully and strictly constructed according to certain principles. The process of making experimental databases is carried out by assembling many databases from the literature. First, when seeking data from the literature, it is not easy to find one research based on a significant quantity of experiments. The highest number of data are found in Ref. [27] with 108 samples, whereas the lowest one is found in Ref. [29] with 18 samples. Second, GO used in the database employed for this study is obtained from two sources: commercial products in laminate structures (304 samples) or synthesized in the laboratory with one single layer (53 samples). In addition, chemically modified GO is also collected in Ref. [26] with 35 samples. Third, original asphalt is mainly considered in this study (260 samples), whereas the remaining samples (97 data) use styrene-butadiene-styrene-modified (SBS-modified) asphalt collected from different works, such as Refs. [1,22,26,29]. Considering that the original asphalts have different rheological properties, the G* and φ before or after being modified are both used as input parameters. Fourth, as mentioned above, asphalt is a material that easily oxidizes, which changes its viscoelastic properties. Therefore, samples of GO-modified asphalt under unaged conditions, short-term aging by thin film oven test (TFOT), rolling thin film oven test (RTFOT), long-term aging by pressure aging vessel test (PAV), ultraviolet (UV) aging by 9 d, and UV aging by 12 d are all considered. The aging condition is another input in the database and is coded as a categorical variable (i.e., from 1 to 6). Finally, different initial asphalts have different effective working temperatures, so the factors related to the preparation process to obtain a uniform dispersion of GO in asphalt are also considered in this study (i.e., mixing temperature, time, and speed, defined by revolutions per minute: RPM).
On that basis, the study uses two data sets to develop the predictive models for G* and φ. Accordingly, the G* and φ data sets have 357 and 339 experimental data, respectively, collected from eight references (see Tab.1). The data in this study is split into two parts, in which 70% of the data are utilized to develop ML models (i.e., training data set), and the remaining 30% of data are used for evaluating the model’s accuracy (i.e., testing data). The ratio is chosen based on several suggestions in the literature and conducted randomly to ensure that the accuracy of the models is representative.
The G* data set consists of nine input parameters: GO content (%), GO layers (number), GO sheet thickness (nm), GO diameter (μm), mixing temperature (°C), RPM (r/min), mixing time (minute), asphalt aging type, initial G* (kPa), and the output parameter is G* (kPa). The statistical description of the G* data set is provided in Tab.2, comprising mean, Std, min, Q25%, Q50%, Q75%, and max values of all variables. Similarly, the φ data set has nine input parameters, in which only the initial φ (° ) is different from the initial G* from the G* data set, whereas the output parameter is the φ (° ) of the asphalt after being modified. Tab.3 presents the descriptive statistics for this data set.
In addition, Fig.1(a) depicts a correlation analysis between all the parameters of the G* data set, whereas Fig.1(b) depicts a similar correlation for φ data set. Based on the value of the Pearson correlation coefficient (r) [52], it can observe that the majority of correlations between all the variables in the G* data set are rather weak, such as the correlation between the initial G* and GO thickness (r = 0.01), or GO layer and G* (r = 0.02). There is almost no relationship between the initial G* and layer (r = 0). Nevertheless, there are substantial and strong correlations, such as between the GO thickness and GO layer (r = 0.92), RPM and mixing temperature (r = 0.94), and initial G* and G* (r = 0.97).
Similarly, the value of r in Fig.1(b) demonstrates that the correlations between the variables of the φ data set are rather weak. However, some strong correlations exist, including the correlation between the GO thickness and GO layer (r = 0.91), between RPM and mixing temperature with the aging condition (r = 0.94), and between the initial φ and φ after being GO-modified (r = 0.97). Through analysis, it can be seen that the variable correlations of the 2 data sets are quite similar, with several inputs having a strong correlation with each other, such as the GO thickness with the GO layer and RPM with mixing temperature. However, in this study, all variables are retained for assessment and increase the generality of the predictive models.
3 Methods
3.1 Machine learning methods
3.1.1 Machine learning
ML is concerned with developing and researching techniques that allow systems to learn from existing databases to solve problems automatically. Using the ML approach, the essential ideas are usually validated experimentally rather than rigorously proven. Therefore, the advantage of ML is that it is possible to solve the problem with the ability to learn and improve the database without knowing the detailed relationship between input and output. ML is a strategy that focuses on the process of developing computer programs that have the ability to acquire data and make use of it in order to learn on their own. Observations of data sets are used to start the learning process. The primary objective of ML is to enable computers to learn about a process rapidly and automatically, without human involvement or help, so that they may adapt their activities to be much more effective. The ML algorithm produces an inference function that accurately predicts the output value. Various models may be used to conduct the necessary conversion from input to output. Different ML models may be employed to predict the desired output value. This work uses the XGB model to predict the G* and φ of GO-modified asphalt.
3.1.2 Extreme gradient boosting
XGB is an improved version of distributed gradient boosting, which was first suggested by Chen and Guestrin [55]. It is an upgraded algorithm from gradient tree boosting (GTB). The GTB algorithm’s fundamental process is to sequentially integrate weak (high-error) basic learning trees into a more robust learning tree model. The XGB model incorporates a regularization component into the loss function that is used to assess the complexity of the model. This helps to enhance the performance of the GTB model. The inclusion of the regularization component contributes to the harmonization of the learned parameters of the learning model and helps to avoid overfitting. The primary purpose of the XGB method is to optimize the objective function’s value. In addition, ML techniques are implemented in a gradient-enhanced framework. With parallel boost trees, XGB can solve various data science issues rapidly and correctly.
3.2 Cross-validation (CV)
In ML applications, an overfitting problem usually happens, which could reduce the model’s predictive reliability. CV is performed to eliminate this phenomenon. In this research, CV is used for each data set in the following manner: the whole data set is separated into the training (70% of data) and testing data set (30% of data). The training data set is partitioned into K equal parts, and the simulation is conducted K times. Each training iteration selects the (K − 1) part as training and the remaining fold as validation data. The model training process is conducted K times, and each training iteration uses only one validation data piece. The assessment of the final model is based on the average accuracy of K iterations of CV. Also, the number of K should be appropriately selected, often taking values of K = 5 or 10. In this investigation, K = 5 is used as suggested in Ref. [56].
3.3 Shapley additive explanations (SHAP) values
The SHAP values provide a method of model explanation in which each prediction is explained by the contribution of the data set’s characteristics to the output of the model [57]. Using game theory, SHAP values are calculated to quantify the contribution of each characteristic in the model to the final observational prediction. In the case of a model based on linear regression (LR), coefficients are used in order to compute the relative weights of objects and provide an explanation for the predictions made for each data point. However, it does not take into consideration the several ways in which individual data observations might be interpreted. On the other hand, in the vast majority of cases, the effect of a particular trait on one data point may not be the same as the influence of the trait on another data point. This corroborates the theory that regional explanations are more reliable than global ones. To develop alternative models for black-box ML models, SHAP researches and makes use of the idea of local explainability. In this instance, SHAP alters the input significantly and examines how the forecast has changed. If the model forecast does not vary significantly when the input value for a feature is altered, the model is stable, then the feature may not be a significant predictor for that particular data point. The sum of the contributions, also known as the SHAP values, made by each feature is added together to arrive at the final forecast. It is not the difference between the prediction with and without a feature that constitutes the SHAP value; rather, it is the contribution that a feature makes to the disparity between the actual forecast and the mean prediction.
3.4 Model performance assessment
In this study, two statistical measures, namely R2 and root mean square error (RMSE), are utilized to assess the performance of models. R2 represents the correlation between the predicted and experimental values, ranging from – ∞ to 1. The model is highly accurate as R2 approaches 1. In contrast, the RMSE criterion averages the difference between the predicted and experimental values. The lower the RMSE error, the more accurate the model’s prediction is. The ideal model is one that achieves R2 = 1 and RMSE = 0. The formulas for determining these metrics are as below:
where n is the number of samples, e is the experimental value, and p is the predicted value.
4 Modeling process
The modeling process in developing and validating the XGB model for predicting the viscoelastic characteristics of GO-modified asphalt at medium and high temperatures is as follows.
1) Data collection and preprocessing
Two data sets were collected from previously published experiments, consisting of 357 samples for G* and 339 samples for φ. These data sets were processed and cleaned to ensure data quality and consistency.
2) Feature selection
Nine input features representing the asphalt binder components were selected for the XGB model. These inputs were chosen based on their relevance and importance in determining the viscoelastic properties of asphalt binders.
3) Data splitting
The data sets were split into training and testing sets, typically using a 70–30 ratio. The training set was used to develop the XGB model, while the testing set was reserved for evaluating the model’s performance.
4) Model development
The XGB model was developed using the training set, with hyperparameters tuned to optimize the model’s performance. As mentioned earlier, 5-fold CV technique was applied to ensure the model’s robustness and generalizability.
5) Model evaluation
The XGB model’s performance was assessed using the testing set by calculating R2 and RMSE for both G* and φ predictions. These metrics were used to determine the accuracy and reliability of the model.
6) Model validation and comparison
The XGB model’s performance was also validated using experimental results from the literature. Additionally, the model was compared to five other ML models (artificial neural network (ANN), LGB, LR, DT, and RF) to demonstrate its superiority in predicting the viscoelastic properties of GO-modified asphalt.
7) Feature importance analysis
SHAP values were used to analyze the influence of each input on asphalt’s physical properties (G* and φ) and to identify correlations between the most important variables for both data sets.
5 Results and discussion
5.1 Hyperparameter selection
As mentioned above, ML algorithms are widely used in many fields and applications. However, building an accurate ML model is a complex and time-consuming process involving determining the appropriate algorithm, as well as its parameters. Therefore, its hyperparameters need to be adjusted before the learning process to fit a ML model. This is an important process to reduce overfitting and increase predictive capability and adaptation to new data, thereby improving the model’s performance [58]. In this study, a trial and error process is performed to optimize the hyperparameters of the XGB model. Four hyper-parameters, including max depth, n_estimators (denoted estimators), min child weight (denoted M.C.W), and learning rate, are shown to affect the performance of the XGB model during optimization [59]. A grid search method is used to find the best values of these parameters, in which the corresponding search domain is detailed in Tab.4. The remaining hyperparameters of the XGB model are taken by default values in Python Scikit Learn to simplify the search process. In addition, R2 is considered the main criterion to quantify prediction error, ensuring identification of appropriate hyperparameters with high accuracy and least deviation. The CV technique is used with K = 5 for assessment of the ML model.
Fig.2 and Fig.3 show the validation scores, reflecting the influence of four hyperparameters on the model’s prediction performance for the G* and φ data sets, respectively. It is worth noticing that the best XGB model is chosen where R2 reaches its highest value. For the G* data set, the XGB model has the best performance (R2 = 0.953) when M.C.W = 1, max depths are 6 and 8, the learning rates are 0.05 and 0.2, and estimators vary from 300 to 500. Similarly, for the φ data set, when M.C.W = 1, the learning rates are 0.05, 0.2, and 0.4, max depths are 3 and 4, and estimators vary from 600 to 800, the model achieves its highest performance with R2 = 0.983.
In addition, it can be seen that the max depth is an important parameter that significantly affects the predictive accuracy of the XGB model. For both data sets, lower numbers of max depth (i.e., 1, 2, and 3) yield lower values of the validation scores. It is also important to note that increasing the max depth increases the complexity of the tree model. Similarly, increase of learning rate values (to higher values, 0.3, 0.4, 0.5) tends to decrease the predictive accuracy. The number of estimators and M.C.W seem to be the least influencing variables on the model’s accuracy.
5.2 Predictive performance of extreme gradient boosting models
In general, the ML model may achieve a high validation score, but once evaluated on the testing data set its performance can change significantly. This change significantly depends on the samples selected to be in each data set, which might reduce the applicability of the constructed model while dealing with new data. It is important to notice that the validation score indicates the model’s robustness, whereas the accuracy evaluated on the testing data set shows how effective the model is when predicting unseen data. To ensure the generalizability of the model and achieve the highest predictive accuracy, 5 XGB models with the highest validation scores are selected for further assessment, denoted as XGB1 to XGB5. The testing phase is used to assess these models by comparing their respective capacities for prediction.
In this section, the RMSE and R2 values calculated during the testing part of the five models are used for comparison purposes on the G* data set and φ data set (Fig.4). For illustration purposes, the validation scores of all models are also plotted. It can be seen that all XGB models have high predictive performance and stability. There is no significant difference in predictive performance between these models in terms of evaluating the testing parts. Details of the RMSE and R2 values, along with the hyperparameters of the models, are presented in Tab.5. In general, the XGB3 model provides slightly higher performance than the remaining models. Compared with the other models, the XGB3 model has the highest R2 in validation and testing, with R2 = 0.953, 0.990 (for G* data set), and 0.983, 0.990 (for φ data set). Moreover, the RMSE values also achieve the smallest value in the testing part, with 31.499 and 1.080, respectively, for the G* and φ data sets. Based on these findings, the model XGB3 is the best model for both data sets, and the typical prediction results are presented in the next section.
5.3 Representative prediction results
In this section, the representative prediction results of the XGB3 model (named as XGB for simplification) for two data sets are shown to highlight its performance. The regression graph describing the comparison between the actual and predicted values for the two data sets is shown (Fig.5) for G* and φ data sets. In each figure, regression lines are also plotted to display the correlation between the predicted values compared with the experimental values. The Pearson correlation coefficient (rs) are 1 and 0.995 for the training and testing parts, respectively. The close correlation between the predicted and experimental values confirms the strong accuracy of the finely-tuned XGB model.
In addition, the error values are also shown to highlight the accuracy of the XGB model (Fig.6). In general, it can be seen that the error values for the training part are small. Considering the G* data set, most errors are in the range from –0.02 to 0.02 kPa, with only 11 samples exhibiting errors outside of this range and not exceeding ±0.03 kPa. Taking into consideration the φ data set, the majority of the samples have very insignificant errors and are centered around the value zero. The samples having errors outside of the range from –0.2° to 0.2° are insignificant (i.e., 4 out of 273 samples). For the testing data set, the error range for both data sets is wider than the training data sets. Specifically, for the data set G*, it is found that the number of samples with error values outside of the range from –50 to 50 kPa is 9 samples, whereas the remaining 98 samples have errors within this range and are mostly concentrated around the value 0. For the φ data set, most errors are in the range from –2° to 2°, with only 8 errors having a higher error. For the testing phase, the sample proportion that has an error in the range from –25 to 25 kPa is 80% for the G* data set based on cumulative distribution, and the proportion with error value in the range from –1° to 1° is 80% for the φ data set. Thus, the above error values show that the XGB model can accurately predict the values of G* and φ of modified asphalt at high temperatures.
5.4 Performance comparison with relevant machine learning methods
To confirm the performance of the XGB model, this section compares its performance with other relevant ML models. Five ML models are selected based on their relevance: ANN, LGB, LR, DT, and RF. The hyperparameters used in these models are also selected prior to the comparison, with most of the models having default values from the Python package. Two statistical measures (R2 and RMSE) are utilized to assess the performance for comparison purposes. The training and testing data sets results are shown for G* and φ data sets (Fig.7). The results show that for the G* data set, the XGB model performs well on both training and testing with almost absolute accuracy. The accuracy of the DT and RF models is also very high, but slightly less than that of the XGB model. Meanwhile, the LGB model has the lowest accuracy for both criteria. The XGB model still has the best performance for the φ data set, while RF is the worst predictor with the smallest R2 and the highest RMSE for both training and testing data sets. The details of the comparison for the six models are shown in Tab.6. Obviously, the XGB model outperforms all the other ML models selected in this study.
It should be noted that in addition to the comparison with other tree-based models, namely LGB, DT, and RF, which share a closer methodological background with XGB, the inclusion of ANN and LR in the comparison can provide valuable insights and help showcase the versatility and robustness of the XGB model in a broader context. Indeed, LR is a simple statistical model that has been widely used for predicting relationships between inputs and outputs. Including LR can serve as a baseline to demonstrate how the XGB model performs against a basic linear approach, which can help to emphasize the advantages of using more complex tree-based models. Besides, ANN is a popular ML technique that can handle complex, nonlinear relationships. Although ANN is not a tree-based model, its inclusion in the comparison can provide insights into the generalization capabilities of the XGB model. Comparing the XGB model’s performance against an ANN helps to demonstrate the effectiveness of the XGB model in handling the nonlinearities and complexities of the problem at hand. Overall, by comparing the XGB model with a diverse set of models, it can demonstrate the versatility and robustness of the XGB model. This comparison can help validate the choice of XGB within the context of this study and provide a broader understanding of its performance in predicting the viscoelastic characteristics of GO-modified asphalt.
5.5 Performance comparison with experimental results
To highlight the performance of the XGB model, Fig.8 describes the assessment of the model’s prediction results with different experimental results from the two data sets. In each figure, the blue lines represent the predicted results of the XGB model, while the blue circles represent experimental data taken from various references [1,26–28,53]. To increase the generalizability in the comparison process, test samples are selected from various sources and using different original asphalt, namely GO type, GO content, and aging types. Twelve comparisons are conducted in total, with 6 for each data set. The results show that the XGB model can capture the characteristics of the GO-modified asphalt well and thus could construct a good relationship between different inputs and outputs in the two data sets. Only a few samples of the φ data set have small deviations from the model’s predictions, which can be seen in Fig.8(f), Fig.8(g), and Fig.8(l). Overall, the validity and robustness of the XGB model are proven to be good in predicting the G* and φ of GO-modified asphalt.
5.6 Sensitivity analysis
This section describes selection of the best XGB model for predicting GO-modified asphalt’s G* and φ for sensitivity analysis. SHAP value analysis is performed to evaluate the importance of input parameters for the output of the G* data set (Fig.9(a)) and the φ data set (Fig.9(b)). Note that, in order from top to bottom, the vertical axis ranks the feature values of the input parameters from most important to least important. The horizontal axis is the values of the output normalized to the SHAP value. The points’ color represents the input value’s magnitude, ranging from pink (high values) to blue (low values). Each dot represents a single data point in the data set, with the position along the x-axis representing its effect on the data output. The dots are stacked to show the density when multiple dots fall on the same position along the x-axis. Increasing the input parameter value increases the SHAP value, showing a more positive influence of that parameter on the output and vice versa.
It is easy to see that the initial characteristics of the asphalt, GO content, and mixing temperature are the three parameters that most significantly affect the G* and φ, of which the asphalt’s two initial GO properties are the most critical factors. The initial property also positively affects the corresponding output, and the output achieves higher values when the initial property increases, and vice versa. Since both data sets are collected from different experiments, including various types of asphalt, such findings are important and representative. Therefore, to accurately predict the G* and φ of GO-modified asphalt, the original asphalt’s physical properties are an important input that should be present in the input space. The findings are consistent with Hosseini et al.’s study [60] on the use of additives such as CR, polyphosphoric acid, and SBS for asphalt modification. Meanwhile, if the GO content is increased, G* increases and φ decreases. At high temperatures, the asphalt viscosity decreases, and the flowability increases. At high temperatures, this is also the primary factor responsible for the rutting of asphalt pavement. Therefore, a high value of G* means better high-temperature performance, resulting in better resistance to rutting in high-temperature environments. Low values of φ indicate better deformation resilience of asphalt [27,29,53]. Thus, increasing GO content improves the high-temperature performance of GO-modified asphalt. The color distribution of the dots on the horizontal axis regarding the mixing temperature shows that it is a complex parameter. In other words, when the value of this variable changes, the SHAP value changes nonlinearly.
In descending order, RPM and aging type are the parameters that affect G*, while the rest of the parameters of this data set have a negligible influence on the output. In fact, when the values of those other variables change, the SHAP values fluctuate insignificantly around 0. This behavior is similar to that of the φ data set where aging type, diameter, and layer are parameters with decreasing influence on φ. The variables RPM, GO thickness and mixing time have a minimal influence on φ. Thus, changing the mixing time in the range of the current data set has almost no effect on the G* and φ values of the GO-modified asphalt. This conclusion is beneficial for material engineers to investigate the mixing process while introducing GO into asphalt to improve the viscoelastic properties of GO-modified asphalt.
With the recommended XGB model, the importance of features is evaluated. The influence of each input parameter on the output of the problem is clearly shown on the SHAP value graph. However, this graph only considers the effect of each single variable on the output but does not consider the influence of pairs of variables. Analyzing the coupling effect of variables on the outputs could be beneficial to the research community. Therefore, this study develops the SHAP dependency and interactive graph to highlight the correlation between the critical input parameters and the two outputs.
In this section, a 2D dependency graph analysis is proposed to evaluate the correlation relationship between pairs of important variables and their influence on G* (Fig.10) and φ (Fig.11). The three important input parameters to be considered are the initial physical property, GO content, and mixing temperature. On these graphs, the horizontal axis shows the value of the main input variable, while the SHAP value of the corresponding output variable is displayed on the vertical axis. The color on the right represents the value of the remaining input variable and is sorted in descending order from pink to blue.
The dependence and interaction between initial G* with GO content (Fig.10(a)) and mixing temperature (Fig.10(b)) shows that the relationship between the initial G* and the G* of GO-modified asphalt is almost linear. G* reaches its maximum value when the initial G* is ~2000 kPa with a maximum GO content of 3%, and the mixing temperature is between 120 and 150 °C. In general, an increase in GO content positively affects G*, except in the case of asphalt, whose initial G* varies between 750 and 1100 kPa. This conclusion has high reliability; the studies of Zeng et al. [27] and Wang et al. [53] also show that G* increases gradually with increasing content of GO. However, the arrangement of data points representing the correlation between GO content and mixing temperature (Fig.10(c)) has only a slight slope, indicating that GO content’s influence on G* is insignificant. In some cases, the GO content exceeds a specific value, and the improvement of high-temperature performance is not obvious, especially for monolayer GO with less than 0.5% content. The reason may be due to the agglomeration of nano-sized materials affecting the high-temperature performance of the asphalt [61].
Considering the φ data set, the dependency graphs for the initial φ with GO content (Fig.11(a)) and mixing temperature (Fig.11(b)) demonstrates that the φ increases as its initial values increase to a limited value in its data range. In other words, this is a linear relationship with a very steep slope. The φ reaches the maximum value corresponding to the GO content of 3% and about 150 °C mixing temperature. This finding has been confirmed by the experimental results of Zeng et al. [27]. It is worth noticing that, for each initial G* value, changing the GO content or mixing temperature does not significantly change φ. This is more evident in the dependency graph between GO content and mixing temperature, as φ decreases when GO content increases and reaches the maximum value corresponding to 0.5% of GO content (Fig.11(c)).
Overall, sensitivity analysis can be useful for material engineers and provides important information before the experimental phase. Calculations can be conducted before performing time-consuming and costly tests. Furthermore, it is important to consider the most critical input parameters discovered in the sensitivity analysis for accurate predictions of G* and φ of GO-modified asphalt. With the help of SHAP analysis, it is possible to simplify later tests by neglecting factors that have little influence on the results.
6 Limitations of this study
Despite the promising results obtained with the XGB model for predicting the viscoelastic characteristics of GO-modified asphalt at medium and high temperatures, there are some limitations to this study that should be pointed out.
1) Limited temperature range. The study focuses on medium to high temperatures, which may not capture the behavior of GO-modified asphalt at low temperatures. Including low-temperature performance in the analysis could provide a more comprehensive understanding of the material’s behavior under a wider range of temperature conditions.
2) Data set size. Although the study uses data sets with 357 samples for G* and 339 samples for φ, increasing the size and diversity of the data set could improve the model’s generalizability and performance. Expanding the data set to include additional asphalt binder compositions, testing conditions, and sources could lead to more robust predictions.
3) Model comparison. While the study compares the XGB model with five other ML models (ANN, LGB, LR, DT, and RF), there might be other relevant models or techniques not considered in this comparison. A more extensive comparison with a broader range of models could provide a better understanding of the XGB model’s performance relative to other approaches.
4) Hyperparameter tuning. Another limitation is the use of grid search for hyperparameter tuning, which can be computationally expensive and may not always find the optimal combination of hyperparameters. In future work, metaheuristic algorithms could be employed to more efficiently explore the hyperparameter space and potentially improve the performance of the XGB model.
By acknowledging these limitations, future research can build upon the findings of this study, addressing the identified gaps, and further improve the understanding and prediction of the viscoelastic characteristics of GO-modified asphalt.
7 Conclusions
This study proposes the XGB model to predict G* and φ of GO-modified asphalt and to construct a correlation between input parameters and the corresponding outputs. Two databases encompassing 357 and 339 experimental results are used to develop the XGB model. To achieve reliable simulation results and find an appropriate model with robust predictive ability, the study applies the CV technique with 5 folds on the training data set.
The research results show that the XGB model accurately predicts G* and φ of GO modified asphalt with performance evaluation criteria on the testing set such as R2 = 0.990, RMSE = 31.499 for the G* data set, and R2 = 0.990, RMSE = 1.080 for φ data set. The model also demonstrates validity in predicting the outputs when assessed in comparison with the literature’s experimental samples. In addition, the model has superior predictive performance compared to five ML models, including ANN, LGB, LR, DT, and RF. Finally, the SHAP values analysis is developed to construct correlations between the three most important variables for the two data sets. The results show that the initial G* and initial φ parameters are the two most important parameters of the two data sets.
This work contributes to a better understanding of the behavior of GO-modified asphalt binders and provides a valuable tool for engineers and researchers working on asphalt mixture design and performance evaluation.
Hu K, Yu C, Yang Q, Chen Y, Chen G, Ma R. Multi-scale enhancement mechanisms of graphene oxide on styrene-butadiene-styrene modified asphalt: An exploration from molecular dynamics simulations. Materials and Design, 2021, 208: 109901
[2]
Anurag K, Xiao F, Amirkhanian S N. Laboratory investigation of indirect tensile strength using roofing polyester waste fibers in hot mix asphalt. Construction and Building Materials, 2009, 23(5): 2035–2040
[3]
Putman B J, Amirkhanian S N. Utilization of waste fibers in stone matrix asphalt mixtures. Resources, Conservation and Recycling, 2004, 42(3): 265–274
[4]
Sengoz B, Isikyakar G. Analysis of styrene-butadiene-styrene polymer modified bitumen using fluorescent microscopy and conventional test methods. Journal of Hazardous Materials, 2008, 150(2): 424–432
[5]
Shen J, Amirkhanian S, Xiao F, Tang B. Influence of surface area and size of crumb rubber on high temperature properties of crumb rubber modified binders. Construction and Building Materials, 2009, 23(1): 304–310
[6]
Xiao F, Amirkhanian S N, Shen J, Putman B. Influences of crumb rubber size and type on reclaimed asphalt pavement (RAP) mixtures. Construction and Building Materials, 2009, 23(2): 1028–1034
[7]
Cao W. Study on properties of recycled tire rubber modified asphalt mixtures using dry process. Construction and Building Materials, 2007, 21(5): 1011–1015
[8]
Sun Y, Luo Y, Jia D. Preparation and properties of natural rubber nanocomposites with solid-state organomodified montmorillonite. Journal of Applied Polymer Science, 2008, 107(5): 2786–2792
[9]
Zhang H, Wang Y, Wu Y, Zhang L, Yang J. Study on flammability of montmorillonite/styrene-butadiene rubber (SBR) nanocomposites. Journal of Applied Polymer Science, 2005, 97(3): 844–849
[10]
Zhang B, Xi M, Zhang D, Zhang H, Zhang B. The effect of styrene-butadiene-rubber/montmorillonite modification on the characteristics and properties of asphalt. Construction and Building Materials, 2009, 23(10): 3112–3117
[11]
Yildirim Y. Polymer modified asphalt binders. Construction and Building Materials, 2007, 21(1): 66–72
[12]
Xiao F, Amirkhanian A N, Amirkhanian S N. Long-term ageing influence on rheological characteristics of asphalt binders containing carbon nanoparticles. International Journal of Pavement Engineering, 2011, 12(6): 533–541
[13]
Amirkhanian A N, Xiao F, Amirkhanian S N. Characterization of unaged asphalt binder modified with carbon nano particles. International Journal of Pavement Research and Technology, 2011, 4(5): 281–286
[14]
Goli A, Ziari H, Amini A. Influence of carbon nanotubes on performance properties and storage stability of SBS modified asphalt binders. Journal of Materials in Civil Engineering, 2017, 29(8): 04017070
[15]
Abdullah M E, Zamhari K A, Hainin M R, Oluwasola E A, Hassan N A, Yusoff N I M. Engineering properties of asphalt binders containing nanoclay and chemical warm-mix asphalt additives. Construction and Building Materials, 2016, 112: 232–240
[16]
de Melo J V S, Trichês G. Evaluation of properties and fatigue life estimation of asphalt mixture modified by organophilic nanoclay. Construction and Building Materials, 2017, 140: 364–373
[17]
El-Shafie M, Ibrahim I M, Abd El Rahman A M M. The addition effects of macro and nano clay on the performance of asphalt binder. Egyptian Journal of Petroleum, 2012, 21(2): 149–154
[18]
You Z, Mills-Beale J, Foley J M, Roy S, Odegard G M, Dai Q, Goh S W. Nanoclay-modified asphalt materials: Preparation and characterization. Construction and Building Materials, 2011, 25(2): 1072–1078
[19]
Khattak M J, Khattab A, Rizvi H R, Zhang P. The impact of carbon nano-fiber modification on asphalt binder rheology. Construction and Building Materials, 2012, 30: 257–264
[20]
KhattakM JKhattabARizviH R. Mechanistic characteristics of asphalt binder and asphalt matrix modified with nano-fibers. Geo-Frontiers 2011: Advances in Geotechnical Engineering, 2011: 4812–4822
[21]
Arabani M, Faramarzi M. Characterization of CNTs-modified HMA’s mechanical properties. Construction and Building Materials, 2015, 83: 207–215
[22]
ZhouH YZhangL. Study on physical and rheological properties of graphene oxide compounded SBS modified asphalt. Journal of Highway and Transportation Research and Development, 2021, 38(1): 10−18 (in Chinese)
Liu K, Zhu J, Zhang K, Wu J, Yin J, Shi X. Effects of mixing sequence on mechanical properties of graphene oxide and warm mix additive composite modified asphalt binder. Construction and Building Materials, 2019, 217: 301–309
[25]
Zhu J, Zhang K, Liu K, Shi X. Adhesion characteristics of graphene oxide modified asphalt unveiled by surface free energy and AFM-scanned micro-morphology. Construction and Building Materials, 2020, 244: 118404
[26]
Duan S, Li J, Muhammad Y, Su Z, Meng F, Yang H, Yao X. Synthesis and evaluation of high-temperature properties of butylated graphene oxide composite incorporated SBS (C4H9-GO/SBS)-modified asphalt. Journal of Applied Polymer Science, 2019, 136(46): 48231
[27]
Zeng W, Wu S, Pang L, Sun Y, Chen Z. The utilization of graphene oxide in traditional construction materials: Asphalt. Materials, 2017, 10(1): 48
[28]
Li Y, Wu S, Amirkhanian S. Investigation of the graphene oxide and asphalt interaction and its effect on asphalt pavement performance. Construction and Building Materials, 2018, 165: 572–584
[29]
Wu S, Zhao Z, Li Y, Pang L, Amirkhanian S, Riara M. Evaluation of aging resistance of graphene oxide modified asphalt. Applied Sciences, 2017, 7(7): 702
[30]
Nguyen H Q, Ly H B, Tran V Q, Nguyen T A, Le T T, Pham B T. Optimization of artificial intelligence system by evolutionary algorithm for prediction of axial capacity of rectangular concrete filled steel tubes under compression. Materials, 2020, 13(5): 1205
[31]
NguyenT ALyH B. Prediction of critical elastic buckling load of cellular H-section beams using support vector machine. Transport and Communications Science Journal, 2020, 71(5): 500−513 (in Vietnamese)
[32]
LyH BNguyenT A. Artificial neural network based modeling of the axial capacity of rectangular concrete filled steel tubes. Transport and Communications Science Journal, 2020, 71(2): 154−166 (in Vietnamese)
[33]
Baldo N, Manthos E, Miani M. Stiffness modulus and marshall parameters of hot mix asphalts: Laboratory data modeling by artificial neural networks characterized by cross-validation. Applied Sciences, 2019, 9(17): 3502
[34]
Daneshvar D, Behnood A. Estimation of the dynamic modulus of asphalt concretes using random forests algorithm. International Journal of Pavement Engineering, 2022, 23(2): 250–260
[35]
Behnood A, Golafshani E M. Predicting the dynamic modulus of asphalt mixture using machine learning techniques: An application of multi biogeography-based programming. Construction and Building Materials, 2021, 266: 120983
[36]
Barugahare J, Amirkhanian A N, Xiao F, Amirkhanian S N. Predicting the dynamic modulus of hot mix asphalt mixtures using bagged trees ensemble. Construction and Building Materials, 2020, 260: 120468
[37]
Behnood A, Daneshvar D. A machine learning study of the dynamic modulus of asphalt concretes: An application of M5P model tree algorithm. Construction and Building Materials, 2020, 262: 120544
[38]
Gong H, Sun Y, Dong Y, Han B, Polaczyk P, Hu W, Huang B. Improved estimation of dynamic modulus for hot mix asphalt using deep learning. Construction and Building Materials, 2020, 263: 119912
[39]
Hussain F, Ali Y, Irfan M, Ashraf M, Ahmed S. A data-driven model for phase angle behaviour of asphalt concrete mixtures based on convolutional neural network. Construction and Building Materials, 2021, 269: 121235
[40]
Majidifard H, Jahangiri B, Rath P, Contreras L U, Buttlar W G, Alavi A H. Developing a prediction model for rutting depth of asphalt mixtures using gene expression programming. Construction and Building Materials, 2021, 267: 120543
[41]
Gong H, Sun Y, Shu X, Huang B. Use of random forests regression for predicting IRI of asphalt pavements. Construction and Building Materials, 2018, 189: 890–897
[42]
Abdelaziz N, Abd El-Hakim R T, El-Badawy S M, Afify H A. International Roughness Index prediction model for flexible pavements. International Journal of Pavement Engineering, 2020, 21(1): 88–99
[43]
Xiao F, Amirkhanian S, Juang C H. Prediction of fatigue life of rubberized asphalt concrete mixtures containing reclaimed asphalt pavement using artificial neural networks. Journal of Materials in Civil Engineering, 2009, 21(6): 253–261
[44]
Golzar K, Jalali-Arani A, Nematollahi M. Statistical investigation on physical–mechanical properties of base and polymer modified bitumen using artificial neural network. Construction and Building Materials, 2012, 37: 822–831
[45]
Specht L, Khatchatourian O. Application of artificial intelligence to modelling asphalt-rubber viscosity. International Journal of Pavement Engineering, 2014, 15(9): 799–809
[46]
Momeni E, He B, Abdi Y, Armaghani D J. Novel hybrid XGBoost model to forecast soil shear strength based on some soil index tests. Computer Modeling in Engineering & Sciences, 2023, 136(3): 2527–2550
[47]
Zhou J, Qiu Y, Zhu S, Armaghani D J, Khandelwal M, Mohamad E T. Estimation of the TBM advance rate under hard rock conditions using XGBoost and Bayesian optimization. Underground Space, 2021, 6(5): 506–515
[48]
Yari M, Armaghani D J, Maraveas C, Ejlali A N, Mohamad E T, Asteris P G. Several tree-based solutions for predicting flyrock distance due to mine blasting. Applied Sciences, 2023, 13(3): 1345
[49]
Li D, Liu Z, Armaghani D J, Xiao P, Zhou J. Novel ensemble tree solution for rockburst prediction using deep forest. Mathematics, 2022, 10(5): 787
[50]
Asteris P G, Rizal F I M, Koopialipoor M, Roussis P C, Ferentinou M, Armaghani D J, Gordan B. Slope stability classification under seismic conditions using several tree-based intelligent techniques. Applied Sciences, 2022, 12(3): 1753
[51]
Huat C Y, Moosavi S M H, Mohammed A S, Armaghani D J, Ulrikh D V, Monjezi M, Hin Lai S. Factors influencing pile friction bearing capacity: Proposing a novel procedure based on gradient boosted tree technique. Sustainability, 2021, 13(21): 11862
[52]
Mukaka M M. Statistics corner: A guide to appropriate use of correlation coefficient in medical research. Malawi Medical Journal: The Journal of Medical Association of Malawi, 2012, 24(3): 69–71
[53]
Wang R, Yue J, Li R, Sun Y. Evaluation of aging resistance of asphalt binder modified with graphene oxide and carbon nanotubes. Journal of Materials in Civil Engineering, 2019, 31(11): 04019274
[54]
Habib N Z, Aun N C, Zoorob S E, Lee P I. Use of graphene oxide as a bitumen modifier: An innovative process optimization study. Advanced Materials Research, 2015, 1105: 365–369
[55]
ChenT QGuestrinC. Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: Association for Computing Machinery, 2016, 785–794
[56]
Baturynska I, Martinsen K. Prediction of geometry deviations in additive manufactured parts: Comparison of linear regression with machine learning algorithms. Journal of Intelligent Manufacturing, 2021, 32(1): 179–200
[57]
LundbergS MLeeS I. A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems 30 (NIPS 2017). Long Beach: NeurIPS, 2017
[58]
Yang L, Shami A. On hyperparameter optimization of machine learning algorithms: Theory and practice. Neurocomputing, 2020, 415: 295–316
[59]
Zhang W, Wu C, Zhong H, Li Y, Wang L. Prediction of undrained shear strength using extreme gradient boosting and random forest based on Bayesian optimization. Geoscience Frontiers, 2021, 12(1): 469–477
[60]
Hosseini A S, Hajikarimi P, Gandomi M, Nejad F M, Gandomi A H. Genetic programming to formulate viscoelastic behavior of modified asphalt binder. Construction and Building Materials, 2021, 286: 122954
[61]
Liu K, Zhang K, Shi X. Performance evaluation and modification mechanism analysis of asphalt binders modified by graphene oxide. Construction and Building Materials, 2018, 163: 880–889
RIGHTS & PERMISSIONS
Higher Education Press
AI Summary 中Eng×
Note: Please be aware that the following content is generated by artificial intelligence. This website is not responsible for any consequences arising from the use of this content.