1. Department of Civil Engigeering, Bogazici University, Istanbul 34342, Turkey
2. School of Business, Quantitative Methods Division, Istanbul University, Istanbul 34320, Turkey
ozer.cinicioglu@boun.edu.tr
Show less
History+
Received
Accepted
Published
2018-10-18
2019-02-25
2020-02-15
Issue Date
Revised Date
2019-10-22
PDF
(1129KB)
Abstract
The purpose of this study is the accurate prediction of undrained shear strength using Standard Penetration Test results and soil consistency indices, such as water content and Atterberg limits. With this study, along with the conventional methods of simple and multiple linear regression models, three machine learning algorithms, random forest, gradient boosting and stacked models, are developed for prediction of undrained shear strength. These models are employed on a relatively large data set from different projects around Turkey covering 230 observations. As an improvement over the available studies in literature, this study utilizes correct statistical analyses techniques on a relatively large database, such as using a train/test split on the data set to avoid overfitting of the developed models. Furthermore, the validity and consistency of the prediction results are ensured with the correct use of statistical measures like p-value and cross-validation which were missing in previous studies. To compare the performances of the models developed in this study with the prior ones existing in literature, all models were applied on the test data set and their performances are evaluated in terms of the resulting root mean squared error (RMSE) values and coefficient of determination (R2). Accordingly, the models developed in this study demonstrate superior prediction capabilities compared to all of the prior studies. Moreover, to facilitate the use of machine learning algorithms for prediction purposes, entire source code prepared for this study and the collected data set are provided as supplements of this study.
With the ever growing project sites, constrained budgets and reduced project deadlines, correlations in geotechnical engineering together with field and laboratory tests have formed the basis of the design process. Therefore, it is invaluable to provide the engineers with tools to obtain critical soil properties from the results of commonly used practical field tests. However, most conventional field testing methods do not allow the direct measurement of the necessary design parameters and relating the results of field tests to the necessary design parameters requires the use of empirical correlations. One such design parameter is undrained shear strength.
Undrained shear strength is an essential parameter in geotechnical design when cohesive soils are considered. Stability calculations based on undrained strength of soils are both more practical and safer when the problem involves rapid loading and positive excess pore water pressure generation. Undrained shear strength can be measured with laboratory tests (i.e., unconfined compression test, triaxial test, etc.) and field vane shear test which is the only field test that can directly measure undrained shear strength. However, laboratory tests require high quality undisturbed samples increasing the economic burden on the project and it is generally impossible to conduct vane tests on most soil profiles. Therefore, a more practical and cost effective alternative for determining undrained shear strength is to use statistical methods that explore the relationship between undrained shear strength and field test results. This is an attractive alternative for design engineers, because reliable predictions of undrained shear strength can be made without the economic burden of laboratory testing. This fact has been also acknowledged by previous studies [1–3] which use SPT results in order to predict undrained shear strength. However, the statistical analyses conducted in most of the available works in literature are preliminary in the sense that they use only one independent variable to predict the dependent one, fail to inform the reader regarding the number of observations used and the level of significance of the results obtained. A statistical analysis conducted in that manner will naturally involve ambiguities which may lead to underestimation or overestimation of undrained shear strength [2]. Additionally, even though many varying equations are suggested in literature, a performance comparison of the suggested models with their preceding alternatives is mostly neglected. As will be further detailed in Section 3 for prediction of undrained shear strength, this omission in turn, leaves the legacy of their results open to discussions.
Additionally, the use of statistical analysis in geotechnical engineering is invaluable, since its application promises to reveal the underlying mechanism of soil behavior. In this regard, there exist remarkable contributions on how to treat the uncertainties existing in different domains of engineering with the use of the sensitivity and stochastic analyses [4–6]. On the other hand, the results revealed with a statistical analysis may only be considered as valid and applicable estimations of the underlying relationship, if the statistical analyses are conducted in the correct statistical manner. If not, resulting equation becomes nothing more than an invalid estimation which is destined to fail when a new data set does emerge. Therefore, as will be further discussed in text, when conducting a statistical analysis, it is the utmost duty of both the practitioners and the researchers to pay attention to the use of the necessary confirmatory statistical metrics.
In this context, one of the projected contributions of this paper is to present a critical review of existing literature on the use of SPT for predicting undrained shear strength. For that reason, first prior studies that lack the use of the correct statistical metrics for model development are identified. Next, the available data set is randomly divided into testing and training sets. Subsequently, using the training data set that corresponds to 80% of the whole data set, new simple and multiple linear regression models are developed for prediction of undrained shear strength in the correct statistical manner. Then, performances of the suggested models in this work and the prior ones in literature are tested using the test data set, which corresponds to 20% of the whole data set chosen on a random basis and spared for testing purposes only. At last, two new machine learning algorithms, random forest, and gradient boosting are proposed for the prediction of undrained shear strength. Finally, in efforts of improving the accuracy of the suggested models, a stacked model is developed for prediction which is a combination of the three aforementioned models. The superior performance of the machine learning models created suggest that they may be considered as an attractive alternative to the conventional method of linear regression. Accordingly, the outline of the remainder of the paper is as follows: in Section 2, the data set used in this study is introduced, accompanied by a brief description of SPT. Furthermore, previous models for prediction of undrained shear strength are reviewed. Then, models used for estimating undrained shear strength, linear regression, random forest, gradient boosting, are explained. In Section 3, the results of the models developed for this study are compared with the results of the models frequently used in practice. Finally in Section 4, conclusions are summarized. All of the analyses in this study are conducted using the Classification and Regression Trees (CARET) package available in R version 2017, a free software environment for statistical computing and graphics that is supported by the R Foundation for Statistical Computing [7].
Data set and method
In this study, the constructed models will explore the relation of the SPT results along with the parameters indicating the soil properties to undrained shear strength of fine grained soils. These parameters include the water content (wn) and the Atterberg limits: liquid limit (LL), plastic limit (PL), and plasticity index (PI).
SPT, though it is the most commonly used in situ test for site exploration, cannot be directly used to measure any mechanical properties. However, general trend in SPT results is that better soil conditions correspond to higher expected blow counts. It is this indication of a possible positive correlation between SPT blow count and cu that leads many researchers to work on estimation of undrained shear strength of cohesive soils using SPT results [1,8–11].
SPT typically involves a standard sampler driven into the ground by energy delivered from a 63.5 kg weight hammer dropped from a height of 760 mm. The process is repeated until the sampler has penetrated a distance of 450 mm into the soil. Hammer blows required to penetrate each interval of 150 mm are recorded. The test is stopped if the number of blows required to penetrate a certain 150 mm interval exceeds 50, or if more than 100 blows are required for the entire 300 mm. The SPT-N value is calculated by adding up the sum of the blows required to penetrate the final 300 mm. Due to factors and variables such as borehole diameter, hammer configuration and many more, SPT hammer efficiencies can vary widely. Accordingly, following the suggestions of Bolton Seed et al. [12] and Skempton [13], it is now common practice to apply corrections to the raw SPT-N values so as to render them useful to the engineer. A hammer efficiency of 60% is now the standard level to which all N values are correlated (N60). Hence, in this study the example of Hettiarachchi and Brown [14] is followed and the correlations are based on N60. Another widely used SPT N value correction is overburden correction. Overburden correction is necessary for cohesionless soils especially when the SPT results would be used for predicting relative density. Thus, overburden correction is not used in this study.
The accuracy of a statistical analysis increases with the number of observations included in the data set and the validity of a suggested model improves if the observations are sampled from different populations. Accordingly, the data set collected for this study covers 230 observations from different projects around Turkey. The observed parameters include undrained shear strength (cu), SPT blow count corrected for efficiency (N60), Atterberg limits (LL, PL, PI) and natural water content (wn). SPT blow count cannot be used to estimate strength values exceeding 200 kPa, since SPT is not a suitable test for such geological materials. These soils are referred to as intermediate geomaterials representing a state between soils and rock. When intermediate geomaterials are encountered, the necessary number of SPT blows exceeds the aforementioned limits before the target penetration depth is reached and the test is terminated prematurely. For such tests, number of blows is not recorded and result is given as “refusal”. On the other hand, samples obtained from intermediate geomaterials can be tested in laboratory to measure undrained shear strength. Since SPT tests that resulted in “refusal” cannot be used in the analyses, in order to maintain consistency in the interpretation of data, data points with undrained shear strength values greater than 200 kPa are excluded from the data set. Thus, the final data set used in this study covers a total of 214 observations. In Fig. 1 the scatter plots for each of the independent variables used in this study, water content (wn), LL, PL, and PI versus cu, the target variable of prediction, are provided. Additionally, a statistical summary of the final data set used for this study is given in Table 1. The entire data are given in Ref. [15].
For evaluation of the performance of the suggested models, all of the statistical analyses conducted in this research use an 80%–20% train-test split on the data set.
Most of the earlier research in literature on the estimation of undrained shear strength from SPT results used either simple or multiple linear regression models for prediction. However, most of these works suffer from the lack of the use of the correct statistical metrics for developing a valid regression model. Previous studies on the subject matter are given in Table 2 along with the equations proposed. Additionally, Table 2 provides information on the number of observations used in each study and the results of the performance metrics, such as R2 (coefficient of determination), p-value, and root mean squared error (RMSE). The absence of these metrics in the related work is indicated as not available (NA). Therefore, in the following section, a brief description of simple and multiple linear regression models is delivered and its necessary metrics, p-value, RMSE, and R2, for purposes of validation and performance evaluation, are revisited.
A discussion on the available linear regression models in literature
Linear regression modeling is one of the oldest and simplest forms of regression methods. In linear regression, a direct linear relationship between the predictor variables x and the response variable y is assumed (Eq. (1)).
where indicates the prediction of the response variable y based on the values of independent variables of x. represents the coefficient of the corresponding independent variable and p is the number of independent variables used in the study.
The verification of the resulting equation of regression analysis is conducted through the t-tests conducted on each variable of the regression equation. The result of the significance test is reported in the form of p-values, which indicate the probability of obtaining the coefficient, , of the variable xi, on the resulting regression equation by chance, whereas its’ actual value is equal to zero. Therefore, only the variables with a smaller p-value than 0.05 are considered as statistically significant and hence included in the resulting regression equation.
Evaluation of model fit for linear regression can be conducted using different measures such as root mean squared error (RMSE) and the coefficient of determination R2. As depicted in Eq. (2), where N refers to the total number of observations, RMSE, is the square-root of the variance of the residuals. This corresponds to the difference between the observed value and the predicted value . Hence, RMSE shows the absolute fit of the model to data, with lower values indicating a better fit of the model.
The main difference between the RMSE and R2, is that R2 is defined as a relative measure of fit, and not an absolute one. The calculation of R2 is given in Eq. (3). Here, the sum of squared errors given in the nominator and defined as the unexplained variation in the model, is compared to the total sum of squares of the mean model. Hence, the resulting value of R2 usually ranges from 0 to 1, and it is interpreted as the percentage of variation explained by the model compared to the mean model. The equation for R2 is given below in which is the mean of the observed data.
In rare cases where the fit of the suggested model is worse than the fit of the mean model given in the denominator, the resulting R2 value can be negative. That shows that the fit of the prediction model which is the regression line in this case, is actually worse than just fitting a horizontal line for prediction [16].
One of the biggest pitfalls of R2 is that it can artificially increase with the number of independent variables added. To solve this problem adjusted R2 is developed which incorporates the model’s degrees of freedom to make up for the artificial increase in R2. Especially, when multiple linear regression is used, adjusted R2 should be used for evaluation. The formulation of the adjusted R2 is given in Eq. (4) where n indicates the sample size and k stands for the number of independent variables.
Though, both RMSE and R2 are important to evaluate model fit of regression analysis and it is best to use both, RMSE should especially be preferred when the objective of the regression analysis is prediction.
In Table 2 prior works of different researchers for prediction of undrained shear strength of cohesive soils are summarized. The variable in these correlations is given as N60 even though some of the correlations were originally developed with N values without applying any corrections. The underlying reason is, as several researchers [14,17] have suggested, that most of the SPT hammers work close to 60% energy efficiency level and it is prudent to use N60 instead of N in these correlations.
As evident in Table 2, the main problem of the earlier works is the lack of the necessary statistical evidence to judge their validities. Unfortunately, as can be seen in Table 2, none of the earlier works conducted t-tests and used the corresponding p-values for verification of the suggested models. Additionally, the R2 values are only presented in Refs. [1,2,14], whereas RMSE results are missing from all of the earlier works presented in Table 2. Some of these works [1,2,14] report the range of the error in prediction as a performance measure of the regression analysis. However, the range of the error cannot represent the total magnitude of the error made in prediction and therefore compared to RMSE, is not an effective measure of prediction performance. Furthermore, in most of these works no information regarding the number of observations used in the data sets are provided. A correlation equation can only be as flexible as the observations used in its derivation and hence in order to judge the validity of the results, it is important to know the number of observations in the data set. Also, with the exception of Hettiarachchi and Brown [14], none of the prior studies used a test/train split of their data which results in overfitting of the developed models, rendering them unstable and extremely variable. Consequently, though some of the researchers [1,3,14] compared their results with their preceding alternatives, the lack of the use of the correct metrics in terms of comparison makes their results open to discussions. For that reason, the results they present, as given in Table 2, cannot be considered as valid statistical models, and their reported relatively high R2 values [2,3] are only an implication of model overfitting.
Considering all of these points presented above in order to reveal the underlying relation between undrained shear strength and SPT results in a scientifically valid and acceptable form, the correct use of statistical analysis is essential. Therefore, in this work, all the analyses will be performed and presented along with the results of the statistical significance tests conducted.
Proposed models
Model 1: simple linear regression
For comparison purposes with the prior works of Sanglerat [9], Decourt [11], Nixon [10], Ajayi and Balogum [18], Kulhawy and Mayne [19], Sivrikaya and Toğrol [1], and Hettiarachchi and Brown [11] that use simple linear regression for prediction of undrained shear strength, initially a simple linear regression is developed where N60 is the predictor variable. The results of the simple regression analysis and the resulting equation are given as Table 3 and Eq. (5), respectively.
Equation (6) is made unit-independent by dividing both sides with atmospheric pressure (pa = 100 kPa), as shown in Eq. (6):
Figure 2 presents the measured vs. predicted values of the undrained shear strength as estimated by the simple linear regression analysis.
Model 2: multiple linear regression
As the second model of the research, a multiple linear regression model is developed using all of the variables present in the data set. Accordingly, Table 4 presents the results of the multiple linear regression.
As can easily be seen, only the variables N60 and wn proved to be statistically significant in 1% significance level whereas LL, PL, and PI have resulted in p-values well above the acceptable limit of 0.05. Accordingly, as the next step of the regression analysis LL, PL, and PI will be excluded from the analysis, since there is not enough statistical proof for their effects on prediction of undrained shear strength. However, before ongoing with the resulting model consisting of N60 and wn only, the model is further improved by considering the fact that the effect of one of the variables (N60, wn, LL, PL, PI) on undrained shear strength might depend on the effect of another variable. Accordingly, an interaction term wn× LL is added to the model with N60 and wn. A summary of the resulting regression analysis along with the p-values are given in Table 5. As can be seen in Table 5, all of the variables in the new model which include N60, wn and wn×LL are proved to be significant whereas the values for the performance metrics did improve. This result shows that the effect of wn on undrained shear strength is different for different levels of LL. This is expected since the relative magnitude of wn with respect to LL defines a soil sample’s consistency. Two different soil samples might have the same wn, but the consistency of the sample that has the smaller LL is more likely to be softer, because that sample’s water content is closer to its liquid limit. Accordingly, Eqs. (7) and (8) represent the final regression equations. Here, Eq. (7) is normalized with pa (= 100 kPa) to make it unit-independent as given in Eq. (8).
The regression analysis is conducted using the training test which corresponds to 80% of the whole data. The remaining 20% will later be used as the test data for performance evaluation purposes. Figure 3 presents the measured vs. predicted values of the undrained shear strength as estimated by the multiple linear regression analysis.
Model 3: random forest
The random forest is a powerful machine learning algorithm used for both regression and classification problems. It is essentially a tree-based method which involves breaking up the predictor space into a number of simple regions. So as to make a prediction of a certain observation, the mean of the training observation of the region to which it belongs is given. Since the splitting criterion used to break up the predictor space can be summarized in a tree, these methods are known as decision tree methods [20].
The random forest tree-based model uses the bagged tree approach. It uses bootstrap resampling to create multiple training sets and fits trees to each bootstrapped training data set. Bootstrap is essentially a resampling technique which involves continuously taking random samples from a training data with replacement. This simply means that selected data may appear more than once in the selected subset [20]. During tree fitting, random forest randomly picks m number of predictors from the total number possible and only searches within these randomly selected predictors for the best possible split [20]. This m predictor number, essentially referred to as mtry in random forest package in the R programming language, becomes a critical tuning parameter in the random forest algorithm [21].
A simple procedure of how to build a random forest is explained as follows [21,22]:
1)Draw a sample from the available data set using bootstrap resampling method (i.e., drawing samples with replacement).
2) Develop the trees by modifying the number of mtry values.
3)Repeat Step 2 until a user defined number of trees is grown. Typically 500 trees.
For each regression tree grown, a different bootstrap sample is drawn from the training set. Two-thirds of the sample is used to determine the regression function and the remainder which is termed as out of bag (OOB) sample is used to validate the accuracy of the function [23].
The random forest algorithm employed in this analysis was implemented from the CARET package [24] found in the R program [7]. In the random forest model, two parameters need to be optimized so as to maximize prediction performance. These parameters are ntree and mtry [25]. ntree is simply defined as the number of trees needed to perform the regression. The prediction accuracy is more sensitive to mtry than ntree [21,25]. Therefore, throughout the analysis ntree is fixed at 500, which is the default number of the random forest model in CARET package available in R.
To determine the best mtry parameter, a 10-fold cross-validation which is repeated 5 times is used to perform a search on a grid of multiple parameter values. Cross-validation involves splitting the training data into two sets randomly, fitting the model on one and testing for its error in the other. Repeating this multiple times and taking the average of the cross-validation results ensures reliability of the cross-validation test. As can easily be seen from the results given in Fig. 4, the optimum value of mtry happens to be 2 in this model.
Unlike linear regression models, random forest is unable to generate a tangible equation as to how prediction was achieved. Instead, as explained earlier, it splits the predictor space into regions while trying to minimize the residual standard error in each region as the splitting takes place. From the available data set 171 observations were used to train the model and 43 observations were used to test our developed model. In Fig. 5 the result of the random forest regression model on the test data are shown where the measured versus predicted cu using the random forest model can be seen.
Model 4: gradient boosting
Gradient boosting is another avenue for improving the prediction results of a decision tree. It can be applied to different statistical learning methods for both regression and classification problems. Gradient boosting is a form of bagging which involves generating multiple training data sets from the original set using bootstrap. This is followed by fitting decision trees to each of these bootstrapped samples, and finally combining all these fitted trees to create a single model. Gradient boosting works in a similar way, except this time each one of the trees is grown by using information from the previously grown trees. This process can be termed as sequential growing. This sequential growing of the trees in the gradient boosting approach allows the algorithm to learn slowly [20].
Let us break down this process a little further to form a clear picture. Given a model, the residuals of the model are fitted rather than the response. Then a new decision tree is added to this so as to update the residuals. These new decision trees can be slow in essence; hence the residuals of the model are slowly improved. In the gradient boosting algorithm, the shrinkage parameter, λ, slows down the process even further allowing smaller decision trees to improve the residuals hence improving the performance of the model. Herein, the shrinkage parameter becomes an essential tuning parameter in the gradient boosting algorithm [26].
The shrinkage parameter (λ), number of trees and interaction depth are what need to be tuned so as to obtain the best predictive model. The shrinkage parameter (λ) is a positive number often between 0.1 and 0.001 which controls the learning process of the gradient boosting algorithm. This parameter determines how slow or fast you wish the algorithm to try and comprehend the data given. If a very low value of the shrinkage parameter is chosen, the learning process will be very slow and a large number of trees will be required. A very large number of trees will lead to overfitting. Friedman [27] suggested the number of trees to range between 100 and 500 for optimal results. The interaction depth controls the complexity of the gradient boosting algorithm. In simpler terms, the interaction depth controls the number of splits in the tree. James et al. [20] suggested that an interaction depth between 1 and 5 often worked well. However, to determine the best shrinkage parameter and the interaction depth that will give us the optimal predictive model a 10-fold cross-validation repeated 5 times is applied as done for the random forest model. From the cross–validation results presented in Fig. 6, best model parameters for the prediction of the undrained shear strength are obtained when shrinkage parameter λ is 0.0101 and interaction depth is 3.
Like the random forest model and most of the supervised learning algorithms, it is not possible to derive a tangible equation. In Fig. 7 the relationship between the measured versus predicted parameters obtained using the gradient boosting algorithm are presented. The comparison was done using the test data which was not used while training the model.
Model 5: the stacked model
The main goal of using a stacked model is to increase the prediction performance by incorporating different models to the analysis. Already developed models can be stacked to further improve their predictive capabilities [28]. In Fig. 8, the simple structure of a stacked model is presented. Stacking involves feeding of the predictions of lower layers to an upper layer stacking function. The machine learning models pass their predictions to the upper layer and this layer makes decisions based on performances of the models in the layer below. Here, it can be seen that the top layer has been labeled as the stacking function. This simply means one can incorporate any machine learning algorithm or simpler functions to the stacking function. The selection of the models to be part of the stacked model should be based on a clearly set performance criteria like a pre-etermined cutoff point for model accuracy or RMSE obtained with the individual model. Accordingly, only the models which meet the requirements of the performance criterion are incorporated to the stacked model. In this study, as presented in Fig. 8, the lower layers are the multiple linear regression, random forest and gradient boosting models. The weighted average of their prediction performances in the form of RMSE indices are used to determine the weights of the stacking function.
When building the stacked model, it is important to ensure that the predictions of the models which are incorporated into the model are not highly correlated with each other. If high correlations among the predictions of individual models are present, then their combination will not result in an improved performance on the stacked model. Therefore, before embarking on creating the stacked model, the correlation between the predictions of individual models should be checked. If the correlation present between two models exceeds a previously determined threshold, then the one with lower performance should be omitted from the stacked model developed.
In this study, the upper limit for this correlation is set to 0.70, which according to Spearman’s coefficient of correlation ranking is termed as a ‘strong’ relationship. In Table 6, the correlation plot of the outputs of the individual models is presented. Consequently, the models used in this study, linear regression, random forest and gradient boosting, are not highly correlated with each other, hence stacking of these models can be undertaken. Figure 9 presents the measured versus predicted values of the undrained shear strength as estimated by the stacked model. As with the other models, comparison was done using the test data which was not used while training the model.
Results, comparison and discussion: suggested models vs. existing work
This section is devoted to the comparison of the performances of the models developed in this work with the aforementioned earlier studies in literature, detailed in Section 2, Table 2. For that purpose, the test data set which corresponds to 20% of the whole data in size, will be used in order to check the prediction performances of all the available models. Several different equations exist in literature developed for the prediction of undrained shear strength of cohesive soils. As given in Section 2 in further detail, other than Sivrikaya [2] and Nassaji and Kalantari [3], all of them use the method of simple linear regression, having N60, as the only predictor variable of undrained shear strength, cu. Sivrikaya [2] and Nassaji and Kalantari [3], on the other hand, use a multiple linear regression model, where their proposed equations include N60, water content, liquid limit, and plasticity index. To compare the performances of the models developed in this study with the nine aforementioned equations existing in literature, all of the models were applied on the test data set and their performances are evaluated in terms of the resulting RMSE, coefficient of determination (R2), and adjusted R2. Hence, as presented in Table 7, a total of 14 models, nine existing and five developed in this work, were used in this section.
Considering the results of the linear regression equations existing in literature and the ones proposed here, it can be seen that both of the linear regression equations developed in this study performed superior compared to their preceding counterparts. The simple linear regression model (Eq. (6)) resulted in an RMSE value of 27.93 and an R2 value of 0.55, whereas its closest follower Hettiarachchi and Brown [14] performed as 30.14 and 0.47 for RMSE and R2, respectively. The multiple linear regression model given in Eq. (8) on the other hand, performed even better, with a lower RMSE value, 24.55 and higher R2 and adjusted R2 values, as 0.68 and 0.67, respectively. As can be clearly be seen in Table 7 the performance of the proposed models are drastically better than their preceding counterparts, both in terms of R2 and RMSE.
Another noteworthy observation regarding Table 7 is that some of the earlier models in literature resulted in negative R2 values for their prediction performances on the test data set. These are the equations suggested by Sanglerat [9], Nixon [10], Decourt [11], Kulhawy and Mayne [19], Sivrikaya and Toğrol [1], and Sivrikaya [2]. As detailed in Section 2.1, negative R2 values only happen when the mean of the data provides a better fit to the outcomes than the model used to do the predictions. A visual comparison for the predictive performances of all the models is provided in Fig. 10. Figure 10 only focuses on the region where the observation data are located. That is why some of the less successful predictions [1,9–11] are truncated at the top boundary.
The poor performance of the earlier studies can be attributed to many factors, like missing verification and evaluation metrics of p-value, R2, and RMSE. Additionally, in none of these studies the method of cross–validation was used in the model development. Moreover, with the exception of Hettiarachchi and Brown [14], none of the earlier works utilized a train/test split of data when developing their models. Hence, the whole data set was used for both developing the model and testing the results. This practice in turn makes their results prone to overfitting. Overfitted models tend to be extremely variable and thus their solutions are unstable. In Ref. [14], instead, a test/train split of the data was performed and the analyses were conducted accordingly. This fact, as presented in Table 7, might be one of the reasons of the better performance of the equation suggested by Hettiarachchi and Brown [14] compared to the previous ones in literature.
The superior performances of the models suggested in this work compared to the preceding counterparts are also apparent in the results of the machine learning models. As can be seen in Table 7, the gradient boosting model resulted in a RMSE value of 25.15 and R2 value of 0.66. Random forest model, on the other hand showed an even better performance, as 23.50 for RMSE and 0.70 for R2. Furthermore, the stacked model, which incorporates the linear regression, gradient boosting and random forest models suggested in this work, has the lowest RMSE score, 22.89 and the highest R2, as 0.73, resulting in the best prediction performance both among the earlier models and the ones developed in this study.
The reason behind the superior performance of the stacked model is down to its ability to combine different predictions coming from other models. Hence, the combination of the outputs of the multiple linear regression, random forest and gradient boosting models resulted in the best prediction performance. Furthermore, when random forest and gradient boosting algorithms are utilized, the cross-validation process ensures the determination of the optimum tuning parameters which in turn induces a ripple effect in producing superior results compared to the equations available in literature.
Hettiarachchi and Brown [14] provide the test data set used in their study which covers 12 observations for N60 and its corresponding undrained shear strength values, cu. As the next step of the analysis, this data set is utilized as a further verification of the performances of the simple linear regression models available in literature (Table 2) and the one developed in this study (Eq. (6)). Since this data set does not provide information regarding the other predictor variables, water content wn and the Atterberg limits, the multiple linear regression and machine learning models are not considered for comparison. In line with this, since only one independent variable, N60, is used for prediction, the results are reported in terms of RMSE and R2, only. In Fig. 11, the predictions of all the simple linear models on the test data set [14] are demonstrated. Additionally, in Table 8, the corresponding RMSE and R2 values are provided. Evidently, as can be seen in Table 8, having the smallest RMSE value and greatest coefficient of determination R2, Eq. (6) suggests the most accurate prediction among simple linear regression equations.
When the available equations in literature are considered, they use either simple linear regression or multiple linear regression. Equations that use simple linear regression benefit from the inherent simplicity of directly linking N60 through a proportionality constant to cu. This is an attractive approach both for the researchers developing the model and the end-user. However, this approach is more prone to errors that stem from the mechanics of the problem. Basically, an SPT sampler is a miniature open-ended pipe pile [14]. During an SPT test, the driving effort must be sufficient to overcome the total of skin resistance and end-bearing resistance. When skin resistance is considered, the influence of the state of the soil on the mechanics of the problem is minor. However, the mechanism by which the end-bearing develops is a function of the state of the soil. As in the case of pipe piles, the possible formation of a soil plug during driving significantly affects the possible magnitude of end resistance. Penetration in stiff to very stiff clays is more likely to result in plug formation and generally formation of a plug is not expected in soft clays. Then to increase the accuracy of cu predictions, it becomes necessary to take the state of the soil into account. Simplest ways the state of a cohesive soil in nature can be described is to either use overconsolidation ratio or to use Atterberg limits and in situ moisture content in combination. As information on liquid limit, plastic limit and in situ moisture content are more readily available in general, this study developed its multiple linear regression using these as input, as previously done by Sivrikaya [2] and Nassaji and Kalantari [3]. However, evident in Table 4, wn is found to be statistically significant, whereas Atterberg limits are not. But, it is still possible that a parameter’s influence might be dependent on its interaction with a statistically insignificant parameter. That is why possible interactions between Atterberg limits and wn were investigated and the influence of the interaction term was found to be significant. Table 5 presents the results of the regression analysis. This outcome is not surprising because it is known that soil behavior is dependent on the relative magnitude of wn with respect to LL. As the magnitude of wn approaches LL, the response gets softer, and vice versa. In other words, the influence of wn on undrained shear strength is dependent on its interaction with LL.
In this study different algorithmic models are used to predict the undrained shear strength using Atterberg limits and the water content. As explained in previous sections, different than the regression models, the machine learning algorithms random forest, gradient boosting and the stacked model do not form tangible equations. Though, this may be considered as a disadvantage of the suggested machine learning algorithms for daily use in practice, the far better accuracy in prediction performance outweighs this shortcoming. Additionally, to ease the use of the developed machine learning models in this work, the entire source code of the models is given in Ref. [29].
Conclusions
With this paper two objectives toward the accurate prediction of undrained shear strength using SPT results are realized. First, both simple and multiple linear regression models are developed and their performances are tested in comparison with the suggested linear regression models in literature. Both of the linear regression models suggested in this work performed superior in terms of the performance metrics, R2 and RMSE. In fact, this result does not come as a surprise. The main problem with the prior studies is the lack of use of the necessary and correct statistical techniques such as the use of p-values for validation, the use of training/test split of data for model verification and the use of cross validation. As a result, their results may demonstrate random chance and their models become prone to overfitting. In this research, all these problems inherent in the prior models are explained and the explanations of the statistical metrics are given in full detail.
The second objective of this study is to demonstrate the use of advanced techniques, machine learning algorithms as a superior alternative to the conventional methods of linear regression for prediction of undrained shear strength. For that purpose, random forest, gradient boosting and the stacked models were developed and used. This choice proved to be successful which is evident in the superior prediction capability of the suggested machine learning algorithms especially in the case of the stacked model which incorporates the MLR, gradient boosting and random forest models. Using the stacked model, several different models can be applied simultaneously. In this regard there exist other machine learning models like hybrid nonlinear modeling used in Ref. [30], artificial neural network or adaptive neuro-fuzzy inference system models used in Ref. [31] which proved to be successful especially in case of high prediction accuracy. As a future study, in order to improve the prediction accuracy, these and other machine learning models can be used for prediction of undrained shear strength. For ease of use of the suggested models by everyone on the other hand, the entire source code is provided to readers in an open data repository as Ref. [29]. Moreover, the data used in this study is provided in Ref. [15] in order to enhance the database in literature on the subject matter and to facilitate the use of the provided code for prediction also by those engineers who don’t have any training data at their disposal.
Sivrikaya O, Toğrol E. Determination of undrained strength of fine-grained soils by means of SPT and its application in Turkey. Engineering Geology, 2006, 86(1): 52–69
[2]
Sivrikaya O. Comparison of artificial neural networks models with correlative works on undrained shear strength. Eurasian Soil Science, 2009, 42(13): 1487–1496
[3]
Nassaji F, Kalantari B. SPT Capability to estimate undrained shear strength of fine- grained soils of Tehran, Iran. Electronic Journal of Geotechnical Engineering, 2011, 16: 1229–1238
[4]
Vu-Bac N, Lahmer T, Zhuang X, Nguyen-Thoi T, Rabczuk T. A software framework for probabilistic sensitivity analysis for computationally expensive models. Advances in Engineering Software, 2016, 100: 19–31
[5]
Hamdia K M, Silani M, Zhuang X, He P, Rabczuk T. Stochastic analysis of the fracture toughness of polymeric nanoparticle composites using polynomial chaos expansions. International Journal of Fracture, 2017, 206(2): 215–227
[6]
Hamdia K M, Ghasemi H, Zhuang X, Alajlan N, Rabczuk T. Sensitivity and uncertainty analysis for flexoelectric nanostructures. Computer Methods in Applied Mechanics and Engineering, 2018, 337: 95–109
[7]
R Core Team. R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing, 2017
[8]
Terzaghi K, Peck R B. Soil Mechanics in Engineering Practice. New York: Wiley, 1967
[9]
Sanglerat G. The Penetrometer and Soil Exploration: Interpretation of Penetration Diagrams, Theory and Practice. Developments in Geotechnical Engineering 1. Amsterdam: Elsevier Publishing Company, 1972
[10]
Nixon I K. Standard penetration test: State of the art report. In: Proceedings of the 2nd European Symposium on Penetration Testing, Vol 1. Stockholm: AA Balkema Publishers, 1982, 3–24
[11]
Decourt L. General Report/Discussion session 2: SPT, CPT, pressuremeter testing and recent developments in in-situ testing- Part 2: The standard penetration test, state-of-the-art report. In: The 12th International Conference on Soil Mechanics and Foundation Engineering. Rio De Janeiro: Taylor & Francis, 1989, 2405–2416
[12]
Bolton Seed H, Tokimatsu K, Harder L F, Chung R M. Influence of SPT procedures in soil liquefaction resistance evaluations. Journal of Geotechnical Engineering, 1985, 111(12): 1425–1445
[13]
Skempton A W. Standard penetration test procedures and the effects in sands of overburden pressure, relative density, particle size, ageing and overconsolidation. Geotechnique, 1986, 36(3): 425–447
[14]
Hettiarachchi H, Brown T. Use of SPT blow counts to estimate shear strength properties of soils: Energy balance approach. Journal of Geotechnical and Geoenvironmental Engineering, 2009, 135(6): 830–834
[15]
Khalid W, Cinicioglu E N, Cinicioglu O. Undrained Shear Strength, SPT, Water Content, Atterberg Limits-2018. Mendeley Data v1, 2018
[16]
Bisht D C, Jangid A. Discharge modelling using adaptive neuro-fuzzy inference system. International Journal of Advanced Science and Technology, 2011, 31: 99–114
[17]
McGregor J A, Duncan J M. Performance and Use of the Standard Penetration Test in Geotechnical Engineering Practice. Blacksburg, Virginia: Virginia Polytechnic Institute and State University, 1998
[18]
Ajayi L A, Balogum L A. Penetration testing in tropical lateritic and residual soils—Nigerian experience. In: First International Symposium on Penetration Testing. Rotterdam: Balkema Pub., 1988, 315–328
[19]
Kulhawy F H, Mayne P W. Manual on Estimating Soil Properties for Foundation Design. California: Palo Alto, 1990
[20]
James G, Witten D, Hastie T, Tibshirani R. An Introduction to Statistical Learning. New York: Springer, 2013
[21]
Zhou J, Shi X, Du K, Qiu X, Li X, Mitri H S. Feasibility of random-forest approach for prediction of ground settlements induced by the construction of a shield-driven tunnel. International Journal of Geomechanics, 2017, 17(6): 04016129
[22]
Breiman L. Random forests. Machine Learning, 2001, 45(1): 5–32
[23]
Adusumilli S, Bhatt D, Wang H, Bhattacharya P, Devabhaktuni V. A low-cost INS/GPS integration methodology based on random forest regression. Expert Systems with Applications, 2013, 40(11): 4653–4659
[24]
Kuhn M. Building predictive models in R using the CARET package. Journal of Statistical Software, 2008, 28(5): 1–26
[25]
Kuhn M, Johnson K. Applied Predictive Modeling. New York: Springer, 2013
[26]
Bishop C M. Pattern Recognition and Machine Learning. New York: Springer, 2011
[27]
Friedman J H. Greedy function approximation: A gradient boosting machine. Annals of Statistics, 2001, 29(5): 1189–1232
[28]
Breiman L. Stacked regressions. Machine Learning, 1996, 24(1): 49–64
[29]
Khalid W, Cinicioglu E N, Cinicioglu O. Code for Predicting Undrained Shear Strength Using CARET Package in R. Mendeley Data v1, 2019
[30]
Badawy M F, Msekh M A, Hamdia K M, Steiner M K, Lahmer T, Rabczuk T. Hybrid nonlinear surrogate models for fracture behavior of polymeric nanocomposites. Probabilistic Engineering Mechanics, 2017, 50: 64–75
[31]
Hamdia K M, Lahmer T, Nguyen-Thoi T, Rabczuk T. Predicting the fracture toughness of PNCs: A stochastic approach based on ANN and ANFIS. Computational Materials Science, 2015, 102: 304–313
RIGHTS & PERMISSIONS
Higher Education Press and Springer-Verlag GmbH Germany, part of Springer Nature
AI Summary 中Eng×
Note: Please be aware that the following content is generated by artificial intelligence. This website is not responsible for any consequences arising from the use of this content.