1 Introduction
In engineering, most structural components present material defects due to local damage during manufacturing, operation, and maintenance. Cracks are one of the common defects and can be found on the surface of various structural components; they represent important information about the durability and safety of an entire structure. Under different load conditions, these defects can coalesce to form larger cracks, and this is one of the main causes of structural collapse [
1]. Therefore, the detection and evaluation of the growth of cracks become very important and have great significance for repair and maintenance. One of the most common methods of detecting defects is human-based visual inspection, due to its simplicity and shortage of alternatives [
2]. However, due to the variety and large number of defects, the defect detection task is not only time-consuming but also involves significant costs. In addition, the detection ability and experience of the inspectors significantly affect the quality of the process. Therefore, reliable alternative methods are needed to improve the accuracy and efficiency of damage detection.
To overcome the drawbacks of manual inspection, various methods of detecting and evaluating the propagation of cracks have been proposed and widely studied by many researchers. For example, several advances in the extended finite element formulation combined with the genetic algorithm to detect any type of defect (cracks or holes) of any shape in structures have been reported in Ref. [
3]. Vu-Bac et al. [
4] proposed the node-based smoothed extended finite element method to analyze fracture problems related to two-dimensional (2D) elasticity. A fully-automated method named CrackTree, to detect cracks from images, was studied in Ref. [
5]. Butcher et al. [
6] used random neural architectures, a non-invasive technique, to detect defects in reinforced concrete. Yagi et al. [
7] used the tetrahedral finite element model to evaluate the crack propagation behaviors in a T-shaped tubular joint. The virtual element method was used to analyze crack propagation in 2D [
8]. Wang et al. [
9] used the effective notch stress method to evaluate the crack propagation and fatigue strength of rib-deck welds. Several other approaches have been employed such as Refs. [
10–
16]. The aforementioned methods exhibited impressive results. However, their main drawback is their large computational and time cost.
Deep learning (DL) has in recent years gained significant traction in numerous domains for performing complex tasks, including computer vision, and speech recognition, natural language processing, and machine translation. It has outperformed traditional methods in some applications such as image classification [
17–
20], object detection [
21–
23], image segmentation [
24,
25], and time series forecasting [
26–
28]. DL is a data-driven approach that uses mathematical functions to map the examples of the input to that of the output without any manual intervention. The model building process consists of three main steps that fix a suitable network structure, a loss function for the model to learn, and an optimizer to update the model’s parameters. In the field of crack segmentation and prediction of crack propagation, an example of a project investigates buildings or bridges uses a controlled drone that flies around buildings or bridges to take pictures of different surfaces. (This is especially meaningful in places that are difficult for humans to reach.) The images are then processed by a computer to identify potential areas on buildings or bridges that could be showing damage and assess the development of that damage. An accurate model reduces the human effort that is needed to process these photos, reducing or eliminating the time-consuming, costly and inefficient requirement for inspectors to check each photo. Currently, the crack segmentation and evaluation of crack propagation methods based on DL have been used with outstanding results [
29–
35]. Since there is a lot of data available and there has been a dramatic increase in hardware performance, DL-based methods outperform other methods. Herein, crack detection and crack propagation prediction using an automated DL show high accuracy with the highest computational efficiency, saving implementation time and cost. Furthermore, we can use the skeleton extraction algorithm to measure the characteristics of the crack, such as length, width, and area of the crack from the result of crack segmentation [
36].
The DL algorithms search for different patterns and trends. There is no single algorithm that is best for all data sets or in all cases. To find the best solution, we need to test all the possibilities by conducting many observations, tuning their hyper-parameters, and evaluating the outputs to choose the one that gives the best result. This might not be a problem initially, but when dealing with large amounts of data, even a single epoch can take a considerable amount of time. Therefore, it is extremely important to choose a suitable algorithm.
To the best of the authors’ knowledge, there have not been many studies that comprehensively explore the issue of cracking, especially cracks in concrete surfaces, with DL. Specifically, in this study, five common loss functions (namely binary cross-entropy (BCE) loss, dice loss, Tversky loss, focal Tversky loss, and Lovasz-Softmax loss) in general segmentation problems are investigated in concrete crack segmentation problem. In addition, gated recurrent unit (GRU) and long short-term memory (LSTM) are investigated in evaluating crack propagation. Notably, GRU has been less explored than LSTM in the literature. Considering the huge potential of DL algorithms, this study aims to propose DL networks to identify cracks in concrete surface images and predict crack propagation in structures. We use SegNet and U-Net networks to automatically segment cracks on images. Then, GRU and LSTM networks are employed to predict crack propagation on experimental examples.
To sum up, the main contributions of this work are as follows.
1) The frameworks and basic characteristics of SegNet, U-Net, GRU, and LSTM are described. In addition, various loss functions (which are used for imbalanced data and small region of interest (ROI) segmentation) are explored, as well as optimizers and model evaluation metrics are presented.
2) The approach for the concrete crack segmentation task was investigated using SegNet and U-Net models (which are automatic crack segmentation methods based on convolutional neural networks (CNNs)). Both models are explored by experimenting with various loss functions and two different optimizers to evaluate and compare the performance of each model (based on the intersection over union (IoU) metric). The model with the most outstanding prediction results is given, and is ready for practical application.
3) By experimental investigation, case studies are carried out to evaluate and demonstrate the effectiveness of GRU and LSTM models in the task of predicting crack propagation, and are implemented on crack public data sets such as fatigue analysis of a fuselage panel, ADB610 steel specimen, and L-shape concrete specimen.
The remaining sections of the paper are structured as follows. Section 2 represents deep neural network architectures for crack segmentation (SegNet and U-Net architectures) and crack propagation evaluation (GRU and LSTM architectures). In addition, the loss functions, optimization algorithms and model evaluation metrics used in this study are also presented in Section 2. Section 3 provides a brief description of the data sets used, and the performance of the proposed models is demonstrated and compared in detail with experimental results. In addition, some discussions regarding our approach are also provided in this section. The conclusions of the paper and suggestions for further research are given in Section 4.
2 Methodology
2.1 Encoder−decoder network
In DL, the CNNs are a class of artificial neural network (ANN) which is most commonly applied to analyze and process visual imagery. In the DL community, particularly for computer vision tasks, it is one of the most extensively utilized architectures. CNN was first introduced by Fukushima et al. [
37] in their paper on the “Neocognitron” for a visual pattern recognition mechanism. Subsequently, Waibel et al. [
38] introduced CNN for phoneme recognition. In 1998, LeCun et al. [
39] proposed a CNN architecture for document recognition. Convolutional, pooling and fully connected (FC) layers are the three types of layers that commonly make up a CNN. Features from the input images are extracted using the convolutional layer (Fig.1). Herein, the convolution operation is carried out between the input image and a filter with a specific size. The filter moves across the input image, the dot product is obtained between parts of the input image and the filter according to the size of the filter. The pooling layer (Fig.2) is often followed by convolution layers that aims to scale down the feature map’s size in order to reduce calculation expenses. Max pooling and average pooling are the two most commonly used kinds of pooling operations. In some models, convolutional layer with stride > 1 is used instead of a pooling layer to reduce data size. The FC layer includes neurons along with the weights and biases that are used to link neurons between the two distinct layers. These layers are frequently positioned before the output layer, which makes up the final few layers of the CNN architectures.
Encoder−decoder architectures [
40] are commonly used in segmentation models based on DL. Such models learn how to map data from the input domain to the output domain through a two-stage (encoder and decoder) network as shown in Fig.3. Convolutional layers and pooling layers are used at the encoder stage to downsample the input image. A latent space representation called
z is created by implementing an encoding function called
z =
f(
x), which compresses the input
x. In contrast, the decoder stage consists of convolutional layers and unpooling layers (or transposed convolution layers) to upsample the input images. It implements a decoding function
y =
g(
z) to predict the output y from the latent space representation
z. The semantic information of the input is captured by the latent space representation
z, which is helpful for predicting the outcome. The transpose convolution (Fig.1) and unpooling (Fig.2) are the reverse operations of the convolution and pooling operations, respectively.
2.1.1 SegNet architecture
SegNet is an image segmentation network with a fully convolutional encoder−decoder architecture that was introduced by Badrinarayanan et al. [
41]. As depicted in Fig.4, the SegNet architecture is composed of an encoder network, a corresponding decoder network, and finally a pixel-wise classifier. The 13 convolutional layers of the encoder network’s architecture are exactly the same as the 13 convolutional layers in the VGG16 network [
42], which was created for image classification. The number of parameters in the encoder network has been significantly reduced as a result of the removal of the FC layers in order to maintain higher resolution feature maps at the encoder output. There is a corresponding decoder layer for each encoder layer, resulting in a total of 13 convolutional layers in the architecture of the decoder network. A multiclass Softmax classifier receives the output of the final decoder and generates class probabilities for each individual pixel. In addition, the decoder conducts nonlinear upsampling using pooling indices implemented in the corresponding encoder’s maxpooling phase. The decoder network’s role is to convert full input resolution feature maps from low-resolution encoder feature maps so that pixel-wise classification may be performed.
2.1.2 U-Net architecture
U-Net is a U-shaped encoder−decoder architecture proposed by Ronneberger et al. [
43] for segmenting biomedical images, as illustrated in Fig.5. It is made to learn efficiently from fewer training samples, which is one of the methods most frequently employed in semantic segmentation tasks. The U-Net architecture is composed of an encoder network (contracting path) to collect context and a symmetric decoder network (expansive path) to enable exact localization; the two paths are connected via a bridge. The contracting path is designed according to the typical architecture of a CNN. It performs the role of a feature extractor by using a sequence of encoder blocks to learn an abstract representation of the input image. A semantic segmentation mask is produced from the abstract representation using the expanded path. By using a skip connection, the expanded path is joined with the matching feature map from the encoder blocks. Due to the depth of the network, these skip connections provide additional features from previous layers that might be lost during learning. A 1 × 1 convolution with a sigmoid activation function is employed to process the last output of the expansive path. The sigmoid activation function produces a segmentation mask that represents pixel-wise categorization.
2.2 Recurrent neural network (RNN)
The RNNs [
44] are a sort of ANN that is well-suited to processing time-series data and other sequential data. The current output of RNN is based on the previous information in the sequence. Here we briefly introduce RNN as a feedforward network extension that can handle variable-length sequences, as well as some of the most common recurrent architectures in use, namely LSTM [
45] and GRU [
46].
The basic architecture of the RNN unit is shown in Fig.6. Its input includes the output of the previous phase () and the current input () passed through tanh activation function, with no gates present:
where all weights and biases are parameters to be learned, tanh is an activation function.
Because of the issue with vanishing gradients, simple RNN is unable to learn long-term relationships from data; this is a major drawback. The vanishing gradient affects the RNN more than other neural network architectures as it processes more steps. To address this issue, the LSTM and GRU architectures were developed.
2.2.1 Long short-term memory
The LSTM was created to address the issue of vanishing gradient and since then it has become one of the most widely used RNN architectures. The workflow of LSTM is similar to that of RNN, with the exception of the operations performed within the LSTM unit. Various variants of LSTM exist [
47], but the most common variant of LSTM is shown in Fig.7 and described by:
where σ, tanh are logistic sigmoid function, hyperbolic tangent function, respectively; is denoted for point-wise multiplication of the two vectors.
2.2.2 Gated recurrent unit
The GRU was suggested as a simpler alternative to the LSTM (as it employs one fewer gate and does not distinguish between hidden states and memory cells) and has since gained popularity. It works by the same mechanism as the LSTM to alleviate the vanishing gradient problem. The most popular variant of GRU is shown in Fig.8 and defined as follows:
2.3 Loss functions
One of the most important components of DL-based image segmentation is the loss function, sometimes referred to as the objective function, and its purpose is to evaluate how well the predicted segmentation compares with the ground truth. It is crucial to choose the correct loss function when designing a DL architecture, because it promotes the learning process of algorithms. To learn and optimize an objective function, DL algorithms use a stochastic gradient descent (SGD) method. We need to ensure a mathematical representation of our loss function, which can cover even edge cases if we want a model to learn more accurately and faster. The loss functions are derived from traditional machine learning, in which these loss functions measure the disparity between predicted and actual labels in a model. Categorical cross-entropy, for instance, is produced from the Multinoulli distribution, while BCE is derived from the Bernoulli distribution. In our study, we emphasize semantic segmentation rather than instance segmentation, so the number of classes at the pixel level is limited to two. Herein, we will look at five loss functions that are widely used in semantic segmentation.
2.3.1 Binary cross-entropy loss
Cross-entropy [
48] is a measurement of the difference between two probability distributions for a set of events or a given random variable is called cross-entropy. It is frequently employed in classification tasks. Segmentation also performs well because it is classified at the pixel level.
In this problem, each pixel belongs to one of two classes that are either white or black (1 or 0). So here BCE is used. It can be described as follows:
where is the predicted value and y is the corresponding target value.
2.3.2 Dice loss
Among segmentation evaluation metrics, the Dice coefficient is frequently employed in the field of computer vision. Dice loss [
49] directly optimizes the Dice coefficient with some adaptations, it can be defined as:
where ε is added to the numerator and denominator to make sure that the function is determined even when both y and are zero.
2.3.3 Tversky loss
The Tversky index can alternatively be viewed as a generalization of the Dice coefficient. Tversky loss [
50] reshapes Dice loss and emphasizes false negatives (FN) to achieve a better trade-off between precision and recall. Similarly to Dice loss, Tversky loss can be defined as follows:
where β is a hyper-parameter that controls the balance between FN and false positives (FP).
2.3.4 Focal Tversky loss
Focal Tversky loss [
51] uses the concept of focal loss to attempt to learn hard examples with a low probability such as high imbalanced data and small ROI through the use of the
γ coefficient. It can be defined as:
where
γ varies in the range [
1,
3].
2.3.5 Lovasz-Softmax loss
Lovasz-Softmax loss [
52] is a loss function for multi-class semantic segmentation that combines with the Softmax operation in the Lovasz extension. In the binary situation, the Lovasz hinge and Jaccard loss are employed, it can be defined as:
where
with
2.4 Optimization algorithms
To minimize the loss function in DL networks when mapping input data to output data, an optimization algorithm is used to update the parameters (weights) for each new iteration. The optimization algorithms have a great influence on the accuracy as well as the training speed of DL models. This increases the need to choose a suitable optimization algorithm for each specific task. In this study, we will go over two optimizers that are widely used in DL models, namely SGD and adaptive moment estimation (Adam).
2.4.1 Stochastic gradient descent optimizer
An iterative technique for optimizing a loss function is SGD [
53]. Essentially, because SGD uses an estimate of the gradient instead of the real gradient, it is a random approximation of gradient descent optimization. First, to train a DL network using SGD, a loss function is used to compute the gradient estimation. Then, the Γ parameters are adjusted at the
k iteration. Each minibatch of n examples from the training set
with corresponding targets
can be computed as:
where is learning rate at the kth iteration.
The learning rate is a very important hyperparameter, it determines the magnitude of updating parameters. In practice, it is required to gradually lower the learning rate throughout the training process.
2.4.2 Adaptive moment estimation optimizer
Adam [
54] is an extension of SGD, and is one of the most frequently utilized optimization algorithms in DL. It combines the advantages of the RMSprop and AdaGrad algorithms to handle sparse gradients in noise problems. From the evaluation of both the first-order moments and the second-order moments of the gradients, the Adam algorithm computes the adaptive learning rate of each individual for different parameters. The update of weights can be represented as follows:
where is gradient vector at t timestep; and are the exponential decay rates; and are estimates of the first biased moment (mean) and the second biased moment (uncentered variance) of the gradients at t timestep, respectively.
where and are estimates of the first bias-corrected moment (mean) and the second bias-corrected moment (uncentered variance) of the gradients at t timestep, respectively.
where η is learning rate, is model parameters at t timestep.
2.5 Evaluation metrics
2.5.1 Dice and intersection over union metrics
To be able to detect a crack, we need to distinguish it from the background. Since this is a binary classification problem, all possible cases can be classified into four categories as shown in Tab.1. Here, true positive (TP) indicates the number of pixels at which a true crack is actually identified as a crack, FP is the number of pixels at which a true non-crack is identified as a crack. In a similar way, FN indicates the number of pixels where a true crack is identified as a non-crack, and true negative (TN) is the number of pixels where true non-crack is correctly identified as a non-crack.
Intuitively, a successful prediction is one that maximizes the overlap between the predicted object and its corresponding actual object. In the problem of semantic segmentation, two related but different metrics, Dice and IoU (or Jaccard) [
55], are commonly used for the evaluation of segmentation tasks. They can be defined as:
where A, B are the target and prediction masks for a given class, and ∥∥ is the norm.
Both Dice and IoU metrics are bounded between zero (when two masks have no overlap) and one (when two masks completely overlap). For model comparison, only one of them is enough. The IoU metric has a simple and intuitive expression, so it is more widely used.
Dice and IoU metrics can be rephrased in terms of TP, FP, and FN:
2.5.2 Mean absolute error (MAE), mean squared error (MSE) and root mean squared error (RMSE) metrics
Continuous data are dealt with in regression models and its predictions are in a continuous range.
MAE,
MSE, and
RMSE [
56] are three commonly used metrics to evaluate regression models. Because we want to minimize them, all of these are also loss functions.
The MAE is the mean of the absolute difference between the actual values and the predicted values. It can be given by:
where m is the total number of data, is actual value, is predicted value.
Because it is the mean error, the MAE is the easiest to understand. However, since it is an absolute value function, the direction of the error is not indicated.
The mean of the squared difference between the actual values and the predicted values is known as the MSE. It can be given by:
where m is the total number of data, is actual value, is predicted value.
The MSE is quite similar to the MAE. However, the MSE penalizes larger errors much more than smaller errors, so the MSE is more commonly used than the MAE.
The square root of the mean of the squared differences between the actual values and the predicted values is known as the RMSE. It can be given by:
where m denotes the total number of data, is actual value, is predicted value.
RMSE can interpret in the units of y, so it is even more prevalent than the MSE. Also, the RMSE is a good tool for estimating the standard deviation of a distribution of errors.
3 Results and discussion
3.1 Data set
For crack segmentation, we experiment with 100 concrete crack images with their corresponding mask, which is taken from the open-source data set in Ref. [
57]. This data set consists of 537 color images of cracks (asphalt and concrete are the two major scenes) with their corresponding masks. All images have been fixed at a size of 544 × 384 pixels, where each image presented one or more cracks. Some representative images of concrete cracks and their corresponding masks are shown in Fig.9.
A schematic of crack segmentation using the SegNet/U-Net consists of the following steps: preparing a crack data set, training the model, and testing the trained model, as shown in Fig.10. For training purposes, all images are re-sampled to 256 × 256 pixels with a 70-30 train-validation split (70 images for the training set with 30 ones for the validation set). The size of each image is reduced before feeding into the network, which largely reduces the training parameters as well as the complexity in the network, leading to a significant reduction in training time without much impact on the results.
For evaluation of crack propagation, several examples are designed for crack propagation instances to illustrate and validate the capabilities of GRU and LSTM models. All data sets are gained from numerical results and experimental data (such as a fuselage panel [
32], L-shaped concrete specimen [
58], ADB610 steel sample [
59]) that have been published in previous studies.
Fig.11 provides a schematic of the process of prediction of crack propagation using the GRU/LSTM, and shows the steps of preparing a crack data set, training the model, and evaluating the learned model. The data set in each example is split into three sets, namely training set, validation set, and testing set. The parameters of the model are fitted using the training set. The validation set is used to adjust the model’s hyperparameters, which helps the model perform better during training. The testing set, which is distinct from both the training set and the validation set, is solely used to evaluate the model’s final performance once it has been trained. When it comes to time-series data, the data are not randomly split into independent samples. For each point in time, historical data are used as features, while future data cannot be used. Hence, previous data are utilized to train the model and the later data on the testing set are utilized to evaluate the model.
3.2 Results on crack segmentation
In this subsection, we present the crack segmentation results from trained SegNet and U-Net networks and then discuss the outcome obtained. In each network, we employ the SGD and Adam optimizers with various loss functions in turn to train the models. The SGD optimizer is employed with a fixed learning rate of 0.01 and momentum of 0.9. The Adam optimizer is used with an initial learning rate of 0.0001; the default exponential decay rates for the first and second moment estimates are
β1 = 0.9,
β2 = 0.999, respectively, and
ϵ = 10
−8 is a small default number to maintain numerical stability during the training for 200 epochs. After each convolutional layer, batch normalization [
60] is employed, which is a technique for speeding up the learning process of the network by normalizing the values of the weights in the preceding layer. In addition, all networks employ the ReLU activation function [
61], which is critical in allowing the neural network model to learn the complicated nonlinear relationship.
3.2.1 Using SegNet
Fig.12 illustrates the change in loss values (on the training set) and IoU values (on the validation set) with respect to the epochs during the training process of the SegNet network that uses the SGD optimizer. It can be observed that for all five models, the training losses rapidly decrease at the beginning and slowly at the end; meanwhile, the IoU values gradually increase and reach around 76% after 200 epochs. In addition, the IoU curves demonstrate that the use of the dice loss leads to the best performance; in contrast, the BCE loss shows the lowest performance.
The loss curves on the training set, as well as the IoU curves on the validation set in relation to the epochs of the networks using the Adam optimizer, are depicted in Fig.13. It can be seen that the training losses first decrease rapidly and begin to converge after about 75 epochs; meanwhile, the IoU values increase quickly and converge to about 77% after the 75th epoch for all five models. Also, the IoU curves reveal that the performances of all models are similar.
The performance of concrete crack segmentation models that are based on IoU value is summarized in Tab.2. The result shows that the model using the SGD optimizer and Dice loss shows the highest IoU (78.11%); while the model using the SGD optimizer and the BCE loss shows a significantly lower IoU than those of other models, at only 72.40%.
3.2.2 Using U-Net
The qualitative results of the U-Net model, which employs the Adam optimizer and the focal Tversky loss function, are depicted in Fig.14. Herein, the source image, the ground truth, and the output of several samples on the training set at various epochs during training are shown. It is clear from Fig.14 that the model is learning properly; the output at the 200th epoch is close to the ground truth, which the human eye can hardly distinguish.
In Fig.15, the loss curves on the training set and the IoU curves on the validation set against the epochs of the networks using the SGD optimizer are shown. It is clear from Fig.15 that the training losses reduce rapidly at first and begin to converge after around 75 epochs; meanwhile, the IoU values climb rapidly and converge to roughly 79% for all five models after the 75th epoch. Furthermore, the IoU curves show that the performance of all models is similar.
Fig.16 illustrates the change of the loss values (on the training set) and the IoU values (on the validation set) with respect to the epochs during the training process of the U-Net network using the Adam optimizer. It can be seen from Fig.16 that for all five models, the training loss gradually decreases, but there is still no clear sign of convergence after 200 epochs; meanwhile, the IoU values gradually increase and start to converge after about 75 epochs, eventually reaching an IoU value of about 81% at the end of the training process. Moreover, the IoU curves revealed that the performances of all models are relatively similar.
The obtained metrics from the trained models on the validation set are reported in Tab.3. It can be observed from Tab.3 that all the considered models achieve high IoU, that is 78% or more; the model using the Adam optimizer and focal Tversky loss outperforms the others, with an IoU value of 81.17%.
According to the aforementioned results, the U-Net model with Adam optimizer and focal Tversky loss performs the best in concrete crack segmentation. Some of the crack samples on the testing set with detections obtained by the U-Net model, which employs the Adam optimizer and the focal Tversky loss, are depicted in Fig.17. The raw images, the predicted images, and the ground truth are displayed in order from top to bottom in Fig.17. The segmented cracks are close to the ground truth, as may be observed. This demonstrates the efficacy of the crack segmentation approach used in this study.
3.3 Results for crack propagation
In this subsection, we present the results for crack propagation and discuss the outcome. Herein, three crack examples are looked at to verify the performance of the LSTM and GRU networks. All data sets used are obtained from the Paris model [
62] or experiments in previous studies. The proportion of the data set used in the training process accounts for 84% (for the first two examples) and 62% (for the last example) and the rest of the data set is used for the testing set. In the training process of each experiment, the training data are further divided into two subsets: the training set has 80%, and the validation set accounts for 20%. For training purposes, the authors choose a sequence length of 30 timesteps. Input-output pairs are created where each input consists of the previous 30 timesteps, and the corresponding output is the next timestep.
To demonstrate the robustness and reliability of the approach, the architecture of the GRU and LSTM networks with four GRU/LSTM layers is utilized in all cases, as detailed in Fig.18. The Adam optimizer is employed with default parameters and the
MSE loss function to train the models. In addition to the tanh activation function and the sigmoid activation function (the sigmoid is used for the recurrent step), the “dropout” technique [
63] with a dropout rate of 0.2 in each GRU/LSTM layer is also utilized to avoid the overfitting problem. Training was performed with 100 epochs for all models.
3.3.1 Example 1: A fuselage panel’s fatigue analysis
An aluminum alloy fuselage panel [
32], is as illustrated in Fig.19. The panel has a 0.02 m long center crack on the wall, a 0.06 MPa applied pressure, a thickness of 0.00248 m, and a radius of 3.25 m. Young’s modulus
E = 70 GPa and Poisson’s ratio
ν = 0.33 are the material characteristics of aluminum alloy. Under constant amplitude mode I, this issue is considered to be treated as an endless plate with a through-the-thickness center crack. The constant
C = 1.5 × 10
−10 m/cycle and the exponent
m = 3.8 of the Paris Law [
62] are provided.
Fig.20 depicts the convergence history of the loss functions on the training and validation sets that are obtained during the training of the GRU and LSTM networks. It is clear from Fig.20 that in only a few epochs, the loss values rapidly fall and converge to zero. This demonstrates that the networks are learning properly.
Fig.21 describes the crack growth trend predicted by the GRU and LSTM networks on the testing set, which is compared with the actual data. It can be seen that the prediction results obtained from the networks agree with the experimental data, and that the GRU network outperforms the LSTM network. The evaluation metrics obtained from the trained networks on the testing set are summarized in Tab.4. The outcomes demonstrate that the coefficient of determination (R2) value obtained from the GRU network is larger than that of the LSTM network with values 0.9858 and 0.9435, respectively.
3.3.2 Example 2: Fatigue life prediction
In this example, experimental data are used to forecast the growth of fatigue crack in the ADB610 steel sample [
59] which confirms the applicability of the GRU/LSTM network. The specifications of this material with a stress ratio of 0.3 are shown in Tab.5.
The convergence history of the loss functions on the training set and validation set produced during the training of GRU and LSTM networks is shown in Fig.22. The loss values rapidly reduce and converge to zero after only a few epochs, as seen in Fig.22. This shows that the networks are appropriately learning.
The crack growth trend predicted by the GRU and LSTM networks on the testing set is depicted in Fig.23, which is compared to the actual data. It can be shown that the network prediction results are consistent with the experimental data, with the GRU network outperforming the LSTM network. Tab.6 summarizes the evaluation metrics acquired from the trained networks on the testing set. It can be seen that the R2 value obtained from the GRU network is larger than the R2 value obtained from the LSTM network, with values of 0.9588 and 0.9221, respectively.
3.3.3 Example 3: L-shape concrete specimen damage
As shown in Examples 1 and 2, both GRU and LSTM give good results (with
R2 > 0.92) for crack length prediction. However, the relationship between length and cycle is a monotonic function. To increase the complexity of the task, the authors further investigate Example 3 where the load−displacement curve is predicted instead of crack length to further explore the potential of both GRU and LSTM. This example uses experimental data from Ref. [
58], which depicts the failure behavior of a basic L-shaped concrete sample. Fig.24 shows the geometry of a specimen with a thickness of 100 mm. The Young’s modulus and the Poisson’s ratio of this concrete material are
E = 25.85 GPa and
ν = 0.18, respectively.
The convergence history of the loss functions on the training and validation sets, during the training of the GRU and LSTM networks, is shown in Fig.25. It can be observed that the training losses initially decrease quickly and begin to converge after only about the first 10 epochs. This shows that the neural networks are correctly learning.
On the testing set, the crack growth trends predicted by the GRU and LSTM networks are compared to the actual data in Fig.26. It is clear that there is good agreement between the values predicted by the networks and the actual values, where the GRU network beats the LSTM network in terms of prediction outcomes. The errors and R2 obtained from the trained networks on the testing set are presented in Tab.7. It is clear from Tab.7 that the R2 values obtained from the networks are not much different; the R2 value obtained by the GRU network is larger than that of the LSTM network, with values of 0.9823 and 0.9247, respectively.
4 Conclusions
Pixel-level concrete crack detection and crack propagation prediction have great significance in structural engineering. In this study, DL networks are used to implement two main tasks, namely.
1) For pixel-level concrete crack detection: Unlike other object detection tasks, segmentation of crack regions implemented at the pixel level is much better than predicting bounding boxes. Our work contributes to the proposal of a DL model based on SegNet and U-Net networks to build a semantic segmentation model of concrete cracks automatically. The performance of SegNet and U-Net models, which use different optimizers and loss functions, is cross-compared. The findings of the experiment indicate that the U-Net model using the Adam optimizer and the focal Tversky loss with an IoU of 81.17% outperforms other models.
2) For the task of evaluating crack propagation: A DL approach uses time-series forecasting method to solve the problem of crack propagation; here GRU and LSTM networks are used. The crack propagation of some materials is studied through experimental examples. To demonstrate the superiority of DL algorithms, GRU and LSTM networks with the same network architecture are used in all experimental examples. The prediction results of both GRU and LSTM are consistent with the experimental data, where the GRU outperforms the LSTM in all experimental cases.
The network is verified based on the input data set. It can therefore be applied to other materials and databases by changing the input to the network. In future studies, which are beyond this study, building a larger data set to increase the robustness of the proposed method, and further comparative studies, will be carried out. To simultaneously identify and predict crack propagation, the problem of combining both networks (U-Net and GRU) into an end-to-end network will be studied. In addition, to optimize the hyper-parameters of the models, the Bayesian optimization method will be used. Finally, CycleGAN [
64] is also of interest to our future work.
The Author(s). This article is published with open access at link.springer.com and journal.hep.com.cn