Hard-rock tunnel lithology identification using multi-scale dilated convolutional attention network based on tunnel face images

Wenjun ZHANG , Wuqi ZHANG , Gaole ZHANG , Jun HUANG , Minggeng LI , Xiaohui WANG , Fei YE , Xiaoming GUAN

Front. Struct. Civ. Eng. ›› 2023, Vol. 17 ›› Issue (12) : 1796 -1812.

PDF (12542KB)
Front. Struct. Civ. Eng. ›› 2023, Vol. 17 ›› Issue (12) : 1796 -1812. DOI: 10.1007/s11709-023-0002-1
RESEARCH ARTICLE

Hard-rock tunnel lithology identification using multi-scale dilated convolutional attention network based on tunnel face images

Author information +
History +
PDF (12542KB)

Abstract

For real-time classification of rock-masses in hard-rock tunnels, quick determination of the rock lithology on the tunnel face during construction is essential. Motivated by current breakthroughs in artificial intelligence technology in machine vision, a new automatic detection approach for classifying tunnel lithology based on tunnel face images was developed. The method benefits from residual learning for training a deep convolutional neural network (DCNN), and a multi-scale dilated convolutional attention block is proposed. The block with different dilation rates can provide various receptive fields, and thus it can extract multi-scale features. Moreover, the attention mechanism is utilized to select the salient features adaptively and further improve the performance of the model. In this study, an initial image data set made up of photographs of tunnel faces consisting of basalt, granite, siltstone, and tuff was first collected. After classifying and enhancing the training, validation, and testing data sets, a new image data set was generated. A comparison of the experimental findings demonstrated that the suggested approach outperforms previous classifiers in terms of various indicators, including accuracy, precision, recall, F1-score, and computing time. Finally, a visualization analysis was performed to explain the process of the network in the classification of tunnel lithology through feature extraction. Overall, this study demonstrates the potential of using artificial intelligence methods for in situ rock lithology classification utilizing geological images of the tunnel face.

Graphical abstract

Keywords

hard-rock tunnel face / intelligent lithology identification / multi-scale dilated convolutional attention network / image classification / deep learning

Cite this article

Download citation ▾
Wenjun ZHANG, Wuqi ZHANG, Gaole ZHANG, Jun HUANG, Minggeng LI, Xiaohui WANG, Fei YE, Xiaoming GUAN. Hard-rock tunnel lithology identification using multi-scale dilated convolutional attention network based on tunnel face images. Front. Struct. Civ. Eng., 2023, 17(12): 1796-1812 DOI:10.1007/s11709-023-0002-1

登录浏览全文

4963

注册一个新账户 忘记密码

1 Introduction

Lithology identification is a crucial and fundamental indicator in tunnel engineering, geology, and geotechnical investigation [1,2]. Long tunnel construction involves many unknowns in geological conditions where difficult strata with various lithologies, such as granite, siltstone, or tuff, will unavoidably be encountered [3]. During tunnel excavation, construction techniques such as support methods, tunnel boring machine (TBM) with proper operational parameters, and explosives ought to be used for strata with different lithologies [410]. Therefore, the construction schedule continually changes in its development and implementation, and lithology has a conclusive influence on underground construction [11]. To ensure safety and productivity in tunnel construction, it is essential to determine the lithology at the construction site promptly and precisely.

The conventional lithology identification methods employed in tunnelling are geological survey, geological analysis, and non-destructive geophysical exploration [3]. Some approaches, however, necessitate complex drilling operations in the tunnel face. In some tunnels with difficult topography and depth, it is challenging to realize surface drilling holes, which delays construction and increases expenses [12]. New lithology identification methods are required for high-efficient tunnelling in such circumstances. Deep learning [13] technology has become the predominant tool in many fields. Nonlinear classifiers created by stacking deep convolutional neural networks (DCNNs) can extract local and abstract features from input signals, minimizing the requirement for feature engineering while increasing classification accuracy. Basic derivative deep learning models, including AlexNet [14], GoogLeNet [15], VGGNet [16] and ResNet [17], have been developed and successfully applied in image classification, detection, and segmentation tasks [18,19]. Artificial intelligence methods have progressively become an active area of research in geotechnical engineering and underground construction owing to their outstanding generalization ability and strong robustness [2028].

Deep learning has been extensively researched for identification of rock lithology. Patel and Chatterjee [29] constructed a probabilistic neural network with color histogram information as input to categorize limestone rocks. Xu et al. [30] achieved lithology identification via intelligent rock microscopic images. Cai et al. [31] developed a method for identifying minerals using Raman spectroscopy and an extensive database of mineral Raman spectra. Cao et al. [32] integrated clustering and lithology to identify the reservoir. Fu et al. [33] employed a combination of fine-tuning methods and convolutional neural networks (CNNs) to identify the lithology in drill core photos. Numerous researchers employ transfer learning, a prevalent method of deep learning with insufficient samples, to classify lithologies [3436]. A real-time construction schedule analysis system based on lithological prediction utilizing the Markov process was proposed by Ref. [11]. Using Big Data from TBM construction, Liu et al. [3] created a time-related intelligent model for tunnel lithology prediction.

Researchers are increasingly concentrating on the use of computer vision-based artificial intelligence algorithms to extract geological characteristics from images of rock tunnel faces and direct tunnel construction. Examples include the extraction of fracture traces [3739], categorization of rock structures [40], segmentation of rock fragments [41], segmentation of weak interlayers [42], and segmentation of water inflow [43,44]. Numerous studies have been performed for rock-mass assessment based on the artificial intelligence method. Jalalifar et al. [45] proposed a rock engineering classification system based on neuro-fuzzy methods to predict rock-mass rating. Wang et al. [46] and Zhao et al. [47] developed an intelligent classification system for the surrounding rock of drilled and blasted tunnels based on drilling parameters. Real-time prediction of rock-mass classification based on TBM operation Big Data was accomplished by Refs. [48,49]. Hu et al. [50] gathered 80 samples from earlier research, and each sample contained four criteria for classifying the quality of the rock-masses. The aforementioned studies demonstrate the viability and superiority of employing the deep learning method for rock-mass feature extraction. However, owing to poor site construction environment, illumination conditions, and various alternative construction processes, image recognition of rock lithology in real-world tunnel excavation has rarely been studied.

In this study, we developed an efficient and accurate method applied to intricate tunnel scenes to meet the requirement of rapid in situ rock-mass classification of tunnel faces in hard rock. The feature extraction ability of the model was improved by introducing a specific multi-scale dilated convolutional attention block in the ResNet-101 backbone, thus effectively suppressing the influence of obstacles and enhancing the model’s application effect in the tunnel scenario. To demonstrate the higher performance of the suggested model, a comprehensive analysis was carried out utilizing a variety of feature extraction models. Further, gradient-weighted class activation mapping (Grad-CAM) technology was applied to visualize and explain the model’s discriminative process and explore why DCNNs classify the input images as belonging to various lithology categories.

This paper is organized as follows. Section 2 introduces the image data set and data augmentation methods. Section 3 introduces the tunnel lithology prediction framework, and the proposed multi-scale dilated convolutional attention block is described in detail. Section 4 presents the experimental results and discussions on the corresponding evaluation metrics and visualization analysis based on comparative experiments with other models. Finally, Section 5 concludes the paper.

2 Data preparation

2.1 Data acquisition and preprocessing

The capture of quality image data of the tunnel face serves as the foundation for the subsequent research. Because of the complex construction procedure, the internal environmental conditions are poor, which has an effect on image acquisition. This study selected the stage after completion of the face of the evacuation tunnel and before completion of the supporting lining for image capturing to reduce noise as far as possible because the dust and light environment in the tunnel at this time is relatively steady compared to that at other stages. Most of the image samples were obtained by taking photographs of the tunnel face with a high-resolution camera, while a small part of the images was retrieved from the Internet. In addition to the timing of shooting, the location and direction of image acquisition and photography must be considered. In general, the camera should be perpendicular to the ground as far as feasible for image acquisition of the palm surface. An additional light source should ensure that the palm surface receives even illumination without casting any shadows. Therefore, to obtain symmetric supplementary illumination, two LED supplementary lights were positioned symmetrically. A diagram of the image acquisition is shown in Fig.1. Tunnel face images of four lithologies common in tunnel engineering, i.e., basalt, granite, siltstone, and tuff, were selected for recognition and analysis. A high-quality data set is a prerequisite for model training to achieve high accuracy. Owing to the different sizes of the collected images, some images are obscured by personnel and equipment, which is not favorable to model training and recognition. To minimize the amount of computer video memory needed for training and accelerate the convergence of the model pixel matching error, an original data set containing 1630 images was formed after manual cutting and selection on the premise of preserving the essential information of the photographs. Each image has a resolution of 512 × 512 pixels. Four representative tunnel face images of the data set are shown in Fig.2(a)–Fig.2(d).

2.2 Data augmentation

For model training using deep learning techniques, the images for each type of rock were randomly divided into training, validation, and testing data sets at a ratio of 6:2:2. The training model’s weight is mainly determined by the images in the training set, which gives the model its capacity for representation. During model iteration, the images in the validation set are used to modify the model’s parameters, evaluate the model’s performance at a preliminary stage, and confirm the model’s generalization performance to determine whether or not to terminate training. The images in the testing set are completely unrelated to the others and are used to assess how well the final model generalizes.

Over-fitting occurs when a model has an excessive number of parameters or a structure that is too complex compared to the available data and matches the given data set too closely [51]. That is, it cannot fit other data accurately or forecast future observations. It occurs when a model absorbs so much detail and noise from training data that it has a negative impact on how well it performs on fresh data. A model that behaves well throughout training set but poorly on the testing set is the most visible sign of over-fitting. Numerous data augmentation strategies are used to avoid over-fitting and strengthen the model, as shown in Fig.3(a)–Fig.3(f). These data enhancement methods include random crop, vertical flip, random rotation, brightness, and contrast. The images in the training and validation sets were all six times more numerous after the data augmentation procedure. Augmentation of the testing set was not carried out because it is only used to gauge how effectively the model generalizes and is not employed in model training. In total, 8150 images were generated, as listed in Tab.1.

3 Implementation details

This study used a method based on an open-source deep learning framework, PyTorch, and developed to provide a high degree of flexibility and speed to the deep neural network implementation. The proposed network is initialized by random weights drawn from Kaiming Initialization [52], allowing extremely deep models to converge. After that, the proper hyper-parameters are selected for network training. The predicted value is obtained by propagating the input images through the network. The set loss function then calculates the error between the predicted and expected values. The network parameters are then updated to minimize this error by setting an optimization algorithm, and this process continues until the network converges.

3.1 Overview of the proposed method

To address the degradation issue with DCNN models, ResNet, a residual network, uses residual (skip) connections between layers [17]. As the neural network layers go deeper, a concept known as the degradation problem causes training and testing error to increase. ResNet-101 uses five convolution stages, as shown in Fig.4, to extract the characteristics of the input images. Global average pooling is applied after the last convolutional layer, regardless of the size of the original input, to create a one-dimensional feature map. The softmax layer is applied to the fully connected (FC) feature map to produce a classification score from 0 to 1 for each label. The predicted label with the maximum score is ultimately assigned to the input image. Fig.5(a) depicts the ResNet-101 network’s architecture. Max-pooling is used once after the first convolutional layer to reduce by half the size of the feature map. There are 3, 4, 23, and 3 iterations of the convolution blocks. ResNet’s bottleneck block is made up of numerous convolution blocks, each containing 1 × 1, 3 × 3, and 1 × 1 convolutional layers with residual connections, as shown in Fig.5(b). The structure of the proposed multi-scale dilated convolutional attention block is shown in Fig.5(c), which mainly includes the dilated convolution, squeeze-and-excitation (SE) attention module, and residual connection. First, an average division of input features into four groups is performed. Dilated convolution is then employed for feature representation with expansion rates of 1, 2, 3, and 4. Next, the output of the four dilated convolutional branches is operated by “concat” and a 1 × 1 convolutional layer, followed by the attention module. Finally, the entire multi-scale dilated convolutional attention block is connected through the skip connection. Tab.2 presents a detailed configuration of each layer of the model.

3.2 Group convolution

Group convolution was first applied in the AlexNet [14]. Owing to the constrained processing capabilities at the time, researchers split the convolution operation into two groups and used dual GPU parallel operation to increase computational performance. Through extensive testing, researchers discovered that group convolution can, to a certain extent, increase the network’s recognition accuracy while reducing the amount of processing required [53,54]. In this study, the input features were divided into four groups on average. As shown in Fig.6, the feature map of multiple channels can be obtained to ensure the same size as the output feature map of traditional convolution by introducing group convolution, which has fewer parameters. In other words, information about multi-scale image representation can be obtained by extracting rock features at a finer granularity. The formula for calculating the number of parameters is as follows:

Pc on ven ti ona l =W×H× Ci n×C ou t,

Pg ro up=W×H× Ci nG×Co utG×G,

where W and H represent the width and height of the convolution filters, respectively; Ci n and Co ut represent the number of input and output channels, respectively; G represents the number of groups of group convolution; Pc on ven ti ona l and Pg rou p represent the number of parameters calculated using conventional convolution and group convolution, respectively.

3.3 Dilated convolution

The pooling layer is typically employed in conventional image identification and segmentation to broaden the receptive field and condense the size of the feature map [55]. The original size is then restored using upsampling. Information is inevitably lost throughout this process, which negatively affects the prediction accuracy in particular. In the process of convolution, dilated convolution can expand the mapping range of the convolution kernel without increasing the number of parameters, allowing the output of each convolution to contain a wider range of data. Dilated convolution does not result in information loss or other issues, but it can enlarge the receptive field and maintain the resolution [56]. However, a breakdown of information continuity and data loss could result from an excessive rate of expansion. In this study, the original 3 × 3 convolution was changed into a 3 × 3 dilated convolution in the improved residual module. As shown in Fig.7, a 3 × 3 kernel size with dilation 2 equals a 5 × 5, but it has fewer parameters. If the dilation rate (DR) is excessively large, the characteristics in the receptive field will lose some correlation. To address the aforementioned issues, this study used dilated convolution with DRs of DR = 1, 2, 3, and 4 to achieve the goal of information expansion at different scales [57]. The dilated convolution operation was carried out on the feature map extracted by group convolution. Multi-scale dilated convolution is more effective at extracting information from lithologic images than conventional convolution. Without adding parameters, the detection range of the receptive field is expanded to make rock identification more accurate. If the convolution kernel size of the dilated convolution is K× K and the DR is D, then the convolution kernel size of the dilated convolution is equivalent to K× K, and the formula is as follows:

K=K+ (K1)× (D1).

3.4 Squeeze-and-excitation attention

The human visual system served as the inspiration for the attention mechanism, which can automatically focus on the key elements of the neural network, further optimizing it and raising the effectiveness of the model. A tunnel face image contains multiple layers of semantic information. The key and challenging aspect of increasing the recognition rate is how to extract low-level and high-level semantic information. The attention mechanism is capable of learning the semantic content of the image on its own and processing it in accordance with its significance to filter out the useful aspects of the recognition outcomes [58]. The research demonstrates that this module performs well in a wide range of classical networks, and therefore this study adopted a channel attention module with high portability [59]. SE is a typical attention module [60]. The core idea is to use feature recalibration automatically so that the DCNN selectively depends on informative features and suppresses useless features. The structure diagram of the SE block is shown in Fig.8. By using global average pooling to generate channel-wise statistics z, the excitation operation generates a weight to represent the importance of the feature channel based on the correlation between feature channels. The final output of the SE block is obtained by a rescaling operation. Suppose x RW× H×C is the input feature of the SE module, where R(W×H×C) represents the matrix of the input feature, which is assembled by W, H, and C, C is the number of channels of the input feature, and H and W are the dimensions of the feature, respectively. The entire process of SE attention is as follows:

zc=1H×W i=1Hj =1 Wxc(i,j ), c=1,2,...,C,

s=σ( W2δ ( W1z)),

X~ c=F sc ale(xc, sc),

where xc represents the cth feature map; σ and δ refer to the ReLU and sigmoid function, respectively; W1 and W2 represent the parameters of the two FC layers (i.e., a dimensionality-reduction and dimensionality-increasing layer, respectively); the vector s represents the weight of the corresponding channel; Fscale(xc, sc) refers to channel-wise multiplication between the feature map xc RH ×W and the scalar sc, RH ×W represents the matrix of the input feature, which is assembled by H and W.

3.5 Hyper-parameter settings

Two categories of parameters influence how well the neural network performs when its structure is established: ordinary parameters (such as weight and bias) and hyper-parameters (such as the number of training epochs, initial learning rate, and batch size). The hyper-parameters must be manually specified, while the ordinary parameters are derived through training of the model itself. The hyper-parameters are presented in Tab.3. It is commonly known that various hyper-parameter settings have a major impact on how effectively deep learning models function. After a number of tests were carried out in this study, the aforementioned hyper-parameters produced an effective model for identifying rock lithology. The optimum hyper-parameter configuration might yield better results; however, this is beyond the scope of this investigation.

3.6 Loss function

For image multi-classification problems, cross-entropy is typically employed as the loss function, and it is primarily used to gauge how well the result matches what was anticipated. The cross-entropy loss function is used to measure the difference between the actual output of the network and the correct label, and the network weights are updated through adaptive moment estimation [61]. The cross-entropy loss is expressed as follows:

xi= eaik =1 Neak,

H(p, q)=i= 1Np( xi)logq(xi),

where ai, xi represent the output value and probability of class i, respectively; p(xi) is the expected output probability distribution; q( xi) is the actual output probability distribution.

4 Experiment results and discussion

In the experiments, all models were trained and validated on an Nvidia RTX 3090 GPU with 64 GB of memory. To ensure the fairness of the comparison, all the experiments were carried out under the same hardware environment, and the data sets and their partitioning methods were precisely the same.

4.1 Evaluation metrics

The assessment metrics accuracy, precision, recall, and F1-score are commonly used to evaluate the performance of the proposed classification algorithms during the model iteration process. The specified basic metrics (i.e., true positive (TP), true negative (TN), false positive (FP), and false negative (FN)) can be used to determine the assessment criteria, as presented in Eqs. (9)–(12). Accuracy implies the ratio of correctly classified samples to all samples in the testing set, which can be calculated as follows:

Accuracy= TP+TNTP+T N+FP+ FN.

Precision is the percentage of accurately categorized as positive samples among all the samples that the classifier has determined to be positive. It is defined as follows:

Precision= TPTP+FP.

The recall criterion displays the percentage of TPs in all samples that are positive. The formula is as follows:

Recall=TPTP+F N.

The F1-score is the harmonic mean of precision and recall. The purpose of the introduction of F1-score as a complete measure is to balance the influence of accuracy and recollection. It is calculated as follows:

F α =( α2+ 1)Precision×R ecallα2(Pr ecisi on+Re call)

where α is the harmonic coefficient used to balance the impact of Precision and Recall, which is set to be 1.

4.2 Training and validation results

This study belongs to the field of image classification, and the author attempts to use the CNN, which performs well in image tasks, to recognize and categorize the tunnel face image, a unique application scene. To evaluate the efficacy of the proposed method, some CNN image classification models were selected for comparison. As there are currently numerous CNNs available for image classification, comparing the network proposed in this study with all the existing networks would be unrealistic. Thus, LeNet [62], AlexNet [14], ResNet [17], MobileNet [54], and ConvNext [63] were selected as the most representative five networks for the comparative analysis in this study. LeNet, among them, was the first CNN to be proposed for handwritten digit recognition, which was the ground-breaking work of CNNs. In 2012, the network AlexNet, which was first proposed several years ago, won the ILSVRC championship and became the first network to implement group convolution. In 2015, ResNet was the ILSVRC champion. This network performed exceptionally well, largely because it used a residual module. The example of a lightweight network is MobileNet. The primary characteristics of this network are its increased speed and decreased volume, both of which are achieved while maintaining a certain level of accuracy. The most recent CNN, ConvNext, was put forth in 2022. It uses only the pure CNN architecture without the addition of any additional modules and achieves a higher accuracy rate. The effectiveness of the suggested strategy was confirmed by comparison with the various classification methods mentioned above. Fig.9 illustrates the changes in cross-entropy loss and accuracy when the experiment epochs are increased for the six distinct deep learning approaches (i.e., LeNet, AlexNet, ResNet-101, MobileNet-v2, ConvNext-XL, and the proposed method), respectively. Fig.9(a) shows that the loss curve of the DCNN models (i.e., AlexNet, ResNet-101, MobileNet-v2, ConvNext-XL, and the proposed method) converges faster than the CNN model (i.e., LeNet) in the training processes [62]. Owing to its limited number of network layers and basic network structure, LeNet, the earliest CNN structure, performs the worst in the complicated background image classification task of this study. Although AlexNet and LeNet have very similar design principles, they also differ greatly. The nonlinear non-saturated ReLU function was adopted as the activation function. The ReLU function is significantly faster than the nonlinear saturation functions (such as the sigmoid function and tanh function) used in conventional neural networks in terms of the gradient decay speed during the training stage. Consequently, despite the comparatively simple structure of the AlexNet network compared to the LeNet network, the convergence speed is significantly increased. Although the loss value of the suggested method is slightly larger than that of the other DCNN models, it does not fluctuate significantly, according to the enlarged subfigure. This also indicates that the suggested approach has greater robustness. As shown in Fig.9(b), the proposed method achieves high accuracy initially. Additionally, the accuracy order in the validation set from low to high is as follows: LeNet, MobileNet-v2, AlexNet, ResNet-101, ConvNext-XL, and the proposed method. Further analysis of the loss curve shows that the convergence rate of the proposed method is relatively slow, whereas that of the ConvNext-XL network is the fastest. However, both converge to a lower level after 200 training iterations. The proposed method is second only to ConvNext-XL in terms of initial accuracy, according to further analysis of the accuracy curve of the verification set, but after 200 epoch training, the proposed method can achieve the highest accuracy. It is worth mentioning that in the practical application, the image prediction uses the final training results of the model, that is to say, the initial accuracy cannot determine the performance of the model in the image classification prediction. The model’s final training outcome is the most important, according to the analysis above. The accuracy of the six classification models reached a high and stable value after training for 100 epochs, according to the curve’s overall trend. Therefore, 101–200 epoch training was adopted in this study. This is included in the boxplot depicted in Fig.10 to better compare the recognition effect of the model. The boxplot clearly depicts the data distribution range, and the accuracy of these models is evident in the figure. It is obvious that the suggested strategy can improve the recognition accuracy while maintaining the convergence speed in the challenging background image classification problem addressed in this study. The loss and accuracy curves are plotted in Fig.11 to measure the fitting degree and convergence of the proposed model quantitatively. Fig.11(a) shows that the model converges both in the training and validation processes. No over-fitting or under-fitting phenomena were observed during the training process, and the maximum validation accuracy was 96%, as shown in Fig.11(b). Therefore, the correctness of the hyperparameter setup in this research was further demonstrated.

4.3 Testing results

A confusion matrix, which has a specific table design, is used in the field of machine learning to visualize how well an algorithm performs. Each row of the matrix represents an occurrence in an actual class, while each column of the matrix represents an example in a forecast class. In this study, a confusion matrix was constructed to provide a helpful comparison, as shown in Fig.12(a)–Fig.12(f). The quantitative comparison of the confusion matrices of the six methods revealed a striking improvement in the classification of rock lithology for the suggested strategy. With a probability of 98.78%, siltstone had the highest actual positive probability, followed by those of basalt, tuff, and granite, which had probabilities of 97.62%, 95.35%, and 94.59%, respectively. Basalt had a 5.41% risk of being incorrectly categorized as tuff, and granite had a 4.65% probability, according to research on misclassification across different rock lithology categories. Owing to their straightforward network structure, the accuracy of model 1 (i.e., LeNet) and model 4 (i.e., MobileNet-v2) for the lithology identification of tuff was only 38.03% and 38.10%, respectively. Group convolution and a deeper network topology were used, which increased the recognition accuracy of model 2 (i.e., AlexNet) and model 3 (i.e., ResNet-101) to 78.57% and 75.00%, respectively. However, the suggested approach significantly raised the tuff identification accuracy to 97.62%. In terms of model 5 (i.e., ConvNext-XL) the lithology identification probability of siltstone was 97.56%, followed by that of granite, basalt, and tuff, which had probabilities of 93.02%, 91.89%, and 89.29%, respectively. In comparison to model 1, the first CNN, the accuracy rate increased by 59.59%. The accuracy rate increased by 22.62% when compared to that of model 3. The comparably high misjudgment probabilities between basalt and tuff were generated by the similar minor features and gray levels in the example images of the two groups. Therefore, it is critical to expand the basalt and tuff image data set.

The confusion matrix in Fig.12 can be used to calculate the evaluation index, which will help in the discussion of how superior our model is to competing models. The calculated results are presented in Tab.4. The competence of the compared models under various types of assessment criteria is presented in Fig.13(a)–Fig.13(d). The classification of basalt and tuff photographs among the four categories indicates slightly poor performance, as the assessment criteria for these two categories exhibit remarkably low values. The average evaluation results (accuracy, precision, recall, F1-score indicator, and execution time indicator) of the six methods are presented in Fig.14. As can be clearly observed, our proposed model is better than other classification models in regard to accuracy, precision, recall, and F1-score. Specifically, the proposed model achieved 96.7% in the F1-score, 15.3% higher than that obtained by the baseline ResNet-101. Model 4 (i.e., MobileNet-v2) was the fastest running model in terms of execution time, followed by model 2 (i.e., AlexNet), which uses a group convolution approach, and model 1 (i.e., LeNet), whose network topology is straightforward. The running duration of model 3 (i.e., ResNet-101) was relatively long owing to its complexity and numerous parameters. The suggested method was not the fastest to run because it employs a deep network to process the image classification problem against a complex background. The proposed solution, however, uses group convolution to boost running speed, making it significantly quicker than the baseline ResNet-101. In terms of memory of the model, owing to its simplistic network architecture, model 1 used the least amount of memory. Model 4 used 8.73 MB of memory because it is a small network. As the model proposed in this study is an improvement of ResNet-101, its memory is comparable to that of model 3, whereas the size of model 5 reaches 1.5 GB as a result of its complex structure and numerous network layers. Actually, it is not worthwhile to use so much memory in order to increase accuracy. To reduce the model size as much as possible while increasing its accuracy, it is preferable to incorporate more sophisticated modules as suggested in this paper. The reasoning speed and volume of final trained model are presented in Tab.5 for a more thorough analysis of the model’s performance. In conclusion, the suggested model is computationally more efficient in recognizing rock lithology in complicated situations and can achieve a better level of accuracy than comparable models.

4.4 Visualization analysis

In general, deep learning networks are viewed as a “black box” with poor interpretability. To foster trust in intelligent systems, we must create “transparent” models that can explain why they anticipate what they predict. The feature selection process of the model will be examined and explained using Grad-CAM technology [64] in this section, which will also examine the reasons why DCNN classifies the input images as belonging to various lithology categories. The entire process of Grad-CAM is as follows:

α kc= 1Zij yc Aijk,

L G ra d- CAMc= R eL U( kαkc Ak),

where yc represents the score predicted by the network for category c; Ak is the kth feature map; α kc is the neuron importance weights.

Grad-CAM images of two typical granite image examples are displayed in Fig.15. The suggested model correctly predicted one of them with high confidence in Fig.15(a) and incorrectly predicted the other, misclassifying it as tuff, in Fig.15(e). The network’s region of interest (ROI) corresponds to those regions highlighted in red. The model will pay increasingly greater attention to the ROI and utilize it as the foundation for classification as network layers become deeper. The ROI in Fig.15(e) is focused in the area mixed with tuff, as can be observed in Fig.15. The model thus miscalculated it. Additionally, we illustrate the class-discriminative localization map of the reference ResNet-101 and our suggested model for several test images to show the efficiency of the attention mechanism for rock lithology classification in complicated circumstances. According to Fig.16, it is evident that the characteristic covers more of the item to be recognized (the regions marked in deep color), and the likelihood of identification is increased with the help of the attention mechanism. Fig.16(a) and Fig.16(d) witnessed increases in their predicted outcomes, from 77.5% to 82.7% and from 95.6% to 98.4%, respectively. This proves that the attention mechanism does instruct the network to pay attention to vital information and can lessen the impacts of interference from other things. The probability of prediction for Fig.16(a) is lower than that for Fig.16(d). Further investigation reveals that a large proportion of the negative samples in Fig.16(a) are negative when compared to Fig.16(d), proving that the presence of interference from equipment and personnel will reduce the lithology identification’s accuracy.

We successively output the feature maps of five convolution phases in the network structure to further examine the procedure of extracting tunnel rock lithology image features, as displayed in Fig.17(b)–Fig.17(f). The first step, as can be observed in Fig.17(b), consists of various edge detectors. Almost all of the original image information is preserved in the activation output at this stage, and the response feature map primarily consists of background, texture, and contour information. The middle-level response characteristic map mainly shows the fine-grained characteristic information of rock, as shown in Fig.17(c). As the layer deepens, the activation output becomes increasingly abstract, as exhibited in Fig.17(d) and Fig.17(e), which is beyond the scope of human intuition. That is, information on the visual aspects of rock images is becoming increasingly scarce, while information about rock categories is becoming increasingly abundant. In brief, the data set of this study is derived from the under-construction tunnel face. The properties of a dark background and increased image noise make lithology identification much more difficult. The analysis above demonstrates the superiority of the suggested approach.

5 Conclusions and outlook

To identify the rock lithology quickly and accurately in a tunnel face under construction, a deep learning-based automated method for classifying lithology from tunnel face images was proposed, employing a multi-scale dilated convolutional attention network. The specific conclusions are as follows.

1) A data set of imagery of tunnel faces from four common lithology types was constructed for tunnel engineering. Additionally, a multi-scale dilated convolutional attention block that inherits the advantages of residual learning was designed to address the fine-grained rock lithology recognition under complicated tunnel scenes.

2) A comparative study with other classifiers revealed that the developed model for tunnel lithology prediction is superior to other deep learning models in terms of assessment criteria such as accuracy, precision, recall, and F1-score, achieving a maximum accuracy of 96%. In addition, owing to the inclusion of group convolution, the computational time of the recommended method is less than that of ResNet-101.

3) The Grad-CAM algorithm, which produces “visual explanations” for decisions, was employed to explain the feature extraction process of our proposed DCNN model in an innovative way, making it more transparent. The findings demonstrated that the multi-scale dilated attention convolution module can extract multi-scale features, and that the mechanism may adaptively select the salient features to enhance the model’s performance further.

It is worth noting that much can still be done to increase the richness of rock samples. Only four rock categories were considered in this model owing to the actual engineering situation. Although the accuracy is outstanding in the given circumstances, more research is required to determine how well this model predicts the behavior of additional types of rocks. To increase the applicability of the suggested strategy, additional attention to data gathering needs to be given in the future. Additionally, multiple lithologies on the tunnel face may lead to misjudgment by the proposed method, which results in limitations for its application to composite strata. In such conditions, a segmentation algorithm should be considered and included to improve the proposed model, which will be the further work.

References

[1]

Xu Z H, Liu F M, Lin P, Shao R Q, Shi X S. Non-destructive, in-situ, fast identification of adverse geology in tunnels based on anomalies analysis of element content. Tunnelling and Underground Space Technology, 2021, 118: 104146

[2]

Xu Z H, Ma W, Lin P, Hua Y L. Deep learning of rock microscopic images for intelligent lithology identification: Neural network comparison and selection. Journal of Rock Mechanics and Geotechnical Engineering, 2022, 14(4): 1140–1152

[3]

Liu Z B, Li L, Fang X L, Qi W B, Shen J M, Zhou H Y, Zhang Y L. Hard-rock tunnel lithology prediction with TBM construction Big Data using a global-attention-mechanism-based LSTM network. Automation in Construction, 2021, 125: 103647

[4]

Xu Z H, Shi H, Lin P, Liu T H. Integrated lithology identification based on images and elemental data from rocks. Journal of Petroleum Science Engineering, 2021, 205: 108853

[5]

Xu Z H, Wang W Y, Lin P, Nie L C, Wu J, Li Z M. Hard-rock TBM jamming subject to adverse geological conditions: Influencing factor, hazard mode and a case study of Gaoligongshan Tunnel. Tunnelling and Underground Space Technology, 2021, 108: 103683

[6]

Ren D J, Shen S L, Arulrajah A, Cheng W C. Prediction model of TBM disc cutter wear during tunnelling in heterogeneous ground. Rock Mechanics and Rock Engineering, 2018, 51(11): 3599–3611

[7]

Kanik M. Evaluation of the limitations of RMR89 system for preliminary support selection in weak rock class. Computers and Geotechnics, 2019, 115: 103159

[8]

Peng R, Meng X R, Zhao G M, Ouyang Z H, Li Y M. Multi-echelon support method to limit asymmetry instability in different lithology roadways under high ground stress. Tunnelling and Underground Space Technology, 2021, 108: 103681

[9]

Ayawah P E A, Sebbeh-Newton S, Azure J W A, Kaba A G A, Anani A, Bansah S, Zabidi H. A review and case study of artificial intelligence and machine learning methods used for ground condition prediction ahead of tunnel boring machines. Tunnelling and Underground Space Technology, 2022, 125: 104497

[10]

de Miguel-García E, Gómez-González J F. A new methodology to estimate the powder factor of explosives considering the different lithologies of volcanic lands: A case study from the island of Tenerife, Spain. Tunnelling and Underground Space Technology, 2019, 91: 103023

[11]

Bi L, Ren B Y, Zhong D H, Hu L X. Real-time construction schedule analysis of long-distance diversion tunnels based on lithological predictions using a Markov process. Journal of Construction Engineering and Management, 2015, 141(2): 04014076

[12]

Li S C, Liu B, Xu X J, Nie L C, Liu Z Y, Song J, Sun H F, Chen L, Fan K. An overview of ahead geological prospecting in tunneling. Tunnelling and Underground Space Technology, 2017, 63: 69–94

[13]

Hinton G E, Salakhutdinov R R. Reducing the dimensionality of data with neural networks. Science, 2006, 313(5786): 504–507

[14]

Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. Communications of the ACM, 2017, 60(6): 84–90

[15]

SzegedyCLiu WJiaY QSermanetPReedS AnguelovDErhan DVanhouckeVRabinovichA. Going deeper with convolutions. In: Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, MA: IEEE, 2015, 1–9

[16]

SimonyanKZisserman A. Very deep convolutional networks for large-scale image recognition. In: Proceedings of the 3rd International Conference on Learning Representations (ICLR). San Diego, CA: ICLR, 2015

[17]

HeK MZhang X YRenS QSunJ. Deep residual learning for image recognition. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV: IEEE, 2016, 770–778

[18]

Lei M F, Liu L H, Shi C H, Tan Y, Lin Y X, Wang W D. A novel tunnel-lining crack recognition system based on digital image technology. Tunnelling and Underground Space Technology, 2021, 108: 103724

[19]

SunX HShi C HLiuL HLeiM F. Concrete crack image recognition system based on improved seed filling algorithm. Journal of South China University of Technology (Natural Science Edition), 2022, 50(5): 127–136, 146 (in Chinese)

[20]

Tinoco J, Gomes Correia A, Cortez P. Support vector machines applied to uniaxial compressive strength prediction of jet grouting columns. Computers and Geotechnics, 2014, 55: 132–140

[21]

Makasis N, Narsilio G A, Bidarmaghz A. A machine learning approach to energy pile design. Computers and Geotechnics, 2018, 97: 189–203

[22]

Han X L, Jiang N J, Yang Y F, Choi J, Singh D N, Beta P, Du Y J, Wang Y J. Deep learning based approach for the instance segmentation of clayey soil desiccation cracks. Computers and Geotechnics, 2022, 146: 104733

[23]

Zhang W G, Li H R, Li Y Q, Liu H L, Chen Y M, Ding X M. Application of deep learning algorithms in geotechnical engineering: A short critical review. Artificial Intelligence Review, 2021, 54(8): 5633–5673

[24]

Huang M Q, Ninić J, Zhang Q B. BIM, machine learning and computer vision techniques in underground construction: Current status and future perspectives. Tunnelling and Underground Space Technology, 2021, 108: 103677

[25]

Bai X D, Cheng W C, Sheil B B, Li G. Pipejacking clogging detection in soft alluvial deposits using machine learning algorithms. Tunnelling and Underground Space Technology, 2021, 113: 103908

[26]

Chen J, Zhang D M, Huang H W, Shadabfar M, Zhou M L, Yang T J. Image-based segmentation and quantification of weak interlayers in rock tunnel face via deep learning. Automation in Construction, 2020, 120: 103371

[27]

Wang Z F, Cheng W C. Predicting jet-grout column diameter to mitigate the environmental impact using an artificial intelligence algorithm. Underground Space, 2021, 6(3): 267–280

[28]

Hu A F, Li T, Chen Y, Ge H B, Li Y J. Deep learning for preprocessing of measured settlement data. Journal of Hunan University (Natural Sciences), 2021, 48(9): 43–51

[29]

Patel A K, Chatterjee S. Computer vision-based limestone rock-type classification using probabilistic neural network. Geoscience Frontiers, 2016, 7(1): 53–60

[30]

Xu Z H, Ma W, Lin P, Shi H, Pan D D, Liu T H. Deep learning of rock images for intelligent lithology identification. Computers & Geosciences, 2021, 154: 104799

[31]

Cai Y Y, Xu D G, Shi H. Rapid identification of ore minerals using multi-scale dilated convolutional attention network associated with portable Raman spectroscopy. Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, 2022, 267: 120607

[32]

Cao Z M, Yang C, Han J, Mu H W, Wan C, Gao P. Lithology identification method based on integrated K-means clustering and meta-object representation. Arabian Journal of Geosciences, 2022, 15(17): 1462

[33]

Fu D, Su C, Wang W J, Yuan R Y. Deep learning based lithology classification of drill core images. PLoS One, 2022, 17(7): e0270826

[34]

Li N, Hao H Z, Gu Q, Wang D R, Hu X M. A transfer learning method for automatic identification of sandstone microscopic images. Computers & Geosciences, 2017, 103: 111–121

[35]

Polat Ö, Polat A, Ekici T. Automatic classification of volcanic rocks from thin section images using transfer learning networks. Neural Computing & Applications, 2021, 33(18): 11531–11540

[36]

Seo W, Kim Y, Sim H, Song Y, Yun T S. Classification of igneous rocks from petrographic thin section images using convolutional neural network. Earth Science Informatics, 2022, 15(2): 1297–1307

[37]

Chen J Y, Zhou M L, Huang H W, Zhang D M, Peng Z C. Automated extraction and evaluation of fracture trace maps from rock tunnel face images via deep learning. International Journal of Rock Mechanics and Mining Sciences, 2021, 142: 104745

[38]

Chen J Y, Chen Y F, Cohn A G, Huang H W, Man J H, Wei L J. A novel image-based approach for interactive characterization of rock fracture spacing in a tunnel face. Journal of Rock Mechanics and Geotechnical Engineering, 2022, 14(4): 1077–1088

[39]

Xue Y D, Cao Y P, Zhou M L, Zhang F, Shen K, Jia F. Rock mass fracture maps prediction based on spatiotemporal image sequence modeling. Computer-Aided Civil and Infrastructure Engineering, 2023, 38(4): 470–488

[40]

Chen J Y, Yang T J, Zhang D M, Huang H W, Tian Y. Deep learning based classification of rock structure of tunnel face. Geoscience Frontiers, 2021, 12(1): 395–404

[41]

Qiao W D, Zhao Y F, Xu Y, Lei Y M, Wang Y J, Yu S, Li H. Deep learning-based pixel-level rock fragment recognition during tunnel excavation using instance segmentation model. Tunnelling and Underground Space Technology, 2021, 115: 104072

[42]

Cheng W C, Bai X D, Sheil B B, Li G, Wang F. Identifying characteristics of pipejacking parameters to assess geological conditions using optimisation algorithm-based support vector machines. Tunnelling and Underground Space Technology, 2020, 106: 103592

[43]

Chen J Y, Zhou M L, Zhang D M, Huang H W, Zhang F S. Quantification of water inflow in rock tunnel faces via convolutional neural network approach. Automation in Construction, 2021, 123: 103526

[44]

Chen J Y, Huang H W, Cohn A G, Zhou M L, Zhang D M, Man J H. A hierarchical DCNN-based approach for classifying imbalanced water inflow in rock tunnel faces. Tunnelling and Underground Space Technology, 2022, 122: 104399

[45]

Jalalifar H, Mojedifar S, Sahebi A A, Nezamabadi-pour H. Application of the adaptive neuro-fuzzy inference system for prediction of a rock engineering classification system. Computers and Geotechnics, 2011, 38(6): 783–790

[46]

Wang M N, Zhao S G, Tong J J, Wang Z L, Yao M, Li J W, Yi W H. Intelligent classification model of surrounding rock of tunnel using drilling and blasting method. Underground Space, 2021, 6(5): 539–550

[47]

Zhao S G, Wang M N, Yi W H, Yang D, Tong J J. Intelligent classification of surrounding rock of tunnel based on 10 machine learning algorithms. Applied Sciences, 2022, 12(5): 2656

[48]

Hou S K, Liu Y R, Yang Q. Real-time prediction of rock mass classification based on TBM operation Big Data and stacking technique of ensemble learning. Journal of Rock Mechanics and Geotechnical Engineering, 2022, 14(1): 123–143

[49]

Qiu D H, Fu K, Xue Y G, Tao Y F, Kong F M, Bai C H. TBM tunnel surrounding rock classification method and real-time identification model based on tunneling performance. International Journal of Geomechanics, 2022, 22(6): 04022070

[50]

Hu J H, Zhou T, Ma S W, Yang D J, Guo M M, Huang P L. Rock mass classification prediction model using heuristic algorithms and support vector machines: A case study of Chambishi copper mine. Scientific Reports, 2022, 12(1): 928

[51]

Xu J J, Zhang H, Tang C S, Cheng Q, Tian B G, Liu B, Shi B. Automatic soil crack recognition under uneven illumination condition with the application of artificial intelligence. Engineering Geology, 2022, 296: 106495

[52]

HeK MZhang X YRenS QSunJ. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV). Santiago, CA: IEEE, 2015, 1026–1034

[53]

XieS NGirshick RDollárPTuZ WHeK M. Aggregated residual transformations for deep neural networks. In: Proceedings of the 2017 IEEE conference on computer vision and pattern recognition (CVPR). Honolulu, HI: IEEE, 2017, 1492–1500

[54]

HowardA GZhu M LChenBKalenichenkoDWangW J WeyandTAndreetto MAdamH. Mobilenets: Efficient convolutional neural networks for mobile vision applications. 2017, arXiv: 1704.04861

[55]

WangP QChen P FYuanYLiuDHuangZ H HouX DCottrell G. Understanding convolution for semantic segmentation. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV). Lake Tahoe, NV: IEEE, 2018, 1451–1460

[56]

YuFKoltunV. Multi-scale context aggregation by dilated convolutions. In: Proceedings of the 4th International Conference on Learning Representations (ICLR). San Juan, UT: ICLR, 2016

[57]

Vo D M, Lee S W. Semantic image segmentation using fully convolutional neural networks with multi-scale images and multi-scale dilated convolutions. Multimedia Tools and Applications, 2018, 77(14): 18689–18707

[58]

MnihVHeess NGravesAKavukcuogluK. Recurrent models of visual attention. In: Ghahramani Z, Welling M, Cortes C, Lawrence N D, Weinberger K Q, eds. Advances in Neural Information Processing Systems 27 (NIPS 2014). Montreal, QC: NIPS, 2014

[59]

VaswaniAShazeer NParmarNUszkoreitJJonesL GomezA NKaiser LPolosukhinI. Attention is all you need. In: Guyon I, Luxburg U V, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R, eds. Advances in Neural Information Processing Systems 30 (NIPS 2017). Long Beach, CA: NIPS, 2017

[60]

HuJShenL SunG. Squeeze-and-excitation networks. In: Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Salt Lake City, UT: IEEE, 2018, 7132–7141

[61]

KingmaD PBa J. Adam: A method for stochastic optimization. In: Proceedings of the 3rd International Conference on Learning Representations (ICLR). San Diego, CA: ICLR, 2015

[62]

Lecun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 1998, 86(11): 2278–2324

[63]

LiuZMaoH WuC YFeichtenhofer CDarrellTXieS. A ConvNet for the 2020s. In: Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans, LA: IEEE, 2022

[64]

SelvarajuR RCogswellMDasA VedantamRParikh DBatraD. Grad-CAM: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV). Venice: IEEE, 2017, 618–626

RIGHTS & PERMISSIONS

Higher Education Press

AI Summary AI Mindmap
PDF (12542KB)

2959

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/