1. School of Civil and Transportation Engineering, Guangdong University of Technology, Guangzhou 510006, China
2. JSTI Group Guizhou Engineering Survey and Design Co., Ltd., Guangzhou 510800, China
3. Guangzhou Municipal Engineering Testing Co., Ltd., Guangzhou 510520, China
gongfa.chen@gdut.edu.cn
Show less
History+
Received
Accepted
Published
2023-07-30
2023-10-27
2024-03-15
Issue Date
Revised Date
2024-04-22
PDF
(11153KB)
Abstract
Training samples for deep learning networks are typically obtained through various field experiments, which require significant manpower, resource and time consumption. However, it is possible to utilize simulated data to augment the training samples. In this paper, by comparing the actual experimental model with the simulated model generated by the gprMax [1] forward simulation method, the feasibility of obtaining simulated samples through gprMax simulation is validated. Subsequently, the samples generated by gprMax forward simulation are used for training the network to detect objects in existing real samples. At the same time, aiming at the detection and intelligent recognition of road sub-surface defects, the Swin-YOLOX algorithm is introduced, and the excellence of the detection network, which is improved by augmenting the simulated samples with real samples, is further verified. By comparing the prediction performance of the object detection models, it is observed that the model trained with mixed samples achieved a recall of 94.74% and a mean average precision () of 97.71%, surpassing the model trained only on real samples by 12.95% and 15.64%, respectively. The feasibility and excellence of training the model with mixed samples are confirmed. The potential of using a fusion of simulated and existing real samples instead of repeatedly acquiring new real samples by field experiment is demonstrated by this study, thereby improving detection efficiency, saving resources, and providing a new approach to the problem of multiple interpretations in ground penetrating radar (GPR) data.
In recent years, road collapse accidents in China have become more frequent and severe due to the increasing traffic burden and the impact of various environmental factors. From January 2019 to January 2022, there were more than 717 road collapse accidents nationwide, resulting in a total of 108 deaths and 174 injuries [2]. Due to their low efficiency, traditional road inspection techniques are unable to meet the requirements of modern road construction and can also cause destructive damage to road structures. Ground penetrating radar (GPR) has become the main method for detecting sub-surface defects in roads due to its high efficiency and convenience [3]. However, due to the lack of objective criteria for evaluating abnormal GPR images and the ambiguous and subjective nature of radar data interpretation, technical personnel often make incorrect judgments and miss detections when analyzing gray-scale images. In addition, manual interpretation of defects is time-consuming, resulting in longer detection cycles and inability to meet increasing engineering demands [4]. Tong et al. [5] have demonstrated that manual defect detection gives poor results due to various interfering factors, such as noise, blurriness and distortion that exist in GPR data. Therefore, it is necessary to use other analytical methods for defect detection. Recently, based on the abundance of GPR detection data, the application of deep learning and other algorithms has made some progress in road defect detection [6].
However, deep learning detection algorithms often require input of a large number of training samples [7]. If following traditional practices, obtaining real training samples typically requires an extensive amount of fieldwork and experimentation, which consumes significant human resources, material resources, and time [8]. In the current stage of development of computer vision and image processing technologies, more and more researchers are applying sample-generation techniques such as generative adversarial networks [9] and 3D reconstruction to obtain new images and to achieve image enhancement, or are using methods such as Poisson fusion [10] to merge a large number of numerically simulated images with real images. These approaches expand the data set, improve image quality, enhance data diversity, optimize intelligent detection methods [11] and improve practical application of the techniques in real-world engineering scenarios [12]. However, these methods still have limitations, such as limited diversity of the generated samples [13]. To further improve the generalization ability of deep learning detection methods, it is necessary to consider other sample generation methods. Many researchers have conducted relevant studies on this issue. For example, regarding typical road defects, the gprMax software [14] (open-source software developed by Professor Giannopoulos from the University of Edinburgh) [15] has been used to perform forward simulations of 3D GPR for inflated and water-filled cavity defects. Giannopoulos and colleagues compared their simulation results with engineering examples to verify the reliability of the forward simulation. However, they did not combine the forward simulation with deep learning to address the issue of insufficient training samples. Therefore, in this study, the feasibility of gprMax forward simulation is investigated and verified by modifying model parameters to obtain simulated training samples which are compared with real experimental samples. The obtained training samples are then fed into a detection network for learning and training in order to address the issue of insufficient data sets for sub-surface road defects.
2 Methodology
In this section, the GPR working principle and the relevant theory behind gprMax forward modeling are introduced. In addition, the network architecture of the YOLOX object detection algorithm based on the Swin Transformer is introduced in detail. This serves as a theoretical reference for the subsequent detection experiments.
2.1 Forward modeling based on gprMax
2.1.1 Working principle of ground penetrating radar
GPR, also known as georadar [16], is a system that uses radar waves to obtain information about sub-surface structural layers and to determine the presence of road sub-surface defects. The GPR system consists of two main components: the transceiver system and the data acquisition and imaging system. The transceiver system, which is the core of the GPR system, is responsible for emitting electromagnetic waves and receiving their reflections, and transmitting the resulting electric signals (data) to the signal processor. The data acquisition and imaging system, which consists of a signal processor and a host, is mainly responsible for processing and analyzing the data to image the sub-surface features [17].
During the detection process, the GPR system sets parameters such as start-stop time, stacking number, and transmission frequency through the corresponding acquisition software on the display unit. When the emitted electromagnetic waves encounter underground anomalous objects these objects, varying in properties from the general sub-surface medium, cause the waves to scatter and reflect. The reflected, or echo, signal is received by the receiving antenna of the GPR, as shown in Fig.1(a). Based on the waveform characteristics represented by the GPR echo signals (Fig.1(b)), the detection of underground anomalies can be achieved [18].
To ensure the effectiveness of GPR, relevant operational parameters need to be set according to the detection requirements.
1) Matching the electromagnetic wave velocity of the medium (v) and relative permittivity :
Due to differences in the relative permittivity and conductivity of the medium, their propagation characteristics also vary. The penetration depth of radar electromagnetic signals depends on the medium, specifically, it is related to the conductivity of the medium. The higher the conductivity of the medium, the shallower the penetration depth of the electromagnetic waves. Therefore, prior to detection, it is often necessary to carry out core drilling verification or pre-testing to determine the of electromagnetic waves in the medium. Then, the of the medium can be calculated using Eq. (1) to adjust and ensure the accuracy of the parameters used by the GPR, thus obtaining a more precise analysis, in the equation, is the speed of light in a vacuum.
2) GPR resolution
GPR resolution can be divided into vertical and horizontal resolutions, which represent the smallest scales that the radar can detect in the vertical and horizontal directions. In practical applications, the vertical resolution () and horizontal resolution () can be calculated using the following formulae:
where represents the wavelength, and Z represents the burial depth of the target.
3) Antenna center frequency (f )
The GPR emits electromagnetic waves with a center frequency between 10 and 5000 MHz. Different antenna frequencies correspond to different detection scales. When detecting sub-surface defects in roads, the center frequency of the antenna can first be chosen by experience or by using Eq. (4), and then the available antenna with the closest frequency can be selected. In the equation, is the vertical resolution, is the horizontal resolution.
4) Time window ()
The time window refers to the delay time for receiving data during a single GPR scan. If the time window is too small, the radar waves may not reach the target to be detected, resulting in no detection results. Conversely, if the time window is too large, the energy of the electromagnetic waves will be continuously attenuated, which may result in a lack of reflected signals in the later part of the detection process. K is a weighting factor typically taken as 1.3 to 1.5. Typically, the time window can be determined using the following formula:
When GPR is used to detect sub-surface defects in roads, continuous measurements are often made, with a large number of electromagnetic pulses being emitted and received, resulting in the effectiveness of the waveform graph is reduced after superimposition, which increases the difficulty in interpretation [19]. Superposition waveform graphs (A-Scans) can lead to poor results and increased difficulty in interpretation. Therefore, in practical detection, waveform graphs are often transformed into greyscale images (B-Scans). Fig.2 explains the principle of transforming waveform graphs into greyscale images, the amplitude of the signals is represented by the intensity of the color, where darker colors indicate stronger reflection wave signal amplitudes. The thickness of the color blocks represents the frequency of the signals, where thicker blocks indicate lower frequency of the reflected wave signals, and vice versa. Fig.3 shows the greyscale image obtained after the transforming the waveform graph, which provides a more intuitive and clearer judgment of abnormal conditions in the road sub-surface structure.
2.1.2 Theory of gprMax forward modeling
GprMax forward modeling refers to the simulation of the propagation of electromagnetic waves in sub-surface media using electromagnetic wave theory and relevant software on a computer [20]. By studying and analyzing the information due to anomalies and carried by reflected waves, the simulation results can be compared and validated with the actual GPR images, thereby improving the interpretation accuracy of GPR results. When studying the electromagnetic field of GPR, the final solution of the finite difference method is usually distributed over a certain time range and calculated. After years of research, the theoretical basis and practical application of finite difference time domain (FDTD) [21] have matured. The solution of the FDTD depends on Maxwell’s curl Refs. [22,23].
GprMax is an open-source software for electromagnetic wave simulation. Due to its high computational efficiency, convenient parameter configuration, and realistic simulation results, gprMax is well suited for forward modeling in GPR applications for road sub-surface structure detection. GprMax is primarily written in Python 3, so when using the software, Python commands are used to perform the desired tasks. Before performing forward modeling with gprMax, a simulation model needs to be created based on the research content, including the model’s dimensions, spatial resolution, infill materials, radar movement step size, and other parameter settings. Once the parameters are determined, they are written into an input file using Python. Running and calculating the input model will generate output files with the OUT extension containing parameters such as those related to reflection signals, sampling rates, and antenna frequencies.
2.2 Relevant object detection networks
Swin Transformer [24] represents a breakthrough in the field of computer vision. It has achieved remarkable performance in various visual tasks, including object detection and instance segmentation. In the domain of object detection algorithms, one-stage detectors include YOLO, SSD, RetinaNet, and CenterNet, while two-stage detectors include R-CNN, Faster R-CNN, Mask R-CNN, and RepPoints. Among one-stage detectors, the YOLO series is one of the most popular models, striking a balance between high detection speed and accuracy. It has been successfully applied in fields such as agriculture, geology, remote sensing, and medicine.
2.2.1 YOLOX detection model
YOLOX combines the cross stage partial darknet (CSPDarknet) feature extraction architecture, focus technique, mosaic data augmentation, and anchor-free concept [25]. It innovatively introduces decoupled prediction heads and SimOTA dynamic positive matching method. As a result, it achieves faster speed, higher recognition accuracy, and a more lightweight model. It can be easily deployed on low-power mobile devices, making it highly valuable for advanced driver-assistance systems.
As shown in Fig.4, the YOLOX model consists of the CSPDarknet backbone for feature extraction, the enhanced feature pyramid networks (FPN) as the neck, with the YOLO head for classification and regression [26]. The backbone primarily consists of three main modules: Focus, CSPNet, and spatial pyramid pooling (SPP) network. The model first performs a slicing operation (feature extraction) on an input image within the backbone network. It samples from the entire image at equal intervals of space to obtain multiple appropriately sized sampling images. Then, it combines the images in the channel dimension, transferring the information from the image to the channel space, resulting in downsampled images without loss of information. CSPNet splits the original stack of residual blocks into two parts: the backbone part continues with the original stack of residual blocks, while the other part, acting as a residual edge, undergoes minimal processing and connects directly to the next layer.
The three effective feature layers extracted by CSPDarknet are fed into path aggregation feature pyramid network (PAFPN) for the next network construction. Multi-scale feature fusion and further feature extraction are performed by upsampling and downsampling. YOLOHead analyses coordinates of the corresponding target bounding box, and assesses confidence level and class scores based on the channel information of the feature points in the feature map. The final detection results are determined by non-maximum suppression [27]. YOLOX ingeniously integrates CSPDarknet, PAFPN, decoupled heads, data augmentation, label assignment, and anchor-free mechanism, surpassing other algorithms in the YOLO series in terms of detection speed and accuracy.
2.2.2 Swin Transformer network
Swin Transformer uses a hierarchical structure similar to that commonly used in convolutional neural networks [28]. One important feature in convolutional neural networks is the increase in the receptive field of each convolutional kernel through pooling operations, with a certain stride. The features extracted after each pooling can capture objects of different sizes, then the receptive field of the nodes expands as the network deepens. Similarly, Swin Transformer introduces a pooling-like operation called Patch Merging. At each stage (except for stage 1), the first step is to perform downsampling using a patch merging layer. As shown in Fig.5, assuming the input to the Patch Merging layer is a single-channel feature map of size 4 × 4, patch merging divides each 2 × 2 adjacent pixels into a patch. Then, the pixels at corresponding positions (same color) within each patch are concatenated to form four feature maps. These four feature maps are then concatenated along the depth dimension and passed through a LayerNorm layer. Finally, a fully connected layer performs a linear transformation in the depth dimension of the feature map, reducing the depth from C to 2C. From this simple example it can be observed that after the patch merging layer, the height and width of the feature map are halved, while the depth is doubled, obtaining multi-scale features.
Swin Transformer consists of two successive Swin Transformer Blocks, which are the core components of this algorithm. The Swin Transformer Block uses two modules, namely the window multi-head self-attention (W-MSA) and the shifted-window multi-head self-attention (SW-MSA) [29], to achieve hierarchical transformers. This can extractmulti-scale features image features like CNN, making Swin Transformer suitable as a backbone network [30] for various visual downstream tasks such as object detection and image segmentation. As shown in Fig.6, information passes through the left module consisting of Layer Norm, W-MSA unit, multi-layer perceptron (MLP) layer, and skip connections, to perform window-based multi-head self-attention calculation. The information then passes through the right module consisting of Layer Norm, SW-MSA unit, MLP layer, and skip connections, to perform SW-MSA calculation. The input and output dimensions of both modules are the same, and they are directly concatenated.
Swin Tiny (referred to as Swin-T) is a Swin Transformer model with a computational complexity similar to that of ResNet-50 [31]. Swin-T is similar to Darknet-53 and ResNet-50 in that different layers can output features at different scales.. Their number of model parameter and floating-point operations are also very close. As shown in Fig.7, the Swin-T model is built with four Swin Transformer blocks {Stages1–4}, using the Swin Transformer Block configuration (2,2,6,2). The downsampling factors are (4,8,16,32). The input image (H,W,3) first passes through a patch partition layer, which divides it into 4 × 4 patches, transforming the image data dimension to (H/4,W/4,48). The image data are then transformed by a Linear Embedding layer, which doubles the number of channels, resulting in dimensions of (H/4,W/4,96). In Stage1, the Swin Transformer Block performs multi-head self-attention calculation and feature extraction based on a 7 × 7 window, producing feature maps with unchanged dimensions of (H/4,W/4,96). In the subsequent process, each Patch Merging layer divides the input feature map into 2 × 2 patches and doubles the number of channels, to achieve downsampling of the feature map. The dimensions of the feature maps are successively transformed to (H/8,W/8,192), (H/16,W/16,384), and (H/32,W/32,768), in a similar manner to pooling operations in CNNs. The Swin Transformer block groups in Stages 2 and 3, and Stage4 only extract features from the image data without changing the dimensions of the feature maps.
2.3 Transfer learning
Transfer learning consists of two main concepts: domain and task. The domain is the subject of learning, which includes the source domain and the target domain. The source domain represents a domain where training is completed, containing experienced knowledge and unlabeled data. In contrast, the target domain represents a domain where no prior training has occurred, and it lacks experienced knowledge and labeled data. In essence, transfer learning leverages the experienced knowledge from the source domain to enhance the learning capability and efficiency of the target domain, enabling it to better perform the intended tasks.
Due to the scarcity of road sub-surface defect data, most source domains nowadays consist of common image samples, such as animals and plants. Therefore, in this study, transfer learning methods are employed to train the detection model, addressing the challenge of limited data. Pre-trained network models are transferred to the sub-surface defect recognition model and through iterative learning and training, the number of training parameters is significantly reduced, resulting in smaller initial loss values and faster convergence rates. This approach improves training stability and ease of debugging. Ultimately, it produces a model with high accuracy, stability, and strong generalization ability for road sub-surface defect recognition.
3 Road sub-surface defect detection and recognition algorithm based on Swin-YOLOX and simulated sample generation based on gprMax
The Road sub-service defect detection and recognition algorithm based on Swin-T (referred to as Swin-YOLOX algorithm) introduces an attention mechanism by replacing the CSPDarknet backbone network of the original YOLOX model with Swin-T for extraction of road sub-surface defect features [32]. Then, the feature maps from Stages 2–4 blocks of Swin-T are passed through PAFPN for multi-scale feature fusion. Finally, the gprMax forward simulation method is used to expand the road sub-surface defect data set to increase the number of positive samples for defect detection.
3.1 YOLOX network model based on Swin-T
The hierarchical transformer module of the Swin-T model can extract features at different scales. The Swin-T model’s shifted-window strategy is beneficial for capturing global contextual information in images, and its multi-head attention mechanism learns task-specific relevant information in different representation subspaces. Based on these advantages, the CSPDarknet backbone network in the original YOLOX model is replaced by Swin-T to extract road sub-surface defect features that possess rich global contextual information and differential features. As shown in Fig.8, Swin-T is positioned as the backbone network in YOLOX. The input image first passes through a Patch partition layer, reducing the width and height of the image to 1/4 of the original size and increasing the depth of the image from 3 to 48. Then, the subsequent Stages 2–4 blocks output feature maps of 80 × 80, 40 × 40, and 20 × 20 scales, respectively, with feature channel numbers of 192, 384, and 768.
As the network deepens, the semantic information of the features changes from low-dimensional to high-dimensional. Each network layer causes a certain degree of feature information loss, so it is necessary to fuse features from different levels to complement the semantic information [33]. The PAFPN, based on the idea of bidirectional FPN for cross-scale feature fusion, uses the FPN + PANet structure to extract different scale features from the backbone network and combines bottom-up and top-down channels to enhance the representation capability of the backbone network. As shown in the middle part of Fig.8, the three effective feature layers obtained from Swin-T are used to construct the PAFPN network. These three effective feature layers are combined by convolution, upsampling, downsampling, and lateral connections to achieve feature fusion, enabling the three feature maps from different levels to share semantic information. The feature maps of 20 × 20 and 40 × 40 generated by the Stages 4 and 3 blocks of the Swin-T backbone network are convolved and upsampled to match the scale of the previous layer. Then, the feature maps of the same scale are laterally connected to achieve feature fusion, and, after convolution operations, feature maps P3 and P4 are obtained. These are then convolved and downsampled to generate feature maps of the same scale as P4 and P5, and then laterally connected to achieve feature fusion. After convolution, feature maps N3, N4 and N5 are obtained, with dimensions of 20 × 20 × 768, 40 × 40 × 384, and 80 × 80 × 192, respectively. Adaptive feature pooling selects different feature maps for different object predictions, avoiding the rigid matching of target size and network depth.
Using the channel information of the feature points in the feature maps N3, N4, and N5, the task is divided into classification and regression parts to analyze target coordinates, confidence, and identify target types, and finally integrate the results to complete the object prediction. The loss function consists of the localization prediction loss , the confidence prediction loss , and the classification prediction loss . is used to adjust the regression parameters of the predicted boxes corresponding to the feature points, while and use the binary cross-entropy loss. is used to adjust whether the feature points are contained within the predicted boxes, and is used to adjust the category of the feature points contained within the predicted boxes. The three YOLOHead modules calculate the feature point loss function based on Eq. (8).
where represents the balance coefficient for the localization loss, and represents the number of feature points classified as positive samples.
Finally, the predicted results of the input image include the coordinates of the predicted boxes (bx, by, bw, bh) and the confidence of the predicted boxes (pc). Here, (bx, by) represents the offset of the center of the predicted box relative to the origin of the coordinate system in the upper-left corner of the predicted image. bw and bh represent the ratios of the width and height of the predicted box to those of the whole image. The confidence pc consists of two aspects: the probability that the predicted box contains an object, denoted as , and the accuracy of the predicted box, which can be represented by the Intersection Over Union (IOU) between the predicted box and the ground truth, denoted as . When the box does not contain an object, is 0, and when it contains an object, is 1. Therefore, the confidence can be defined as .
3.2 Simulated sample generation based on sub-surface defect experiment
Although the YOLOX model based on Swin-T has shown outstanding performance in object detection, it heavily relies on the quantity of samples. In cases where there is a scarcity of road sub-surface defect samples, prediction results are poorer and fail to achieve the desired detection goals. Therefore, it is very desirable to obtain simulated training samples through forward simulations using gprMax to better train the deep learning model.
To validate the accuracy and feasibility of the simulated training samples, a series of ideal cavity experiments is conducted in this study. These experiments simulate the ideal conditions of GPR detection work and set up well-defined cavities to extract the sample data. The experiments use a horizontal model where the road surface layer is used as the boundary, and sand is filled inside the boundary. Real soil cavities and artificially embedded defects are used to create sub-surface cavity defects. Moreover, a dual-frequency GPR is used in the experiment, with two antennas providing more comprehensive and detailed detection results. The sand used in this experiment is compacted sand commonly used in roadbeds, and is an ideal material with good scattering characteristics and a low dielectric constant, making it suitable for radar detection.
The model and the on-site experimental layout are shown in Fig.9 and Fig.10, respectively. The gray structural wall in the figures simulates the road surface, and the internal space simulates the roadbed. The internal filling material of the shielding structure is sand, compacted manually by pounding. The surrounding materials mainly consist of the same materials as the road surface, using 30 cm of cement-stabilized crushed stone, 15 cm of cement concrete or asphalt concrete, to simulate actual cement and asphalt road surfaces. Additionally, to maintain the stability of the defective soil during the simulation, a small amount of water is added to the roadbed material inside the shielding structure. Two positions are identified, named as position 1 (1#) and position 2 (2#). The focus of this experiment is to study the detection effectiveness of sub-surface cavity defects in GPR. This experiment is divided into three parts: Part A: Two cavities are manually created at positions 1 and 2 without any further treatment, and then the GPR is tested several times. Part B: The plane size of the existing cavities are enlarged, and then the GPR is tested. Part C: A square acrylic box is placed inside the cavity at position 1 and filled with sand, while the cavity at position 2 remains unchanged. The experimental details are shown in Tab.1.
The collected GPR data are imported into Reflexw software (Beijing Dipper Technology Service Co., Ltd., China) for preprocessing, including static correction, filtering, and other processing steps. The data are then converted to B-scan images and exported. The exported images are shown in Fig.11.
To validate the accuracy of the simulated samples generated by gprMax, a numerical model of the same size, materials, and shape as the experimental site is created using Python code. Following the high-frequency antenna used in Part A, the radar antenna frequency is set to 600 MHz. The radar simulation function is run, to simulate ground-penetrating radar detection along the same survey line as Part A, obtaining a simulated B-scan. Compare the simulated B-scan with the real B-scan, as shown in Fig.12.
By comparing the greyscale images obtained from the simulation and those collected in the experiment, it can be observed that the simulated images closely match the waveform and location of the actual GPR images. This indicates that the simulated samples have a high level of accuracy and can be combined with real samples in the training set.
Generating simulated samples requires design of various parameters, including the model’s dimensions, spatial resolution, time window size, medium properties, and the size and positions of objects within the model. The proposed model range is 12 m 3.05 m with a spatial resolution of 0.005. The details of the medium properties, defect sizes, and positions are given in Tab.2.
Once the designed parameters are incorporated into the Python code for running gprMax, simulated images can be generated in batches. These simulated images not only increase the sample quantity but also preserve the characteristics of the defects, thus providing better training results for subsequent object detection networks. Examples of road sub-surface defect images after augmentation are shown in Fig.13. The number of defect samples has approximately doubled after augmentation. Images with indistinct defect features and repetitive samples are removed, resulting in a total of 880 final simulated images.
4 Road sub-surface defect detection experiment
The detection of road sub-surface defect based on GPR can be categorized into two main types: signal interpretation and image recognition. Considering the data format of GPR detection results, signal interpretation focuses primarily on one-dimensional data, producing A-Scans, while image recognition is mainly concerned with two-dimensional data, producing B-Scans. Deep learning intelligent recognition methods based on B-Scans for image recognition, especially object detection algorithms, have gained significant popularity [34].
4.1 Annotation and processing of the sample data set
The radar images of defects are marked using the labeling software Labelme, which is an open-source Python tool developed by Guido van Rossum at Google. Rectangular bounding boxes are used to mark the sub-surface defects in the B-Scan images, resulting in json format files containing label information corresponding to each bounding box. The label information includes the path of the marked image and the coordinates of the two points on the diagonal of the bounding box, indicating the position and size of the marked defect. Based on the 880 simulated samples collected in the previous section, all the sample images are labeled. An example of the marked images and the obtained labeling information are shown in Fig.14.
Based on the number of samples obtained (see the previous section) and the training requirements, samples are divided into three parts: training set, validation set, and test set. Two Swin-YOLOX models needed to be trained, one using only real samples as the training set, and the other using a mixture of real samples and simulated samples as the training set. In addition, for a more meaningful comparison of the two Swin-YOLOX models performances, both models used the same network parameters. The specific sample quantities for each data set are shown in Tab.3.
4.2 Experimental relevant configuration
In terms of the training environment, the Swin-YOLOX model is configured on a deep learning server with Ubuntu 20.04.5 operating system. Python 3.9.12 and PyTorch 1.13.7 are used, and GPU acceleration is performed using CUDA 11.7. The YOLOX network and the Swin-T network in the Swin-YOLOX algorithm model are respectively transferred with the YOLOX part network weights and the Swin-T network weights from ImageNet. The two Swin-YOLOX models use the same network parameter settings, as shown in Tab.4.
4.3 Model evaluation index
The evaluation indexes selected in this study to comprehensively assess the training effectiveness and detection performance of the model mainly include precision (P), recall (R), and mean average precision (). Precision (P) and recall (R) are fundamental performance evaluation indexes in object detection models. Specifically, P represents the ratio of correctly detected defect images to all detected defect images, while R represents the ratio of correctly detected defects to all defects. The expressions for these indexes are as follows:
where TP is the number of true positives, which refers to the number of images correctly predicted as defects by the model; FP is the number of false positives, which refers to the number of non-defect images incorrectly predicted as defects; and FN is the number of false negatives, which refers to the number of defect images incorrectly predicted as non-defects.
is a comprehensive evaluation index that combines precision and recall. It is essentially the average precision () values for all classes. The value is calculated as the area under the precision-recall (P-R) curve and can be expressed as follows:
where N is the number of classes. The value ranges from 0 to 1, with higher values indicating greater accuracy of detection.
4.4 Experimental results and analysis
Fig.15 and Fig.16 show the Regression loss curves and Classification loss curves of the first set of Swin-YOLOX models. Fig.17 and Fig.18 show the Regression loss curves and Classification loss curves of the second set of Swin-YOLOX models. The train_loss_bbox represents the regression loss of the training set and the val_loss_bbox represents the regression loss of the validation set, and these losses are used to evaluate the bounding box localization. The train_loss_cls represents the classification loss of the training set and the val_loss_cls represents the classification loss of the validation set, and these losses are used to evaluate the classification accuracy. The loss value represents the error between the predicted values and the truth values. A lower loss indicates better training performance of the model.
It can be observed that both models converge relatively quickly and the curves flatten out. However, compared to the Swin-YOLOX model trained only with real samples, the Swin-YOLOX model trained with a mixture of real and simulated samples showed a faster decrease in loss, eventually reaching almost zero. This indicates that the latter model has more stable network performance and better detection results.
The P−R curves provide a comprehensive and intuitive representation of the model’s detection performance in terms of recall and average precision. Fig.19 and Fig.20 show the P−R curves of the two training models, with recall on the x-axis and precision on the y-axis.
By comparing the two plots, it can be seen that the mixed-sample model significantly outperforms the real-sample model in terms of . Tab.5 shows the recall and for each defect category in the two training models. The and recall of the real-sample model are 82.07% and 81.79%, respectively, while those of the mixed-sample model are 97.71% and 94.74%, showing an increase of 15.64% and 12.95% compared to the real-sample model. Overall, the mixed-sample training model shows higher image recall and , indicating more superior classification and recognition performance. Therefore, it can be concluded that the mixed-sample model has stronger feature extraction capabilities and better overall performance compared to the real-sample model.
To further evaluate the detection performance of the proposed algorithm for road sub-surface defects, the algorithms selected for comparison include not only the one-stage object detection algorithms YOLOv3, YOLOv5 and the original YOLOX algorithm, but also the two-stage classical algorithm Faster R-CNN. Each of these algorithms was trained on the mixed-sample training data set (as shown in Tab.3). The trained models were then tested on the same test data sets of mixed-sample as shown in Tab.3. To objectively evaluate the performance of these different object detection algorithms in detecting road sub-surface defects, their performance was evaluated using the evaluation indexes consistent with Subsection 4.3 and the frames per second (FPS) index.
Tab.6 shows the performance results of the five object detection algorithms for detecting road sub-surface defects. YOLOv3 (Darknet-53) represents the YOLOv3 algorithm based on the Darknet-53 backbone network, while YOLOX (CSPDarknet) represents the YOLOX algorithm based on the CSPDarknet backbone network.
From Tab.6, it can be observed that the overall detection accuracy of the one-stage detection algorithms, YOLOv3, YOLOv5, and the YOLOX algorithm, is not as high as that of the two-stage detection algorithm, Faster R-CNN. However, their detection speed is significantly faster, with an increase in speed of more than half (as shown in Tab.1’s FPS data). The YOLOX series of algorithms, due to the integration of multiple technologies, outperforms other algorithms in terms of detection speed, and their detection accuracy is better than that of the YOLOv3 (Darknet-53) algorithm but slightly lower than the Faster R-CNN algorithm. In this study, the Swin-YOLOX model outperforms the YOLOX (CSPDarknet) model in terms of detection accuracy, achieving an mAP0.5 of 97.71%. It lags slightly in detection speed but offers the best overall performance, which is generally sufficient to meet real-time detection requirements.
The experimental results show that CSPDarknet can only model local information and lacks global spatial context information in the extracted features. In contrast, the Swin-YOLOX model in this study uses Transformer modules to obtain richer global context and discriminate feature information, which significantly improves the overall detection performance of YOLOX for road sub-surface targets.
4.5 Detection result images
Fig.21 illustrates some detection results of mixed-sample trained models. It is evident that the model can accurately identify and locate the defects, with the predicted labels almost overlapping with the ground truth labels. This demonstrates that the model is capable of meeting the requirements of engineering detection applications.
5 Discussion and conclusions
5.1 Discussion
This study focuses research on the characteristics, intelligent classification and localization of road sub-surface defects based on GPR and deep learning theory. However, there are still several areas that require further exploration.
1) Multiple defect feature patterns. In this study, two types of single defects are simulated by forward simulation and field experiments. However, in real road conditions, different defects often coexist. Therefore, a combination of multiple defects, such as a combination of cavities and water-rich defects, is worth further investigation. Summarizing the image features and patterns of these defects can provide references for defect detection work.
2) Index of defect characteristics. This study focuses on the intelligent detection method using radar images. Although radar reflection signals are unstable, they contain rich original information. In future research, filtering, wavelet transform and other methods can be employed to remove noise and other factors. In addition, data fusion techniques can be applied to integrate image data with radar reflection wave data, thereby enhancing defect feature indicators. This will enable better utilization of the inherent features in signals and improve the effectiveness of intelligent detection.
3) Deep learning models. This study mainly adopts the Swin-YOLOX model but lacks comparisons with other models. Future research needs to increase the number of defect image samples and model types to improve training and detection performance.
5.2 Conclusions
In this study, simulated samples generated by gprMax and samples collected from the field (‘real’ samples) are used to train the Swin-YOLOX model, which facilitates the detection and recognition of road sub-surface defects. The process of generating simulated samples, the detection principle of Swin-YOLOX, and the process of detecting road sub-surface defects based on Swin-YOLOX are introduced. Based on the training and detection results, the following conclusions can be drawn.
1) By inputting the parameters obtained from actual GPR detections into gprMax for forward simulation, the generated B-Scan images closely match the characteristics, size, and location of defects as observed in the B-Scan images captured by GPR. This validates the feasibility of using the simulated samples generated by gprMax software to augment the training data set.
2) A comparison between models trained on real samples alone and models trained on a mixture of real and simulated samples showed that the latter achieves higher accuracy, reaching 97.71%, which is 15.64% higher than that achieved by the model trained without simulated samples. This indicates that the model trained on a mixture of real and simulated samples exhibits higher accuracy in defect prediction and can be effectively applied in the detection and recognition of road sub-surface defects.
3) Using the GPR and the Swin-YOLOX deep learning algorithm, this study realizes intelligent recognition and localization of road defects using a large-scale generated data set. Finally, the entire process is automated by writing code, establishing a preliminary framework for intelligent safety evaluation of road sub-surface defects.
Wang Z F, Wang J, Chen K F, Li Z P, Xu J, Li Y, Sui Q M. Unsupervised learning method for rebar signal suppression and defect signal reconstruction and detection in ground penetrating radar images. Measurement, 2023, 211: 112652
[2]
Wai-Lok Lai W, Dérobert X, Annan P. A review of ground penetrating radar application in civil engineering: A 30-year journey from Locating and Testing to Imaging and Diagnosis. NDT & E International, 2018, 96: 58–78
[3]
Guo H, Zhuang X, Rabczuk T. A deep collocation method for the bending analysis of Kirchhoff plate. Computers, Materials & Continua, 2019, 59(2): 433–456
[4]
Dong Z H, Ye S B, Gao Y Z, Fang G Y, Zhang X J, Xue Z J, Zhang T. Rapid detection methods for asphalt pavement thicknesses and defects by a vehicle-mounted ground penetrating radar (GPR) system. Sensors, 2016, 16(12): 2067
[5]
Tong Z, Gao J, Zhang H T. Innovative method for recognizing subgrade defects based on a convolutional neural network. Construction & Building Materials, 2018, 169: 69–82
[6]
Samaniego E, Anitescu C, Goswami S, Nguyen-Thanh V M, Guo H, Hamdia K, Zhuang X, Rabczuk T. An energy approach to the solution of partial differential equations in computational mechanics via machine learning: Concepts, implementation and applications. Computer Methods in Applied Mechanics and Engineering, 2020, 362: 112790
[7]
Zhuang X, Guo H, Alajlan N, Zhu H, Rabczuk T. Deep autoencoder based energy method for the bending, vibration, and buckling analysis of Kirchhoff plates with transfer learning. European Journal of Mechanics. A, Solids, 2021, 87: 104225
[8]
Guo H, Zhuang X, Chen P, Alajlan N, Rabczuk T. Stochastic deep collocation method based on neural architecture search and transfer learning for heterogeneous porous media. Engineering with Computers, 2022, 38(6): 5173–5198
[9]
Yue Y P, Liu H, Meng X, Li Y G, Du Y L. Generation of high-precision ground penetrating radar images using improved least square generative adversarial networks. Remote Sensing, 2021, 13(22): 4590
[10]
Ye W J, Liu C J, Chen Y H, Liu Y J, Liu C M, Zhou H H. Multi-style transfer and fusion of image’s regions based on attention mechanism and instance segmentation. Signal Processing Image Communication, 2023, 110: 116871
[11]
Guo H, Zhuang X, Chen P, Alajlan N, Rabczuk T. Analysis of three-dimensional potential problems in non-homogeneous media with physics-informed deep collocation method using material transfer learning and sensitivity analysis. Engineering with Computers, 2022, 38(6): 5423–5444
[12]
Guo H, Zhuang X, Fu X, Zhu Y, Rabczuk T. Physics-informed deep learning for three-dimensional transient heat transfer analysis of functionally graded materials. Computational Mechanics, 2023, 72(3): 513–524
[13]
Guo H, Zhuang X, Alajlan N, Rabczuk T. Physics-informed deep learning for melting heat transfer analysis with model-based transfer learning. Computers & Mathematics with Applications (Oxford, England), 2023, 143: 303–317
[14]
Gao R X, Zhu H Q, Liao Q, Qu B L, Hu L T, Wang H R. Detection of coal fire by deep learning using ground penetrating radar. Measurement, 2022, 201: 111585
[15]
Warren C, Giannopoulos A, Giannakis I. gprMax: Open source software to simulate electromagnetic wave propagation for ground penetrating radar. Computer Physics Communications, 2016, 209: 163–170
[16]
Berenger J. Perfectly matched layer for the FDTD solution of wave-structure interaction problems. IEEE Transactions on Antennas and Propagation, 1996, 44(1): 110–117
[17]
Zhang J, Yang X, Li W G, Zhang S B, Jia Y Y. Automatic detection of moisture damages in asphalt pavements from GPR data with deep CNN and IRS method. Automation in Construction, 2020, 113: 103119
[18]
Fauchard C, Dérobert X, Cariou J, Côte P. GPR performances for thickness calibration on road test sites. NDT & E International, 2003, 36(2): 67–75
[19]
Khudoyarov S, Kim N, Lee J J. Three-dimensional convolutional neural network-based underground object classification using three-dimensional ground penetrating radar data. Structural Health Monitoring, 2020, 19(6): 1884–1893
[20]
Rabczuk T, Ren H, Zhuang X. A nonlocal operator method for partial differential equations with application to electromagnetic waveguide problem. Computers, Materials & Continua, 2019, 59(1): 31–55
[21]
Kane Y. Numerical solution of initial boundary value problems involving maxwell’s equations in isotropic media. IEEE Transactions on Antennas and Propagation, 1966, 14(3): 302–307
[22]
KunzK SLuebbersR J. The Finite Difference Time Domain Method for Electromagnetics. Leiden: CRC Press, 1993
[23]
Ren H, Zhuang X, Rabczuk T. A higher order nonlocal operator method for solving partial differential equations. Computer Methods in Applied Mechanics and Engineering, 2020, 367: 113132
[24]
DosovitskiyABeyerLKolesnikovAWeissenbornDZhaiXUnterthinerTDehghaniMMindererMHeigoldGGellySUszkoreitJHoulsbyN. An image is worth 16 × 16 words: Transformers for image recognition at scale. 2020, arXiv:2010.11929
[25]
GeZLiuS TWangFLiZ MSunJ. YOLOX: Exceeding YOLO series in 2021. 2021, arXiv:2107.08430
[26]
Ji W, Liu Q J, Huang C W, Yang R, Huang H L, Xu G H. YOLOX traffic sign detection based on Swin-Transformer. Radio Communications Technology, 2023, 49(3): 547–555
[27]
Wu G X, Li Y C. Non-maximum suppression for object detection based on the chaotic whale optimization algorithm. Journal of Visual Communication and Image Representation, 2021, 74: 102985
[28]
Zheng C W, Lin H. YOLOv5 helmet wearing detection method based on Swin Transformer. Computer Measurement and Control, 2023, 31(3): 15–21
[29]
Iqbal A, Sharif M. BTS-ST: Swin transformer network for segmentation and classification of multimodality breast cancer images. Knowledge-Based Systems, 2023, 267: 110393
[30]
Üzen H, Türkoğlu M, Yanikoglu B, Hanbay D. Swin-MFINet: Swin transformer based multi-feature integration network for detection of pixel-level surface defects. Expert Systems with Applications, 2022, 209: 118269
[31]
Yang H N, Yang D P. CSwin-PNet: A CNN-Swin Transformer combined pyramid network for breast lesion segmentation in ultrasound images. Expert Systems with Applications, 2023, 213: 119024
[32]
Jiang S, Kong R N, Li P C, lu C W, Zhang S, Li M. Intelligent detection algorithm of obstacles in front of open-pit mine cars based on Swin Transformer and CNN. Metal Mine, 2023, 5: 228–236
[33]
Lu S L, Liu X Y, He Z X, Zhang X, Liu W B, Karkee M. Swin-Transformer-YOLOv5 for real-time wine grape bunch detection. Remote Sensing, 2022, 14(22): 5853
[34]
Ishitsuka K, Iso S, Onishi K, Matsuoka T. Object detection in ground-penetrating radar images using a deep convolutional neural network and image set preparation by migration. International Journal of Geophysics, 2018, 2018: 1–8
RIGHTS & PERMISSIONS
Higher Education Press
AI Summary 中Eng×
Note: Please be aware that the following content is generated by artificial intelligence. This website is not responsible for any consequences arising from the use of this content.