Area-based non-maximum suppression algorithm for multi-object fault detection

Jieyin BAI; Jie ZHU; Rui ZHAO; Fengqiang GU; Jiao WANG

doi:10.1007/s12200-020-0967-5

Frontiers of Optoelectronics >

2020 , Vol. 13 >Issue 4: 425 - 432

DOI: https://doi.org/10.1007/s12200-020-0967-5

RESEARCH ARTICLE

Area-based non-maximum suppression algorithm for multi-object fault detection

Jieyin BAI ^,¹ ,
Jie ZHU ² ,
Rui ZHAO ¹ ,
Fengqiang GU ³ ,
Jiao WANG ³

Expand

¹. Nanrui Group Co., Ltd., Beijing 100192, China
². State Grid Beijing Electric Power Company, Beijing 100031, China
³. Beijing Kedong Electric Power Control System Co., Ltd., Beijing 100192, China

Received date: 24 Sep 2019

Accepted date: 15 Nov 2019

Published date: 15 Dec 2020

Copyright

2020 Higher Education Press

Fold

Abstract

Unmanned aerial vehicle (UAV) photography has become the main power system inspection method; however, automated fault detection remains a major challenge. Conventional algorithms encounter difficulty in processing all the detected objects in the power transmission lines simultaneously. The object detection method involving deep learning provides a new method for fault detection. However, the traditional non-maximum suppression (NMS) algorithm fails to delete redundant annotations when dealing with objects having two labels such as insulators and dampers. In this study, we propose an area-based non-maximum suppression (A-NMS) algorithm to solve the problem of one object having multiple labels. The A-NMS algorithm is used in the fusion stage of cropping detection to detect small objects. Experiments prove that A-NMS and cropping detection achieve a mean average precision and recall of 88.58% and 91.23%, respectively, in case of the aerial image datasets and realize multi-object fault detection in aerial images.

Key words： fault detection; area-based non-maximum suppression (A-NMS); cropping detection

Cite this article

Jieyin BAI , Jie ZHU , Rui ZHAO , Fengqiang GU , Jiao WANG . Area-based non-maximum suppression algorithm for multi-object fault detection[J]. Frontiers of Optoelectronics, 2020 , 13(4) : 425 -432 . DOI: 10.1007/s12200-020-0967-5

Introduction

The conventional inspection of power transmission lines by patrol personnel has been gradually replaced by unmanned aerial vehicles (UAVs). However, viewing the numerous images provided by UAVs individually is a time-consuming and complex task. Therefore, examining how to use computer technology to perform automatic recognition has become a popular topic.

During the early stage of the machine learning, traditional image recognition and machine learning were often used to locate and detect faults. Sun constructed a slope model based on the appearance model of the insulators [¹]. Zhang extracted the H vector from the hue, saturation, and value color space to perform contour matching [²]. After deep learning was proposed by Hinton and Salakhutdinov in 2006 [³], convolutional neural network (CNN) [⁴–⁷] and object detection [⁸–¹⁴] algorithms have become increasingly powerful and effective. Using the images captured by UAVs, Wang et al. applied the multi-object detection algorithm to electrical components, achieving an accuracy of 92.7% [¹⁵].

In this study, we detect several types of common faults in power transmission lines using an object detection algorithm. However, there are two problems associated with this algorithm that must be solved. The first problem is that a single object has multiple labels, and the second problem is that the detection capability of small objects is low. To solve the first problem, the traditional non-maximum suppression (NMS) algorithm is used to handle universal objects [¹⁰], the polygonal non-maximum suppression algorithm is used to perform curve text detection [¹⁶], and the mask non-maximum suppression algorithm is used to perform oriented scene text detection based on the segmentation method [¹⁷] and other methods. To improve the ability to detect small objects, the length and width of which are less than 5% of the original scale, feature pyramid networks predict objects by fusing different feature layers [⁹]. In addition, single shot detector (SSD) generates anchors on multiple feature maps [¹³], whereas Cascade regional CNN (R-CNN) provides a multi-regression architecture to train high-quality detectors [¹⁴]. The structure of the detection network is presented in Fig. 1.

Fig.1 Structure of the detection network, where the detector is followed by a classifier

Full size|PPT slide

To solve the two aforementioned problems, we use faster R-CNN as a benchmark and propose an area-based non-maximum suppression (A-NMS) algorithm to delete redundant labels for a single object and improve the ability to detect small objects during the box fusion stage of cropping detection. We also discuss the selection of different parameters and compare the performance of the combinational algorithms. The experimental results demonstrate the feasibility of using object detection algorithms to detect the faults in transmission lines and achieve accurate localization and discrimination of the multiple common faults displayed in photos captured using UAVs.

Basic problem

Our study involves the detection of string-off insulators and shedding dampers. This requires our system to perform fault localization and recognition in one step. Therefore, insulators are classified as intact insulators, labeled “good”, and string-off insulators, labeled “bad”, as illustrated in Figs. 2(a) and 2(b), respectively. Dampers are classified as intact dampers, labeled “double”, and shedding dampers, labeled “single”, as illustrated in Figs. 2(c) and 2(d), respectively. These are the four types of objects that have to be detected.

Fig.2 Four objects to be detected. (a) Intact insulator labeled “good”; (b) string-off insulator labeled “bad”; (c) intact damper labeled “double”; (d) shedding damper labeled “single”

Full size|PPT slide

Faster R-CNN is a state-of-the-art two-stage object detection algorithm. First, the input image is processed using a simple CNN to obtain a feature map. Then, this feature map passes through two branches. One branch is the region proposal network (RPN) [¹³] used to generate default boxes and perform preliminary regression of the bounding boxes, whereas the other branch performs region of interest pooling with respect to the feature map and bounding boxes. Finally, completely connected layers are introduced to perform classification and precise regression. In the faster R-CNN algorithm, RPN is the core module. It initially generates many default boxes and subsequently deletes the boxes that are out of bounds. Thereafter, NMS is utilized to remove the overlapping boxes and select the first N bounding boxes for the next network.

Image processing methods include histogram equalization [¹⁸] and filtering image [¹⁹]. The traditional NMS algorithm requires the coordinates and the scores of the detected boxes belonging to a certain type of object. All the detected boxes are ranked according to their scores, and the box with the highest score in the current set is extracted after each iteration. Then, the intersection-over-union (IoU) is calculated with respect to the extracted box and each remaining box. If the IoU is larger than the overlap threshold (usually set to 0.5), the two boxes are considered to be the same object. The box with the lower score is then deleted. In the next iteration, this procedure is repeated until all the boxes are processed.

For one object, only one label is expected as the output. Figure 3 presents the detection results of a string-off insulator after the traditional NMS process. Three boxes have been labeled in case of this insulator; however, only the blue box is the expected label, and the upper and lower parts have been unexpectedly labeled as intact insulators (displayed as red boxes). However, traditional NMS can only process the detected boxes belonging to the same class and cannot remove redundant labels belonging to different classes. To solve this problem, we propose the A-NMS algorithm.

Fig.3 Detection results of a string-off insulator. The blue box is the expected “good” label, whereas the red boxes are unexpected “bad” labels

Full size|PPT slide

Area-based non-maximum suppression algorithm

The A-NMS algorithm considers intact and string-off insulators as belonging to the same class. Similarly, it considers intact and shedding dampers as belonging to the same class. The class set of detected boxes must be obtained before applying the A-NMS algorithm. In faster R-CNN, NMS is used twice, i.e., once during the RPN phase and once during the fast R-CNN phase. In the RPN phase, the only information that can be obtained is the probability that the detected boxes belong to the foreground; there is no specific classification. Therefore, the A-NMS algorithm replaces the NMS algorithm in the fast R-CNN phase. Based on the class set C, the A-NMS algorithm extracts the detected box set B and box score set S belonging to the insulators and then calculates the area of all the boxes and selects the box with the maximum area. Suppose the IoU of the upper red box and blue box in Fig. 3 is less than the threshold of 0.5, these two boxes would not be regarded as the same object, which is unexpected. Therefore, we propose the intersection-over-smaller (IoS) estimation rule provided in Eq. (1), which denotes the percentage that the smaller box is covered by the largest box. The IoS of the upper red box and the blue box in Fig. 3 is approximately 1.0. Therefore, the two boxes are regarded as the same object.

(1)

I o S (b i, b j) = int e r (b i, b j) min (a r e a (b i), a r e a (b j)),

where

I o S (b i, b j)

is the IoS value of the boxes

b i

and

b j

int e r (b i, b j)

is the intersection area of the boxes

b i

and

b j

min (a r e a (b i), a r e a (b j))

means the minimum area between the areas (

b i

) and (

b j

Similar to the traditional NMS algorithm, the A-NMS algorithm has a hyperparameter that should be set, i.e., the overlap threshold T. If the IoS is above the overlap threshold T, the two boxes are regarded as the same object. Then, if the absolute score difference is smaller than a certain value (0.1), the box with the smaller area is deleted; otherwise, the box with the lower score is deleted. The same condition is applicable to dampers. The selection of the overlap threshold T is discussed in Section 5.1. The A-NMS algorithm is the basis of the box fusion algorithm discussed in Section 4.

A-NMS algorithm for box fusion during cropping detection

CNN serves as the image feature extractor in an image detection network. From the bottom layer to the top layer, the size of the feature maps becomes increasingly smaller. In most conventional CNNs, such as ResNet [¹⁰] and DenseNet [²⁰], the final feature maps are scaled at least 32 times before the pooling layer. This signifies that an object with an area of 32 × 32 pixels in the original image becomes a pixel in the final feature map. Therefore, such small objects are difficult to detect. If the scaling factor is considerably small, the network may not extract the semantic features in a higher layer, which may hinder the improvement of classifiers.

In this study, a cropping detection method is proposed for the efficient detection of small objects. The larger the size of the input image in the object detection algorithm, the more effective will be the detection of small objects. However, the time required for model inference increases. To accelerate model inference, the short edges of all the input images are set to 600 pixels. The parallel processing frame can process several images simultaneously; thus, no additional time is required for cropping detection.

Fig.4 Diagram of image cropping. (a) Original image, where the 1/4 and 3/4 points on the x-axis and y-axis are the cropping points; (b) top left subpicture cropped by the yellow lines from the original image; (c) top right subpicture cropped by the purple lines from the original image; (d) bottom left subpicture cropped by the green lines from the original image; (e) bottom right subpicture cropped by the orange lines from the original image

Full size|PPT slide

As illustrated in Fig. 4, the points located at 1/4 and 3/4 of the x-axis and y-axis are the cropping points. The original input image is cropped into four subpictures: the top left subpicture, the top right subpicture, the bottom left subpicture, and the bottom right subpicture. Because the width of the input image is set to 600, the cropped pictures must be zoomed to their original size. After this process, the area of small objects in the processed image becomes approximately twice that in the original image. In addition, this process improves the ability of the image detection network to detect small objects.

In the fusion phase, the coordinates in the subpictures must be initially transformed to those in the original images. Specifically, the coordinates must be multiplied by 3/4 to zoom out and must be given an offset of 1/4 based on different locations. Equation (2) provides a matrix equation to achieve coordinate translation of the top right subpicture. The offset is the displacement along the x-axis in case of the subpictures. Other subpictures can be processed using a similar method.

(2)

[X o l d Y o l d] = [0.75 0 0 0.75] [X c u r Y c u r] + [0.25 0],

where

X o l d

and

Y o l d

mean the sizes of the original image,

X c u r

and

Y c u r

mean the sizes of the cut image.

When the transformation results of the four subpictures are marked on the original image, one object contains multiple detected boxes, which is suitable for the fusion of the overlapping detected boxes using the A-NMS algorithm. However, the A-NMS algorithm cannot be used directly in the box fusion algorithm because it will delete the detected boxes with small areas. Because objects exist in more than one cropped image, the obtained detection boxes may be incomplete. Therefore, before deleting the boxes with small areas, they must be fused into the largest box. This signifies that the final output box is the outermost contour of the two detection boxes, as shown in Fig. 5.

Fig.5 Box fusion algorithm. (a) Two detection boxes; (b) fused detection box. The background green box is the output box obtained by fusing the two detection boxes in (a)

Full size|PPT slide

In the box fusion algorithm, a hyperparameter pertaining to the overlap threshold T must be set. The selection of this threshold is discussed in the experiment section.

Experimental results

In this study, we used faster R-CNN combined with the ResNet101 network as the benchmark model. The data in this study were obtained from the images taken during the daily patrol inspection of an electric company and comprised approximately 8000 pictures. The data covered a variety of geographical environments, weather conditions, shooting angles, and shooting distances that may exist in the normal range. The pixel size varied from 400 × 300 to 2000 × 1500. All the short edges in the images were zoomed to 600 pixels during training. In practical application scenarios, the pixels of the patrol images should not be less than 300 × 300 and their aspect ratio should be approximately 4:3 or 16:9. The dataset was divided into a training set, validation set, and test set in a 7:2:1 ratio. The network was initialized using network weights pre-trained by the MS COCO [²¹] and VOC2007 [²²] datasets. Random horizontal flip, random clipping, random noise, and other data augmentation methods were used. The loss function followed that of faster R-CNN, and the optimization method used was stochastic gradient descent. The initial learning rate was 0.001, and the number of iterations was 100.

Selection of the A-NMS algorithm and box fusion overlap threshold

Selecting the overlap threshold T is important in the traditional NMS algorithm, the proposed A-NMS algorithm, and the box fusion algorithm. In this subsection, we initially discuss the influence of T on the performance of different schemes.

We use the conventional standard, i.e., the mean average precision (mAP), to measure the algorithm’s performance. The experimental results are presented in Fig. 6. The overlap threshold T is 0.3–0.9 at intervals of 0.1. The test results of the benchmark model are displayed as a black bar, whereas the bars of other colors represent the mAP for different T values obtained using different algorithms. The performance of the traditional NMS algorithm is optimal when T is 0.5 or 0.6, whereas the performances of the A-NMS and box fusion algorithms are optimal when T is 0.7–0.9. The reason for this result is that the traditional NMS algorithm estimates the IoU, whereas the A-NMS and box fusion algorithms estimate the IoS. According to the algorithm, various boxes are regarded as the same object only when majority of a smaller box is covered by a larger box.

Fig.6 Sensitivity of different methods to the threshold T in NMS, A-NMS, and cropping detection. Threshold T is 0.3–0.9 with intervals of 0.1. Black bars indicate the traditional NMS algorithm, orange bars indicate the A-NMS algorithm, and green bars indicate the cropping detection method

Full size|PPT slide

In the following comparison experiments, the NMS overlap threshold T is set to the optimal value to avoid multiple variables. Thus, T is 0.5 for the traditional NMS algorithm, whereas it is 0.7 for the A-NMS and box fusion algorithms.

Fig.7 Detection results of the NMS and A-NMS algorithms. The green box represents a string-off insulator labeled “bad”, whereas the blue box represents an intact insulator labeled “good”. (a) Detection results of the traditional NMS algorithm. The “bad” box is correct, whereas the “good” box is an additional box; (b) detection results of the A-NMS algorithm. The “bad” box is correct

Full size|PPT slide

Figures 7(a) and 7(b) present the detection results obtained using the NMS and A-NMS algorithms, respectively. The A-NMS algorithm deletes the additional box and solves the problem that cannot be handled by the traditional NMS algorithm.

Cropping detection test

Figure 8 presents the experimental results obtained when the cropping detection method was used and not used. In the experiment, we evaluated the effect of cropping detection using four objects. Figure 8 reveals that the cropping detection method performed better in case of small dampers. After cropping and magnification, the dampers were more likely to be correctly detected. However, in case of insulators, the detection of intact insulators improved by 3%, whereas the detection of string-off insulators worsened by 1%. This is because insulators may occupy a large part of the images, which can cause errors during box fusion. Therefore, our algorithm requires further improvement.

Fig.8 Impact of cropping detection on four types of objects: “good”, “bad”, “double”, and “single”. Red bars indicate the results of the benchmark method, whereas blue bars indicate the results of the cropping detection method. Here, AP refers to average precision

Full size|PPT slide

Figure 9 presents the results of the benchmark model and cropping detection. Cropping detection magnifies small objects, increasing their identification probability. The detector identified the objects located in the upper part of the picture, which were not detected previously.

Fig.9 Results of the benchmark and cropping detection methods. Detection results of (a) benchmark algorithm and (b) cropping detection method

Full size|PPT slide

Comprehensive test

In this subsection, we verify the performances of different methods and evaluate various indicators, including the detection speed.

Tab.1 mAP and recall for different methods. “√” indicates that the corresponding algorithm is used

benchmark	A-NMS	cropping detection	mAP	recall
faster R-CNN + ResNet101			0.8142	0.8421
	√		0.8594	0.8875
	√	√	0.8858	0.9123

Tab.2 Detection time of different methods in different GPU environments

detection scheme	number of GPU	time/ms
benchmark	1	210
benchmark+ A-NMS	1	212
A-NMS+ cropping detection	1	850
A-NMS+ cropping detection	4	220

Table 1 lists the mAP and recall values in case of different methods, which are obtained based on the benchmark model of faster R-CNN with ResNet101. Compared with NMS algorithm, the experimental results reveal that the mAP and recall values obtained using the A-NMS algorithm increased by 4.52% and 3.54%, respectively. This indicates that the A-NMS algorithm can decrease the probability of error during the detection of insulators and dampers.

The aim of cropping detection is to enhance the ability to detect small objects. Compared with the benchmark model, the cropping detection method achieved a 7.16% increase in mAP and a 7.02% increase in the recall value. These results demonstrate the effectiveness of the proposed A-NMS algorithm.

Different detection methods require different amounts of time in different GPU environments. Table 2 lists the detection times of different detection schemes. All the GPUs are NVIDIA GeForce GTX 1070. The benchmark model with the A-NMS requires only one GPU. The experimental results indicate that the A-NMS algorithm required only approximately 2 ms. Using only one GPU, the cropping detection method requires approximately 850 ms, which is very slow. Therefore, four GPUs are required to accelerate detection because four subpictures have to be detected in parallel. The time required by the parallel architecture with four GPUs is 220 ms (approximately 4.5 frames per second (FPS)), which is only 10 ms more than that required by the benchmark model.

Conclusions

By focusing on detecting faults in the electrical components of transmission lines, we propose an A-NMS algorithm to solve the problems of a single object having multiple labels and the difficulty of detecting small objects. We conduct a detailed comparison and analysis for different schemes. The experimental results indicate that the proposed A-NMS algorithm not only correctly removes additional and incorrect labels but also increases the detectors’ ability to sense small objects. The proposed method achieves a mAP value of 88.58%, a recall value of 91.23, and a detection speed of 4.5 FPS. However, there are cases in which the background is mislabeled as an object. Therefore, further research is required on how to remove erroneous objects and develop a more robust detector.

Acknowledgements

This paper was supported by the National Grid Corporation Headquarters Science and Technology Project: Key Technology Research, Equipment Development and Engineering Demonstration of Artificial Smart Drived Electric Vehicle Smart Travel Service (No. 52020118000G).

References

Publishing order | Descend order by publishing year | Descend order by cited within

1	Sun J. Research on Diagnosis of Insulator Crack Based on Edge Detection. Beijing: North China Electric Power University, 2008 (in Chinses)

2	Zhang F Y. Identification and Research of Abnormal Patrol Diagram of Transmission Line Based on Computer Vision. Changchun: Jilin University, 2015 (in Chinese)

3	Hinton G E, Salakhutdinov R R. Reducing the dimensionality of data with neural networks. Science, 2006, 313(5786): 504–507 DOI PMID

4	Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. In: Proceedings of Conference on Neural Information Processing Systems, 2012, 1106–1114

5	Simonyan K, Zisserman A. Very deep convolutional network for large-scale image recognition. In: Proceedings of IEEE International Conference of Learning Representation, 2015, 1–5

6	Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z. Rethinking the inception architecture for computer vision. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2016, 2818–2826

7	He K, Zhang X Y, Ren S Q, Sun J. Deep residual learning for image recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2016, 770–778

8	Lee K P, Wu B H, Peng S L. Deep-learning-based fault detection and diagnosis of air-handling units. Building and Environment, 2019, 157: 24–33 DOI

9	Lin T Y, Dollar P, Girshick R, He K M. Feature pyramid networks for object detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2017, 2117–2125

10	Ren S, He K, Girshick R, Sun J. Faster R-CNN: towards real-time object detection with region proposal networks. In: Proceedings of Conference on Neural Information Processing Systems, 2015, 91–99

11	Yan W, Yu L. On accurate and reliable anomaly detection for gas turbine combustors: a deep learning approach. arXiv:1908.09238, 2019

12	Luo B, Wang H, Liu H, Li B, Peng F. Early fault detection of machine tools based on deep learning and dynamic identification. IEEE Transactions on Industrial Electronics, 2019, 66(1): 509–518 DOI

13	Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C Y, Berg A C. SSD: single shot multibox detector. In: Proceedings of European Conference on Computer Vision, 2016, 21–37

14	Cai Z, Vasconcelos N. Cascade R-CNN: delving into high quality object detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2018, 6154–6162

15	Wang W G, Tian B, Liu Y, Liu L, Li J X. Research on power component identification of UAV inspection image based on RCNN. Journal of Earth Information Science, 2017, 2(19): 256–263

16	Liu Y, Jin L, Zhang S, Sheng Z. Detecting curve text in the wild: new dataset and new solution. arXiv: 1712.02170, 2017

17	Dai Y, Huang Z, Gao Y, Chen K. Fused text segmentation networks for multi-oriented scene text detection. In: Proceedings of the 24th International Conference on Pattern Recognition, 2018, 3604–3609

18	Abdurashitov A, Lychagov V V, Sindeeva O A, Semyachkina-Glushkovskaya O V, Tuchin V V. Histogram analysis of laser speckle contrast image for cerebral blood flow monitoring. Frontiers of Optoelectronics, 2015, 8(2): 187–194 DOI

19	Sudhakar M, Reddy V, Rao Y. Influence of optical filtering on transmission capacity in single mode fiber communications. Frontiers of Optoelectronics, 2015, 8(4): 424–430 DOI

20	Huang G, Liu Z, Maaten L. Densely connected convolutional networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2017, 4700–4708

21	Lin T Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick C L. Microsoft COCO: common objects in context. In: Proceedings of European Conference on Computer Vision, 2014, 740–755

22	Everingham M, Van Gool L, Williams C K I, Winn J, Zisserman A. The pascal visual object classes (VOC) challenge. International Journal of Computer Vision, 2010, 88(2): 303–338 DOI

Options

Outlines

About the journal

Browse

Authors & reviewers

Abstract

Cite this article

Introduction

Fig.1 Structure of the detection network, where the detector is followed by a classifier

Basic problem

Fig.2 Four objects to be detected. (a) Intact insulator labeled “good”; (b) string-off insulator labeled “bad”; (c) intact damper labeled “double”; (d) shedding damper labeled “single”

Fig.3 Detection results of a string-off insulator. The blue box is the expected “good” label, whereas the red boxes are unexpected “bad” labels

Area-based non-maximum suppression algorithm

A-NMS algorithm for box fusion during cropping detection

Fig.5 Box fusion algorithm. (a) Two detection boxes; (b) fused detection box. The background green box is the output box obtained by fusing the two detection boxes in (a)

Experimental results

Selection of the A-NMS algorithm and box fusion overlap threshold

Fig.6 Sensitivity of different methods to the threshold T in NMS, A-NMS, and cropping detection. Threshold T is 0.3–0.9 with intervals of 0.1. Black bars indicate the traditional NMS algorithm, orange bars indicate the A-NMS algorithm, and green bars indicate the cropping detection method

Cropping detection test

Fig.8 Impact of cropping detection on four types of objects: “good”, “bad”, “double”, and “single”. Red bars indicate the results of the benchmark method, whereas blue bars indicate the results of the cropping detection method. Here, AP refers to average precision

Fig.9 Results of the benchmark and cropping detection methods. Detection results of (a) benchmark algorithm and (b) cropping detection method

Comprehensive test

Tab.1 mAP and recall for different methods. “√” indicates that the corresponding algorithm is used

Tab.2 Detection time of different methods in different GPU environments

Conclusions

Acknowledgements

References