Detection of small ships from an optical remote sensing image plays an essential role in military and civilian fields. However, it becomes more difficult if noise dominates. To solve this issue, a method based on a low-level vision model is proposed in this paper. A global channel, high-frequency channel, and low-frequency channel are introduced before applying discrete wavelet transform, and the improved extended contrast sensitivity function is constructed by self-adaptive center-surround contrast energy and a proposed function. The saliency image is achieved by the three-channel process after inverse discrete wavelet transform, whose coefficients are weighted by the improved extended contrast sensitivity function. Experimental results show that the proposed method outperforms all competing methods with higher precision, higher recall, and higher F-score, which are 100.00%, 90.59%, and 97.96%, respectively. Furthermore, our method is robust against noise and has great potential for providing more accurate target detection in engineering applications.
Mingzhu SONG, Hongsong QU, Guixiang ZHANG, Guang JIN. Detection of small ship targets from an optical remote sensing image[J]. Frontiers of Optoelectronics, 2018, 11(3): 275-284. DOI: 10.1007/s12200-018-0744-x
Introduction
A ship is the main target for sea surface monitoring and wartime combating. In remote sensing image processing, ship detection and identification technology has been intensively studied in recent years. Ship target detection methods for optical remote sensing can be classified into the following four categories [1–5]: methods based on the gray statistic feature, image edge feature, fractal model and fuzzy theory, and visual sensitivity model. Small ship targets are susceptible to noise and shadow of the sea, so the methods based on the gray statistical feature, edge feature information, and fractal model may lose their functions [5]. Therefore, the method based on the visual sensitivity model is selected.
At present, saliency detection based on the visual sensitivity model is widely used. Goferman et al. [6] realized a saliency model based on the context aware. Erdem [7] proposed a viewpoint saliency detection method based on the region covariance. Pandivalavan and Karuppiah [8] proposed a region-based computational visual attention model for saliency detection. Kapoor et al. [9] introduced a set of fuzzy features to mark out the salient region in an image. Zhang et al. [10] proposed a novel graph-based optimization framework for salient object detection. However, all the algorithms above are focused on the detection of noiseless color images with large targets and abundant texture information. The influence of panchromatic images, small targets, and noise is not considered. Therefore, existing algorithms have limitations for detecting small targets from noisy panchromatic images. From the above perspective, we modify the nonparametric low-level vision model first proposed by Murray et al. [11] and Song et al. [5] and optimize the robustness of small-ship detection through the design of an extended contrast sensitivity function (ECSF) for different channels.
Proposed detection method
This paper first describes the algorithm design using the human visual contrast and spatial relations, based on the following three assumptions [12]:
1) The induction effect operating on a particular spatial frequency stimulus in the intensity channel is determined by the characteristics of the surround stimulus with the same spatial frequency.
2) When the central stimulus has the same orientation as the surround stimulus, assimilation in the intensity channel is stronger and vice versa.
3) When the contrast energy of the surround features increases, assimilation in the intensity channel increases, the contrast effect decreases, and vice versa.
Based on the assumptions above, we believe that the spatial frequency of the environment and the central-surrounding contrast stimulus are the two main factors that determine visual perception changes. This paper focuses on how to construct the extended contrast sensitive function of different spatial frequencies. The final visual significance information is obtained by designing contrast sensitivity weighting at different scale spaces. The flow chart of the proposed method is shown in Fig. 1.
Fig.1 Flow chart of the proposed method. WT: wavelet transform; WT-1: inverse wavelet transform
Impulse noise, Gaussian noise, and Poisson noise are the main factors affecting detection accuracy. Considering that different noises have different frequency characteristics, we convert the image information to the global channel , high-frequency channel , and low-frequency channel in order to realize different frequency noise interference separateness. The global channel is used for global saliency detection, high-frequency channel is used to detect the saliency of impulse interfering information, and low-frequency channel is used to remove the interference of Gaussian and Poisson noises and ensure the number of absolute targets. The global channel is normalized by the value of the gray scale data:where is the original image information and N is the quantization bits.
Considering that impulse noise is isolated from ship targets, Gaussian noise and Poisson noise are mixed in the background, we design a filter to achieve the effective extraction of high-frequency and low-frequency information. Let represent the rectangular subimage window coordinate group at the center with a size of ; iswhere is a balance factor which affects the proportion of the frequency component. The outputs of the critical channels are
In this case, the channel mainly contains the information to be ignored after the detection, and the channel mainly contains the most significant low-frequency information for the final calibration.
After the channel conversion, the three channels are transformed by wavelet to obtain the spatial pyramid of multiple scales, which contains wavelet planes oriented either horizontally (h), vertically (v), or diagonally (d), and the wavelet transform parameter contains the local contrast information of the global, high-frequency, and low-frequency channels.
Scale adaptive center-surround contrast energy
Contrast is an important feature in an image vision model and is the main factor affecting the human visual attention. When we need to analyze complex scenes, the vision system will judge the saliency target according to the contrast stimulus [5]. In this paper, the center-surround contrast energy is used to characterize the saliency region. The difference between the noise-edge information and the local information at different scales is considered in order to realize effective and ineffective information separation. The wavelet transform coefficient matrix is convoluted by a binary filter with a directional characteristic and the contrast energy coefficient , which is constructed as
Compared with ordinary images, sea surface images containing small ship targets lose texture information. At a low scale, the difference of the center-surround contrast energy is not significant, and aliasing between noise-edge information and background information is considerable, which directly reduces the significant difference between the target and nontarget points in the subsequent saliency map. Therefore, we improve the center-surround contrast energy from the traditional model (Eq. (5)) to a new model (Eq. (6)):
The factor is introduced into the center-surround contrast energy, and it varies with the magnitude of the edge region energy calculated in its scale direction:
where is the amplification factor ( in this paper) and represents the order of magnitude. The change curve of after introducing the scale factor is shown in Fig. 2.
It can be seen in Fig. 2 that the introduction of in suppresses the overall contrast of the global image. The ratio of the center-surround contrast energy values at each position does not change, but the value becomes narrow overall. At this time, the noise-edge position with large values is still significant, and other values of relative positions become smaller and are suppressed. The images are shown in Fig. 3.
Improved ECSF
The ECSFs have been developed by Otazu et al. [12] in order to quantitatively analyze the center-surround contrast energy:where is the weight function and is an additional function ensuring a nonzero lower bound [12]. The main drawback of the low-level vision model in dealing with images in this paper is that noise, as a stimulus with a different frequency, has a “target-guided” form for visual perception, and the intensity of the target information is often weaker than noise. Owing to the loss of detail information of small ship targets in a sea image and the relationship between the noise type and the frequency, we adopt the method of neglecting the target edge and construct a new ECSF based on the Barten model and Daly model of the human eye transfer function. We designed :
where represents the spatial scale, represents the spatial frequency, and . is the size of an input image. is a factor of imaging luminance, which decreases with brightness enhancement. , define the spread of the spatial sensitivity of , defines the peak spatial scale sensitivity of . , are the weighing factors: mainly affects the low-frequency component, which is mainly caused by Gaussian noise and Poisson noise, and mainly affects the high-frequency component, which is mainly caused by impulse noise.
defines the spread of the spatial sensitivity of , defines the peak spatial scale sensitivity of , and is designed as
The special point of our ECSF is in considering the noise interference and luminance simultaneously with complex weight function and factors. When the center-surrounding energy measurement parameter is weighted by , the response of different frequency information stretches effectively. The proposed ECSF model is shown in Fig. 4.
The wavelet coefficient matrix is modulated by the value of the ECSF, and the saliency information of each channel is obtained by performing the inverse wavelet transform. Then the final saliency map is achieved ( represents the global channel, represents the high-frequency channel, and represents the low-frequency channel; equals to 0.1 in this paper):
At this moment, the differential result of the global information and the high-frequency information will be calibrated by the low-frequency channel information in order to improve the significance of the absolute targets and exclude the impact of low-frequency noise.
Experiments and discussion
A target that occupies 2−30 pixel positions is called a small ship target in this paper. We test 40 panchromatic satellite images with a size of 256 × 256 from Google Earth, 40 images with impulse noise with a mean value of 0 and a variance of 0.002, 40 images with Gaussian noise having a mean value of 0 and a variance of 0.002, 40 images with Poisson noise with a mean value of 0 and a variance of 0.002, and 40 images with three multiple noises above. Six methods are used for comparison with the proposed method. The results are shown in Fig. 5. The comparison algorithms are context aware (CA) [6], covariance (COV) [7], spectral residual (SR) [13], spatially weighted dissimilarity (SWD) [14], saliency estimation using low-level model (SIM) [11], minus contrast sensitivity (MCS) [5], and unsupervised surface detection (USD) [15].
Fig.5 Contrast of saliency maps using different methods for noise-free image and noise image, from left to right: original, CA, COV, SR, SWD, SIM, MCS, USD, OURS. (a) Noise-free image; (b) impulse; (c) Gaussian; (d) Poisson; (e) multiple (Top row: Typical image1. Bottom row: Typical image2)
The number of detection targets is obtained by binarization and corrosion expansion operations. The criteria for evaluating the merits of the algorithms are defined as precision, recall, and F-score:where represents the right number, represents the false alarm number, represents the leak number, and is used to balance the precision and the recall ( in this paper). The performance comparison of different methods is shown in Figs. 6 and 7.
For normal images, the precision, recall, and F-score of our method is 100%, 90.59%, and 97.96%, respectively, which is 7.15% higher, 4.93% lower, and 7.48% higher than those in other algorithms. From the contrast experiment, for normal images: the CA algorithm using context aware saliency may detect weak texture information as targets, giving rise to higher false alarm. The COV algorithm using the mean descriptor and a covariance descriptor to describe seven dimensional feature vectors of the image region cannot be used effectively when the texture information is small, which results in higher omission ratio. The SR method analyzing and removing the general information in the logarithmic spectrum leads to the neglect of secondary targets and a decrease in the number of correct tests. In the bright sea area, the SWD algorithm needs to calculate the space distance of image blocks and the center deviation; the obtained weighting proportion is larger, which leads to an increase in the background value. The SIM algorithm extracting excess weak texture and edge information during scale decomposition may obscure targets and background information; moreover, the ECSF does not significantly focus on the target area and brings false alarm. The MCS algorithm using channel and MECSF design cannot stretch the contrast all the time and cannot get rid of the influence caused by low-frequency noise, which results in false alarms. The USD algorithm that smoothes the amplitude spectrum by using different Gaussian kernel functions after the discrete Fourier transform causes energy of some targets to be weakened during the processing, resulting in the imbalance of false alarms and omissions after subsequent fusion correction.
Fig.6 Performance of different methods for noise-free images
When applied to images of impulse noise, Gaussian noise, and Poisson noise, the precision, recall, and F-score of our method are (100%, 90.59%, 97.96%) (impulse), (94.52%, 81.18%, 91.51%) (Gaussian), and (100%, 83.53%, 96.21%) (Poisson), respectively, and the highest precision, recall, and F-score of the other methods are (87.34%, 90.59%, 86.03%) (impulse), (81.36%, 80.95%, 75.00%) (Gaussian), and (90.91%, 94.12%, 89.29%) (Poisson), respectively. A further application to images of multiple noises shows that the precision, recall, and F-score of our method are 94.44%, 80%, and 91.15%, and the highest precision, recall, and F-score of the other methods are 59.26%, 88.24%, and 61.70%, respectively. For noisy images, the influence of various noises from high to low are impulse noise, Gaussian noise, and Poisson noise, and almost all the comparison algorithms are not able to resist impulse noise and Gaussian noise. For images of multiple noises, the number of correct detections decreases dramatically, and the false alarm and omission ratios increase significantly.
Fig.7 Performances of different methods for noise image. (a) Impulse; (b) Gaussian; (c) Poisson; (d) multiple
In our algorithm, the influence of different types of noise on sea surface target detection is fully considered. By means of the scale adaptive central contrast energy and the extended ECSF design, significant separation of signal and noise at different frequencies is realized. At the same time, the three-channel design effectively realizes filtering of high-frequency noise and “re-calibration” of a signal target. Analysis shows that there are two main reasons for the omission and false alarms in our algorithm: 1) When the scale of a single target is too small, the target is neglected in noises. 2) A connected region is formed when targets are too close.
To verify the robustness of the proposed algorithm against interference noise, an additive noise test is designed. Considering that changes in the content of Gaussian noise and Poisson noise can have negligible impact on the detection, we design the following two tests: Test 1. Add impulse noise with a mean value of 0 and a variance of 0, 0.005, 0.010, 0.015, and 0.020 on normal images. Test 2. Add Gaussian noise with a mean value of 0 and a variance of 0.005 and Poisson noise on normal images; then add impulse noise with a mean value of 0 and a variance of 0, 0.005, 0.010, 0.015, and 0.020 on noisy images above, respectively. The detection results are shown in Figs. 8 and 9.
Fig.8 Noise images (Top row) and saliency maps (Bottom row) of (a) Test 1 and (b) Test 2. Noise variance of both of them from left to right: 0, 0.005, 0.010, 0.015, 0.020
It can be seen from Test 1 that when the variance of impulse noise is less than 0.020, the precision, recall, and F-score are higher than 80%. False alarms are mainly caused by the impulse noise, and omission mainly results from the regional connectivity at this time. In Test 2, the steps of adding the Gaussian noise and the Poisson noise have a direct impact on the detection effect. The noise masks some small targets, resulting in a similar noise and target scale, and some images cannot be recognized even by the human eye; therefore, the false alarm and omission ratios increase directly. The precision of the other comparison algorithms are all below 40% because of the high false alarm rate with this data set. When the targets, which can be recognized by the human eye in Experiment 2, are considered as the actual target numbers and the target shrink threshold is increased, the detection results are shown in Table 1.
Tab.1 Detection results of Test 2 (contrast human eyes)
variance
precision
recall
F-score
0
95.59%
86.67%
93.66%
0.005
96.83%
89.71%
95.32%
0.010
90.91%
87.72%
90.25%
0.015
91.38%
89.83%
91.07%
0.020
86.44%
89.47%
87.03%
Real remote sensing image do not have much noise. To verify the effectiveness of the proposed algorithm, an actual remote sensing camera imaging and testing experiment was designed. In the experiment, a self-developed low-light camera (CMOS sensor: GSENSE400) was used, sea surface scenes were generated by an image simulator, and the local areas (256 × 256) of images were detected by the proposed algorithm. The experiment was conducted under two scenarios: the first scenario was natural light environment (illumination: 4000 lx, exposure time: 20 ms), and the second scenario was low-light environment (illumination: 0.05 lx, exposure time: 1000 ms). The detection results are shown in Fig. 10. The performance of the proposed algorithm is good for both sparse targets and target groups.
Fig.10 Detection results of imaging test of (a) first group and (b) second group. Top row: original images; bottom row: saliency maps
For time consumption experiment, Intel (R) Xeon (R) CPU E5-1620 v2 @ 3.70GHz computer and MATLAB R2015a software are used. An average time consumption of our method is 1.4008 s, less than that of CA (43.0158), COV (26.2670), SIM (1.6437 s), and USD (1.6453 s) and more than that of SR (0.0994 s), SWD (0.2462 s), and MCS (1.3936 s). Time consumption mainly occurs in the multichannel conversion between the airspace and the transform domain.
Conclusions
This paper presents a new approach to detect small ships from an optical remote sensing image based on the low-level vision model. The channel separateness, scale adaptive center-surround contrast energy, and improved ECSF are taken as the key in the formulation of our improved model to detect small ships. The performance of the proposed model is validated and compared. The results of the comparison show that the proposed method has higher precision, recall, and F-score with robust anti-noise ability. However, the time consumption is large. Therefore, we plan to research an optimization method in our future work.
Acknowledgements
This work was an extended research based on low-level vision model SIM and MCS. We would like to thank Dr. Fang XU and Murray (et al.) for sharing their evaluation codes. This work was supported by Major Projects of the Ministry of Science and Technology (No. 2016YFB0501202) and the Natural Science Foundation of Jilin Province, China (No. 20170101164JC).
Wang Y Q, MA L, Tian Y. State-of-the-art of ship detection and recognition in optical remotely sensed imagery. Acta Automatica Sinica, 2011, 37(9): 1029–1039
2
Shi Z, Yu X, Jiang Z, Li B. Ship detection in high-resolution optical imagery based on anomaly detector and local shape feature. IEEE Transactions on Geoscience and Remote Sensing, 2014, 52(8): 4511–4523
Hu J, Gao J B, Posner F L, Zheng Y, Tung W W. Target detection within sea clutter: a comparative study by fractal scaling analyses. Fractals-complex Geometry Patterns & Scaling in Nature & Society, 2011, 14(3): 187–204
5
Song M Z, Qu H S, Jin G.Weak ship target detection of noisy optical remote sensing image on sea surface. Acta Optica Sinica, 2017, 37(10): 1011004-1–1011004-8
6
Goferman S, Zelnik-Manor L, Tal A. Context-aware saliency detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(10): 1915–1926
Pandivalavan M, Karuppiah M. Saliency detection for content aware computer vision applications. International Arab Journal of Information Technology, 2017, 14(4): 528–533
9
Kapoor A, Biswas K K, Hanmandlu M. An evolutionary learning based fuzzy theoretic approach for salient object detection. Visual Computer, 2017, 33(5): 665–685
Zhang J X, Ehinger K A, Wei H K, Zhang K J, Yang J Y. A novel graph-based optimization framework for salient object detection. Pattern Recognition, 2017, 64(C): 39–50
Murray N, Vanrell M, Otazu X, Parraga C A. Saliency estimation using a non-parametric low-level vision model. In: Proceedings of IEEE Conference on Computer Vision & Pattern Recognition, 2011, 433–440
12
Otazu X, Parraga C A, Vanrell M. Toward a unified chromatic induction model. Journal of Vision, 2010, 10(12): 5
Hou X, Zhang L. Saliency detection: a spectral residual approach. In: Proceedings of IEEE Conference on Computer Vision & Pattern Recognition, 2007, 1–8
14
Duan L, Wu C, Miao J, Qing L, Fu Y. Visual saliency detection by spatially weighted dissimilarity. In: Proceedings of IEEE Computer Vision & Pattern Recognition, 2011, 473–480
15
Xu F, Liu J H, Zeng D D, Wang X. Detection and identification of unsupervised ships and warships on sea surface based on visual saliency. Optics and Precision Engineering, 2017, 25(5): 1300–1311