1. Chongqing Smart City Institute, Chongqing Jiaotong University, Chongqing 400074, China
2. Chongqing Geomatics and Remote Sensing Center, Chongqing 401147, China
guandongjie_2000@163.com
Show less
History+
Received
Accepted
Published
2021-07-15
2022-03-02
Issue Date
Revised Date
2022-07-20
PDF
(20150KB)
Abstract
Realizing accurate perception of urban boundary changes is conducive to the formulation of regional development planning and researches of urban sustainable development. In this paper, an improved fully convolution neural network was provided for perceiving large-scale urban change, by modifying network structure and updating network strategy to extract richer feature information, and to meet the requirement of urban construction land extraction under the background of large-scale low-resolution image. This paper takes the Yangtze River Economic Belt of China as an empirical object to verify the practicability of the network, the results show the extraction results of the improved fully convolutional neural network model reached a precision of kappa coefficient of 0.88, which is better than traditional fully convolutional neural networks, it performs well in the construction land extraction at the scale of small and medium-sized cities.
Accurate judgment of urban land spaces is the basic premise for most state policy making (Weng, 2001; Jiang and Yao, 2010); a lack of quantitative information on urban space usually leads to decision-making failures on the part of governments, which further aggravates a series of human-land conflicts. Extracting urban land information through remote sensing satellite images is regarded as a common way to comprehensively grasp urban sprawl (Xi and Ng, 2007; Vizzari et al., 2018; Shao et al., 2019), however, urban land extraction methods have faced several problems. First, the traditional classifier has poor capacity to transfer learning and capture small-scale or fine image features when shed light on the land boundary (Waldner and Diakogiannis, 2020; Zhang et al., 2020a, 2020b). Second, the forms of urban land are irregular, resulting in low classification precision when extracting construction land boundaries by traditional classifiers (Pan et al., 2013; Zhang et al., 2018a). Third, large-scale land use classification based on remote sensing images is mainly processed by professionals, which restricts the efficiency of data production and increases the cost of data. Deep learning (DL) algorithms excavates the spatial relationship characteristics of satellite images (Zhang et al., 2019; Zhou et al., 2020), overcome the deficiency of low spatial resolution image information, providing a potential solution for the efficient and accurate extraction of large-scale land information, which have been widely used in remote sensing image classification (Kussul et al., 2017; Hu et al., 2018; Xu et al., 2018). Consequently, this paper proposes an improved fully convolutional neural network (IFCN) with the help of the Caffe platform (Jia et al., 2014), providing an effective method for data acquisition at different levels of urban space on large scales, which is expected to provide high precision data support for the exploration of urban spatial development and scientific layout.
Traditional convolutional neural network (CNNs) improves the paradigm of machine operation in image processing with translational invariance, which has proven to perform well in the field of computer vision applications (Tian et al., 2018). Since LeNet5 appeared, scholars improved the hierarchical structure based on the design concepts of LeNet5 to meet the demands of land cover/use classification tasks (Lecun et al., 1998), and have made breakthrough achievements in classification precision of satellite image data sets (Flood et al., 1998; Nguyen et al., 2013; Lagkvist et al., 2016). To date, end-to-end DL algorithms includes patch-based convolutional neural networks and fully convolutional neural networks (Zhu et al., 2017). For example, Huang et al. (2018) constructed an improved STDCNN to classify land use in Hong Kong and Shenzhen to obtain more practical land use classification results. Martins et al. (2020) proposed a multi-scale object-based CNN method for large-scale land cover classification of high-resolution images. Patch-based CNNs use artifacts for adjacent patches during operation, resulting in low classification efficiency (Volpi and Tuia, 2017), the patches in the input CNNs are usually inconsistent with the real objects, leading to excessive expansion (contraction) and geometric distortion of the edges of the classification results, which reduces classification accuracy. Fully convolutional neural networks perform semantic segmentation on each image at the pixel level, maintaining a two-dimensional image output structure with fewer training samples and shorter calculation time compared with traditional CNNs (Middel et al., 2019), which has played an excellent role in satellite image classification (Liu et al., 2018; Pan et al., 2018; Mohammadimanesh et al., 2019; Ptucha et al., 2019). Long et al. (2015) applied FCNs to the data set of Pascal voc-2012, enabling the realization of more rapid and accurate remote sensing image classification through a fully convolutional neural network. FCNs perform well in complex classification, Mohammadimanesh et al. (2019) designed an FCN based on SARS images to classify the complex land cover ecosystem; Wurm et al. (2019) utilized the transfer learning capacity of FCN to segment marginalized urban living areas in remote sensing images; Persello and Stein (2017) used FCN to identify and delimit the farmland of small farms from satellite images. Recently, the application field of FCNs is increasingly broad, with critical breakthroughs having been made in the research of semantic segmentation under three-dimensional spatial data and land cover extraction based on historical images (Mboga et al., 2020; Wen et al., 2020).
Urban mapping is a significant application field of DL and has produced a few excellent methods. Guo et al. (2021) presented a saliency-guided edge-preservation network to generate up-to-date building footprints, it is trained in a semi-supervised manner, which is feasible produce an up-to-date architectural footprint in large scale area. Ji et al. (2019) used the dual time remote sensing image training model of pre-changed and post-change remote sensing images to realize urban change detection. Owing to the favorable end-to-end capacity of FCNs, they are widely applied in urban mapping at local scales (Maggiori et al., 2016; Li et al., 2018), as well as regional scales (Fu et al., 2017), and even global scales (He et al., 2019; Qiu et al., 2020). These studies take large cities as the study object, while recently attention has been devoted to the development of small and medium-sized cities. Urban development can be reflected by changes in urban boundaries, and it is meaningful to realize the rapid identification of small and medium-sized cities boundaries by using high-precision algorithm. However, it is still a challenge to accurately extract construction land of different city scales in one algorithm, the reason lies in the spatial composition and spatial morphology differences between big cities and small cities. Due to the limited availability of satellite remote sensing image data and the characteristics of different image channels, when extracting urban features by FCNs, different levels of city images present different feature information in the network, generally. There are two ways to address the defects of FCNs: one is combing FCNs with traditional methods or other networks to optimize the classification results (Wu et al., 2015; Deng et al., 2018), the other is modifying the network structure to optimize network performance (Jean et al., 2016; Zhong et al., 2016; Ding et al., 2018). Based on the difficulty of multi-scale urban construction land extraction in the context of low-resolution remote sensing images, this paper presents an improved fully convolutional neural network (IFCN) model, adopted low-resolution remote sensing image as source data, improved the front-end process and deconvolution process based on the VGG16 network, and optimized the network feature extraction capability under low-resolution satellite image data. The potential of the designed network for automatic semantic segmentation of urban construction land of different city levels in large-scale areas is also considered.
The main aims of this paper are as follows. (i) An innovative FCN is proposed to extract multi-scale urban construction land information supported by VIIRS nighttime lights satellite image data, and as much information as possible from limited and low spatial resolution remote sensing image samples are extracted to achieve high-precision semantic segmentation. (ii) Explore the spatial characteristics of mult-scale cities in specific remote sensing images, thus to improve the method of training-sample selection, which makes full use of the feature mapping of image band combinations to improve the classification precision.
2 The study area and data collection
The Yangtze River Economic Belt (Fig.1) covers 97°36′–122°95′ E longitude and 21°14′–35°12′ N latitude, stretches across eastern, central and western China along the Yangtze River Basin, covering nine provinces and two municipalities including Sichuan, Yunnan, Guizhou, Hunan, Hubei, Jiangxi, Anhui, Jiangsu, Zhejiang, Chongqing and Shanghai (China Statistical Yearbook, 2019), formed three urban agglomerations, including the Yangtze River Delta, the Central Yangtze River and Cheng-Yu. The ratio of regional urbanization increased from 14.81% in 1978 to 58.29% in 2017 in this area, as multiple central cities sprang up, the urban density has reached 44.25 (10000 km2), twice the average Chinese urban density.
Pixel-level classification is suitable for large area surface detection, but it requires small internal differences of satellite data (Liu et al., 2021). NPP-VIIRS data is a product released by the National Geophysical Data Center (available at NOAA website), this image eliminate the influence of sunlight and auroras, and record the surface light intensity information under cloudless conditions at night, while capturing the weak light signals from cities, residential areas, fishing boats and other areas at night and distinguishing them from the dark rural and mountain areas, and thus it opens a new direction for approaches to realize the extraction of urban boundaries and urban construction land areas from source data. Moreover, the images cover a wide range, which supports its strong application potential in large-scale earth observation, making it an ideal data source for this research.
The main data used in this paper include the remote sensing image data set and network structure files. 1) The remote sensing image data set: the nighttime light data (VIIRS) were used as feature samples for the fully convolutional neural network. This paper selected the annual synthesis data of VIIRS in China in 2017 with a spatial resolution of 500 m as a nighttime light data source. The spatial reference of VIIRS data was set to WGS1984. The image met the requirements of fully convolutional neural network data sampling after preprocessing and was used for the training and test sample of the network, serving as the image data source to be classified in the meantime. 2) This paper designed a network based on several common fully convolutional neural network models, all original networks were collected from the GitHub.
3 Method
3.1 FCN model establishment
The CNNs are characterized by spars e connection and weight sharing (Lecun et al., 1998; Wagner et al., 2013), the deeper the hidden layer is, the closer it approximates a continuous function, and it is feasible to achieve low-resolution pixel-level image classification through an improved neural network.
The convolution operation has the property of translation invariance, thus FCNs use the Laplacian operator and gradient operator for image segmentation, automatically learning image signal features through a convolution kernel, and acquire the heatmap of convolution layer i through the nonlinear calculation. The primary process is as follows:
where Ci corresponds to the convolution layer and the feature map of layer i of FCNs in the meantime, Wi corresponds to the weight of the convolution kernel in layer i, Bi is the bias value, corresponds to the cross-correlation operation, and f is the nonlinear activation function.
The FCNs realize end-to-end classification through deconvolution. The deconvolution process is essentially an upsampling process, and deconvolution layers upsample the low-resolution heatmaps obtained from the pooling process to ensure that the network outputs a segmented image with the same resolution and same size as the input image. A bilinear interpolation method is applied to calculate the deconvolution (Zeiler and Fergus, 2014). The operational process is as follows:
where x represents the input image, y represents the output image, and CT denotes the deconvolution process matrix.
The ReLU function was used as the activation function to convert the convolutional layer and avoid the disappearance of the gradient, and the Xavier random initialization method was utilized to assign the initial value of the weight, thus accelerating the convergence speed and suppressing the oscillation of the network. The mathematical formula is as follows:
where ni represents the ith neuron, W represents the weight value, U represents the uniform distribution, and its variance is .
The adaptive moment estimation (Adam) method is suitable for solving the problem of network training under the background of low resolution image, intuitively explains the hyperparameters. Empirical results also prove that the Adam algorithm is superior to other algorithms (Duchi et al., 2011). Therefore, to improve the training rate of the stochastic gradient descent and avoid the gradient disappearance problem, Adam is selected as the network weight optimizer in this paper.
where mt is the first-order moment estimation of the gradient, nt is the second-moment estimation of the gradient, is the correction of mt, is the correction of nt, γ corresponds to the learning rate, and τ is used to guarantee that the denominator is not 0.
Softmaxwithloss is a commonly used loss function calculation method, it is a combination of the Softmax and the cross-entropy loss, which is relatively stable in calculating the loss function of the segmentation task, objectively reflecting the changes of the algorithm, the mathematical formula is as follows:
where z represents the bottom blob of the loss function in caffe framework, and loss(y,z) represents the top blob of the loss function.
The batch normalization layer and dropout layer were applied in the forward propagation and back propagation of the network. They were used to reduce the generalization errors, enhance the network generalization ability, and suppress overfitting through the local response and multi-neuron combination.
3.2 Realization techniques
3.2.1 Sample collection
This paper adopted checkerboard to extract network samples based on ArcGIS, which ensure the adequacy and coverage of different urban types of training samples and avoid overfitting and underfitting of the model, as shown in Fig.2. The graph net completely covers the study area and divides it into 400 sub-regions, while 228 sampling areas were selected under the principle that the proportion of image areas in sub-regions accounted for more than 90%.
Intelligent identification of the position information of each element is critical to judge the city boundary when gray image is used as the original image to be classified. Urban form includes grid shape, radial shape, cluster shape and so on, and all urban forms are included in the samples. 2–8 samples were extracted from each sampling area. A total of 1017 feature samples of nighttime light remote sensing image data (NPP-VIIRS) were selected, the pixel resolution of each VIIRS data sample is 36 × 36. The collected samples were randomly assigned as training samples and test samples at a ratio of 21:4.
Labelme was used to collect labels to meet the label standards of the fully convolutional neural network model in a Caffe deep-learning environment. Accurate label boundary calibration is the basis for ensuring the classification precision of the fully convolutional neural network model. This paper applied the global land cover data set FROM-GLC30 (2017) of the China National Earth System Science Data Center (Gong et al., 2019) as the sample label reference basis. All samples and labels were clipped into 224 × 224 pixel size to meet the consistency of sample and label data sets and the requirements of network operation.
3.2.2 Network structure settings
The proposed method improves the front-end and deconvolution process based on traditional FCNs to enhance the capability of feature information extraction of the IFCN, thus constructing a fully convolutional neural network model suitable for construction land extraction.
It is critical to retain local information for semantic segmentation (Barth et al., 2019). However, limited by the data volume of the low spatial resolution remote sensing images, the heatmaps contain less shallow classification information that obtained through traditional FCNs, leading to local information loss and tendency toward errors in deep classification information, and the boundary of the obtained construction land classification results is ambiguous and inaccurate in case the deconvolution is carried out on the fusion map with less information. Traditional FCNs model are not applicable to this paper. As shown in Fig.3, the shallow heatmaps of FCNs contain more image details and location information, while the deeper heatmaps of FCNs are more robust and rougher. For remote sensing images with different resolutions, the heatmaps at different levels have different effects on satellite image extraction; hence, image extraction can be optimized by the abundant use of multi-level network heatmaps.
This paper improved the structure, parameter, and data entry mode of traditional FCNs to meet the requirements of segmentation, and fcn8s-heavy-pascal caffemodel is selected as the pre-training model (available at caffe berkeleyvision website). In the deconvolution phase, we revised the skip connections, fused the output feature information of pooling layer 4, pooling layer 4, and pooling layer 2 with the output features of multiple layers in the deconvolution structure, redesigned the multiple of the four upsampling processes and averaged the multiple of the four upsampling processes to ensure the balance of depth and shallow information in the network, increased the crop layer, and corrected the offset to optimize the detection capacity of the urban boundary. In the convolution phase, two batch normalization layers were added for the first two pooling layers, reducing the learning rate coefficient and bias coefficient of the first four convolution layers, thus improving the learning ability of the shallow convolutional layer and optimizing the extraction capacity of the space location information of the network. A specific network structure diagram is shown in Fig.4.
We verified the validity of the above parameter adjustment strategies. Four strategies were adjusted on the basis of FCN8s, and the training accuracy under various adjustment modes was compared with FCN8s and IFCN. In the four modes, each variable is different, and the control quantity is consistent, the results are shown in Tab.1 after 20000 iterations. Compared with FCN8s, the overall accurate (OA) and MeanIU of mode ①, mode ②, and mode ③ increased in different amplitude under the same number of iterations, but the gradient decreased fast. The OA and MeanIU of mode ④ decreased slightly, loss value increased significantly, the network gradient descent speed decreased and learning ability increased after adding skip connection. By adjusting these modes, the performance of IFCN is improved compared with FCN8s.
3.3 Model validation
This paper divided 1017 VIIRS night light image samples into training samples and testing samples, input them to the network and carried out 200000 iterations. The experimental results show that the loss value of IFCN has minimal change after 200000 iterations. After completing training, the overall precision, mean precision, MeanIU, and fwavacc reached 99.48%, 98.92%, 97.59%, and 98.98%, respectively. The variation in precision during model iteration is shown in Fig.5. The MeanIU climbed rapidly before reaching 50000 iterations, which indicates that the gradient descent is obvious during training, with no gradient disappearance or oscillation occurring. After 50000 iterations, the network tends to be flat. While the precision keeps invariant after 180000 iterations, indicating that the network fitting has been completed and the optimal performance has been achieved. The loss value decreased rapidly before 25000 iterations, slowly approaching 0, and reached 592 after 200000 iterations, which met the model application benchmarks.
4 Results
4.1 Results of construction land extraction in the experimental area
The VIIRS images of the Yangtze River Economic Belt were fed into the trained IFCN network to obtain the binary results. The experimental results showed that the automatic extraction of construction land can be realized by the IFCN model.
We classified the construction land and non-construction land in the test areas through four classifiers, the results are shown in Fig.6. The SVM (Support Vector Machine) adopts a linear kernel form, so that the urban boundary obtained by SVM is obvious, while a few non-construction lands are wrongly determined as construction land driven by this characteristic, and the model is unable to identify smaller cities and small facility such as reservoirs and parks. U-Net has five pooling layers, in the process of up sampling, it fuses the multi-scale features one by one to form thicker and prominent features, while it is difficult for U-Net to identify the urban areas with low DN value in VIIRS nighttime lights satellite image, a few suburban boundaries are misclassified in the classification results. More detailed features of urban boundaries are obtained by the IFCN and the Threshold classification, indicating that these classifiers are better for identifying non-construction land within the city, while the Threshold classification methods require optimal Threshold selection at the decision stage, which makes it scenario-dependent and time-consuming (Shi et al., 2021). Compared with these classifiers, IFCN is better at detecting the detailed features of small city boundaries, and has better segmentation capacity for roads and water bodies.
To further verify the validity of the model, we explored the construction land extraction performance of IFCN under large-scale low-resolution satellite images through quantitative precision evaluation. The kappa coefficient and F1-Score algorithm were used to measure the precision of three classifiers.
The results are shown in Tab.2. The kappa coefficient and F1-score coefficient of the IFCN increased by 5% and 4%, respectively, compared with the other classifiers. The IFCN infer object geometry from context information, thus separating categories. The recall ratio of IFCN reaches 95%, which is much higher than 90% of SVM classification, 87% of Threshold classification and 48% of U-Net, and the optimized deconvolution structure combines the shallow spatial position information and deep robust information, significantly improve the classification precision.
4.2 Results of construction land extraction in small and medium-sized cities
40 cities were selected to verify the construction land segmentation capacity of the model, where 20 medium-sized cities and 20 small cities were selected as test sites. The scale of the city is divided in accordance with the state council of Chinese publication of the “Notice on Adjusting the Standards for the Division of Urban Size”. The extraction results are shown in Fig.7. Most sites have good connectivity in the results of construction land segmentation, with inconspicuous salt-and-pepper noise dislocation. The construction land boundaries are clear, and the forms of urban sprawl are distinctly shown in the classification results, such as radial patterns and group patterns. In addition, the identification of non-construction land within cities is comparatively accurate, and urban morphology is relatively intact.
In this paper, the shape index (SI) and total edge length (TE) were used to verify the segmentation capacity of the IFCN for small and medium-sized city boundaries. We extracted the actual construction land distribution of 40 cities by the manual vectorization method and compared it with the extraction results of IFCN using the quantitative method. We revealed the accuracy of urban spatial shape recognition of the IFCN through shape index difference (SID), and explored the accuracy of urban area recognition of the IFCN through total edge length difference (TED). The specific results are shown in Tab.3.
For medium-sized cities, the minimum and maximum SID are 0.6% and 23.87%, respectively, the average SID is 7.79%, and the morphological similarity degree between the extracted and real results is higher than 90%, which shows that the IFCN has a preferable extraction effect on urban morphological features of medium-sized cities. The minimum TED and minimum TED were 3.03% and 23.69%, the average TED was 13.74%, and the urban perimeter similarity degree was higher than 85%, which demonstrates that the network extracts the boundaries of medium-sized cities well. Binzhou, Xiangtan, and Yuxi have high TED, which may be related to the size of the convolution kernel of the front-end structure in the IFCN. Compared with medium-sized cities, the segmentation effects of small cities are relatively poor, accompanied by individual cities with extremely poor segmentation effects (The SID of Lushui is 60.04%, and the TED of Lushui is 445.53%). The SID of small cities reached 13.99% and the TED reached 18.55%, which is higher than that of medium-sized cities. While the similarity degree between real results and extraction results are still higher than 80%, which shows that IFCN has better semantic segmentation capacity of small urban construction land.
5 Discussion
5.1 Model generalization capacity
To inspect the generalization capacity of IFCN in the context of low-resolution images, we selected one mega-city (Qingdao), one large city (Harbin), one medium city (Sanya) and one small city (Lhasa) as the objects of network generalization capacity verification, which are not included in the training sample. This paper takes the municipal districts of four cities as the validation scope, the generalization ability of network is verified by F1-score method, the results are shown in Tab.4. The prediction ratios of the four cities are all higher than 90%, with low Recall ratios, and Harbin’s Recall ratio is lower than 80%. The pixels of construction land misclassification into non-construction land are much larger than those of non-construction land misclassification into construction land. F1-score value of the four cities reached 92.25%, 83.47%, 85.51%, 85.51%, respectively, indicating that the network has good generalization ability.
The segmentation results are superimposed on the original nighttime light image for intuitive presentation, where the light red layer represents the result of the segmentation. The original images distinguish construction land and non-construction land through a luminous radiation intensity value; the higher the luminous radiation intensity value of the area, the more likely it is to be a city. The extraction results of construction land for cities are shown in Fig.8. The construction land segmentation results for each city are highly matching with the distribution area of the high luminous radiation intensity value of the original image, and the boundary shape of construction land is coincident with the original image.
5.2 Model performance
This paper discusses the mechanism to improve IFCN accuracy. Take one city as the detection object and input it into the trained model. Three common FCNs were selected as comparison objects, and the same training samples were input to three networks for the same parameter training. After an identical number of iterations, the loss values of FCN32s, FCN16s, FCN8s, IFCN were 3302, 3102, 698, and 421, respectively, and the mean IU of FCN32s, FCN16s, FCN8s, and IFCN were 0.8913, 0.9377, 0.9721, and 0.9758, respectively. The segmentation results are shown in Fig.9. FCN32s and FCN16s performed poorly for identifying the boundary of construction land, the urban boundaries are rough, which is significantly misclassified. FCN8s had fewer misclassification cases, however, the connectivity of the segmentation results is poor, and the detection of urban boundaries is weak. Common FCNs are not suitable for semantic segmentation under low-resolution-single-band satellite images. Compared with common FCNs, the superiority of the IFCN is attributed to the emphasis on the shallow information module, the improved model retains the characteristics of urban suburbs and predicted the shape of urban boundaries well, which has prominent performance in pixel-level classification of construction land and non-construction land.
Tab.5 shows representative papers extracting land cover through DL since 2018. The application of DL in land cover classification mainly focuses on the extraction optimization of artificial surface extraction such as cities, agricultural land extraction, and target detection, while urban extraction focuses on large city mapping and urban complex classification. In this paper, we sorted out the literatures from different spatial scales and remote sensing sources, analyzing the extraction accuracy of different networks. In general, the Kappa coefficients of each DL method is higher than 79%, F1 score is higher than 89%, OA is higher than 87%, while there is no significant correlation between classification complexity and classification accuracy. High resolution remote sensing image is considered as the main image source of deep learning at present, it improves the accuracy of network training, however, it enhances the difficulty of its application in large-scale land cover classification at the same time. The classification accuracy of IFCN is equivalent to that of other networks, reached the normal level of urban extraction models, which meets the requirements of global scale city classification, could be effectively applied to multi-temporal and rapid extraction of multi-scale urban construction land.
Fig.10 shows the final output heatmaps of the front-end network of four FCNs. Each model outputs 512 heatmaps, and each heatmap contains varying degrees of information extracted from the image. Compared with other FCNs, the heatmap output by the IFCN contains more valid maps, the characteristic information carried in each map is more balanced, indicating that the IFCN has preferable extraction capacity of sample information under the same sample background.
The optimized network structure improves the semantic segmentation capacity of the IFCN in the low-resolution image. Fig.11 shows the upsampling process of deconvolution layers of four FCNs. After four sampling times, the output heatmap of IFCN contains abundant feature information compared with common FCNs, the heatmap of FCN8s contains less detailed features compared with IFCN, the heatmaps of FCN16s and FCN32s are visually blurred. As pool2 contains spatial location information, the proposed IFCN significantly improves segmentation results by adding pool2 to the deconvolution structure, and thus retrieves more spatial details. The IFCN own more skip connections, which integrates deep semantic information with shallow appearance information to enhance the network segmentation capacity. The difference confirms the importance of skipping connections in semantic segmentation. The IFCN improves the capacity of obtaining detailed information on the deconvolution output heatmaps by combining the multi-flow learning process. Due to the smaller convolution kernel in the deconvolution layer and the reasonable clip offsets, as well as the parameter optimization of the shallow convolutional layer in the front-end network, the network context information is further analyzed in IFCN.
There are deficiencies to be improved. The application of a multi-pooling layer leads to the loss of valuable information, and ignores the correlation between the whole and the parts, which reduces the performance of network segmentation. Some new convolution structures provide inspiration to replace the pooling layer function to solve the network design problem. This paper continues to explore network design and sample selection methods to provide scientific technical methods for the classification of low-resolution and large-scale remote sensing images.
6 Conclusions
In this paper, an improved fully convolutional neural network model is applied for construction land extraction in the context of low-resolution remote sensing images, and taking the Yangtze River Economic Belt of China as experimental subject, 1017 VIIRS data samples were input to the network. This paper analyzed the extraction capacity of the network, verified the detection ability of the designed network for the boundaries of small and medium-sized cities, and revealed the optimization mechanism of the improved model. The results demonstrate that the IFCN has high precision in the classification of pixel-level objects, which is better in the discrimination of detailed features of small area urban boundaries and the segmentation of roads, parks, water bodies, and so on in comparison with traditional classifiers. The shape index and total edge length were utilized to verify the extraction results of IFCN on construction land in small and medium-sized cities and proved that the IFCN has preferable semantic segmentation capacity on construction land in small and medium-sized cities. The semantic segmentation capacity of IFCN is enhanced under low-resolution images with an improved network structure, which has better performance in extracting feature information and predicts urban and suburban features and urban edge forms better compared with traditional classifiers.
BarthR, IJsselmuidenJ, HemmingJ, Van HentenE J. ( 2019). Synthetic bootstrapping of convolutional neural networks for semantic plant part segmentation. Comput Electron Agric, 161: 291– 304
[2]
ChinaStatistical Yearbook ( 2019). China 2010 Population Census Data. Beijing: China Statistics Press
FloodN, WatsonF, CollettL. ( 1998). Using a U-net convolutional neural network to map woody vegetation extent from high resolution satellite imagery across Queensland, Australia. ITC J, 82: 101897
[7]
FuG, LiuC, ZhouR, SunT, ZhangQ. ( 2017). Classification for high resolution remote sensing imagery using a fully convolutional network. Remote Sens-Basel, 9( 6): 1– 21
[8]
FuY Y, LiuK K, ShenZ Q, DengJ S, GanM Y, LiuX G, LuD M, WangK. ( 2019). Mapping impervious surfaces in town–rural transition belts using China’s GF-2 imagery and object-based deep CNNs. Remote Sens (Basel), 11( 3): 280
[9]
GebrehiwotA, Hashemi-BeniL, ThompsonG, KordjamshidiP, LanganT E. ( 2019). Deep convolutional neural network for flood extent mapping using unmanned aerial vehicles data. Sensors (Basel), 19( 7): 1486
[10]
GongP, LiuH, ZhangM N, LiC, WangJ, HuangH, ClintonN, JiL, LiW, BaiY, ChenB, XuB, ZhuZ, YuanC, Ping SuenH, GuoJ, XuN, LiW, ZhaoY, YangJ, YuC, WangX, FuH, YuL, DronovaI, HuiF, ChengX, ShiX, XiaoF, LiuQ, SongL. ( 2019). Stable classification with limited sample: transferring a 30-m resolution sample set collected in 2015 to mapping 10-m resolution global land cover in 2017. Sci Bull (Beijing), 64( 6): 370– 373
[11]
GuoH N, ShiQ, MarinoniA, DuB, ZhangL P. ( 2021). Deep building footprint update network: a semi-supervised method for updating existing building footprint from bi-temporal remote sensing images. Remote Sens Environ, 264: 112589
[12]
HanZ M, DianY Y, XiaH, ZhouJ J, JianY F, YaoC H, WangX, LiY. ( 2020). Comparing fully deep convolutional neural networks for land cover classification with high-spatial-resolution Gaofen-2 images. ISPRS Int J Geoinf, 9( 8): 478
[13]
HeC Y, LiuZ F, GouS Y, ZhangQ F, ZhangJ S, XuL L. ( 2019). Detecting global urban expansion over the last three decades using a fully convolutional network. Environ Res Lett, 14( 3): 034008
[14]
HeD, ShiQ, LiuX P, ZhongY F, ZhangX C. ( 2021). Deep subpixel mapping based on semantic information modulated network for urban land use mapping. IEEE T Geosci Remote, pp( 99): 1– 19
[15]
HuY, ZhangQ, ZhangY, YanH. ( 2018). A deep convolution neural network method for land cover mapping: a case study of Qinhuangdao, China. Remote Sens (Basel), 10( 12): 2053– 2069
[16]
HuangB, ZhaoB, SongY. ( 2018). Urban land-use mapping using a deep convolutional neural network with high spatial resolution multispectral remote sensing imagery. Remote Sens Environ, 214: 73– 86
[17]
JeanN, BurkeM, XieM, DavisW M, LobellD B, ErmonS. ( 2016). Combining satellite imagery and machine learning to predict poverty. Science, 353( 6301): 790– 794
[18]
JiS P, ZhangC, XuA J, ShiY, DuanY L. ( 2018). 3D convolutional neural networks for crop classification with multi-temporal remote sensing images. Remote Sens (Basel), 10( 1): 75
[19]
JiS, WeiS, LuM. ( 2019). Fully convolutional networks for multisourcebuilding extraction from an open aerial and satellite imagery dataset. IEEE Trans Geosci Remote Sens, 57( 1): 574– 586
[20]
JiaYShelhamerEDonahueJKarayevSLongJGirshickR ( 2014). Caffe: Convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on multimedia
[21]
JiangBYaoX ( 2010). Geospatial analysis and modeling of urban structure and dynamics: an overview. In: Geospatial analysis and modelling of urban structure and dynamics. Dordrecht: Springer, 3– 11
[22]
KussulN, LavreniukM, SkakunS, ShelestovA. ( 2017). Deep learning classification of land cover and crop types using remote sensing data. IEEE Geosci Remote Sens Lett, 14( 5): 778– 782
[23]
LängkvistM, KiselevA, AlirezaieM, LoutfiA. ( 2016). Classification and segmentation of satellite orthoimagery using convolutional neural networks. Remote Sens (Basel), 8( 4): 329– 329
LiH, JiaY, ZhouY. ( 2018). Urban expansion pattern analysis and planning implementation evaluation based on using fully convolution neural network to extract land range. Neuroquantology, 16( 5): 814– 822
[26]
LiuS J, ShiQ, ZhangL P. ( 2021). Few-shot hyperspectral image classification with unknown classes using multitask deep learning. IEEE Trans Geosci Remote Sens, 59( 6): 5085– 5102
[27]
LiuS, DingW, LiuC, LiuY, WangY, LiH. ( 2018). Ern: edge loss reinforced semantic segmentation network for remote sensing images. Remote Sens (Basel), 10( 9): 1339– 1362
[28]
LiuT, Abd-ElrahmanA. ( 2018). An object-based image analysis method for enhancing classification of land covers using fully convolutional networks and multi-view images of small unmanned aerial system. Remote Sens (Basel), 10( 3): 457
[29]
LongJShelhamerEDarrellT (2015). Fully convolutional networks for semantic segmentation. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
MartinsV S, KaleitaA L, GelderB K, da SilveiraH L F, AbeC A. ( 2020). Exploring multiscale object-based convolutional neural network (multi-OCNN) for remote sensing image classification at high spatial resolution. ISPRS J Photogramm Remote Sens, 168: 56– 73
[32]
MbogaN, GrippaT, GeorganosS, VanhuysseS, SmetsB, DewitteO, WolffE, LennertM. ( 2020). Fully convolutional networks for land cover classification from historical panchromatic aerial photographs. ISPRS J Photogramm Remote Sens, 167: 385– 395
[33]
MiddelA, LukasczykJ, ZakrzewskiS, ArnoldM, MaciejewskiR. ( 2019). Urban form and composition of street canyons: a human-centric big data and deep learning approach. Landsc Urban Plan, 183: 122– 132
[34]
MohammadimaneshF, SalehiB, MahdianpariM, GillE, MolinierM. ( 2019). A new fully convolutional neural network for semantic segmentation of polarimetric sar imagery in complex land cover ecosystem. ISPRS J Photogramm Remote Sens, 151: 223– 236
[35]
NguyenT, HanJ, ParkD C. ( 2013). Satellite image classification using convolutional learning. In: Proceedings of the AIP Conference, Albuquerque
PanX, GaoL, MarinoniA, ZhangB, YangF, GambaP. ( 2018). Semantic labeling of high resolution aerial imagery and lidar data with fine segmentation network. Remote Sens (Basel), 10( 5): 743– 767
[38]
PerselloC, SteinA. ( 2017). Deep fully convolutional networks for the detection of informal settlements in VHR images. IEEE Geosci Remote Sens Lett, 14( 12): 2325– 2329
ShiC, PunC M. ( 2019). Adaptive multi-scale deep neural networks with perceptual loss for panchromatic and multispectral images classification. Inform Sciences, 490: 1– 17
[44]
ShiQ, LiuM, LiS, LiuX, WangF, ZhangL. ( 2021). A deeply supervised attention metric-based network and an open aerial image dataset for remote sensing change detection. IEEE Trans Geosci Remote Sens, 60: 1– 16
[45]
StarkT, WurmM, ZhuX X, TaubenböckH. ( 2020). Satellite-based mapping of urban poverty with transfer-learned slum morphologies. IEEE J Sel Top Appl Earth Obs Remote Sens, 13: 5251– 5263
[46]
TanY H, XiongS Z, YanP. ( 2020). Multi-branch convolutional neural network for built-up area extraction from remote sensing image. Neurocomputing, 396: 358– 374
[47]
TianYPeiKJanaSRayB (2018). Deeptest: automated testing of deepneural-network-driven autonomous cars. In: Proceedings of the 40th International Conference on Software Engineering
[48]
VizzariM, HilalM, SiguraM, AntognelliS, JolyD. ( 2018). Urban-rural-natural gradient analysis with CORINE data: an application to the metropolitan France. Landsc Urban Plan, 171: 18– 29
[49]
VolpiM, TuiaD. ( 2017). Dense semantic labeling of subdecimeter resolution images with convolutional neural networks. IEEE Trans Geosci Remote Sens, 55( 2): 881– 893
[50]
WagnerR, ThomM, SchweigerR, PalmG, RothermelA. ( 2013). Learning convolutional neural networks from few samples. In The 2013 International Joint Conference on Neural Networks (IJCNN), IEEE: 1– 7
[51]
WaldnerF, DiakogiannisF I. ( 2020). Deep learning on edge: extracting field boundaries from satellite images with a convolutional neural network. Remote Sens Environ, 245: 111741
[52]
WangQ, GaoJ Y, YuanY. ( 2018). Embedding structured contour and location prior in siamesed fully convolutional networks for road detection. IEEE Trans Intell Transp Syst, 19( 1): 230– 241
WengQ. ( 2001). A remote sensing of GIS evaluation of urban expansion and its impact on surface temperature in the Zhujiang Delta, China. Int J Remote Sens, 22( 10): 1999– 2014
[55]
WuHZhangHZhangJ FXuF J ( 2015). Fast aircraft detection in satellite images based on convolutional neural networks. In: 2015 IEEE International Conference on Image Processing, New York
[56]
WurmM, StarkT, ZhuX X, WeigandM, TaubenböckH. ( 2019). Semantic segmentation of slums in satellite images using transfer learning on fully convolutional neural networks. Int J Remote Sens, 150: 59– 69
[57]
XiJ Y, NgC N. ( 2007). Spatial and temporal dynamics of urban sprawl along two urban–rural transects: a case study of Guangzhou, China. Landsc Urban Plan, 79( 1): 96– 109
[58]
XuY, WuL, XieZ, ChenZ. ( 2018). Building extraction in very high resolution remote sensing imagery using deep learning and guided filters. Remote Sens (Basel), 10( 1): 144– 156
[59]
ZeilerM DFergusR ( 2014). Visualizing and Understanding Convolutional Networks. In: European Conference on Computer Vision. Charm: Springer
[60]
ZhangC, HarrisonP A, PanX, LiH, SargentI, AtkinsonP M. ( 2020a). Scale sequence joint deep learning (SS-JDL) for land use and land cover classification. Remote Sens Environ, 237: 111593
[61]
ZhangC, SargentI, PanX, LiH, GardinerA, HareJ, AtkinsonP M. ( 2018a). An object-based convolutional neural network (OCNN) for urban land use classification. Remote Sens Environ, 216: 57– 70
[62]
ZhangC, SargentI, PanX, LiH, GardinerA, HareJ, AtkinsonP M. ( 2019). Joint deep learning for land cover and land use classification. Remote Sens Environ, 221: 173– 187
[63]
ZhangC, YueP, TapeteD, ShangguanB, WangM, WuZ. ( 2020b). A multi-level context-guided classification method with object-based convolutional neural network for land cover classification using very high resolution remote sensing images. ITC J, 88: 102086
[64]
ZhangD JZhangJ SPanY ZDuanY M ( 2018b). Fully convolutional neural networks for large scale cropland mapping with historical label dataset. In: IGARSS 2018–2018 IEEE International Geoscience and Remote Sensing Symposium
[65]
ZhongY, FeiF, ZhangL. ( 2016). Large patch convolutional neural networks for the scene classification of high spatial resolution imagery. J Appl Remote Sens, 10( 2): 025006
[66]
ZhouW, MingD, LvX, ZhouK, BaoH, HongZ. ( 2020). SO–CNN based urban functional zone fine division with VHR remote sensing image. Remote Sens Environ, 236: 111458
[67]
ZhuX X, TuiaD, MouL, XiaG S, ZhangL, XuF, FraundorferF. ( 2017). Deep learning in remote sensing: a comprehensive review and list of resources. IEEE Geosci Remote Sens Mag, 5( 4): 8– 36
RIGHTS & PERMISSIONS
Higher Education Press
AI Summary 中Eng×
Note: Please be aware that the following content is generated by artificial intelligence. This website is not responsible for any consequences arising from the use of this content.