Research on the Perception Evaluation of Urban Green Spaces Using Panoramic Images and Deep Learning: A Case Study of Zhujiang Park in Guangzhou

Xukai ZHAO; Guangsi LIN

doi:10.15302/J-LAF-0-020024

PDF(8144 KB)

Landsc. Archit. Front. ›› 2024, Vol. 12 ›› Issue (6) : 7-18. DOI: 10.15302/J-LAF-0-020024

PAPERS

Research on the Perception Evaluation of Urban Green Spaces Using Panoramic Images and Deep Learning: A Case Study of Zhujiang Park in Guangzhou

Xukai ZHAO ,
Guangsi LIN

Author information +

History +

Highlights

	● Explores a convenient image collection and processing workflow using panoramic cameras for urban green spaces
	● Develops a deep-learning-based evaluation method for park landscape visual quality, enabling unbiased analysis
	● Applies quantitative computation and statistical approaches to rapidly identifying areas needing optimization measures by integrating subjective and objective evaluation metrics

Abstract

Visual quality assessment of urban green spaces is a major topic in landscape architecture research, yet traditional methods face limitations in practice. The rapid development of artificial intelligence and street-view big data offers opportunities for advancing green space perception studies. However, the lack of full street view image coverage of green spaces in China poses challenges for related research. Focusing on public landscape perception evaluation, this research took Zhujiang Park in Guangzhou, China as a case study. The research team utilized a convenient image collection method by panoramic camera and an effective processing workflow, and then employed the Segformer-B5 semantic segmentation model and the ViT-base-p16 image classification model to calculate four objective evaluation metrics (green view index, sky view factor, road visibility index, and artificial structure visibility index) and four subjective evaluation metrics (attractiveness, richness, naturalness, and depression) for visual quality assessment. Based on the spatial distribution results of these metrics, comprehensive analyses were conducted and low-score areas were identified. Research results indicate that vegetation and water features significantly enhance park attractiveness and positive perceptions, while excessive sky and artificial structures produce negative effects; oppressive artificial landscapes and constrained architectural views also lower overall landscape quality. The image collection and visual perception evaluation methods proposed in this study provide a scientific basis for the renovation and management of urban green spaces.

Graphical abstract

Keywords

Landscape Perception Evaluation / Visual Landscape Assessment / Panoramic Camera / Artificial Intelligence / Urban Green Space / Semantic Segmentation / Image Classification

Cite this article

EndNote

Ris (Procite)

Bibtex

Download citation ▾

Xukai ZHAO, Guangsi LIN. Research on the Perception Evaluation of Urban Green Spaces Using Panoramic Images and Deep Learning: A Case Study of Zhujiang Park in Guangzhou. Landsc. Archit. Front., 2024, 12(6): 7‒18 https://doi.org/10.15302/J-LAF-0-020024

1 Introduction

Urban green spaces—natural or semi-natural land uses within cities—are essential components of landscapes, offering residents a broad range of ecosystem services and opportunities for connection with nature and recreational activities^{^[1]}. Visual perception is one of the significant ways through which people perceive the environment^{^[2]}, and assessing the visual quality of urban landscapes—also known as visual landscape assessment—constitutes a major topic of landscape research^{^[3]}. Such assessments provide valuable insights for researchers and governments into urban landscape quality. Traditional visual landscape assessment methods primarily employ scenic beauty estimation^{^[4]} and questionnaire surveys^{^[5]}. Although these methods effectively collect people's preferences on specific landscapes, they still have numerous limitations: they heavily rely on costly manual judgments from experts or respondents regarding images, making the process labor and resource-intensive, complex to implement, and limited in data sources^{^[6]} ^{^[7]}. Furthermore, the complexity of real-world landscapes often makes it challenging to apply findings from generalized studies into new scenarios.

The recent development of artificial intelligence (AI) technologies offers solutions to address these issues. AI has demonstrated tremendous potential in research on intelligent built environments and is widely recognized as highly promising in the fields of sustainable smart cities and landscape planning^{^[8]} ^{^[9]}. Among these advancements, street view images (SVIs) have emerged as a new form of crowd-sourced data that offers a realistic depiction of urban environments and supports feedback based on genuine human perceptions, which serves as a high-quality data source for evaluating the visual quality of urban built environments^{^[6]} ^{^[10]}. Related research spans five key areas: landscape design and environmental assessment, thermal environment, neighborhood morphology, environmental perception of neighborhood, and socioeconomic factor analysis^{^[7]}.

With the increasing availability of satellites and coverage of street view services, satellite imagery and SVIs have become vital data sources for understanding large-scale urban landscapes. However, these sources also have certain limitations. For instance, satellite imagery fails to reflect the human-eye perception. Currently, most green spaces, communities, and educational institutions in many Chinese cities are not accessible to map service providers for image collection; certain roads also lack street view service coverage. Additionally, the lack of timely updates is a major shortcoming of publicly available data^{^[11]}. Consequently, scholars attempt to use wearable cameras, drones, or other devices to manually collect images as a supplement or replacement for SVIs. For instance, Yan Li et al. used GoPro cameras mounted on a car to collect street images of Xining City, China and developed a vacancy estimation model using object detection techniques to infer the storefront vacancy rate^{^[12]}. Similarly, Junjie Luo et al. employed drones to establish an oblique dataset for the river landscape visual evaluation of a section of the Grand Canal in Tianjin, China^{^[11]}. Despite these efforts, no studies have specifically focused on manually collecting images of parks as a substitute for SVIs. Given the fact that certain routes or terrains within parks (e.g., staircases, stepping stones) are unsuitable to collect images by riding or driving with a GoPro and drones cannot replicate human perspectives, further exploration of image collection devices and their application in visual landscape assessment is necessary.

Simultaneously, as quality of life improves, there is a growing demand for high-quality green spaces, requiring urban planners to accurately identify and improve low-quality areas within these spaces. However, as existing research is hard to be applied in practice, relevant assessments rely primarily on designers' personal experience and subjective determination, often overlooking the public's actual needs and preferences. AI technology has the potential to address this challenge by effectively simulating public perceptions and conducting visual evaluations on environmental images^{^[6]} ^{^[7]} ^{^[10]}. Nevertheless, AI-based public perception evaluation methods specific to parks have yet to be developed.

This study aims to establish an intelligent perception framework for urban green spaces based on urban park image collection and deep learning techniques. The goal is to enable rapid, accurate, and comprehensive evaluation of park visual quality, identify low-quality areas, and so as to inform spatial renewal and improvement plans. Specifically, this study focuses on the following questions. How to collect green space images more conveniently? How can AI-based algorithmic systems be developed to accurately reflect public perceptions and preferences of parks, thereby identifying low visual quality spaces? And in what ways can the perception evaluation results from such a system support theories related to visual landscape assessment? These explorations aim to promote the development of quantitative and evidence-based research on landscape perception and provide effective decision-making guidance for the renewal of urban green spaces.

2 Materials and Methods

2.1 Study Area

This study focused on Zhujiang Park, located in the Tianhe District of Guangzhou, Guangdong Province, China, which is an ecological park that integrates ecological, recreational, and cultural functions. It features diverse space types for various activities, covering an area of approximately 28 hm². The park is highly popular and serves as a representative green space in the subtropical region of China.

2.2 Technical Framework

First, this study adopted a convenient approach to collecting park images using a panoramic camera, and verified its feasibility with on-site operations. Subsequently, the Seformer-B5 model trained on the ADE20K dataset was used to automatically identify 150 categories of objects in the collected images and calculated four objective evaluation metrics: green view index (GVI), sky view factor (SVF), road visibility index (RVI), and artificial structure visibility index (ASVI). Additionally, four subjective evaluation metrics—attractiveness, richness, naturalness, and depression—were employed. A public perception dataset was established through pairwise comparisons of the images, classifying each image into high or low values for the four subjective dimensions. The ViT-base-p16 model was trained on this dataset to enable effective prediction of the subjective metrics. Next, the spatial distribution of both objective and subjective evaluation metrics was visualized, enabling the identification of areas associated with low-scoring images. Finally, correlations between objective and subjective metrics were analyzed to provide insights for park renovations (Fig.1).

Fig.1 Technical framework.

Full size|PPT slide

2.3 Data Collection and Processing

The images were collected on July 6, 2023, between 9:00 and 13:00, under clear weather conditions with temperatures around 30℃. A collector walked along all paths in the park, with the Insta360 ONE RS at a height of approximately 1.7 m. A handheld GPS sensor (Garmin eTrex 221x) recorded the location of each shooting point. According to previous experience, the research team captured images at road intersections, turning points, the midpoints of two turning points, and landmarks (e.g., buildings, pavilions, sculptures) to provide comprehensive visual information and ensure high efficiency. In this study, the interval distance between two collection points was no more than 40 m (about 50 walking steps). A total of 275 panoramic images were captured, all located along the centerline of the paths (Fig.2).

Fig.2 Image collection points.

Full size|PPT slide

Collected images were processed with Insta360 Studio and all were clear enough for the study. Then, the research team extracted perspectives at 0° and 180° in flat mode, generating 550 images that represent the surroundings at each point. The images were subsequently matched with GPS spatial data using ArcMap 10.6.

2.4 Deep Learning-based Image Evaluation Methods

2.4.1 Objective Evaluation Metrics Extraction With Semantic Segmentation Model

Physical elements in the environment (both natural and artificial) significantly influence the visual quality of landscapes and people's aesthetic perceptions. Semantic segmentation technology, a key technique for scene understanding, significantly improves the accuracy of identifying physical elements by pixel-level classification.

This study employed the SegFormer-B5 model^{^[13]}, recognized for its high accuracy, to extract objective physical elements. The model consists of a hierarchical Transformer encoder and a lightweight All-MLP decoder. The Transformer encoder extracts image features using a self-attention mechanism to weigh important areas, enhancing segmentation performance. The All-MLP decoder fuses multi-level features and predicts semantic segmentation masks, outputting results through a fully connected layer. The model was trained using the ADE20K dataset^{^[14]}, an open dataset for scene understanding released by MIT in 2016, which includes 150 element categories. Testing results show that the SegFormer-B5 model outperforms earlier models such as FCN, PSPNet, and DeepLabV3+^{^[7]}, as well as advanced models like FPN and UPerNet on the ADE20K validation set^①.

① Model comparison data are available on the OpenMMlab GitHub page.

From the 150 element categories, this research extracted 13 common visual elements in parks^②, and calculated GVI and SVF drawing from existing visual perception research^{^[15]}~^{^[17]}. Additionally, as Zhujiang Park has numerous roads and artificial structures (e.g., walls, benches, streetlights, and fences), this study introduced RVI and ASVI as metrics^{^[11]} (Tab.1).

② The 13 common visual elements in parks include wall, building, sky, tree, shrub, ground cover, first-class road, second-class road, third-class road, fence, skyscraper, bench, and streetlight.

Tab.1 Objective evaluation metrics

Dimension	Metric	Definition	Source
Natural	Green view index (GVI)	Proportion of pixels representing vegetation (tree, shrub, and ground cover)	Refs. [15^] ^{^[16]}
	Sky view factor (SVF)	Proportion of pixels representing sky	Ref. [17]
Artificial	Road visibility index (RVI)	Proportion of pixels representing road (first-class, second-class, and third-class roads)	Ref. [11]
	Artificial structure visibility index (ASVI)	Proportion of pixels representing artifacts (wall, building, fence, skyscraper, bench, and streetlight)	Ref. [11]

2.4.2 Subjective Perception Score Prediction With Image Classification Model

Traditional studies on subjective landscape perceptions often adopt methods like rating scales, pairwise comparisons, or categorization^{^[18]}. For instance, the Likert five-point scale requires respondents to rate images from 1 to 5. After obtaining scores, image classification models in deep learning can learn the relationship between scores and image features, simulating human perception process rating images from 1 to 5, enabling large-scale, rapid subjective perception evaluation. Existing studies mainly rely on large urban perception datasets. For example, the Place Pulse 2.0 dataset^{^[19]} includes over 110, 000 images from 56 cities and over 80, 000 online volunteers evaluated the images through pairwise comparisons across various dimensions to generate perception scores. This dataset has been used in subsequent research to train image classification models for subjective perception scoring prediction^{^[6]}. Relevant studies demonstrate that combining subjective visual surveys, image semantic segmentation, and image classification models can effectively and fairly collect and map street-level perceptions^{^[6]}. Although the dataset lacks data park-scene image data that can be directly applied into this study, its construction methods and subjective perception prediction approaches provide a foundation for the subjective evaluation of this study.

(1) Establishing subjective evaluation metrics

Drawing from traditional visual landscape assessment research^{^[20]}~^{^[24]}, four subjective evaluation metrics were selected: attractiveness, richness, naturalness, and depression. Attractiveness refers to the degree to which a park scene appeals to individuals, encompassing factors like beauty and uniqueness^{^[20]}. Richness reflects the diversity and complexity of park elements, including species and design elements^{^[20]} ^{^[21]}. Naturalness represents the balance between human intervention and the natural state in the perceived park environment, informing park maintenance and management strategies^{^[22]}. Depression measures the extent to which a park induces feelings of melancholy or discomfort^{^[23]}, often used to assess the impact of urban landscapes on physical and mental health^{^[24]}. Parks inducing high levels of depression may discomfort the visitors and negatively affect overall experience.

(2) Collecting pairwise comparison results

Compared with directly obtaining numerical ratings from participants, pairwise comparison is a more effective and accurate way to gather perception data^{^[19]}. First, to ensure the coverage of as many kinds of park scenes as possible, 550 photos were manually screened^③, and those with excessive similarity were excluded, leaving 200 valid photos. Next, the research team developed an online rating system using JavaScript, which dynamically adjusted the displayed images based on user selections and the existing relationships between images to ensure each photo had enough comparison times and valid ratings. In each comparison, two images were randomly selected from the 200 photos (Fig.3). Participants were asked to choose the image that better aligned with their preferences based on a question (e.g., "Which scene do you think exhibit more attractiveness/richness/naturalness/depression?"). Each participant performed four experiments, with each focusing on a single metric. To avoid fatigue, the number of comparisons in each experiment was limited to approximately 50 and kept less than 10 min. The experiment involved 35 master students, primarily majored in Landscape Architecture at South China University of Technology (12 males and 23 females), all of whom had no color blindness or color weakness. The experiment was conducted online over three days (March 3 ~ 5, 2024). On average, this research yielded 6, 702 pairwise comparison results across four metrics, with an average of 1, 675.5 results per metric.

③ Manual screening refers to image selection based on subjective perception and personal experience by the collector, without rigid quantitative criterion.

Fig.3 Subjective rating system based on pairwise comparison of images.

Full size|PPT slide

(3) Calculation of subjective evaluation metrics

Drawing on existing research^{^[25]}, this research used the "strength of schedule" method to statistically analyze the subjective ratings, obtaining high and low scores for each metric (Fig.4).

Fig.4 Examples of image scoring across the four metrics.

Full size|PPT slide

For the subjective evaluation metric m, this research defined the frequency of a given image i when being selected (W_{i, m}) and not being selected (L_{i, m}) as follow:

(1)

\begin{aligned} W_{i, m} = \frac{w_{i, m}}{w_{i, m} + l_{i, m}}, \end{aligned}

(2)

L_{i, m} = \frac{l_{i, m}}{w_{i, m} + l_{i, m}},

where w_{i, m}, and l_{i, m} represent the number of times the image being chosen, or not being chosen during comparisons.

The perception score (Q_{i, m}) of each image i for the evaluation metric m can be defined as follow:

(3)

Q_{i, m} = W_{i, m} + \frac{1}{n_{i}^{w}} \sum_{k_{1} = 1}^{n_{i}^{w}} W_{k_{1} m} - \frac{1}{n_{i}^{l}} \sum_{k_{2} = 1}^{n_{i}^{l}} W_{k_{2} m},

where n_i^w and n_i^l represent the total number of times image i being selected and not being selected, respectively. To further categorize the image scores Q_{i, m} into low and high value, the research team defined the following binary label W_{i, m}∈{0, 1}, where 0 represents a low score and 1 represents a high score:

(4)

W_{i, m} = {\begin{cases} 0 & i f Q_{i, m} > μ_{m} + σ_{m} \\ 1 & i f Q_{i, m} < μ_{m} - σ_{m} \end{cases},

where μ_m and σ_m represent the mean and standard deviation of the perception scores across all data for the evaluation metric m, respectively.

(4) Image classification model training

After the above calculations, each of the 200 images was assigned a value of "0" or "1" for all four metrics, the public perception dataset was formed. The image classification model used these values as labels and images as explanatory variables for training. This research employed the ViT-base-p16 model^{^[26]} for image classification. The ViT-base-p16 model divides input images into patches and treats each patch as a sequence element for input into a Transformer model. Using a self-attention mechanism, it weights important areas in the input images, effectively capturing important information. During the training, the ViT-base-p16 model was firstly pre-trained on the large-scale ImageNet-1k dataset to learn general representations of images. It was then fine-tuned on the public perception dataset for each of the metrics, resulting in four separate models for predicting attractiveness, richness, naturalness, and depression scores among all park images.

The performance of the model was evaluated using five-fold cross-validation. Specifically, the dataset of 200 images was randomly divided into five subsets. In each training iteration, four subsets were designated as the training set, and the remaining subset served as the validation set. The average accuracy across all five iterations was calculated to assess overall performance. The model with the highest accuracy was selected for scoring the subjective metrics. This approach ensured robustness of the training set while enhancing the model's generalizability to new data, enabling superior performance upon the limited sample size.

2.5 Integrated Evaluation of Subjective and Objective Metrics

The trained SegFormer-B5 and ViT-base-p16 models were employed to calculate both subjective and objective evaluation metrics for all 550 images. For each location, the average values of the two images were taken as the final scores. The score of these data points were visualized in ArcMap 10.6 to create spatial distribution maps of both objective and subjective metrics, identifying low score areas. Since the data did not conform to a normal distribution, Spearman correlation analysis was thus applied to examine the relationships between subjective and objective metrics with the major elements in the images that take a larger proportion, including vegetation (trees, shrubs, and ground cover) and park paths (first-class, second-class, and third-class roads).

3 Results and Discussion

3.1 Results of Objective Evaluation Metrics

Fig.5 shows examples of different landscape elements identified through semantic segmentation using the SegFormer-B5 model. Tab.2 summarizes the results of four objective evaluation metrics. Specifically, the average GVI was the highest (0.7115, with tree coverage at 0.3973, shrub coverage at 0.1691, and ground cover at 0.1450), indicating excellent vegetation conditions of the park, which is the main constituent of the park's landscape. The low average SVF (0.0737) corresponds to the high vegetation coverage, reflecting the dense tree canopy. The low averages of RVI and ASVI also reveal that the park is dominated by natural landscapes. The median values for these two metrics are close to their averages, indicating a relatively low coverage of roads and artificial structures. Moreover, the low SD suggests consistent internal park structures, contributing to a uniform visitor experience.

Fig.5 Examples of semantic segmentation results.

Full size|PPT slide

Tab.2 Results of objective evaluation metrics

Objective metric	Maximum	Average	Median	SD
GVI	0.9731	0.7115	0.7351	0.1630
SVF	0.2815	0.0737	0.0643	0.0557
RVI	0.4237	0.1236	0.1011	0.0910
ASVI	0.3787	0.0286	0.0127	0.0462

3.2 Training Results of Subjective Evaluation Metric Prediction Model

The distribution of five-fold cross-validation data and model prediction accuracy for the subjective evaluation metrics (Fig.6) shows that, though the model's accuracy fluctuated across different metrics, the overall trend was stable. The average prediction accuracies for the test set were 69% (attractiveness), 70.5% (richness), 82% (naturalness), and 68.5% (depression), demonstrating a high reliability.

Fig.6 Results of fivefold crossvalidation and model prediction accuracy.

Full size|PPT slide

The statistical results for subjective metrics (Tab.3) showed that naturalness had the highest mean value, indicating that the naturalness of Zhujiang Park was particularly prominent in human perception. This aligns with the semantic segmentation results. The range of naturalness was the largest (0.0443 ~ 0.8855), and the mean and median were close with the highest SD, reflecting significant spatial heterogeneity in vegetation distribution. Both attractiveness and depression also had relatively high mean values, suggesting that park scenes with high naturalness generally have strong appeal. However, excessively dense vegetation may increase feelings of depression. The SD for these two metrics were moderate and similar, indicating a relatively consistent variability across the sample. In contrast, the distribution of richness was more concentrated, with a lower SD and a narrower range (0.0732 ~ 0.5826), indicating relatively small differences in this metric. The lower mean value suggests that the diversity of visual elements in the park was relatively inadequate. This contrasts with the high variability in naturalness, indicating that while naturalness varies greatly across different scenes, the richness of visual elements is comparatively insufficient. This highlights a need to enhance landscape diversity in the park.

Tab.3 Statistical results of subjective evaluation metrics

Subjective metric	Minimum	Maximum	Average	Median	SD
Attractiveness	0.0783	0.7540	0.4285	0.4485	0.1476
Richness	0.0732	0.5826	0.2841	0.2724	0.0942
Naturalness	0.0443	0.8855	0.4303	0.3776	0.2021
Depression	0.1619	0.8710	0.3821	0.4120	0.1468

3.3 Integrated Evaluation Results of Objective and Subjective Metrics

Overall, the spatial distribution patterns of objective and subjective metrics in Zhujiang Park showed similarities (Fig.7, Fig.8). The lawn area in front of the west entrance of the park (Zone C), characterized by open lawns and short trees with sparse shrubs, has wide paths and a higher SVF, but lower scores in GVI and naturalness, as well as relatively low attractiveness. The Kuailv Lake area in the central part of the park (Zone E), despite low GVI and high SVF, exhibited high attractiveness. This aligns with previous findings that people generally prefer water features^{^[5]}. The scenic forest area in the eastern part of the park (Zone F) had high GVI and naturalness, making it a valuable asset in the bustling city center of Guangzhou. Its winding, undulating paths, combined with a low proportion of road and artificial structures, contributed to the overall high attractiveness. Some areas in the southwest part of the park have dense vegetation and diverse spatial variations, leading to higher richness. However, the variability of scenes between different points results in varying levels of attractiveness. The service buildings on the eastern side of the park (Zone G) has monotonous facades and low attractiveness, requiring special attention in park management.

Fig.7 Examples of park scenes.

Full size|PPT slide

Fig.8 Spatial distribution results of objective and subjective metrics.

Full size|PPT slide

Spearman correlation analysis (Fig.9) revealed a significant positive correlation between naturalness and attractiveness (r_s = 0.60), indicating that scenes with high naturalness are more favored by people. This finding aligns with previous research that visitors prefer environments with abundant vegetation. Such preferences may positively influence park usage frequency and visitor satisfaction^{^[27]}. The proportion of ground cover was significantly negatively correlated with richness (r_s = − 0.48), suggesting that an increase in ground cover may reduce the overall richness. In Zhujiang Park, areas with a high proportion of ground cover are primarily located in the western part of the park, characterized by open lawns, leading to lower spatial richness. Naturalness was significantly positively correlated with GVI (r_s = 0.71), proportion of tree (r_s = 0.47), and proportion of shrub(r_s = 0.46). GVI and naturalness represent subjective and objective ecological environment, respectively, but the perception of naturalness is influenced not only by vegetation proportions but also by other factors, such as the overall composition of green elements and the presence of additional materials in the images (e.g., water, soil or permeable pavements). Depression shows a significant positive correlation with both naturalness (r_s = 0.64) and proportion of shrub (r_s = 0.65), indicating the dense vegetation may evoke feelings of depression.

**Fig.9 Spearman correlation analysis results (^* indicates significant correlation at the 0.05 level, ^ indicates significant correlation at the 0.01 level).

Full size|PPT slide

Furthermore, SVF, RVI, and ASVI show positive correlations with each other and negative correlations with all four subjective perception metrics as well as GVI. This suggests that increases in the proportions of sky, roads, and artificial structures are associated with decreases in vegetation and naturalness. In Zhujiang Park, areas with higher proportions of sky, roads, buildings, walls, and benches, e.g., the children's play area in the northwest (Zone B)and the lawn area in front of the west entrance (Zone C), tend to have lower vegetation coverage, wide paths, and open spaces, and their attractiveness and naturalness are lower—compared with the scenic forest area in the eastern part that is densely vegetated—though the openness of these areas reduces feelings of depression.

4 Conclusions and Prospects

The European Landscape Convention emphasizes that landscapes are a vital public interest deserving recognition and protection^{^[28]}. Understanding how individuals observe and perceive landscapes and incorporating these insights into landscape planning and management is critical. This study adopts advanced image collection and AI technologies to develop a methodological framework for landscape research and practice centered on landscape perception. Overall, this study demonstrates three key contributions as follow.

1) This research implemented a convenient and efficient workflow by combining urban green space image collection by panoramic camera with advanced semantic segmentation and image classification models for unbiased assessments on park visual quality. This method overcomes limitations in traditional visual assessments such as inefficiencies in processing large volumes of images or fatigue in multi-scene evaluations, and validates the application of image big data and deep learning in landscape perception research.

2) Traditional visual assessment studies often rely on small-sized image datasets and lack accurate quantification of subjective and objective elements. This research precisely extracted and evaluated objective metrics and predicted scoring on subjective metrics, revealing that the presence of vegetation and water features enhances park attractiveness and stimulates positive perceptions. Conversely, higher proportions of sky, roads, and artificial structures are found to have negative effects.

3) Traditional research findings are often difficult to be applied directly to new scenarios' preference prediction. The intelligent method demonstrated in this paper can learn subjective scoring from a subset of scene images and predict scores for other new scenes, helping park managers efficiently identify low-scoring areas. This provides actionable guidance for urban green space renewal, demonstrating significant practical value.

Despite the contributions, this research has certain limitations. The image data and the number of participants were relatively limited, and the study focused exclusively on summer landscapes of Zhujiang Park, making it maybe difficult to generalize findings to parks of other types or in other seasons. Notably, the children's play area in the northwest had lower attractiveness, according to the research results, likely due to the preferences of the selected participants—university students—who may find areas characterized by low vegetation and richness less appealing. It also underlines a limitation of prior studies based on street view big data, which train models on generalized public preferences and fail to reflect the various needs of different user groups^{^[19]} ^{^[25]}. Additionally, panoramic camera images may have distortion, potentially affecting accuracy. Future studies should expand the dataset to include more diverse urban green spaces and seasonal landscapes; gather ratings from a broader range of users to improve green space perception datasets; and pay attention to the necessity of conducting preference surveys across diverse user groups.

Finally, the subjective and objective metrics extracted in this study could be integrated with other data, including park vitality, functional usage, and environmental quality, to further explore the relationships between factors including landscape attractiveness, user behavior patterns, and physical characteristics of a park. Such studies will support urban managers in systematic decision-making of developing more precise strategies, optimizing park functions, and improving urban landscape quality.

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Wolch, J. R. , Byrne, J. , & Newell, J. P. (2014) Urban green space, public health, and environmental justice: The challenge of making cities 'just green enough'. Landscape and Urban Planning, ( 125), 234– 244.

[2]	Daniel, T. C. (2001) Whither scenic beauty? Visual landscape quality assessment in the 21st century. Landscape and Urban Planning, 54 ( 1–4), 267– 281.

[3]	Gobster, P. H. , Ribe, R. G. , & Palmer, J. F. (2019) Themes and trends in visual assessment research: Introduction to the Landscape and Urban Planning special collection on the visual assessment of landscapes. Landscape and Urban Planning, ( 191), 103635.

[4]	Daniel, T. C. (1976). Measuring Landscape Esthetics: The Scenic Beauty Estimation Method. Department of Agriculture, Forest Service, Rocky Mountain Forest and Range Experiment Station.

[5]	Cai, K. , Huang, W. , & Lin, G. (2022) Bridging landscape preference and landscape design: A study on the preference and optimal combination of landscape elements based on conjoint analysis. Urban Forestry & Urban Greening, ( 73), 127615.

[6]	Zhao, X. , Lu, Y. , & Lin, G. (2024) An integrated deep learning approach for assessing the visual qualities of built environments utilizing street view images. Engineering Applications of Artificial Intelligence, ( 130), 107805.

[7]	He, N. , & Li, G. (2021) Urban neighbourhood environment assessment based on street view image processing: A review of research trends. Environmental Challenges, ( 4), 100090.

[8]	Sanchez, T. W. , Shumway, H. , Gordner, T. , & Lim, T. (2022) The prospects of artificial intelligence in urban planning. International Journal of Urban Sciences, 27 ( 2), 179– 194.

[9]	Cheng, Y. , & Fan, B. (2023) Digital landscape process. Chinese Landscape Architecture, 39 ( 6), 6– 12.

[10]	Biljecki, F. , & Ito, K. (2021) Street view imagery in urban analytics and GIS: A review. Landscape and Urban Planning, ( 215), 104217.

[11]	Luo, J. , Zhao, T. , Cao, L. , & Biljecki, F. (2022) Semantic Riverscapes: Perception and evaluation of linear landscapes from oblique imagery using computer vision. Landscape and Urban Planning, ( 228), 104569.

[12]	Li, Y. , & Long, Y. (2024) Inferring storefront vacancy using mobile sensing images and computer vision approaches. Computers, Environment and Urban Systems, ( 108), 102071.

[13]	Xie, E. , Wang, W. , Yu, Z. , Anandkumar, A. , Alvarez, J. M. , & Luo, P. (2021) SegFormer: Simple and efficient design for semantic segmentation with transformers. Advances in Neural Information Processing Systems, ( 34), 12077– 12090.

[14]	Zhou, B., Zhao, H., Puig, X., Fidler, S., Barriuso, A., & Torralba, A. (2017). Scene parsing through ADE20K dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 633–641). Computer Vision Foundation.

[15]	Qiu, W. , Li, W. , Liu, X. , Zhang, Z. , Li, X. , & Huang, X. (2023) Subjective and objective measures of streetscape perceptions: Relationships with property value in Shanghai. Cities, ( 132), 104037.

[16]	Song, Q., Li, W., Li, M., & Qiu, W. (2022). Social inequalities in neighborhood-level streetscape perceptions in Shanghai: The coherence and divergence between the objective and subjective measurements. Social Science Research Network.

[17]	Xia, Y. , Yabuki, N. , & Fukuda, T. (2021) Sky view factor estimation from street view images based on semantic segmentation. Urban Climate, ( 40), 100999.

[18]	Lange, E. , & Legwaila, I. (2012) Visual landscape research—Overview and outlook. Chinese Landscape Architecture, 28 ( 3), 5– 14.

[19]	Dubey, A., Naik, N., Parikh, D., Raskar, R., & Hidalgo, C. A. (2016). Deep learning the city: Quantifying urban perception at a global scale. Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I (pp. 196–212). Springer.

[20]	Sun, D. , Li, Q. , Gao, W. , Huang, G. , Tang, N. , Lyu, M. , & Yu, Y. (2021) On the relation between visual quality and landscape characteristics: A case study application to the waterfront linear parks in Shenyang, China. Environmental Research Communications, 3 ( 11), 115013.

[21]	Zhang, G. , Yang, J. , & Jin, J. (2021) Assessing relations among landscape preference, informational variables, and visual attributes. Journal of Environmental Engineering and Landscape Management, 29 ( 3), 294– 304.

[22]	Wartmann, F. M. , Stride, C. , Kienast, F. , & Hunziker, M. (2021) Relating landscape ecological metrics with public survey data on perceived landscape quality and place attachment. Landscape Ecology, ( 36), 2367– 2393.

[23]	"Depressing. " Oxford English Dictionary. Oxford University Press.

[24]	Gong, Y. , Palmer, S. , Gallacher, J. , Marsden, T. , & Fone, D. (2016) A systematic review of the relationship between objective measurements of the urban environment and psychological distress. Environment International, ( 96), 48– 57.

[25]	Zhang, F. , Zhou, B. , Liu, L. , Fung, H. H. , Lin, H. , & Ratti, C. (2018) Measuring human perceptions of a large-scale urban region using machine learning. Landscape and Urban Planning, ( 180), 148– 160.

[26]

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., & Houlsby, N. (2021). An image is worth 16x16 words: Transformers for image recognition at scale. International Conference on Learning Representations.

[27]	Talal, M. L. , Santelmann, M. V. , & Tilt, J. H. (2021) Urban park visitor preferences for vegetation—An on-site qualitative research study. Plants, People, Planet, 3 ( 4), 375– 388.

[28]	Council of Europe. (2000). Explanatory Report to the European Landscape Convention.

Acknowledgements

·Project of "A study on the Inclusive Design in the Recreational Place of Urban Green Space in Response to Passive and Active Exclusion, " National Natural Science Foundation of China (No. 52378054) ·Project of "Research on Green Space Supply Evaluation Methods Based on Public Perception, " Fundamental Research Funds for the Central Universities (No. CGPY202410) ·Project of "Research and Application of Deep Learning-Driven Park Perception Evaluation Methods, " South China University of Technology Step Climbing Program (No. j2tw202402095)

RIGHTS & PERMISSIONS

AI Summary AI Mindmap

PDF(8144 KB)

1700

Accesses

Citations

Detail

Sections

Recommended

Highlights
Abstract
Graphical abstract
Keywords
Cite this article
1 Introduction
2 Materials and Methods
2.1 Study Area
2.2 Technical Framework
Fig.1 Technical framework.
2.3 Data Collection and Processing
Fig.2 Image collection points.
2.4 Deep Learning-based Image Evaluation Methods
2.4.1 Objective Evaluation Metrics Extraction With Semantic Segmentation Model
Tab.1 Objective evaluation metrics
2.4.2 Subjective Perception Score Prediction With Image Classification Model
Fig.3 Subjective rating system based on pairwise comparison of images.
Fig.4 Examples of image scoring across the four metrics.
2.5 Integrated Evaluation of Subjective and Objective Metrics
3 Results and Discussion
3.1 Results of Objective Evaluation Metrics
Fig.5 Examples of semantic segmentation results.
Tab.2 Results of objective evaluation metrics
3.2 Training Results of Subjective Evaluation Metric Prediction Model
Fig.6 Results of fivefold crossvalidation and model prediction accuracy.
Tab.3 Statistical results of subjective evaluation metrics
3.3 Integrated Evaluation Results of Objective and Subjective Metrics
Fig.7 Examples of park scenes.
Fig.8 Spatial distribution results of objective and subjective metrics.
Fig.9 Spearman correlation analysis results (* indicates significant correlation at the 0.05 level, ** indicates significant correlation at the 0.01 level).
4 Conclusions and Prospects
References
Acknowledgements
RIGHTS & PERMISSIONS

Received	Accepted	Published
02 Nov 2023	05 Apr 2024	15 Dec 2024
Just Accepted Date	Issue Date
16 Aug 2024	27 Dec 2024

About the journal

Browse

Authors & reviewers

Highlights

Abstract

Graphical abstract

Keywords

Cite this article

1 Introduction

2 Materials and Methods

2.1 Study Area

2.2 Technical Framework

Fig.1 Technical framework.

2.3 Data Collection and Processing

Fig.2 Image collection points.

2.4 Deep Learning-based Image Evaluation Methods

2.4.1 Objective Evaluation Metrics Extraction With Semantic Segmentation Model

Tab.1 Objective evaluation metrics

2.4.2 Subjective Perception Score Prediction With Image Classification Model

Fig.3 Subjective rating system based on pairwise comparison of images.

Fig.4 Examples of image scoring across the four metrics.

2.5 Integrated Evaluation of Subjective and Objective Metrics

3 Results and Discussion

3.1 Results of Objective Evaluation Metrics

Fig.5 Examples of semantic segmentation results.

Tab.2 Results of objective evaluation metrics

3.2 Training Results of Subjective Evaluation Metric Prediction Model

Fig.6 Results of fivefold crossvalidation and model prediction accuracy.

Tab.3 Statistical results of subjective evaluation metrics

3.3 Integrated Evaluation Results of Objective and Subjective Metrics

Fig.7 Examples of park scenes.

Fig.8 Spatial distribution results of objective and subjective metrics.

Fig.9 Spearman correlation analysis results (* indicates significant correlation at the 0.05 level, ** indicates significant correlation at the 0.01 level).

4 Conclusions and Prospects

{{custom_sec.title}}

{{custom_sec.title}}

References

Acknowledgements

RIGHTS & PERMISSIONS

**Fig.9 Spearman correlation analysis results (^* indicates significant correlation at the 0.05 level, ^ indicates significant correlation at the 0.01 level).