Comparison and Applicability Study of Analysis Methods for Social Media Text Data: Taking Perception of Urban Parks in Beijing as an Example

Zhenyu SHANG; Kexin CHENG; Yuqing JIAN; Zhifang WANG

doi:10.15302/J-LAF-1-020083

Landsc. Archit. Front. ›› 2023, Vol. 11 ›› Issue (5) :8 -21. DOI: 10.15302/J-LAF-1-020083

PAPERS

Comparison and Applicability Study of Analysis Methods for Social Media Text Data: Taking Perception of Urban Parks in Beijing as an Example

Author information +

History +

PDF (2806KB)

Abstract

The booming Internet technology and media have generated large sets of social media data, with which the social sensing analyses based on users' reviews have become a research hotspot and have been increasingly applied in the study of urban park usage and perception. However, most existing studies adopt a single model for text data processing. To fill this gap, this study aims to compare social media text data analysis methods and assess their advantages, disadvantages and applicability in park perception research. The Lexicon-based classification analysis model (lexicon model) and LDA (Latent Dirichlet Allocation) model widely used in relevant research were selected. Based on text data obtained from public reviews of 10 urban parks in Beijing on Dianping, this study explored the perception topic distribution of each park and all parks in general, and compared the classification results of perception topics between these two models. Results show that the lexicon model is conducive to the parallel comparison of perception frequency between parks, while the LDA model can directly reflect each park's characteristics and visitors' perception preferences; the combined use of the two models can optimize park perception assessment. Results from the two methods reveal that visitors to urban parks in Beijing focused more on their social recreation needs and visual aesthetics brought by the natural landscape, as well as conditions of the transportation facilities and the consumption in the parks. This research can provide optimization suggestions for the selection and use of social media text analysis methods, and a basis and guidance for park construction and management improvement.

● Exploring the advantages, disadvantages, and applicability of two text analysis models

● The lexicon model is more suitable for parallel comparison between perceived objects by users

● The Latent Dirichlet Allocation (LDA) model can better capture the characteristics of each individual perceived object

● Taking advantage of the two models’ strengths is vital for optimizing landscape perception assessment

Graphical abstract

Keywords

Social Sensing / Text Analysis / Lexicon / Latent Dirichlet Allocation (LDA) / Urban Park / Landscape Perception

Cite this article

Download citation ▾

Zhenyu SHANG, Kexin CHENG, Yuqing JIAN, Zhifang WANG. Comparison and Applicability Study of Analysis Methods for Social Media Text Data: Taking Perception of Urban Parks in Beijing as an Example. Landsc. Archit. Front., 2023, 11 (5) : 8-21 DOI:10.15302/J-LAF-1-020083

登录浏览全文

4963

注册一个新账户忘记密码

1 Introduction

With the rapid development of Internet technology, people socialize more and more frequently through online media, while generating a huge amount of information which provides the data base for the study of social sensing^[¹^] ^[²^]. This type of research focuses on analyzing people's perceptions of spaces, as well as human mobility patterns and social relations between individuals by mining information on human behavioral characteristics contained in big data^[³^] ^[⁴^]. With the growth of social media data in recent years, there has been a gradual increase in research on analyzing geospatial sentiment perceptions^[⁵^] ^[⁶^]. The social media data can be mainly divided into three categories: check-in data, image data with geolocation, and text data; and the methods commonly used in early research were the analysis of arrival rate and motivational preference identified by the check-in data^[⁷^]^~^[¹⁰^], as well as the perceived sentiment analysis based on image data contents and their geolocation^[¹¹^]^~^[¹⁵^]. In recent years, as the intuitive expression of sentiment by text data has been gradually recognized, there has been increasing research on perception analysis through text data mining^[¹⁶^]^~^[¹⁹^]. For instance, in social perception analysis research, objects mainly include public sentiment on hot topics and response to risks and disasters^[²⁰^]^~^[²²^], as well as the perception of using public facilities, especially the post-occupancy perception of scenic areas^[²³^]^~^[²⁶^] and urban green spaces^[²⁷^]^~^[²⁹^]. As a type of important public open space, urban parks provide residents with services such as access to nature, recreation, relaxation, and leisure^[³⁰^] ^[³¹^]. Thus, researchers are attaching more attention to studying park perception through text data, i.e., analyzing the post-occupancy perception of visitors to guide and lay the groundwork for urban park construction and renewal.

Methods to analyze social media text data typically include word frequency analysis and semantic analysis^[³²^]^~^[³⁴^]. The advancement of text mining technology has made it possible to build text analysis models to explore the internal laws and topics in text data, and topic models have become the basis for perception analysis and satisfaction evaluation. Commonly used topic models for text analysis include the Lexicon-based classification analysis model ("lexicon model" hereafter)^[³⁵^], K-means model^[³⁶^] ^[³⁷^], Latent Dirichlet Allocation (LDA) model^[⁵^], Naive Bayes model^[³⁸^] ^[³⁹^], Linear Regression and Logistic Regression models^[⁴⁰^], Random Forest and Decision Tree models^[⁴¹^], etc. Most existing studies tend to adopt a single model for text data processing in perception analysis without exploring the advantages and disadvantages of different models and their applicability.

This study aims to compare social media text data analysis methods and reveal their applicability in park perception research. Since the lexicon model and LDA model are widely used in research on the perception of scenic spots and urban parks, they were chosen for comparison. Most lexicon models first semantically analyze high-frequency words in the text obtained, then establish a corresponding lexicon according to an existing standard system to classify different words and expand the lexicon to make it more complete, and finally further classify and analyze the text content according to the optimized lexicon^[⁴²^]^~^[⁴⁴^]. The LDA model is a machine learning-based model, mainly used for topic extraction and classification in text analysis^[⁴⁵^]^~^[⁴⁹^]. This study focuses on following questions: when analyzing social media texts related to park perception, what are the differences between the process and analysis results of lexicon model and LDA model? What are the advantages and disadvantages of the two models? On this basis, we further explore approaches to utilizing both models to provide guidance for urban park planning to summarize the applicability of text analysis methods in park perception research.

2 Data Processing and Research Methods

2.1 Study Area and Data Sources

Covering an area of approximately 16,410 km², Beijing has a permanent resident population of 21.89 million by 2020, 1,050 parks of various types, and a total park green space area of 357.2 km².^[⁵⁰^] As a super first-tier city with rapidly advancing Internet technology, its residents frequently use social media, providing mass data for this study.

Dianping was chosen as the source of text data. It is one of the highly influential social review platforms in China with a large number of reviews, and the number of active users is increasing year by year. Meanwhile, the growing active participation of users enhances the accuracy of the review data^[⁵¹^]. This study used the Request module in Python to obtain all the text review data and reviewer information from April 2006 to September 2020 in the catalogue of Beijing parks on Dianping and selected the top 10 urban parks, ranked by the number of reviews, as the objects of study (Tab.1).

To ensure the accuracy of the model analysis, the study pre-processed the contents of the acquired text data. Firstly, delete short reviews^[⁵¹^] with less than 50 characters. After that, Beijing Garden Expo Park had the least reviews (6,531 pieces), based on which we randomly selected the same number of reviews for the other 9 parks using the SPSS software and finally obtained a total of 65,310 reviews.

In this study, jieba (a Chinese word segmentation tool) in Python was used to process data. Compared with other similar tools, jieba has the advantage of generating a customized lexicon, resulting in more accurate and effective word segmentation, and the adaptation of language environment^[⁵¹^]. In the next step, we cleansed the data by filtering out meaningless symbols and words^[⁴⁸^], utilizing a lexicon of Chinese stop words that combined and deduplicated words from several lists including the HIT (Harbin Institute of Technology) Stop Word List. Considering the semantically similar Chinese words, the HIT-CIR Tongyici Cilin (Expanded) was used to substitute synonyms in the segmentation results to improve the accuracy and processing efficiency of the model^[⁴⁶^]. Finally, a manual screening was conducted to adjust the segmentation and synonym replacement results based on the actual use of the park and the perception contents. In this study, there were instances of inappropriate synonym substitution, such as replacing "cherry, " "daffodil, " or "begonia" with "chamomile tablet, " and "Haidian District" or "Chaoyang District" with "Baiyun District." In this case, we kept the original words and deleted the substitutions.

2.2 Research Methods

2.2.1 Lexicon-based Perception Topic Classification Model

This study adopted the model for classifying and evaluating landscape service-based urban park perception topics proposed by Zhifang Wang et al. in 2021, as its validity has been proven and its overall performance was excellent^[⁴²^]. Landscape service research focuses on the comprehensive effects of landscape patterns and functions, and the spatial process and relationship between service providers and demanders. Thus, considering that parks are a type of important urban green space landscape, this model can effectively reflect tourists' perceptions and evaluations of the parks^[⁴²^]. In this study, we conducted a structured processing of the pre-processed text data with Python and extracted high-frequency words, manually classified these words to build a Chinese lexicon for landscape service perceptions, expanded the lexicon both using the Word2vec word embedding model and manual additions^[⁵²^], and finally classified each word into different perception topics. According to existing literature^[⁵³^]^~^[⁵⁷^], 9 topic classifications of urban park perception categorized by landscape services were identified (Tab.2)^[⁴²^].

The next step was to match the obtained text data of reviews related to park perception with the lexicon to identify words used in the reviews, then extract the perception topics covered in each review, and finally calculate the perception frequency of each topic. The frequency was determined by the ratio of the number of reviews related to a perception topic to the total number of reviews for a park^[⁴²^]:

(1)

F i = N i / M,

where F_i is the perception frequency of topic i, N_i is the number of reviews for a park mentioning contents of topic i, and M is the total number of reviews of the park.

2.2.2 LDA-based Perception Topic Classification Model

As an unsupervised language processing model that automatically analyzes texts, LDA quickly extracts topics from unstructured texts (i.e., documents) to realize the dimensionality reduction of documents^[⁵⁸^]. In this model, a document consists of multiple topics in a certain probability distribution. Similarly, each topic consists of multiple words in a certain probability distribution. The larger the probability value is, the more closely related the set and its components are^[⁵⁹^] ^[⁶⁰^]. The LDA model calculates probability distributions of both "document–topic" and "topic–word, " so as to classify document topics and corresponding words (keywords).

This study utilized the gensim toolkit of Python to invoke the LDA model for topic analysis of the text data. Determination of the number of topics (K value) needs to consider the granularity of the topic, the interpretability of the topic content, as well as whether it is convenient for comparative analysis. In this study, we first calculated the Coherence score of different numbers of topics, which can effectively represent the degree of similarity between keywords in a topic—a higher Coherence score indicates that the model is more effective in analyzing this number of topics^[⁶¹^]^~^[⁶³^]. Then, topics with high coherence scores were manually selected to determine the appropriate number of topics conducive to desired modeling results. After this, the actual weight (i.e., the perception frequency) of each topic was calculated as follows.

1) Determine the number of topics as K, and the total number of reviews as N;

2) Calculate the expected probability F₀ of K topics in each review, i.e., F₀ = 1/K;

3) Obtain the actual probability of the jth topic in each review as F_j by the LDA model (j = 1, 2, 3, ..., K), and compare the values of F_j and F₀;

4) Count the number of reviews in which F_j > F₀ as A_j;

And 5) obtain the actual weight of the jth topic as Q_j = A_j/N.

Based on the analysis results, the topics of each park were named by three researchers specializing in landscape architecture, considering both the keywords and the corresponding high-weight review text. Meanwhile, "noisy" topics were removed due to their low weight and weak correlation of content.

2.2.3 Correlation Analysis of Topic Distribution

The study conducted a correlation analysis on the distribution of varied perception topics obtained from the two models. The distribution of these topics in each review text is a dichotomous variable, with results of "yes" ("1") or "no" ("0"). Thus, we calculated the Phi coefficient in SPSS for the correlation test, mainly utilizing the 2-by-2 contingency table of binary variable values. As shown in Fig.1, when the values are mostly distributed on the main diagonal, it means that the correlation between different variable distributions is high and the coefficient can be calculated by equation (2):

(2)

Φ = N 11 N 00 − N 10 N 01 (N 11 + N 10) (N 11 + N 01) (N 10 + N 00) (N 01 + N 00),

where X and Y denote values of the two variables ("1"/"0"), N₁₁, N₁₀, N₀₁, and N₀₀ for numbers counted for different values of the variables, and Φ for the correlation coefficient of the two variables' distribution. A module related to the Phi coefficient in SPSS was used for data analysis. When the significance level is less than 0.05 and the Φ value approaches 1, it indicates a stronger correlation between the two topics.

2.2.4 Semantic Analysis of Topic Contents

This study utilized Python for word frequency analysis of the review texts and illustrated the high-frequency words via word clouds, where the size of the words indicates their frequency. These illustrations can effectively visualize the main contents of the selected review texts, while analysis of word frequency for review texts from each park can help reveal corresponding perception topics.

2.3 Technical Route

from Dianping, this study conducted text analysis with two types of models to explore the perception topic distribution of each park and all parks in general and compared the classification results of perception topics between these two models. The specific technical route is shown in Fig.2.

3 Research Results and Analyses

3.1 The Lexicon Model Facilitating Parallel Comparison Between Parks

Benefitting from manual presets, the Lexicon model covers relatively comprehensive and well-defined topics. Results of each park's perception analysis were confined to the lexicon contents, which is conducive to further interpretation of the results and the parallel comparison of perception frequency and differences between parks.

Statistical results of the topic classification using the lexicon model show significant differences between parks on visitors' perception frequency of different topics (Fig.3). The total perception frequencies of Yuanmingyuan Park (3.40) and Olympic Forest Park (3.40) were relatively the highest, while those of Beijing World Park (2.90) and Chaoyang Park (2.97) were the lowest. The largest difference between varied topics' perception frequencies existed in the Olympic Forest Park—0.88 for recreational activities and 0.02 for religion. In addition, among the perception topics of all the parks, recreational activities and aesthetic appreciation were comparatively more frequently perceived by visitors, while education and religion were less frequently perceived. Yuanmingyuan Park, Badachu Park, Jingshan Park, and Chaoyang Park showed a higher perception frequency in history and culture (0.68), religion (0.45), aesthetic appreciation (0.79), and social interaction (0.60) than other parks, respectively. Moreover, the topic of education in Yuyuantan Park and Badachu Park was less frequently perceived than in other parks.

3.2 The LDA Model Highlighting Each Park's Characteristics

The perception analysis results from the LDA model show significant differences between perception types of the 10 parks and reviews from social media highly reflected each park's landscape characteristics and visitors' perception preferences. After the review text data were processed by the LDA model, the appropriate number of topics for each park was determined according to their Coherence score (Tab.3), after which the topics were named considering their interpretability and "noisy" topics were removed. For instance, in the results of Beijing Garden Expo Park, we identified a topic as a noisy one as its keywords include "arrive soon, " "check, ""sun umbrella, " "turn left, " "kite festival, " "excellent, " "wait for the bus, " "Gate 3, " "fully, " "department, " etc., which contained very vague perception contents, and had a relatively low weight of 0.016. The final distribution of topics varied from park to park (Tab.4).

Tab.4 shows that there were mainly 8 or 9 topics perceived by visitors in each park. Yuanmingyuan Park, Yuyuantan Park, and Olympic Forest Park had the most topics, while Beijing World Park had the least. In terms of the perception contents, although variation existed among all the parks, some topics like transportation (including information on buses, subways, parking lots, etc.) can be found in most parks. In addition, together with the word frequency analysis (Fig.4, Fig.5), it can be seen that some perception topics were expressed differently depending on each park's characteristics, such as the spring cherry blossom landscape in Yuyuantan Park versus the autumn foliage landscape in Fragrant Hills Park. Moreover, topics related to festivals reflected characteristic perception results, such as the Spring Festival temple fair in Chaoyang Park, the band performance in Olympic Forest Park, and other types of gatherings.

An overall perception analysis of the review text data from all the 10 parks by the LDA model identified 10 topics. Among these topics, the perception frequencies of transportation and tickets (0.60), spring view (0.53), collective memory and perception (0.52), and social activities (0.48) were higher than that of hiking activities (0.30), cultural history (0.29), gatherings and performances (0.26), autumn view (0.20), religious culture (0.14), and featured constructions (0.11). This implies that visitors to urban parks in Beijing prioritized their social interaction needs and visual aesthetics brought by the natural landscape, as well as conditions of the transportation facilities and the consumption in the parks.

3.3 Similarities and Differences of the Lexicon Model and LDA

3.3.1 Similarities in Perception Topics Between the Two Models

As can be seen from the results of the overall perception analyses of the 10 parks, topics of recreational activities and aesthetic appreciation from the lexicon model, as well as topics of transportation and tickets, spring view, collective memory and perception, and social activities from the LDA model were most frequently perceived. It shows that visitors to these parks paid more attention to whether their own social recreation needs and aesthetic needs (by enjoying the natural landscape) were satisfied. Meanwhile, they cared about the status of transportation facilities and consumption in parks.

Correlation can be found from the distribution of the 9 perception topics used in the lexicon model and the 10 topics generated from the LDA model. The results of the correlation analysis of these topics between the two models are shown in Tab.5.

Among the perception topics obtained from the LDA model and the lexicon model, there were strong correlations between spring view and environmental improvement, biodiversity, recreational activities, and aesthetic appreciation; religious culture and history and culture and religion; hiking activities and religion; autumn view and aesthetic appreciation; social activities and recreational activities and social interactions; collective memory and perception and education; cultural history and history and culture, aesthetic appreciation, and education. In addition, the topic of physical and mental recovery in the lexicon model, as well as topics of transportation and tickets, featured constructions, and gatherings and performances from the LDA model had weak correlations with other perception topics (Fig.6).

Results from both models revealed visitors' special attention to the landscapes of nature and cultural history, as well as recreational activities. Besides, results from the LDA model reflected a comprehensive perception of different natural landscapes and sightseeing activities, such as seasonal landscape perceptions that include botanical landscapes, aesthetics, and excursion activities. Meanwhile, the LDA model classified recreational activities into more specific topics, such as gatherings and performances and hiking activities. Different from the clear classification of topics in the lexicon model, the LDA model generated topics with less distinct differences. For example, it might be difficult to effectively differentiate between visitors' leisure activities and appreciation activities.

3.3.2 Differences in Perceived Contents Across Topics Under the Two Models

Various types of visitors' perception topics were presented significantly different across parks under the two models. In terms of the results of individual parks, it can be found from Tab.2 and Tab.4 that a significant difference existed between visitors' perceived topic types obtained by the LDA model and those related to urban parks concluded from the literature review. Firstly, the perception topics extracted by the LDA model were distinct in different parks. For example, worship activities and hiking activities were only presented in one or two parks. Secondly, topics obtained by the LDA model were mostly those with high perception frequency, excluding the low-frequency ones. Thirdly, these finely classified topics explicitly represented characteristics of each park and covered more contents than what were included in the lexicon model. One example was that in Yuyuantan Park, cherry blossom and related activities were frequently perceived by visitors—for the topic of the cherry blossom festival, there was a review that "it's so crowded, affecting the viewing experience, " while for the topic of cherry blossom view, a visitor reviewed that "they are very beautiful; the wind blows and cherry blossoms fall." In contrast, the lexicon model was able to obtain all the given perception contents (Fig.3), even when the topics were perceived less frequently. However, the perception topics and contents covered were limited by the range of the lexicon, resulting in an emphasis on visitors' perception of landscape services that were manually selected and neglect of their perception of the surrounding environment and landscape elements. For example, transportation conditions and ticket prices were frequently seen in the results of the LDA model but never in that of the lexicon model.

4 Discussion

4.1 Advantages and Disadvantages of the Two Models in Analyzing Perception Texts for Parks

The comparative analysis shows that there was a significant difference in the classification of perception topics between the lexicon model and the LDA model. Specific advantages and disadvantages of the two models can be summarized based on the classification of park perception types, identification of perception contents, and the scope of model application (Tab.6).

4.2 Application Suggestions for Combining the Two Models

Based on the research results, a possible optimizing strategy is to expand lexicon contents of the lexicon model, including the perception topics and words identified by the LDA model. For example, the LDA model identified high-frequency words depicting perception contents like transportation facilities outside parks and tickets. These words can be added to the lexicon by Word2vec.

To improve their applicability, we may combine the two models, allowing for their respective characteristics and advantages. When carrying out perception analysis of parks at the regional scale, the lexicon model can be used to analyze the current situation and provide a basis for the construction, management, and improvement of the parks; then based on these results, we can select perception topics that require further assessment by the LDA model. For the perception analysis of individual parks, start by using the LDA model to identify the park's characteristics and items that draw visitors' attention, then conduct a more comprehensive analysis with a lexicon model optimized based on these results to identify any problems and propose corresponding suggestions for park improvement.

5 Conclusions

In recent years, research on social sensing analysis has paid more attention to the use of spontaneous reviews from big data, aiming to extract valuable information through semantic analysis. The accumulation of social media data and the continuous optimization of analysis methods have enriched the research contents of social sensing, better reflecting the sentimental and cognitive situation of users' interaction with space. Differing from the earlier studies that focused on scenic spots and tourist destinations^[⁶⁴^] ^[⁶⁵^], more and more current research surveys smaller-scale urban parks^[⁶⁶^] ^[⁶⁷^] utilizing a variety of methods for data analysis. However, there is a lack of comparative studies on these methods and exploration of their applicability. To fill the gap, this study employed two commonly used topic analysis models for text data, i.e. the lexicon model and the LDA model, to analyze the same research objects separately, explore the differences in the application of the two models in researching visitors' perception of urban parks, and finally clarify each model's strengths, weaknesses, and optimization paths. The results can not only guide the construction and management of urban parks but also provide a reference for relevant research on social perception through text analysis.

There are still some limitations in this study. In terms of data sources, the review text from Dianping provides little information for user profiles, making the analysis difficult to fully reflect visitors' perceptions of urban parks. In addition, the unsupervised LDA model cannot control the classification results. In response to this problem, there have been improved semi-supervised and supervised machine learning topic classification models^[⁶⁸^] ^[⁶⁹^], which need to be further explored. Finally, in addition to the two models studied in this research, there are many other text classification models based on big data and different algorithms, each with its advantages and disadvantages. Future research also needs to probe into these characteristics.

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Ferreira, A. P., Silva, T. H., & Loureiro, A. A. (2020) Uncovering spatiotemporal and semantic aspects of tourists mobility using social sensing. Computer Communications, (160), 240– 252.

[2]	Li, Y., Guo, J., & Chen, Y. (2022). A new approach for tourists' visual behavior patterns and perception evaluation based on multi-source data. Journal of Geo-information Science, 24(10), 2004– 2020.

[3]	Liu, Y., Liu, X., Gao, S., Gong, L., Kang, C., Zhi, Y., Chi, G., & Shi, L. (2015). Social sensing: A new approach to understanding our socioeconomic environments. Annals of the Association of American Geographers, 105(3), 512– 530.

[4]	Liu, Y. (2016). Revisiting several basic geographical concepts: A social sensing perspective. Acta Geographica Sinica, 71(4), 564– 575.

[5]	Mao, T., Wu, Y., & Huang, W. (2023). Content mining and sentiment analysis of online comments for ethnic museums in autonomous regions. Economic Geography, 43(8), 229– 236.

[6]	He, X. (2019). Research on Social Sensing and Spatiotemporal Pattern of Xiong'an New District Based on Weibo Data [Master's thesis]. Hebei Normal University.

[7]	Zhang, S., & Zhou, W. (2018). Recreational visits to urban parks and factors affecting park visits: Evidence from geotagged social media data. Landscape and Urban Planning, (180), 27– 35.

[8]	Donahue, M. L., Keeler, B. L., Wood, S. A., Fisher, D. M., Hamstead, Z. A., & McPhearson, T. (2018). Using social media to understand drivers of urban park visitation in the Twin Cities, MN. Landscape and Urban Planning, (175), 1– 10.

[9]	Li, F., Li, F., Li, S., & Long, Y. (2019). Deciphering the recreational use of urban parks: Experiments using multi-source big data for all Chinese cities. Science of the Total Environment, (701), 134896.

[10]	Liang, H., & Zhang, Q. (2021). Temporal and spatial assessment of urban park visits from multiple social media data sets: A case study of Shanghai, China. Journal of Cleaner Production, (297), 126682.

[11]	Van Berkel, D. B., Tabrizian, P., Dorning, M. A., Smart, L., Newcomb, D., Mehaffey, M., Neale, A., & Meentemeyer, R. K. (2018). Quantifying the visual-sensory landscape qualities that contribute to cultural ecosystem services using social media and LiDAR. Ecosystem services, (31), 326– 335.

[12]	Oteros-Rozas, E., Martín-López, B., Fagerholm, N., Bieling, C., & Plieninger, T. (2018). Using social media photos to explore the relation between cultural ecosystem services and landscape features across five European sites. Ecological Indicators, (94), 74– 86.

[13]	Richards, D. R., & Friess, D. A. (2015). A rapid indicator of cultural ecosystem service usage at a fine spatial scale: Content analysis of social media photographs. Ecological Indicators, (53), 187– 195.

[14]	Pan, Y., & Li, J. (2021). Landscape preference based on user-generated photograph metadata: The case of Xixi National Wetland Park. Natural Protected Areas, (1), 100– 108.

[15]	Zhu, X., Gao, M., Zhang, R., & Zhang, B. (2021). Quantifying emotional differences in urban green spaces extracted from photos on social networking sites: A study of 34 parks in three cities in northern China. Urban Forestry & Urban Greening, (62), 127133.

[16]	Wartmann, F. M., Acheson, E., & Purves, R. S. (2018). Describing and comparing landscapes using tags, texts, and free lists: An interdisciplinary approach. International Journal of Geographical Information Science, 32(8), 1572– 1592.

[17]	Yan, Y., Chen, J., & Wang, Z. (2020). Mining public sentiments and perspectives from geotagged social media data for appraising the post-earthquake recovery of tourism destinations. Applied Geography, (123), 102306.

[18]	Marcotte, C., & Stokowski, P. A. (2021). Place meanings and national parks: A rhetorical analysis of social media texts. Journal of Outdoor Recreation and Tourism, (35), 100383.

[19]	Bai, H., Song, Z., Liang, S., Zhang, P., & Zhang, G. (2023). Imagery perception analysis and comprehensive attraction evaluation of tourism destinations based on Internet text data—Taking Nanjing City as example. Areal Research and Development, 42(4), 89.

[20]	Zhao, Y., Pang, S., & Wu, Z. (2021). Research on geographic semantic ontology model based on social sensing data for emergency management of events. Information Science, (2), 44– 53.

[21]	Chen, Y., Gong, C., Fan, Y., Li, X., Liang, Y., & Hu, M. (2022). Spatio-temporal variation assessment of urban waterlogging in Zhengzhou using social media data. Journal of China Hydrology, 42(3), 26, 48– 52.

[22]	Li, S., Zhao, F., Zhou, Y., Tian, X., & Huang, H. (2022). Analysis of public opinion and disaster loss estimates from typhoons based on Microblog data. Journal of Tsinghua University (Science and Technology), 62(1), 43– 51.

[23]	Yang, B., & Zhang, J. (2017). Research on tourism image and perception of Tianmu Mountain based on network text analysis— Based on travel notes and comments of Ctrip. Journal of Fujian Forestry Science and Technology, 44(4), 118– 125.

[24]	Wang, X., & Xia, M. (2018). Research on tourist preference and satisfaction in Huangshan Scenic Spot based on network review data. Tourism Overview, (18), 59– 60.

[25]	Wight, A. C. (2020). Visitor perceptions of European Holocaust Heritage: A social media analysis. Tourism Management, (81), 104142.

[26]	Xu, Z., Dong, J., Chen, Z., Fu, W., Wang, M., & Dong, J. (2021). Image Perception of the historical ancient town scenic spot of Yunshuiyao. Journal of Chinese Urban Forestry, 19(2), 115– 120.

[27]	Park, S. B., Kim, J., Lee, Y. K., & Ok, C. M. (2020). Visualizing theme park visitors' emotions using social media analytics and geospatial analytics. Tourism Management, (80), 104127.

[28]	Widmar, N. O., Bir, C., Clifford, M., & Slipchenko, N. (2020). Social media sentimentas an additional performance measure? Examples from iconic theme park destinations. Journal of Retailing and Consumer Services, (56), 102157.

[29]	Wan, C., Shen, G. Q., & Choi, S. (2021). Eliciting users' preferences and values in urban parks: Evidence from analyzing social media data from Hong Kong. Urban Forestry & Urban Greening, (62), 127172.

[30]	Li, L., Zhang, C., Han, L., Qing, L., & Ji, H. (2021). Research on multi-scale evaluation system of parks based on comment text—Taking Chengdu parks as an example. Intelligent City, (2), 3– 6.

[31]	Jiang, Q., Wang, G., Liang, X., & Liu, N. (2022). Research on the perception of cultural ecosystem services in urban parks via analyses of online comment data. Landscape Architecture Frontiers, 10(5), 32– 51.

[32]	Jing, F., Sun, H., & Long, D. (2017). Tourist experience elements structure characteristics analysis of Xixi National Wetland Park based on web text. Journal of Zhejiang University (Science Edition), 44(5), 623– 630.

[33]	Wang, X., & Li, X. (2017). Research on the analysis of social services value of forest park in Beijing based on network big data. Chinese Landscape Architecture, (10), 14– 18.

[34]	Zhao, S., & Liu, B. (2019). Research on visitor perception of urban parks based on analysis of network text data—Take the main urban area of Nanjing as an example. 2019 Urban Development and Planning Proceedings( pp. 263−272). Chinese Society for Urban Studies.

[35]	Gao, X., Jin, Y., Wang, X., & Hao, J. (2021). Research on product perceptual evaluation method based on online review mining. Modern Manufacturing Engineering, (12), 13– 20.

[36]	Lu, X. (2014). Research on text clustering algorithm based on K-means. Computer Programming Skills & Maintenance, (24), 33– 35.

[37]	Wang, D., Li, J., & Shi, Y. (2020). Methods of government document clustering based on K-means algorithm. Software Guide, 19(6), 201– 204.

[38]	Ma, W., Chen, G., Li, X., Su, W., Chai, Y., Pu, Y., Zeng, J., & Liu, X. (2021). Chinese comment classification based on Naive Bayesian algorithm. Journal of Computer Applications, 41(S2), 31– 35.

[39]	Permana, F. C., Rosmansyah, Y., & Abdullah, A. S. (2017). Naive Bayes as opinion classifier to evaluate students satisfaction based on student sentiment in Twitter social media. Journal of Physics: Conference Series, (893), 012051.

[40]	Han, X., & Li, Y. (2022). Research on the influencing factors of social media rumor-refuting information dissemination effect in emergencies. Information Studies: Theory & Application, 45(8), 97– 103.

[41]	Zeng, Y., Li, Z., & Zhou, Y. (2020). Article feature extraction and flow control based on text mining. Electronic Technology & Software Engineering, (2), 176– 177.

[42]	Wang, Z., Miao, Y., Xu, M., Zhu, Z., Qureshi, S., & Chang, Q. (2021). Revealing the differences of urban parks' services to human wellbeing based upon social media data. Urban Forestry & Urban Greening, (63), 127233.

[43]	Wang, Z., Zhu, Z., Xu, M., & Qureshi, S. (2021). Fine-grained assessment of greenspace satisfaction at regional scale using content analysis of social media and machine learning. Science of the Total Environment, (776), 145908.

[44]	Zheng, T., Yan, Y., Zhang, W., Zhu, J., Wang, C., Rong, Y., & Lu, H. (2022). Landscape assessment on urban parks using social media data. Acta Ecologica Sinica, 42(2), 561– 568.

[45]	Taecharungroj, V., & Mathayomchan, B. (2019). Analysing TripAdvisor reviews of tourist attractions in Phuket, Thailand. Tourism Management, (75), 550– 568.

[46]	Dong, S., & Wang, Q. (2019). LDA-based tourist perception dimension recognition: Research framework and empirical research—Taking the National Mine Park as an example. Journal of Beijing Union University (Humanities and Social Sciences), 17(2), 42– 49.

[47]	Liang, C., & Li, R. (2020). Tourism destination image perception analysis based on the Latent Dirichlet Allocation model and dominant semantic dimensions: A case of the Old Town of Lijiang. Progress in Geography, 39(4), 614– 626.

[48]	Song, Y., Wang, R., Fernandez, J., & Li, D. (2021). Investigating sense of place of the Las Vegas Strip using online reviews and machine learning approaches. Landscape and Urban Planning, (205), 103956.

[49]	Zhou, W. (2021) Research on Tourism Destination Evaluation Based on Improved AHP of LDA: A Case Study of 5A Scenic Spots in Jiangxi Province. [Master's thesis]. Jiangxi University of Finance and Economics.

[50]	Beijing Statistics Bureau (2021). Beijing statistics yearbook. China Statistics Press.

[51]	Zhu, Z. (2020). An Assessment Framework of Green Space Satisfaction Using Social Media Data: Content Analysis with Machine Learning. [Master's thesis]. Peking University.

[52]	Wang, Z., Miao, Y., Zhu, Z., Zhou, J., & Wang, S. (2020). A method for landscape service identification of parks. (No. CN111310444A). China National Intellectual Property Administration.

[53]	Buchel, S., & Frantzeskaki, N. (2015). Citizens' voice: A case study about perceived ecosystem services by urban park users in Rotterdam, the Netherlands. Ecosystem Services, (12), 169– 177.

[54]	Huang, S., Pearce, J., Wen, J., Dowling, R. K., & Smith, A. J. (2020). Segmenting Western Australian national park visitors by perceived benefits: A factor-item mixed approach. International Journal of Tourism Research, 22(6), 814– 824.

[55]	Willemen, L., Verburg, P. H., Hein, L., & van Mensvoort, M. E. (2008). Spatial characterization of landscape functions. Landscape and Urban Planning, 88(1), 34– 43.

[56]	Sun, R., Li, F., & Chen, L. (2019). A demand index for recreational ecosystem services associated with urban parks in Beijing, China. Journal of Environmental Management, (251), 109612.

[57]	van Riper, C. J., Kyle, G. T., Sutton, S. G., Barnes, M., & Sherrouse, B. C. (2012). Mapping outdoor recreationists' perceived social values for ecosystem services at Hinchinbrook Island National Park, Australia. Applied Geography, 35(1−2), 164– 173.

[58]	Wang, J., Wang, M., & Du, B. (2019). A study of the change trend of social concern in the field of consumption in China—The LDA Model analysis based on the text of Daily Economic News List in People's Daily Online (2007—2017). Journal of Baoding University, 32(2), 41– 49.

[59]	Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, (3), 993– 1022.

[60]	Brandt, T., Bendler, J., & Neumann, D. (2017). Social media analytics and value creation in urban smart tourism ecosystems. Information & Management, 54(6), 703– 713.

[61]	Röder, M., Both, A., & Hinneburg, A. (2015). Exploring the Space of Topic Coherence Measures. Proceedings of the Eighth ACM International Conference on Web Search and Data Mining(pp. 399−408). Association for Computing Machinery.

[62]

Stevens, K., Kegelmeyer, P., Andrzejewski, D., & Buttler, D. (2012). Exploring Topic Coherence over Many Models and Many Topics. Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning(pp. 952−961). Association for Computational Linguistics.

[63]	Syed, S., & Weber, C. T. (2018). Using machine learning to uncover latent research topics in fishery models. Reviews in Fisheries Science & Aquaculture, 26(3), 319– 336.

[64]	Chen, Y., Zhu, Y., & Fu, G. (2022). Visitor perception toward outstanding universal value of Xinjiang Tianshan—Based on web text analysis. Special Zone Economy, 398(3), 124– 128.

[65]	Liu, Q., Wang, X., & Liu, J. (2022). Study on relationship among tourist perceived value, satisfaction and environmental responsibility behavior in forest park. Ecological Economy, 38(2), 137– 141.

[66]	Cao, K., & Chen, Y. (2021). Service evaluation of Shenzhen parks based on social data. Special Zone Economy, (4), 127– 129.

[67]	Ye, Y., & Qiu, H. (2022). Urban park image perception based on network text analysis. Journal of Chinese Urban Forestry, 20(1), 90– 95.

[68]	Han, D., Wang, C., & Xiao, M. (2018). Text categorization scheme based on semi-supervised learning and Latent Dirichlet allocation model. Computer Engineering and Design, 39(10), 3265– 3271.

[69]	Guo, X., Ding, J., Jiang, H., & Chen, Z. (2020). ZeroNet text content analysis based on semi-supervised LDA topic model. Information Technology, (3), 32– 38.

RIGHTS & PERMISSIONS

Higher Education Press 2023