1 Introduction
Rural tourism (RT) is experiencing increased popularity globally, driven by the accelerated pace of urban life and the growing detachment from agricultural roots (
Roberts et al., 2017). In the post-COVID-19 era, tourists are increasingly seeking health, fresh air, and greenery, enhancing the appeal of RT within the tourism industry (
Smith et al., 2019). Compared to congested urban environments, the expansive countryside and the self-sufficiency offered by sustainable agricultural products promote a healthy rural lifestyle (
Jia et al., 2021).
RT is an important industry for poverty alleviation, significantly boosting rural productivity, optimizing resource use, and providing developmental platforms (
Gao and Wu, 2017). During the 1970s, RT emerged as a crucial strategy for revitalizing rural economies and social structures in developed countries, a trend that gradually extended to developing nations (
Su et al., 2019). Research indicates that RT effectively enhances related service facilities (
Lin et al., 2017), promotes economic upgrading (
Sharpley, 2014), and supports sustainable rural development (
Dai et al., 2023). In China, rural areas operate as agricultural-based social systems (
Wang and Yotsumoto, 2019). The Chinese government’s 2018 rural revitalization policy (
Guo and Sun, 2016;
Liu, 2018) further highlights the potential of RT in poverty alleviation (
Li et al., 2021), particularly in rural areas adjacent to metropolitan cities.
Beyond examining the outcome of RT on rural development, scholars have focused on the spatial structure of RT. Methodologies have evolved from the early application of location theory to empirical studies on tourism destinations (
Li et al., 2019). However, most studies have concentrated on specific rural areas within policy frameworks, involving various administrative levels such as national, provincial and municipal (
Shen and Chou, 2022;
Wang and Li, 2022). Nevertheless, these studies often explore the macroscopic rural spatial distribution characteristics but overlook the human-centered behavior of tourists’ self-organized choice of rural tourism destinations. In addition, factors influencing rural tourism include service levels (
An and Alarcón, 2021) and local structures (
Streimikiene and Bilan, 2015).
Destination image (DI) represents the internalized understanding of a place, personalized and combined with the subjective logic of tourists. It reflects the dynamic interaction between objective destination attributes and subjective tourist logic (
Currie, 2020). Numerous scholars have constructed the Tourism Destination Image (TDI) model to study the components and formation processes. Among these, the TDI model based on cognitive and affective components has been extensively utilized and validated across different cultural contexts (
Molinillo et al., 2018). The cognitive image reflects a set of impressions of a destination, representing various resource attributes such as natural scenery, historical attractions, social culture, and accommodation-related infrastructure (
Iordanova and Stylidis, 2019). In contrast, affective image refers to the feelings tourists associate with the destination and its attributes (
Stylidis et al., 2017).
Social media (SM) serves as an open and interactive platform based on internet technology, enabling users to create and share DI-related information. It provides an impartial space for contact and communication among different groups and is a primary means for tourists to obtain travel-related information and make decisions (
Koo et al., 2016). The dissemination and sharing of user-generated content (UGC) have significantly increased, allowing tourists to transition from passive information receivers to active participants and even information evaluators (
Varkaris and Neuhofer, 2017). Shared travel experiences on social media significantly influence others’ willingness to visit a destination (
Liu et al., 2019), and can even change their destination choices (
Kim et al., 2015). Consequently, social media has become a common and trusted tool for tourists to select destinations and develop a comprehensive DI (
Zhou et al., 2023).
With growing competition in the global tourism industry and the crucial role tourism plays in the development of destinations, analyzing tourists’ future behavioral intentions has become a prominent research topic (
Chen and Tsai, 2007). In tourism research, behavioral intentions have historically served as a criterion for assessing the success of tourism destinations, and have been used as a dependent variable to study the relationship between DI and individual motivated behavior (
Tan and Wu, 2016). DI significantly influences behavioral intentions, often exerting a direct or indirect positive impact (
Chaulagain et al., 2019;
Kani et al., 2017). However, some studies have not found a relationship between DI and behavioral intentions (
Kock et al., 2016). Behavioral intentions are not actual behaviors but remain at the ideological level rather than the practical level of embedded tourism behavior and its subsequent development in destinations.
To date, no research has explored the relationship between DI and substantive tourism behaviors. Furthermore, spatial assessment and modeling of tourism behavior, as well as the exploration of multi-scale changes in factors affecting tourism behavior are also scarce. Considering the current popular demand for RT and the lack of research context on rural DI analysis, this study aims to fill that gap by innovatively applying MGWR to the spatial scale study of rural environments.
In this paper, we construct a dataset with 37 candidate variables based on the spatial and emotional image of rural tourism destinations, as well as the built environment of the rural background. We employed three global regression models (OLS, SLM, and SEM) to identify significant factors from the independent and control variables that are significantly correlated with the two dependent variables, namely, spatial capacity of rural tourism behavior (RTB) and resident duration. Subsequently, we applied two local regression models (GWR and MGWR) to capture the spatial scale effects of the motivational factors. we explore whether rural image and built environment have a significant dynamic effect on tourism behavior. To the best of our knowledge, this is the first study to apply MGWR to rural areas. By analyzing the dynamic factors and their spatial scale effects affecting the spatiotemporal characteristics of rural tourism behavior, it has high input benefits and strong practical significance for destination institutions and policymakers to implement optimization and management measures that meet the orientation of people-oriented needs.
2 Study area and data
2.1 Study area
Beijing and Tianjin, both significant economic regions in northern China, are municipalities and mega-cities experiencing rapid globalization and modernization. These cities are geographically adjacent, serving as major economic growth poles within the Beijing-Tianjin-Hebei urban agglomeration, and are key demonstration areas for technological innovation (
Mao, 2017). The convenient and efficient transportation between the two cities promotes the comprehensive development of the region, which is conducive to reflecting the representativeness of tourism research in the integrated rural areas of Beijing and Tianjin. The area is characterized by beautiful natural land-scapes and rich cultural heritage, forming a very different type of rural territory that shows a certain difference, making them the high-density areas for RT development. This regional attractiveness appeals to city residents for daily tourism activities. In the context of RT, the concept of rural identity in postmodernity has become increasingly ambiguous and complex, often including a blend of village and town ideologies (
Zhou, 2014). According to China’s Rural Revitalization Promotion Law, rural areas include both towns and villages. Compared to decentralized villages, towns serve as the basic administrative units with concentrated basic service facilities, reflecting the actual needs and experiences of urban residents in RT. Therefore, in this study, the term “rural area” refers to geographical spaces relative to cities, with townships serving as the basic research units.
From a data-driven perspective, this study extracts tourism units under the self-organized RTB of urban residents as research samples. These samples are based on the solid-state performance of rural areas and the dynamic population flow between urban and rural areas. The selection criteria for the former include tourist-oriented point of interest (POI) facilities in rural areas and the official measurement list of National Key Rural Tourism Towns. For the latter, cell phone signaling data is used, supplemented by comparing the rate of change in tourism activities between weekends and midweek, along with data structure thresholds of rural arrivals. Eventually, 116 study samples were identified in the Beijing-Tianjin rural areas (Fig. 1).
2.2 Data collection and measurement
This study examines RTB from two dimensions: spatial capacity and resident duration, using spatial and temporal data. Anonymous cell phone signaling data were obtained from Unicom, one of the three major communication operators in China. Considering that China lifted epidemic restrictions in February 2023, the travel impact of the epidemic should be eliminated as much as possible, taking into account the characteristics of suitable weather and high recreational demand of residents in Beijing and Tianjin in June. The time frame of the data covers from June 1 to June 30, 2023, including both weekdays and weekends. This data provided residents’ travel trajectories from the central urban areas of Beijing and Tianjin to rural areas, capturing the number of individual trips and their corresponding lengths of stay. Using PostgreSQL database software, an attribute table of the cell phone signaling data was established, followed by data pre-processing steps such as cleaning, filtering and screening. Subsequently, based on the extracted rural travel units, the population flow and length of stay of urban residents in rural areas during weekends were summarized, and the daily average number of people and length of stay were calculated as dependent variables.
Active characterization data sourced from questionnaires (
Han et al., 2021), GPS sensors (
Liu et al., 2022), and official statistics (
Kulshrestha et al., 2020) used in previous studies on travel mobility exhibit limitations, including small sample sizes, microspatial scales, and a high degree of subjectivity (
Domenech et al., 2020). These limitations result in low temporal and spatial accuracy. In contrast, cell phone signaling data offers the advantage of generating real-time, objective information about tourists with high spatial and temporal resolution (
Saluveer et al., 2020). This type of data has been validated for monitoring the spatial structure and patterns of tourism mobility (
Xu et al., 2021).
The databases for independent and control variables contained 37 variables, categorized into two types: DI and rural background characteristics (Table A.1 in Appendix A). Data for spatial and emotional images under DI were derived from images and texts posted by the public on the social media platforms Weibo and Little Red Book. Weibo, often referred to as “China’s Twitter”, is one of the largest social media platforms in China and serves as an effective source for vivid user expression (
Song and Wu, 2018). Similarly, Little Red Book has emerged as a popular social networking platform in recent years. We obtained data from 116 study samples via the application program interfaces (API) of both Weibo and Little Red Book. The data collection period spanned from 0:00 on January 1, 2023, to 23:59 on December 31, 2023, resulting in a total of 24,793 entries. After data cleaning and screening, 19,909 valid entries related to rural DI, along with 77,446 accompanying images, were retained.
Rural background characteristics include objective built environment and socio-economic attributes. The categorization of the objective built environment is based on the five elements proposed by Kevin
Lynch (1984), which were adapted to suit the research focus on rural tourism in this study, forming a novel type of variable relevant to the digital era. We utilized Python to collect POI data from AMap. Data on administrative boundaries and related road networks at all levels were sourced from OpenStreetMap (OSM) and AMap. Socio-economic attribute data were obtained from the 2022
China Statistical Year-book (Township) published by the National Bureau of Statistics.
It is notable that the independent variables under the DI were calculated using deep learning techniques (Fig. 2). To enhance the accuracy of the study, the raw data were manually screened by three seniors experts in the related research fields. Additionally, the data underwent calibration testing and cross-comparison among the scholars to ensure the reliability of the screening process (
Siegel et al., 2023). Following individual labeling and cross-verification by the three scholars, 191 rural tourism scenes were identified. Considering the uniqueness of rural areas, the proximity of scene types, and the similarity of tourists’ perceptions, all labels were consolidated into 12 rural spatial image categories.
A VGG-19 deep convolutional neural network model, involving 16 convolutional layers and 3 fully connected layers, was utilized to recognize and classify all image data (
Li et al., 2021;
Xu et al., 2024). The attitudinal tendencies characterized in texts posted by tourists were analyzed using deep learning algorithms from the Sentiment Analysis Module of the Baidu Cloud Intelligence Platform. The level of nostalgia exhibited in the countryside was assessed using the VGG-19 model, with a scale ranging from 1 to 5.
3 Methodology
As demonstrated (Fig. 3), it illustrates the procedural steps and analytical flow of the methodology. Global and local regression analyses were performed using five independent models.
3.1 Global regression modeling
3.1.1 Ordinary Least Squares (OLS)
Ordinary Least Squares (OLS) is a classical regression method that assumes a steady-state and constant relationship between variables across spatial dimensions (
Kashki et al., 2021). The OLS formula is as follows:
where
Y represents the dependent variable (e.g., spatial capacity and length of stay in tourism behavior);
ß0 is the intercept;
X1,
X2, …,
Xn are the explanatory variables influencing
Y (e.g., destination imagery and ontogenetic characteristics);
ß1,
ß2, …,
ßn are the corresponding regression coefficients; and
ε is the random error term.
OLS assumes that tourism behaviors across the study samples are independent and do not exhibit spatial variation (
Oshan et al., 2020). The Spatial Lag Model (SLM) and Spatial Error Model (SEM) are methodological variants of OLS that account for spatial weighting and dependence (
Ward and Gleditsch, 2018). This study further examines the potential presence of spatial dependencies by using SLM and SEM.
3.1.2 Spatial lag model (SLM)
The SLM accounts for spatial dependence between the dependent and independent variables, highlighting spatial spillover effects (
Ward and Gleditsch, 2018). The formula for SLM is
where
ß0 is the intercept;
ß is the regression coefficient of the explanatory variable
xi;
xi is the chosen explanatory variable;
ρ is the spatial lag parameter (spatial effect coefficient);
Wiyi is the spatial weight matrix; and
εi is the random error term.
3.1.3 Spatial Error Model (SEM)
The SEM assumes that OLS residuals exhibit spatial dependence. In SEM, the random error term in the OLS formula is decomposed into two terms,
λWiξi and
εi (
Chen et al., 2016). The SEM formula is
where
ß0 is the intercept;
ß is the regression coefficient of the explanatory variable
xi;
xi is the selected explanatory variable;
ξi represents the spatial component of the error;
λ denotes the level of correlation between these components; and
εi is the spatially uncorrelated error term.
3.2 Local regression modeling
3.2.1 Geographically weighted regression (GWR)
The previously discussed OLS, SLM and SEM all assume that the relationship between dependent and independent variables is spatially uniform. In contrast, rather than estimating global values of the regression parameters, GWR calculates localized parameters specific to each sample location, considering the geographical context (
Oshan et al., 2020). The GWR model is expressed as follows (
Fotheringham and Oshan, 2016):
where
yi is the value of the dependent variable in the study sample
i;
ßi0 is the intercept at the spatial location (
ui;
vi) where sample
i is located;
ßij describes the estimated weighted regression parameter of the variable under the
j DI and ontological characteristics in the sample
i;
m is the number of independent variables in the sample
i;
xij is the estimate of the regression parameter of the
j independent variable in sample
i;
εi is the random error term. The parameter estimates
ßij for each independent variable at sample
i can be expressed in matrix form:
where
is a vector of parameter estimation columns (
m × 1);
X is the matrix of selected independent variables (
n ×
m);
y is the vector of observed values (
m × 1) of the dependent variable (
Fotheringham and Oshan, 2016);
Wi is a diagonal matrix of spatial weights based on the distance of each observation from location
i.
3.2.2 Multi-scale geographically weighted regression (MGWR)
The bandwidth used in GWR is global and may not be optimal for all features, as different features may exhibit local relationship with the dependent variable on varying scale. To address this,
Fotheringham et al. (2017) proposed the MGWR method. MGWR incorporates spatially smoothed variables into the OLS and GWR models, allowing for different bandwidths for different variables (
Fotheringham et al., 2017). This approach improves the explanation of non-stationary spatial processes, reduces the influence of subjective factors, and decreased bias and covariance in parameter estimation (
Oshan et al., 2019). The MGWR model is formulated as follows:
where
yi is the explanatory variable;
xij is the explanatory variable;
ßbwj denotes the local regression coefficient of variable
j;
bwj is the bandwidth used for the regression coefficient of variable
j; (
ui,
vi) is the spatial center-of-mass coordinates of the sampling point
i;
xij is the observed value of variable
j at the sampling point
i; and
εi is the random error term.
4 Results
4.1 Spatial pattern of RTB
As demonstrated (Fig. 4), it shows the spatial distribution of spatial capacity and resident duration of urban residents traveling to the rural level. Xihongmen in Beijing receives the highest urban flow with a weekend daily average of 52,688, while Luozhuangzi in Tianjin has the lowest flow (434); urban residents have the highest average weekend daily length of stay at Chuanfangyu in Tianjin, at 4.02 h, and the lowest is Badaling in Beijing (0.91 h).
We calculated Moran’s I index to test the spatial autocorrelation of the dependent variables in the study area. As shown in Table 1, the Moran’s I for spatial capacity and resident duration were 0.314 and 0.226, with critical values (Z-scores) higher than 2.58 and P-values lower than 0.001. This indicates that both passed the test of significance with 99.9% probability. These values indicate that the global spatial autocorrelation of each dependent variable within the overall study area is significantly positive, and the probability that the clustered pattern of each dependent variable can be the result of random chance is less than 0.01%.
In addition, we utilize local Moran’s I to characterize local spatial correlations. It shows the local spatial clustered pattern for each dependent variable change. High-high clusters of spatial capacity is mainly located in the south of the main city of Beijing and near the main city of Tianjin, while low-low clusters are mainly located in Pinggu district of Beijing and Jizhou district of Tianjin. This is due to the high transportation accessibility of the area near the main city, which has obvious geographical advantages, and urban residents can often travel to such rural areas for recreational activities in a short period of time and distance. The long-stay clusters are located near the main city of Tianjin and the northern part of Jizhou District, and low-low clusters are mainly distributed in the southern part of Daxing District in Beijing (Fig. 5). It indicates that the attraction of rural areas in Beijing is significantly lower than that in rural areas of Tianjin, and in addition to the influence of geographical location, this may also be related to ecological environment and recreational facilities.
4.2 Model performance comparison
Prior to constructing the model, it was necessary to address data redundancy and conduct data pre-processing in order to reduce and exclude errors in the results arising from the coexistence of multiple highly correlated variables. Following Person correlation analysis and multi-collinearity diagnosis, 25 variables were retained for the initial 37 candidate variables for inclusion in the final model.
As demonstrated (Table 2), several variables significantly influence spatial capacity, including cultural customs, transportation, sky, uniqueness, intersection density, cost distance, morphological compactness, and transportation service density, with cost distance exerting the highest impact. Regarding resident duration, cultural customs, landmarks, uniqueness, and water coverage were significant, with uniqueness showing the greatest influence. In addition, the variance inflation factor (VIF) for the selected variables in the OLS model was below 10, indicating no serious multicollinearity issues among the variables.
It indicates that the autoregressive lag coefficients of both the SLM and SEM do not exhibit significant effects on spatial capacity. However, the autoregressive lag coefficients for resident duration are significant (P < 0.05), suggesting that resident duration is perturbed by spatially neighboring objects (Table A.2 in Appendix A).
As demonstrated (Table 3), the performance of the OLS model is relatively weak among the five models, as evidenced by lower R2 and Adjusted R2 values, and higher AICc values. The Adjusted R2 values for OLS are 0.534 and 0.262, indicating that 46.6% and 73% of the variance in spatial capacity and residence time, respectively, remain unexplained. The Adjusted R2 values for SLM and SEM are higher, and their AICc values are slightly lower, indicating better performance than OLS, though the improvement is not considerable. This may be due to unknown variables in the model or the scale of spatial processes involved.
To investigate potential local spatial variability, GWR and MGWR were employed for local modeling of spatial non-stationarity. For spatial capacity, the Adjusted R2 values for GWR and MGWR increased to 0.607 and 0.648, respectively, and the AICc values were 251.892 and 240.370. For resident duration, the Adjusted R2 values for GWR and MGWR increased to 0.392 and 0.584, respectively, and the AICc values decreased to 302.108 and 261.083. Overall, the local regression models outperformed the global regression models, with MGWR exhibiting the highest Adjusted R2 and lowest AICc values. Compared to GWR, MGWR more accurately captures local-scale variability in spatial capacity and residence duration, indicating its superior fit relative to OLS, SLM, SEM, and GWR.
As demonstrated (Table 4), GWR and MGWR reflect significant differences in spatial scale effects. The classical GWR, which employs a uniform bandwidth for all variables, only reveals the average level of effect scales. In contrast, MGWR accounts for differential role scales for various variables. For instance, for spatial capacity, the bandwidth of 44 for public space exhibits the smallest scale effect, with spatially steeper coefficients and greater spatial heterogeneity. Overall, MGWR outperforms classical GWR, with its primary advantage being the tailored spatial scales of action for different variables, as opposed to the fixed scales in GWR.
4.3 Multi-scale localized effects of motivational factors
As demonstrated (Fig. 6), it presents the coefficients of the GWR and MGWR models for the dynamical factors significantly influencing spatial capacity. The models exhibit differences in scale effects and the degree of influence for various dynamic factors. Specifically, there is a positive correlation between cultural customs and spatial capacity, particularly in areas with a lower proportion of cultural customs (Fig. 6(a, f)). Transportation, a crucial determinant of spatial capacity distribution, shows greater spatially heterogeneity in the MGWR model compared to the GWR model (Fig. 6(b, g)). The influence of public space in the MGWR model is concentrated in southeastern Beijing, showing stronger spatial sensitivity (Fig. 6(c, h)). The coefficients for the variable sky are smaller in both models, indicating a weaker positive relationship with spatial capacity (Fig. 6(d, i)).
The variable intersection density displays inconsistent behavior between GWR and MGWR models, although both show spatial effects approaching a global scale (Fig. 6(e, j)). The cost distance exhibits a negative effect but is not significant in northwestern Beijing in the MGWR model (Fig. 6(k, p)). The Morphological compactness exerts a relatively negative effect, while cropland cover and accommodation service density also have negative impacts, primarily distributed in the Tianjin region, with smaller scale effects (Fig. 6(l, q, m, r, n, and s)). The transportation service density shows nearly identical spatial patterns in both models, decreasing progressively from the north-western side of Beijing to the southern part of Tianjin, and is identified as the most influential dynamical factor (Fig. 6 (o, t)).
As demonstrated (Fig. 7), it illustrates the GWR and MGWR coefficients for the motivational factors significantly affecting resident duration. Unlike spatial capacity, cultural customs negatively impact resident duration, with substantial differences in spatial patterns between the GWR and MGWR models, the latter showing more cautious behavior (Fig. 7(a, f)). Transportation is negatively correlated with resident duration, especially in the southwestern parts of Beijing and Tianjin (Fig. 7(b, g)). Landmarks have a relatively low weakening effect, decreasing gradually from the northeast to the southwest, but the MGWR model exhibits a significantly smaller scale effect (Fig. 7(c, h)). Uniqueness positively influences resident duration, with the spatial patterns of the two models not coinciding. MGWR shows a weaker positive correlation than GWR (Fig. 7(d, i)). Diversity’s positive effect has a lower scale effect in MGWR compared to GWR, indicating greater spatial heterogeneity (Fig. 7(e, j)).
The negative effect of intersection density in MGWR is concentrated in Tianjin’s Jizhou district (Fig. 7(k, p)). The cost distance exhibits both positive and negative effects on resident duration, with differing scale effects between the two models (Fig. 7(l, q)). The water coverage is an essential dynamical factor for resident duration distribution near Tianjin’s main city (Fig. 7(m, r)). Accommodation service density, the most influential dynamical factor, positively affects resident duration, particularly in the southern region adjacent to Beijing’s main city (Fig. 7(n, s)). In the GWR model, tourist attraction density negatively impacts resident duration in the northern and western mountainous areas and the south of Beijing, whereas the MGWR model does not capture this effect (Fig. 7(o, t)).
As demonstrated (Fig. 8), it reveals the spatial distribution of local R2 values in both GWR and MGWR models. Both models better predict spatial capacity than the resident duration, with local R2 values exceeding 0.7 in most of Tianjin. This indicates higher local R2 values for spatial capacity in Tianjin, western and southern Beijing, suggesting good model performance in these areas (Fig. 8(a, b)). In contrast, lower local R2 values in northern Beijing in the GWR model indicate poor predictive performance. For resident duration, a substantial gap exists between the two models’ performance, with significantly lower local R2 values for Beijing in the GWR model compared to Tianjin (Fig. 8(c, d)). The MGWR model generally exhibits higher local R2 values except for the western part of Beijing. Overall, despite the similarity in local goodness-of-fit between the two dependent variables in the GWR and MGWR model, demonstrating more conservative spatial scale effects.
5 Discussion
With the growth of the national economy, the quality of residents’ tourism demand has increased, and the phenomenon of reverse urbanization has drawn significant attention to RT, establishing it as an important mode of tourism (
Wang et al., 2018). This study explores the influence of destination attributes on RT by investigating the potential mechanisms of DI and rural vernacular features affect the spatial capacity and length of stay of tourists, framed from the publicness perspective of SM. The results reveal that motivational factors at the levels of DI and rural background characteristics differ in their impact on spatial capacity and resident duration, displaying a spatial multi-scale effect.
Regarding spatial capacity, GWR and MGWR models perform better in the western and southern parts of Beijing and Tianjin, regions where power factors most significantly impact spatial capacity. Traditional cultural practices not only influence the lifestyle of local residents, but also attract tourists (
Ma et al., 2021). Areas with significant traffic impact are mainly in southern and western Beijing, as well as near the main city of Tianjin. The well-developed transportation infrastructure in these areas attracts more tourists who travel by self-driving, cycling, and other means, highlighting the transportation image in combination with rural characteristics. The scale effect of public spaces in MGWR is relatively small, with high-value impacts concentrated in Tongzhou District, the sub-center of Beijing connected to Tianjin. High-quality public spaces such as Songzhuang Artist Gathering Area and Grand Canal Forest Park meet tourists’ needs to interact with nature and experience culture, enhancing tourist satisfaction and natural landscapes protection (
Yang et al., 2021).
A moderate increase in infrastructure development ID improves destination road network accessibility and enhances tourist experience (
Randelli and Martellozzo, 2019). Conversely, the cost distance weakens spatial capacity by increasing travel time and economic costs (
Juschten and Hössinger, 2021). The negative correlation between morphological compactness and spatial capacity is strongest in northwest Beijing, where low form compactness restricts centralized transportation and infrastructure development, resulting in poor tourism experience (
Li et al., 2018). Regions with higher cropland cover have a stronger weakening effect on spatial capacity due to the emphasis on cropland use and protection, which reduces tourism-related facilities construction (
Yang et al., 2021), thereby diminishing the diversity and convenience of tourist experiences (
Randelli and Martellozzo, 2019). Compared to the mixed effects of accommodation service density in GWR, the smaller scale effect of MGWR resulting in high accommodation service density producing greater negative effects, making tourists feel crowded and commercialized (
Ye et al., 2019). The transportation service density has the greatest impact on spatial capacity, with a positive correlation of high-value clustering in mountainous areas of Beijing with lower transportation service density, offering a good natural environment with lower human interference (
Yang et al., 2019).
In terms of resident duration, rural areas near the main city of Tianjin have the most significant impact of dynamic factors. Cultural customs may restrict tourists’ freedom of movement, shortening their stay (
Ma et al., 2021;
Wu et al., 2023). Transportation has a strong correlation with resident duration in southwest Beijing, while landmark singularity leads to a monotonous tourist experience, reducing stay time (
Soszyński et al., 2021). Uniqueness can increase rural regional competitiveness (
Lin and Kuo, 2018), and extend visitors’ length of stay (
Jacobsen et al., 2018). Diverse DI fulfills the psychological needs of tourists, increasing their willingness and length of stay (
Lyu et al., 2021). Significant samples in MGWR are mainly distributed in the Beijing-Tianjin border area, where the twin-city effect increases tourism resources diversity and opportunities. The scale effects of intersection density and cost distance in MGWR are significantly lower than those in GWR, mainly in Jizhou district of Tianjin and northeastern Beijing. Abundant water resources and associated activities such as boating and fishing increase tourist satisfaction and resident duration (
López-Sanz et al., 2021). High-value impact areas near the main city of Tianjin benefit from favorable locations, making it easy for tourists to engage in various water-related activities.
Compared to spatial capacity, accommodation service density in MGWR becomes a key factor positively influencing length of stay, with stronger effects for low accommodation service density in southern Beijing, offering personalized and private experience, reducing the feeling of crowding among tourists (
Ye et al., 2019). The tourist attraction density in MGWR shows a smaller scale effect than GWR but is more spatially sensitive. High tourist attraction density attracts tourists to stay longer to fully experience different attractions and activities (
Jacobsen et al., 2019). Despite tourists’ preference for Beijing’s natural resources area, increased tourism services are insufficient to significantly extend their stay.
This study has some limitations. The research was conducted at the township level and not at the village level. In addition, in terms of data sources, due to the limitations of data acquisition, article length and time, cell phone signaling and social media have not acquired travel behavior data including gender, age and other traveler attributes for the time being, which limits the fine categorization and development directions of RT. This is the direction of the research being conducted by our team. Future research should consider villages as the research unit and differentiate actor attributes. At the same time, traditional data such as questionnaires can be added to form a combined study of “big data” and “small data”. The research method can be applied to other rural areas to improve generalizability.
6 Conclusion
This study provides a data-driven analysis of the effects of DI and background characteristics on the spatial capacity and resident duration of rural tourism behavior. It also examines the spatial multi-scale effects of motivational factors. The findings aim to help managers and planners in accurately understanding tourists’ self-organized behaviors, thereby facilitating the formulation and adjustment of policies to enhance human-land coupling relationships and promote sustainable tourism development in rural areas. Using cell phone signaling data, we measured the spatial capacity and resident duration of actual RTB of urban residents. We utilized public social media data to deconstruct and reorganize DI, and computed rural background characteristic variables based on multi-source heterogeneous big data.
In this paper, we applied and compared global regression models (OLS, SLM, SEM) and local regression models (GWR, MGWR) to determine the significance, direction of influence, and scale effects of the dynamical factors. Our results indicate that global models provide a weaker explanation of spatial variability, while the local models, particularly MGWR, offer a significantly better fit and more accurately capture the local spatial patterns. The MGWR, in particular, demonstrates superior simulation and multi-scale effects, aligning well with practical situation and demand orientation. In addition, incorporating the MGWR model mitigates common issues associated with aggregated data, such as the modifiable area unit problem (
Fotheringham and Wong, 1991).
To the best of our knowledge, previous research has primarily focused on the relationship between DI and behavioral intentions. While behavioral intentions can indicate the likelihood of an individual decisions, they do not represent actual behaviors. Starting from a data-driven viewpoint, this study uses the MGWR model to analyze the intrinsic relationship and multi-scale spatial changes between DI and rural tourism behaviors from the bottom-up perspective, which fills the theoretical gap in related studies. Additionally, there is a notable gap in research exploring DI at the village level and its correlation with substantive tourism behaviors. This study addresses this gap, offering a foundation for interpreting and geographically modeling the relationship between DI and actual tourism behaviors. The study’s results can provide tourism managers with a direction for DI development that fits human needs, provide rural planners with a targeted list of objects to optimize, and contribute to improving the return on funding for policymakers.
2095-2635/2025 The Authors. Publishing services by Elsevier B.V. on behalf of KeAi Communications Co. Ltd.