1. School of Geography and Planning, Sun Yat-sen University, Guangzhou 510000, China
2. Guangdong Guodi Institute of Resources and Environment, Guangzhou 510000, China
liuxp3@mail.sysu.edu.cn
Show less
History+
Received
Accepted
Published
2023-08-04
2024-02-02
Issue Date
Revised Date
2025-04-24
PDF
(5961KB)
Abstract
Urban expansion has far-reaching implications for economy, environment, and socio-cultural aspects of a city. Therefore, it is essential to have a thorough understanding of the complex dynamics and driving factors behind urban expansion in order to make informed decisions that promote the long-term sustainability of a city. Currently, cellular automata (CA) and agent-based modeling (ABM) have been widely employed to simulate urban land growth. However, existing research lacks a comprehensive consideration of the influence of individual agent attributes and land population capacity on site selection decisions. Consequently, we propose a novel approach that incorporates fine-scale population data into the site-selection decision simulation process, allowing for a granular depiction of individual decision attributes. Moreover, the site selection process integrates assessment criteria, including population capacity and neighborhood development status. Furthermore, to address the issue of fragmented simulated residential land use outcomes, population redistribution is iteratively conducted. Additionally, by integrating extended reinforcement learning mechanisms, the site selection process of residential multi-agent systems experiences a significant improvement in overall simulation accuracy. The proposed model was applied to simulate urban expansion in Shenzhen, Guangdong province, China. The results demonstrated that this model effectively enhances the behavioral decision-making capabilities of intelligent agents, thereby providing insights into the mechanisms underlying urban expansion. These findings hold considerable significance for making informed urban planning decisions and advancing the goal of sustainable urban development.
Jinding GAO, Chao LIANG, Jiaojiao GUO, Xiaoping LIU, Honghui ZHANG, Geng LIU.
Residential land growth simulation of agent-based model by coupling big data and reinforcement learning.
Front. Earth Sci., 2025, 19(3): 389-405 DOI:10.1007/s11707-024-1121-2
In recent years, China has experienced rapid urbanization, leading to unprecedented socio-economic prosperity with remarkable advancements in economy, education, and infrastructure nationwide. However, this accelerated development has also introduced new challenges for urban areas, as evidenced by the phenomenon of “ghost cities.” This phenomenon arises from the disparity between the expansion of urban residential land and the spatial distribution of population (Zheng et al., 2017), resulting in an evident manifestation of inefficient utilization of urban land resources (Li et al., 2015). Thus, in the context of contemporary new urbanization, it becomes paramount to integrate the preferences of micro-agents on the “demand side” in urban development, emphasizing a human-centered approach to development. In this context, the incorporation of sophisticated and scientifically-grounded decision-making simulations for urban land use growth holds great significance.
Based on existing research, several models have been developed for land use simulation. Notably, representative models utilizing cellular automata (CA) include FLUS (Liu et al., 2017), PLUS (Liang et al., 2021), among others, which have found widespread application in urban land use simulation. Nonetheless, these models frequently neglect the influence of micro-level individual decision-making on agent-based groups. In alignment with the World Bank’s report, a comprehensive understanding and simulation of human location choice behavior assume paramount importance in urban growth simulation (World Bank, 2015). Given the intrinsic characteristics of multi-agent systems, intelligent agents display individual heterogeneity, manifesting in diverse behavioral criteria. Their decision-making process is molded by interactions with the environment, consequently shaping the evolutionary dynamics of the entire system. Recognizing this, several studies have adopted individual-based modeling techniques to simulate the intricate behaviors of individuals, (Saeedi, 2018; Kourosh Niya et al., 2020; Li et al., 2020; Hashemi Aslani et al., 2022; Mirzahossein et al., 2022), with the objective of unraveling the underlying evolutionary patterns within complex systems. Embracing this modeling approach provides distinct advantages in effectively simulating complex spatial systems.
In the realm of micro-individual-based modeling, discrete choice models have been extensively employed to simulate agent decision-making processes. Among these models, the discrete choice model grounded on the assumption of stochastic utility maximization proves to be a powerful tool for analyzing residential location choices. An increasing body of research has broadened the scope of urban simulation models based on discrete choice theory, revealing their versatility and practicality. An example of such expansion is the development of the Urban SIM model, which aims to capture the complex dynamics of residential and employment decision-making processes (Waddell et al., 2010). Through these simulations, the model accurately identifies the distinct housing demands linked to each location and elucidates the resulting impacts on urban development patterns. The city model, as proposed by Jjumba and Dragićević (2012), utilizes irregular spatial units at the cadastral scale to effectively simulate land use changes. This is achieved by incorporating five distinct types of intelligence, which encompass residents and government entities among others. The SILO model and MATSim model were integrated to simulate fine-scale land use changes by incorporating essential processes such as household relocation, population dynamics, and real estate development (Ziemke et al., 2016). These models have found extensive application in the examination of urban land use and have been successfully implemented in various foreign cities, including Water Source in Korea (Jin and Lee, 2018) and Atlanta (Wang and Yuan, 2018). However, given China’s distinctive policies, national conditions, and data accessibility, additional research is warranted to localize and adapt these models to the specific Chinese context.
While residential choice models developed using discrete choice theory provide significant benefits in terms of accurate prediction and comprehensive evaluation, it is important to acknowledge the limitations that exist within these models.
1) Limited comprehension of the complex interactions between natural resource dynamics and human behavior systems, particularly the oversight or oversimplification of individual human behaviors. Previous individual-based decision simulation studies have been limited by data availability and modeling constraints, heavily relying on statistical or survey data that do not fully meet the requirements for micro-scale individual simulation data (Acheampong, 2018; Wang et al., 2021). The emergence of big data sources, such as Internet data and cell phone signaling, combined with advancements in data processing technology, has facilitated the acquisition and processing of vast amounts of fine-scale spatio-temporal data. Some studies have already started utilizing such data to analyze residential spatial choice behavior (Duan et al., 2022). However, the integration of these data with discrete choice models for residential location simulation is still relatively scarce. Additionally, existing methods face challenges in accurately estimating the parameters of the model utility function.
While current models rely on deterministic hierarchical analysis (Balta and Öztürk, 2021) and logistic regression (Dragićević and Hatch, 2018), these approaches are often influenced by subjective factors and struggle to fully capture the complex nonlinear relationship between human-land interactions and the drivers or constraints of urban expansion. Alternatively, some scholars have employed objective methods, such as artificial neural networks (ANN), random forests (RF), and gradient boosted decision trees (GBDT) (Kavak et al., 2018; Zhang et al., 2021; Tsagkis et al., 2023), to discern the decision-making patterns of agents, thereby enhancing the accuracy of multi-agent simulations. However, most studies primarily analyze the influencing factors on residential site selection from a land perspective, yet neglecting the broader social attributes that residential multi-agents consider when making site selection decisions. Consequently, such analyses can lead to spatial misalignment between people and land.
Furthermore, in the analysis of the human-land relationship, a prevailing trend is the absence of explicit categorization of agents, opting instead for the creation of a homogeneous agent type. Nonetheless, a subgroup of researchers has advocated for the incorporation of peer influence in residential location choices, leading to improved model accuracy through the adoption of hierarchical agent modeling that considers similarity and heterogeneity. While certain studies have categorized agents based on income (Haase et al., 2010; Hosseinali et al., 2013; Acheampong and Asabere, 2021), they have not extensively examined and analyzed the interplay between each agent type and spatial factors that drive decision-making processes.
2) The exploration of mechanisms governing the interaction among multi-agent within the same spatial context remains limited. In the realm of land use development, land development should ideally result from the coordinated efforts of residents, developers, and government entities. However, most current studies assume a one-to-one correspondence between land and agent, which overlooks important issues related to limited plot capacities and the possibility of selecting too few agents on a plot. Although some scholars have recognized the distinction between single-family and multi-family housing development (Dahal and Chow, 2014), their analyses often assume that developers face constraints in terms of development conditions and categories. As a result, the discrete site selection by multiple resident agents frequently leads to the creation of isolated parcels and fragmented land development. To address this, subsequent phases of the simulation process necessitate the involvement of real estate agents and government entities to select and redistribute the simulation results.
3) The current process of land use simulation tends to focus more on the interactions within the human-biophysical environment, rather than within the human system itself (Groeneveld et al., 2017). Previous studies have predominantly emphasized economic and locational factors, often overlooking the essential role of social dynamics, such as mutual and self-learning mechanisms among individuals. To bridge this gap, certain efforts have been made to integrate learning mechanisms into the simulation of agents’ land use changes (Bone et al., 2011). These endeavors have highlighted the learning characteristics of individuals, emphasizing how agents acquire and reinforce their own experiences through the incorporation of learning mechanisms into the location selection process (Li et al., 2020). These studies have extended reinforcement learning (RL) models and integrated the learning outcomes with land utility functions to construct hybrid utility functions. However, with regards to the design of learning rules, these models have primarily focused on the selection of single agent and have not adequately addressed the varying levels of attractiveness associated with different selection ratios for the same location.
Therefore, this study employs multiple sources of large-scale data to develop an individual-level learning model for simulating the growth of residential land. The analysis and optimization of the residential land growth model are focused on three primary dimensions. (i) Improving the accuracy of influential factor weights through the categorization of micro-level population data and exploring the human-land relationship from a human perspective. (ii) By incorporating maximum capacity for each plot and introducing evaluation modules for developer agents and government agents, the model minimizes fragmented land development, thereby mitigating associated risks. (iii) Through the extension of the reinforcement learning model, the simulation incorporates the learning process of agents as they consider both their own attributes and the behavior of other individuals during the site selection process. This leads to a reduction in arbitrary decision-making and an optimization of residential location choices. The proposed model is applied to conduct simulations of residential land growth within the study area, and the accuracy of these simulations is validated against actual growth patterns. The experimental findings are extensively analyzed and discussed to provide insights into the effectiveness and implications of the developed approach.
2 Study area and data
2.1 Study area
The study area is Shenzhen, located in the southern part of Guangdong Province at the mouth of the Pearl River (22°27′E−22°52′E, 113°46′N−114°37′N), encompassing a total area of 1997.47 km2 (Fig. 1). It consists of ten districts, with the southern areas of Futian, Luohu, and Nanshan being the most prosperous and highly developed regions. Being the first special economic zone of China, Shenzhen has witnessed rapid population growth and accelerated urbanization, resulting in a continuous reduction of available land resources. Therefore, Shenzhen serves as a suitable location for the investigation of urban simulation and expansion dynamics. As of the end of 2019, Shenzhen’s permanent resident population reached approximately 13.4388 million.
2.2 Study data
The data used in this study encompass a wide range of information, including fundamental geographic data, socio-economic indicators, climate and environmental data, land-use statistics, population characteristics, points of interest (POI), road network data, and subway line information. A detailed summary of these data sources is provided in Table 1.
The study area was represented using pixels with a resolution of 50 m. For land use classification (Fig. 2), we employed five categories: non-construction land (N), public administration service land (P), commercial land (C), residential land (R), and industrial land (I), with the latter four representing urban land types. As the conversion from urban land to non-urban land is relatively infrequent in Shenzhen, our research primarily focuses on the transformation from non-construction land to urban land and the transitions among different urban land use types.
The population portrait data (Fig. 3) was obtained through collaborative efforts, providing a spatial resolution of 200 m×200 m. It encompasses a wide range of attributes, including daytime and nighttime population counts, as well as proportions of high, medium, and low-income categories, among other variables. The population figures are derived from a combination of mobile signaling data, Tencent location data, and survey data. The integration of these data sources ensures a comprehensive representation of population dynamics at a fine scale, contributing to a more detailed analysis of location-specific decision-making processes. The attribute values primarily stem from the amalgamation of behavioral characteristics obtained from various mobile applications, and their prediction is facilitated by business rule identification and machine learning models. To ensure the data’s accuracy, rigorous validation was conducted by comparing it with actual small-area data, thereby meeting the required precision standards. To strike a balance between analysis precision and ease of partitioning, the initially provided grid, with a resolution of 200 m, was resampled to a finer resolution of 50 m. As a result, the resulting grid cells are one-sixteenth the size of the original cells while maintaining the unchanged proportions of population income, allowing for a more granular depiction of population dynamics and enabling a more precise analysis of location-based decision-making processes.
The locational selection process of the multi-agent system is guided by a comprehensive set of drivers, encompassing basic geographic information, socio-economic indicators, environmental conditions, points of interest, and the OpenStreetMap road network. The dynamic changes in urban land use are profoundly influenced by the intricate interplay between the natural environment and human activities. To capture the complexity of this relationship, we carefully curated a set of 13 driving factors derived from diverse sources, including nature, ecology, socio-economics, and transportation.
To ensure data consistency and facilitate meaningful analysis, these driving factors underwent meticulous and uniform processing, resulting in a spatial resolution of 50 m (Fig. 4). This finer resolution allows us to discern and comprehend the spatial variation of these factors across the study area, enabling a deeper understanding of how different environmental and socio-economic variables interact and influence the patterns of urban land use. By incorporating these driving factors into the simulation, we aim to gain valuable insights into the mechanisms driving urban land use changes and enhance the accuracy of our model’s predictions.
3 Methods
The research framework of this study is illustrated in Fig. 5, where each pixel represents a land parcel characterized by specific attributes and statistics. Furthermore, this study defines three major types of agents: resident agent (RA), who play a pivotal role in selecting residential locations; policy maker agents (PMA), responsible for assessing the appropriateness of land parcels for development; and developer agent (DA). In the model, it is assumed that each new resident agent has the ability to select a suitable residential location from the available developable land. The availability of such land is regulated by government agents. Meanwhile, the decision-making process of resident agents involves considering their own attributes along with the desirability of different land parcels. This framework aims to capture the dynamics of residential location selection and land development interactions in the study. Developer agents determine whether a land parcel should be developed, and their decision-making process takes into account the selection status of resident agents and the condition of the land parcel. Furthermore, the study employs reinforcement learning to enhance and fine-tune the selection structure after the initial stage of selection.
This study develops three sub-models to simulate the decision-making process of resident agents in selecting residential locations: the resident agent selection sub-model, the developer agent redistribution sub-model, and the resident agent learning optimization adjustment sub-model. The selection sub-model captures the land parcel choice process of all agents, while the redistribution sub-model allows developer agents to address fragmented land parcels and minimize development risks. The optimization adjustment sub-model incorporates learning mechanisms to enhance the concentration of location choices. During the simulation, resident agents enter the process sequentially to make their location selections. The redistribution sub-model evaluates the parcels chosen by agents and guides resident agents to select alternative parcels if the initially chosen ones do not meet the development criteria. In the learning optimization adjustment sub-model, resident agents integrate knowledge from the location choices made by other agents and their own initial selection experiences to optimize and adjust their subsequent location choices. These interconnected models operate in a sequential and nested manner, collectively simulating the process of residential land selection by human agents.
3.1 Residential choice model
3.1.1 Mining human-land relationships based on the GBDT model
Residential location selection is a complex process influenced by various factors, such as the natural environment, accessibility, socio-economic factors, and neighborhood environment, as identified in prior studies. These factors interact collectively and in a nonlinear manner, shaping the choices of resident agents within diverse spatial contexts. Gaining a deeper understanding of the intricate relationship between resident agents and their decision-making process regarding residential location selection has emerged as a prominent area of investigation in current research. To improve the simulation of resident agent location choices, this study adopts the Gradient Boosting Decision Tree (GBDT) model. The GBDT model is employed to uncover the spatially nonlinear relationship between resident agents and the social and natural environments. By using the GBDT model, the study creates a robust framework to assess the spatial influence on resident agents’ decision-making process when selecting residential locations.
GBDT algorithm, proposed by Friedman in 2001, has gained significant popularity due to its effectiveness. It is based on the concept of boosting and falls under the class of ensemble methods. GBDT employs an additive model that combines decision regression trees as base learners using linear combinations and forward propagation calculations. Through iterative training, GBDT constructs an ensemble model that effectively captures the intricate relationship between individuals and the environment.
1) Principle of GBDT
The fundamental principle of GBDT lies in its iterative process, which generates the next generation of weak classifiers using base classifiers known as classification and regression trees (CART). Through this iterative training, GBDT creates a globally optimal ensemble model for artificial intelligence. The specific formula for GBDT is as follows:
where represent the decision tree, is the sample data, is the number of decision trees, is the decision tree weights, and finally the optimal solution is obtained through M iterations.
2) Calculation of Feature Importance
The feature importance in GBDT quantifies the impact of various natural and social environmental indicators on residential population density within this module. The fundamental concept involves initially computing the importance of a feature variable in an individual decision tree:
where is the number of tree nodes, is the feature associated with the node, represents the squared loss obtained from splitting a tree node. A larger value signifies a greater level of accuracy attributed to the feature variable j, thus signifying its significance. The aggregate importance of the feature is assessed by computing the average importance across individual decision trees, as demonstrated in the subsequent equation:
where is the total number of decision trees and represent the importance of the feature variable in the k decision tree. In this study, we set the sum of the importance of each feature variable is 1.
3.1.2 Discrete choice model
Discrete choice models are utilized to investigate individuals’ decision-making processes, and they are applied to the residential selection of intelligent agents. Initially, the graded land utility function for each location is calculated, considering the attractiveness of land parcels to agents at different levels, as determined in the previous step. The calculation formula for the utility function is as follows:
where denotes the utility obtained by resident agents when making a choice among land parcels, specifically selecting land parcel j. represents an influencing factor, is the preference coefficient of , whose value is obtained by computing the feature importance utilizing the GBDT method in the previous step, and the uncertainty term represents the uncertainty component and is generated through a random number generator.
Furthermore, resident agents display diverse perspectives that depend on their income levels, with higher-income agents possessing wider-ranging viewpoints. Based on their individual perspectives, resident agents choose a specific number of land parcels and assess them using the corresponding land utility function. Prior studies have established that resident agents adhere to the principle of utility maximization in their residential location selection, guided by their personal preferences. The probability calculation formula for selecting each location is as follows:
where represents the probability that agent i selects j as a residential site at time t. A larger value signifies a greater probability of being selected by agent as a residential site. is the sum of the exponential functions of the locational utilities of the candidate locations. The formula is derived under the assumption that the stochastic component of decision-maker utility follows an independently and identically distributed extreme value distribution, based on the approach presented by McFadden for computing the choice probabilities of multiple alternative schemes (McFadden, 1977).
3.2 Redistribution model
The transition of land use arises from the collaborative efforts of resident agents, developers, and government entities. Within this dynamic framework, resident agents make land use choices based on utility functions. It is essential to consider that population capacity varies across regions and is specific to different land uses, thus requiring a priori calculation of population capacity. For land parcels designated for residential use, the determination of accommodatable population relies on a comprehensive assessment combining current population levels and vacancy rates:
where Rmax is the maximum number of people that can be housed on the site, Rnow represents the actual number of people on the site now, and k is the vacancy rate.
When dealing with land parcels characterized by non-residential land use and where there is no existing resident population, a k-nearest neighbor algorithm is used. This algorithm initiates by identifying the four nearest residential parcels within a 300-m radius of the target non-residential parcel. The population capacity of the target parcel is then determined by computing the average population count of these four adjacent parcels. In the event that the specified conditions are not met within the 300-m radius, the search range is expanded until four suitable neighboring parcels are identified. The ensuing calculation formula is outlined as follows:
The main force behind urban land expansion is land development carried out by profit-oriented developers. In light of this, the current study adopts two decision-making criteria based on relevant research. First, the number of individuals selecting a particular land parcel is a crucial consideration. If this number falls below 1/5 of the parcel’s capacity, the land parcel is deemed unsuitable for development. Secondly, the study conducts an assessment of the proportion of existing residential land within the surrounding neighborhood. Choosing land units situated in close proximity to urban areas and established residential zones offers several advantages, including convenient access to various transportation infrastructure and reduced investment risks. If the percentage of existing residential land in the immediate vicinity is below 1/9, indicating the presence of at least one residential land parcel nearby, the land parcel is excluded from the development options.
After completing the site selection process in the current round, resident agents encounter land parcels that are classified as non-convertible by government agents or temporarily excluded from development as assessed by developers. In such cases, resident agents selecting these parcels are instructed to proceed to the subsequent round of site selection, following the methodology outlined in Section 3.1. Furthermore, a reevaluation of these parcels is conducted utilizing the assessment method explicated in Section 3.2. This iterative learning process continues until all resident agents have designated their residential locations.
3.3 The learning model
This study utilizes an extended reinforcement learning framework to elucidate the decision-making behavior of resident agents during site selection processes. By incorporating elements of observational learning and self-learning, this approach enhances our understanding of the complex decision-making dynamics involved. A notable contribution of this study is the introduction of a novel computational equation for decision functions, which enhances and strengthens the attractiveness of target parcels. Through this extension, the learning process in site selection decision-making is effectively simulated.
3.3.1 The model of reinforcement learning
The fundamental principle of reinforcement learning is based on the idea that strategies that have shown relatively higher historical returns are more likely to be adopted in future decision-making. Incorporating past experiences is vital in shaping the learning behavior of individuals. Among the diverse array of implementation algorithms, the Roth-Erev (RL) model has garnered extensive utilization due to its simplicity and ease of implementation (Roth and Erev, 1995). Duffy and Feltovich (1999) advanced the Roth-Erev model by integrating individual learning from peers. Building on conventional approaches, Li et al. (2019) further extended the RL model by refining the reward function and enhancing the strategy for reinforcing positions. The calculation of the attractiveness of location j at time t involves two key scenarios. When the location was selected at time t−1, the corresponding calculation formula is as follows:
If position j is not selected, its calculation is given by
where ∈(0,1) denotes the depreciation factor, representing the extent to which past experiences are retained. signifies the agent i propensity for selecting location j at moment t, and its value evolving iteratively during the learning process. is the number of localization strategies available to the agent at time t, and is the reward function, The reward function is calculated using the following formula:
where denotes the number of newly selected siting strategies in neighborhood j, and is the proportion of the total number of available land units in neighborhood j.
This study presents two key modifications building upon the prior work by Li et al. (2019). First, the reward function is adjusted to accommodate the involvement of multiple agents in the site selection decision-making process, wherein each agent selects individual parcels. Furthermore, the study recognizes the varying impact of selection quantities on the subsequent attractiveness and improvement of the chosen parcels. In contrast to Li’s approach (Li et al., 2019), which immediately designates selected parcels for development, this study places emphasis on the utility enhancement after selecting the central cell. To achieve this, comprehensive testing is conducted to assign distinct weights to the central and surrounding cells, effectively highlighting their disparate importance in the decision-making process. The calculations are performed under two scenarios, with the first scenario focusing on the selection of location j at time t−1. The corresponding calculation formula is provided as follows:
where is the number of selections in plot j at moment t−1 and N is the number of selections available in plot j. is the sum of the selected numbers of plots around plot j at moment t−1, and is the sum of the selectable numbers of plots around plot j.
If location j is not selected at moment t−1, the formula is
3.3.2 Hybrid land utility function
Throughout the learning process, a hybrid utility function is utilized to shape the site selection decisions of resident agents. This utility function integrates both the learning outcomes and the inherent attractiveness of the land parcels. The calculation formula for this hybrid utility function is as follows:
where is the mixed land use value of land unit j for class i agent at time t, α is the weight of the attractiveness of the plot based on population and surroundings, β represents the weight of the HA learning outcome, represents an influence factor, is the preference coefficient of , the uncertainty term is created by the random number generator, and to express the uncertainty in the decision.
3.4 Accuracy indices
To assess the efficacy of the agent-based model simulation, this study employs evaluation methods inspired by prior research and incorporates neighborhood-based evaluation metrics. These metrics encompass measurements of match, close match, and mismatch. The precise calculation formulas for these metrics are provided below:
where MAC represents the degree of fit of the simulation results, and the higher its value, the higher the accuracy of the simulation. N is the sum of the number of all pixels, x represents the perfectly matched pixels, y represents the near matched pixels, and r denotes the half price of the near matched settings.
4 Model application and results analysis
4.1 Human-land relationship analysis
To examine the intricate relationship between humans and the environment, this study initiates the calculation process by estimating population figures for three income categories (high, medium, and low) based on income proportion attributes derived from population profile data. Thirteen distinct factors, encompassing aspects such as points of interest (POI) accessibility, development space, and housing prices, are employed as feature variables, while population density per unit area serves as the target variable. Using the gradient boosting decision tree (GBDT) algorithm, three models are constructed. The regression analysis reveals an R-squared value of 82.6%, indicating commendable performance of the regression models. The significance of individual features is depicted in Fig. 6, offering valuable insights into the relationships under investigation.
The study reveals variations in the drivers of site selection decisions among distinct income groups. Notably, the low-income group demonstrates a pronounced influence from factors encompassing proximity to subway stations, city centers, and factories. Among these factors, the proximity to subway stations has the highest impact, accounting for approximately 26.7% of the overall influence on their decision-making. Similarly, the middle-income group places primary importance on the distance to subway stations, followed by the distance to the city center and hospitals. In contrast, the high-income group assigns significantly greater weightage to the distance from the city center, which holds the highest influence at approximately 46.9% of their overall decision-making process. Additionally, this group takes into account rental costs and other factors when making their site selection decisions.
In practice, the empirical findings highlight distinct preferences and considerations among different income groups within the study area. Specifically, the low-income group demonstrates a stronger inclination toward practical considerations, notably commuting costs and convenience. As a result, they tend to select sites with easy access to subway stations and the city center. These areas often offer relatively lower housing rental costs and a reduced cost of living, making them attractive options for this income segment. Conversely, the middle-income group, benefiting from increased income levels, places greater emphasis on the availability of essential services. They prioritize locations in close proximity to hospitals, schools, and other vital infrastructure, indicating their preference for areas with comprehensive amenities and services to meet their needs. The high-income group, with even higher income levels, displays an even more pronounced preference for amenities and convenience. They tend to favor high-priced and high-rent areas in the central city, driven by factors such as convenient transportation, comprehensive facilities, and elevated service quality. As a result, they tend to concentrate their residential choices in the central city area, where they can enjoy the benefits of upscale living and a wide range of amenities.
Utilizing the GBDT calculation, the analysis yields numerical outcomes that highlight the significance of spatial factors within distinct income groups. These significance values are then utilized as weighting coefficients, effectively integrating them into the spatial factors. As a result, the study derives spatial attractiveness weights that are specific to each income group. These spatial attractiveness weights visually represented in Fig. 7. These findings provide a fundamental basis for subsequent multi-agent site selection decisions made by residents.
4.2 Distribution and redistribution
As highlighted in Section 3.2 of the previous chapter, population capacity tends to vary across diverse regions and land use types, necessitating the precalculation of population capacity. In this study, a vacancy rate of 5% is assigned to residential land, which allows for an additional 5% of the existing population to be accommodated in the selected parcels. For non-construction and public management land, the capacity is determined as 95% of the average population count of adjacent residential parcels. However, water bodies, commercial land, and industrial land are expressly excluded from selection process within the research framework. Employing the stipulated site selection rules, the number of individuals that can be accommodated for each land parcel is computed (Fig. 8).
By utilizing the spatial attractiveness results computed in Section 4.1.1 and employing the discrete random selection method described earlier, the study conducts a simulated allocation of new population proportionally among high, middle, and low income levels. Notably, residents of different income levels exhibit varying levels of spatial awareness, with higher-income residents generally possessing a broader field of vision. Drawing upon Liu et al. (2010)’s study on field of vision settings and considering the accessibility of information in the current context, adjustments are made accordingly. Specifically, the field of vision for high, middle, and low-income residents is set at 30, 20, and 10 land parcels, respectively. These values correspond to the number of randomly selectable land parcels that each income group can perceive during the site selection process. Before entering the random selection process, agents are randomly shuffled, and only one agent enters the process at a time. During the simulation of site selection decisions, agents consider their income level attributes and select an appropriate number of pixels within their field of vision based on the corresponding spatial attractiveness data. Subsequently, the random utility function is employed to evaluate and guide site selection decisions. This iterative process continues until each agent completes its site selection, marking the conclusion of the multi-agent residential site selection simulation round. Analyzing the results of the initial site selection round (Fig. 9), it is evident that the overall distribution of multi-agent site selections is concentrated primarily in the southern, central, and south-western core areas, closely mirroring the population distribution within the study area. Furthermore, a notable presence of new land parcels is observed along the subway line regions. However, due to the stochastic nature of the random utility function, the initial site selection results still exhibit dispersed land parcels. Therefore, subsequent evaluations conducted by real estate and government multi-agent systems are essential to reassess and redistribute less productive land parcels.
As discussed earlier, after completing the round of site selection simulations, some parcels may be scattered and fail to meet the development objectives. Thus, it becomes imperative for multi-agent systems comprising government and developers to assess the compliance of these parcels with the conversion requirements. Drawing upon the aforementioned redistribution criteria and guided by the principle of intensive land use, all parcels chosen during the decision simulations of resident agents undergo evaluation. If a parcel satisfies the conversion criteria, the resident agent responsible for its selection is deemed to have completed the residential site selection. Conversely, if a parcel fails to meet the conversion conditions, the resident agent who initially chose it must reselect and reassess until all resident agents have finalized their selection process.
Through the iterative “select-assess-reselect” process, a redistribution is carried out for all resident agents. The redistribution process (Fig. 10) demonstrates a gradual decline in the participation of resident agents as the iterations progress, ultimately resulting in the completion of the redistribution task. Spatially, the redistribution results indicate a decrease in the number of scattered parcels, with a higher concentration of parcels observed in the core area. This concentration is primarily observed in the southern, central, and south-western regions of the study area. These findings demonstrate the effectiveness and applicability of the redistribution module.
4.3 Reinforcing the learning process
Following the preliminary selection by resident agents, the learning model described in Section 3.3 is employed to conduct separate learning processes for three income groups. Through multiple rounds of iterative learning, the number of selected grid cells (x) reaches a stable state, while the iteration count (Iteration) exhibits a significant and effective decrease, as illustrated in the overall trend depicted in Fig. 11. The number of redistributions, initially 1757 in the first round, rapidly declines to 1240 and then gradually decreases to 1180 before stabilizing. This indicates the ability to achieve residents' site selection decisions at an accelerated iteration rate. Furthermore, the final count of selected grid cells (Grids Num) demonstrates a declining trend, decreasing from an initial 83846 to 83806. This signifies an ongoing tendency toward residents' site clustering, effectively realizing the principles of intensive land development.
The results of the resident multi-agent site selection simulations after reinforcement learning demonstrate a notable pattern (Fig. 12). With the incorporation of the learning mechanism, the probability of selecting neighboring cells surrounding the chosen parcels gradually increases as their attractiveness strengthens. This leads to the emergence of clustered patches of parcels, indicating the impact of the learning process on shaping the spatial distribution. To highlight the effectiveness of the learning process, we focused on two specific regions and observed the progressive merging of initially isolated parcels. This merging process generates interconnected clusters of cells, resulting in a more compact and relatively concentrated distribution pattern that closely resembles the actual spatial distribution of residential areas.
4.4 Model performance evaluation
Based on the literature references, this study establishes an evaluation function to simulate the growth of residential land. The simulation utilizes land use data from 2009 as a baseline and population data from 2009 to 2014 as the basis for predicting the growth and distribution of residential land. The simulation results are then compared with the actual land use data from 2014 to assess the accuracy and validity of the model. The degree of fit between the predicted results and the actual data is classified into three categories: perfect match, close match, and poor match. A perfect match refers to pixels where the spatial transformation trend of the predicted results aligns with that of the actual data, while a close match indicates adjacency or proximity within a 300m neighborhood. On the other hand, a poor match represents a set of results that fall outside this neighborhood.
Following each iteration of learning by the resident agents, their respective site selection matching degree is computed. The matching error curve and corresponding numerical values are presented in Table 2 and Fig. 13, respectively. It is evident that the combined number of perfectly matched and closely matched pixels gradually increases during the iteration process, reaching a stable state. Conversely, the number of poorly matched pixels exhibits an overall declining trend. These observations directly highlight the efficacy of the learning function.
4.4.1 Sensitivity analysis of parameter
To test the sensitivity of the learning model’s parameters, we design a series of parameter set and run the ABM-learning model. We varied the values of parameters across different sets: {0.6, 0.8, 1.0, 1.2}, {0.6, 0.8, 1.0, 1.1, 1.2, 1.5}, and {0.6, 0.8, 1.0, 1.1, 1.2, 1.5}. To account for the inherent randomness in the selection process, each model was run 50 times, and the overall average accuracy was calculated after each run. The results, presented in Fig. 14, demonstrate that the simulation accuracy exhibits slight fluctuations. Notably, when exceeds 1, there is a modest improvement in accuracy following the close match and adjacent match compared to scenarios where ‘a’ is less than 1. However, the overall magnitude of change is not significant.
In the final selection results after reaching a stable iteration of learning, this study examines two approaches to validate the outcomes of the simulated residential land additions, considering the relatively large number of newly generated pixels. The first approach, MAC_N (Mean Accuracy Change in relation to the Number of actual additions), evaluates the accuracy relative to the actual changes in residential land. The second approach, MAC_P (Mean Accuracy Change in relation to the simulated results), assesses the accuracy relative to the simulated outcomes. The results demonstrate that the MAC_N accuracy exceeds 90%, indicating a high degree of consistency between the simulated and actual additions of residential land. Only a small proportion, less than 10%, of the actual additions were not captured in the simulation. This level of accuracy confirms the acceptable performance of the model in predicting and simulating residential land additions.
From Fig. 14, it can be observed that the overall distribution pattern of the simulation results is similar. The simulation exhibits a higher level of accuracy in the central region and along the subway lines. However, certain newly developed areas, particularly those situated farther from the subway stations and in the south-eastern region, display instances of underestimation. These areas are characterized by relatively scattered simulation outcomes, primarily attributable to the lack of established supporting amenities and ongoing infrastructure development, resulting in diminished attractiveness for the intelligent agents. Concurrently, the central city area, benefiting from well-established amenities, exhibits sporadic instances of overestimation. This is due to its heightened appeal to individuals, leading to a higher likelihood of being selected as a residential location by the resident agents.
5 Discussion
5.1 Big data and rule fine-grained mining
Traditional land-use simulation methods heavily rely on historical macro-statistical data or individual sample surveys, which often encounter difficulties in capturing individual attributes and behavioral rules, thus limiting their ability to comprehensively reflect the intricate dynamics of human-land interactions. In contrast, this study leverages the advantages of fine-grained population big data, characterized by its extensive sample size, refined spatial scale, and detailed attribute information. By employing intelligent mining techniques to extract individual behavioral rules for intelligent agents, the study introduces a stratification based on income levels to account for the varying factors that influence residential site selection among different income groups. This refined approach enhances the accuracy of model calibration and simulation, enabling a more realistic and precise representation of the complex human-land relationship.
5.2 Consideration of population capacity enhances the rationality of site selection
The utilization of discrete choice methods in residential site selection can sometimes result in scattered parcels, with some areas having only a few resident agents selecting them. To address this issue and promote more efficient land use, the study conducts a redistribution of resident agent selections. The redistribution process takes into account factors such as maximum capacity, minimum selection quantity, and the number of residential sites in the neighborhood. By considering these factors, the redistribution aims to cluster parcels more effectively, aligning with the principle of intensive land use.
5.3 Learning models help optimize site selection results
In this study, learning models have been employed for resident agents to enhance their location selection strategies. By leveraging their own experiences as well as the experiences of others, this approach effectively reduces the likelihood of arbitrary choices and enhances the rationality of site selection decisions. Furthermore, the study has introduced adaptive modifications to the extended reinforcement model, which bolster the utility value of areas exhibiting a higher proportion of selected locations, as well as the utility value of neighboring cells. This approach aligns with the location selection process of resident agents in urban settings, where a comprehensive consideration of both individual and social experiences is paramount. Notably, the study methodology differs from that of Li et al. (2019). In that it allows for the possibility of multiple individuals selecting the same location, and we have positioned the learning phase subsequent to the initial selection process, enabling the refinement of site selection strategies through the assimilation of lessons learned from the outcomes of previous selections.
6 Conclusion and expectation
This study utilizes micro-level population profiling data to categorize individuals into three income levels, enabling the construction of models for discrete selection in residential land use. By incorporating incremental capacity constraints and redistribution mechanisms, the study aims to enhance the clustering of land parcels while introducing a learning mechanism to improve the selection process. The application of these approaches demonstrates an improved decision-making capability of the model to a certain extent. The integration of micro-level population data and the implementation of learning mechanisms have significantly enhanced the intelligence of resident agents, making the model more applicable to land use simulations in human-centric cities. In future research, further exploration can be conducted to consider household needs and provide recommendations for developers’ site selection and the allocation of various service facilities.
However, it is important to acknowledge that in real-world scenarios, households are typically considered as the primary unit of analysis in residential site selection. However, due to challenges in acquiring household-level data and the complexities involved in capturing internal household dynamics during the selection process, this study focuses on individual-level data for resident agents’ site selection. Furthermore, the current study categorizes resident agents solely based on income, while population data may encompass additional attributes such as the presence of children and vehicle ownership. Future steps in research will involve conducting more nuanced simulations by incorporating these additional attributes, as they play a significant role in shaping residential location preferences and decisions. Lastly, the study does not account for the mobility of resident agents within the city. Future research endeavors will expand upon this aspect by integrating migration patterns and urban renewal processes.
Acheampong R A (2018). Towards incorporating location choice into integrated land use and transport planning and policy: a multi-scale analysis of residential and job location choice behaviour.Land Use Policy, 78: 397–409
[2]
Acheampong R A, Asabere S B (2021). Simulating the co-emergence of urban spatial structure and commute patterns in an African metropolis: a geospatial agent-based model.Habitat Int, 110: 102343
[3]
Balta M Ö, Öztürk A (2021). Examining the dynamics of residential location choice in metropolitan areas using an analytical hierarchy process.J Urban Plann Dev, 147(4): 05021048
[4]
Bone C, Dragicevic S, White R (2011). Modeling-in-the-middle: bridging the gap between agent-based modeling and multi-objective decision-making for land use change.Int J Geogr Inf Sci, 25(5): 717–737
[5]
Dahal K R, Chow T E (2014). An agent-integrated irregular automata model of urban land-use dynamics.Int J Geogr Inf Sci, 28(11): 2281–2303
[6]
Dragićević S, Hatch K (2018). Urban geosimulations with the Logic Scoring of Preference method for agent-based decision-making.Habitat Int, 72: 3–17
[7]
Duan J, Shi M, Wang Y (2022). Enhancing the discrete choice model of residential location with big data and representation learning.CICTP, 2022: 2526–2535
[8]
Duffy J, Feltovich N (1999). Does observation of others affect learning in strategic environments? An experimental study.Intern J Game Theory, 28: 131–152
[9]
Groeneveld J, Müller B, Buchmann C M, Dressler G, Guo C, Hase N, Hoffmann F, John F, Klassert C, Lauf T, Liebelt V, Nolzen H, Pannicke N, Schulze J, Weise H, Schwarz N (2017). Theoretical foundations of human decision-making in agent-based land use models–a review.Environ Model Softw, 87: 39–48
[10]
Haase D, Lautenbach S, Seppelt R (2010). Modeling and simulating residential mobility in a shrinking city using an agent-based approach.Environ Model Softw, 25(10): 1225–1240
[11]
Hashemi Aslani Z, Omidvar B, Karbassi A (2022). Integrated model for land-use transformation analysis based on multi-layer perception neural network and agent-based model.Environ Sci Pollut Res Int, 29(39): 59770–59783
[12]
Hosseinali F, Alesheikh A A, Nourian F (2013). Agent-based modeling of urban land-use development, case study: simulating future scenarios of Qazvin city.Cities, 31: 105–113
[13]
Jin J, Lee H Y (2018). Understanding residential location choices: an application of the UrbanSim residential location model on Suwon, Korea.Intern J Urban Sci, 22(2): 216–235
[14]
Jjumba A, Dragićević S (2012). High resolution urban land-use change modeling: Agent iCity approach.Appl Spat Anal Policy, 5(4): 291–315
[15]
Kavak H, Padilla J J, Lynch C J, Diallo S Y (2018). Big data, agents, and machine learning: towards a data-driven agent-based modeling approach.Proceedings of the Annual Simulation Symposium, 2018: 1–12
[16]
Kourosh Niya A, Huang J, Kazemzadeh-Zow A, Karimi H, Keshtkar H, Naimi B (2020). Comparison of three hybrid models to simulate land use changes: a case study in Qeshm Island, Iran.Environ Monit Assess, 192(5): 302
[17]
Li F, Li Z, Chen H, Chen Z, Li M (2020). An agent-based learning-embedded model (ABM-learning) for urban land use planning: a case study of residential land growth simulation in Shenzhen, China.Land Use Policy, 95: 104620
[18]
Li F, Liang J, Clarke K, Li M, Liu Y, Huang Q (2015). Urban land growth in eastern China: a general analytical framework based on the role of urban micro-agents’ adaptive behavior.Reg Environ Change, 15(4): 695–707
[19]
Li F, Xie Z, Clarke K C, Li M, Chen H, Liang J, Chen Z (2019). An agent-based procedure with an embedded agent learning model for residential land growth simulation: the case study of Nanjing, China.Cities, 88: 155–165
[20]
Liang X, Guan Q, Clarke K C, Liu S, Wang B, Yao Y (2021). Understanding the drivers of sustainable land expansion using a patch-generating land use simulation (PLUS) model: a case study in Wuhan, China.Comput Environ Urban Syst, 85: 101569
[21]
Liu X, Li X, Chen Y (2010). Agent-based model of residential location.Acta Geogr Sin, 65(6): 695–707
[22]
Liu X, Liang X, Li X, Xu X, Ou J, Chen Y, Li S, Wang S, Pei F (2017). A future land use simulation model (FLUS) for simulating multiple land use scenarios by coupling human and natural effects.Landsc Urban Plan, 168: 94–116
[23]
McFaddenD (1977). Modelling the Choice of Residential Location. Cowles Foundation Discussion Papers, 710
[24]
MirzahosseinH, NoferestiV, JinX (2022). Residential development simulation based on learning by agent-based model. TeMA-Journal of Land Use, Mobil Environ, 15(2): 193–207
[25]
Roth A E, Erev I (1995). Learning in extensive-form games: experimental data and simple dynamic models in the intermediate term.Games Econ Behav, 8(1): 164–212
[26]
Saeedi S (2018). Integrating macro and micro scale approaches in the agent-based modeling of residential dynamics.Int J Appl Earth Obs Geoinf, 68: 214–229
[27]
Tsagkis P, Bakogiannis E, Nikitas A (2023). Analysing urban growth using machine learning and open data: an artificial neural network modelled case study of five Greek cities.Sustain Cities Soc, 89: 104337
[28]
Waddell P, Wang L, Charlton B, Olsen A (2010). Microsimulating parcel-level land use and activity-based travel: development of a prototype application in San Francisco.J Transp Land Use, 3(2): 65–84
[29]
Wang D, Yuan C (2018). Modeling and forecasting household energy consumption and related CO2 emissions integrating UrbanSim and transportation models: an Atlanta BeltLine case study.Transp Plann Technol, 41(4): 448–462
[30]
Wang H, Zeng W, Cao R (2021). Simulation of the urban jobs–housing location selection and spatial relationship using a multi-agent approach.ISPRS Int J Geoinf, 10(1): 16
[31]
World Bank (2015). World Development Report 2015: Mind, Society, and Behavior. The World Bank
[32]
Zhang W, Valencia A, Chang N-B (2021). Synergistic integration between machine learning and agent-based modeling: a multidisciplinary review.IEEE Trans Neural Netw Learn Syst, 34(5): 2170–2190
[33]
Zheng Q, Deng J, Jiang R, Wang K, Xue X, Lin Y, Huang Z, Shen Z, Li J, Shahtahmassebi A R (2017). Monitoring and assessing “ghost cities” in Northeast China from the view of nighttime light remote sensing data.Habitat Int, 70: 34–42
[34]
Ziemke D, Nagel K, Moeckel R (2016). Towards an agent-based, integrated land-use transport modeling system.Procedia Comput Sci, 83: 958–963
RIGHTS & PERMISSIONS
Higher Education Press
AI Summary 中Eng×
Note: Please be aware that the following content is generated by artificial intelligence. This website is not responsible for any consequences arising from the use of this content.