1 Introduction
With accelerating urbanization, mobility systems are growing more complex, encompassing a wide array of transportation modes, densely interconnected road networks, and highly dynamic traffic flows. This multifaceted complexity poses significant challenges to traditional urban management frameworks. Meanwhile, urban mobility data have expanded exponentially, drawing from diverse sources like real-time traffic flows, public transit, ride-sourcing platforms, and mobile devices, greatly enriching the resources available for intelligent urban governance. Despite the widespread use of data-driven methods (
Nicolas et al., 2021;
Tyagi and Bhushan, 2023) in traffic flow prediction, route planning, and congestion control, these approaches remain limited in addressing complex scenarios and diverse needs (
Liu et al., 2020). Traditional methods are often hampered by high data dependence, limited generalization, and challenges to capture dynamics, underscoring the urgent need for more advanced and intelligent technological solutions to smart city mobility.
Fortunately, large language models (LLMs), exemplified by GPT-4, offer significant potential for enabling mobility computing for smart cities. With their exceptional natural language understanding and generation capabilities, LLMs can efficiently handle structured and unstructured data. It extracts implicit patterns and rules from complex mobility data through contextual semantic understanding and sophisticated reasoning, resulting in interpretable insights. This paradigm shift transforms traditional decision-making approaches and unlocks new solutions to effective urban mobility management (
Chen et al., 2024). Under multivariable dynamic interactions, LLMs’ multitasking capabilities and cross-domain knowledge transfer have the potential to support forward-looking decision-making, offering valuable insights (
Zhang et al., 2024b). For example, Alipay of China’s Ant Group has released the “Zhi Xiao Bao” mobile App based on the Ant Bailing multi-modal LLM, which integrates dozens of mobility services such as ticket booking, ordering food, taking taxis, and checking information on leisure places. Users can chat with the AI agent using natural language (e.g., voice, and text) to obtain corresponding services.
Despite LLMs’ promising prospects in empowering smart city mobility, the field is in its infancy and faces numerous challenges, like model inference instability and high computational costs. For instance, when the prompts are not constrained, LLM will inevitably output content that is irrelevant to the target task, resulting in the failure of urban management decisions. At the same time, LLM has large network parameters, and its model operation requires a lot of computing power, which hinders real-time decision making. In this Comment, as illustrated in Fig.1, we provide an overview of smart city mobility applications empowered by LLMs. Unlike prior works that primarily explore generalized NLP applications or AI’s foundational roles in smart cities, we highlight the distinctive challenges and opportunities that LLMs present in this domain. Specifically, we identify critical limitations in current LLM-based urban mobility studies (
Zheng et al., 2023;
Ullah et al., 2024;
Zhang et al., 2024a), such as data encapsulation, model stability, and model complexity, which can impede effective urban management decision-making. To tackle these challenges, we propose practical solutions from both data and model perspectives, such as data alignment based on time-space indexing, multi-level model integration, and model pruning based on knowledge distillation. Ultimately, we hope to achieve more flexible and secure data use, and more stable and accurate model computing, so as to enable LLM to reliably and personally empower smart city applications.
2 Application potential of LLM for smart city mobility
Driven by leading companies in China, e.g., Alibaba and Baidu, LLMs are gradually penetrating and transforming various fields and servicing stakeholders (e.g., travelers, mobility service platforms, and government agencies) for smart city mobility, greatly enhancing the efficiency and quality of urban travel (
Palmer, 2024).
In terms of individual travel, fine-tuning LLM with extensive traffic industry terminology and users’ historical search data enables the conversational interface that supports users in location search, navigation, and traffic information queries (
Wang et al., 2024). This enhances the efficiency and convenience of personal travel.
Urban mobility service platforms like Mobility as a Service (MaaS) represent an innovative, user-centric model that integrates various transportation services into a single digital solution (
Wong et al., 2020). Designed to offer seamless, one-stop travel experiences, MaaS combines public transit, bike-sharing, ride-hailing, car rental, and other mobility options into a unified platform. Through a single application, users can plan their trips, book services, and make payments, ensuring a smooth and connected travel experience. Within the MaaS platform, LLMs can leverage users’ query histories and travel trajectories to deliver personalized travel recommendations. Meanwhile, LLMs fully harness their advanced text processing capabilities to gather real-time traffic and external environment information from various data sources like social media, news, and weather forecasts, enabling timely alerts. For example, if an accident causes road congestion, the platform can swiftly identify and notify users while offering alternative route suggestions. Zhou et al. (
2024b) developed a dynamic path planning framework combining Markov chain, Bayesian inference, and LLM (i.e., Llama3 8B), which evaluates route rationality by driving time and vehicle waiting time, and can dynamically adjust and generate alternative routes. The effectiveness of the framework was verified by using SUMO (Simulation of Urban MObility) software.
From the government’s perspective, LLMs can significantly influence transportation planning and policy formulation (
Gao et al., 2024). Designing urban transportation systems requires considering geographic information, human mobility, and the distribution of points of interest (POIs). With their multi-source data integration capabilities, LLMs serve as a vast AI corpus that can perceive social environments, thereby assisting managers in adjusting urban transportation networks and providing real-time response and decision support. Additionally, LLMs can be utilized for infrastructure optimization and travel demand management (
Zhou et al., 2024a), offering well-informed recommendations for target cities based on layout information from other cities. Regarding traffic regulation, vision-based LLMs can help monitor road conditions and violations, reducing the workforce need. LLMs can reconstruct vehicle trajectories based on limited roadside detection devices and provide accurate traffic flow estimations at the segment level (
Wei et al., 2024), enabling the government to obtain the overall network’s traffic conditions in real time and make accurate travel guidance strategies. For example, Baidu’s intelligent connected car-road-cloud platform uses the pre-set knowledge base and local knowledge base as corpus support, and the traffic vision large model, traffic language large model, and traffic expert large model as technical support. Through various monitoring and visualization interfaces, it provides integrated traffic services in dimensions such as people, cars, and roads, promoting efficient city governance. LLM can also help traffic managers summarize accident information in lengthy accident description documents and generate clear and easy-to-understand accident reports, reducing labor costs. For example, Zheng et al. (
2023) designed prompts for traffic safety analysis, used ChatGPT to extract, impute, and analyze accident information, and realized the automatic generation of accident reports.
Beyond these applications, the robust reasoning and generalization capabilities of LLMs can empower future more complex smart city mobility scenarios, such as autonomous vehicle scheduling and multi-modal travel choices. For instance, LLMs could be developed for vehicle scheduling across an entire transportation network (
Chen and Lu, 2024;
Cui et al., 2024), enabling real-time interconnection of autonomous vehicles (AV) at scale for efficient dispatching. These models can process vast amounts of dynamic data—such as traffic conditions, passenger demand, and vehicle availability—while reconciling competing priorities, such as minimizing waiting time, optimizing energy efficiency, and mitigating congestion in high-traffic areas. By integrating contextual understanding, LLMs can address complex scheduling conflicts, such as balancing simultaneous demands from multiple regions during peak hours, in ways that traditional rule-based systems might struggle to achieve. Furthermore, LLMs can enhance multi-modal travel choices by serving as a central coordinating system that seamlessly connects different transportation modes, such as bus, metro, bike-sharing, and ride-sourcing services, to provide users with a unified and convenient travel experience. For example, LLMs can analyze and predict disruptions in real time (e.g., delays in public transit) while dynamically suggesting alternative routes or travel modes (
Fang et al., 2024). By incorporating individual travel preferences through local fine-tuning, LLMs can generate highly personalized travel itineraries, taking into account factors such as cost, time, environmental impact, and even user-specific constraints like accessibility needs. This level of customization and adaptability not only streamlines multi-modal travel but also makes it inclusive and user-centric.
3 Multi-modal data encapsulation and representation
While LLMs are increasingly applied in smart city mobility systems, large-scale and reliable data perform a prerequisite for LLMs to learn domain-specific knowledge adaptively. While existing smart mobility systems hold vast, multi-source, and heterogeneous data (e.g., human/vehicle trajectories, checkpoint videos, and remote sensing data), challenges emerge in encapsulating and feeding the data into LLMs for empowering urban mobility computing. These challenges include aligning data structures, effectively representing data semantics, and protecting data privacy.
3.1 Data encapsulation
First, multi-modal mobility data exhibit substantial structural differences. For example, human travel trajectories typically manifest as continuous spatiotemporal trajectory chains, whereas traffic conditions are discrete timestamped data, and intersection monitoring appears as continuous video streams. Data heterogeneity complicates LLMs’ ability to effectively associate multi-modal traffic data with travel information. Although previous studies (
Fawzy et al., 2023) explored unsupervised learning methods for encapsulating heterogeneous multi-modal data, they face limitations when applied to LLMs due to high training costs.
To meet LLMs’ need for a unified data format and the flexibility of model deployment in various urban scenarios, a data alignment module can be integrated to encapsulate multi-modal data into a single input package that LLMs can process, such as a JSON file, through standardized data workflows. Achieving this necessitates establishing efficient temporal and spatial indexing specifications, such as utilizing geographic coordinates and contextual feature descriptions to create a unified spatial index. As a result, all heterogeneous data can be processed in batches by LLM. Further, different types of data need to undergo feature extraction and transformation via multi-modal LLM. Specifically, textual data can be processed through a natural language processing module to extract key information, while image and video data can be transformed into structured features through a computer vision module. These features are then aligned and integrated using cross-modal learning algorithms (
Li et al., 2020). Meanwhile, to enable LLM to perceive changes in urban travel scenarios continuously, each data processing module needs to integrate an incremental learning mechanism for adapting to continuous data streams and dynamically updating the knowledge base without complete retraining. This is especially important in real-time tasks such as traffic prediction and traffic light control, where abnormal event information (e.g., traffic accidents or public events) needs to be reflected in predictive results. Also, to prevent LLM from generating biased decision results, a bias correction mechanism (
Chen et al., 2024) based on adversarial networks can be embedded to balance the data distribution and ensure that the data features learned by LLM are non-discriminatory. Ultimately, LLM synthesizes information from various modalities to generate outputs with contextual understanding and logical reasoning capabilities.
3.2 Data semantics
While LLMs excel at commonsense understanding and logical reasoning, the semantic representation of urban travel data are crucial for them to quickly grasp the underlying motives and logic of human mobility patterns. For instance, LLMs can infer travel opportunities (e.g., jobs, and shopping) by analyzing demographic and economic attributes, thereby predicting individuals’ travel choices. Previous work (
Zhang et al., 2024b) attempted to enhance the semantic representation of travel data using techniques like semantic annotation and Bayesian reasoning. However, these methods showed limited performance in spatiotemporal semantic parsing under large-scale urban travel scenarios.
In the future, a geospatial semantic module can be introduced to encode target cities’ functional characteristics, organizational structure, and road network topology. This module will generate comprehensive geospatial context representations, enabling LLMs to handle complex urban scenarios with greater accuracy and adaptability. For instance, such a module could dynamically parse and integrate real-time spatiotemporal data (e.g., traffic conditions, and public event schedules) while simultaneously encoding the multi-layered organizational structure of cities, such as the relationships between neighborhoods, transit hubs, and commercial zones. Additionally, the module would interpret road network topologies, capturing details such as intersection layouts, connectivity, and hierarchical road classifications, which are critical for applications like route optimization and autonomous vehicle navigation. Meanwhile, constructing a geographic knowledge graph (
Wang et al., 2019;
Dsouza et al., 2021) on this foundation can effectively assist LLMs in understanding the dynamic relationships among people across regions. For example, by integrating event text data to identify venues for upcoming performances, the knowledge graph can dynamically update the attractiveness of areas and their related travel flow characteristics. Moreover, urban managers can design prompt templates to dialogically convey physical urban travel rules to LLMs, reinforcing and refining their identified semantic knowledge of travel patterns. For instance, supplying traffic regulation data can help LLMs learn universal traffic rules and enhance the logic behind their route-planning solutions. To ensure seamless integration with existing data structures, the geospatial semantics module will interface with various data sources through a custom standardized preprocessing and encoding process so that all heterogeneous data sets can be coordinated into a unified semantic framework.
3.3 Data privacy protection
Since LLMs utilize user input for training, they are susceptible to data leakage and theft, posing personal safety and privacy risks. Consequently, it is crucial to integrate established privacy-preserving technologies, such as blockchain technology (
Karger et al., 2021), into LLM-based systems to secure communications. Additionally, employing federated learning (
Li et al., 2021) in training LLMs facilitates distributed model training and fine-tuning, enhancing data protection by allowing data providers to share only model parameters, rather than raw data, thereby significantly bolstering data security.
4 Robust and lightweight model construction
LLMs have demonstrated substantial potential in smart city mobility. However, challenges in assessing model plausibility, regulating outputs, integrating model architectures, and compressing models limit their robustness across various urban mobility scenarios.
4.1 Model plausibility
A key advantage of LLMs is their capacity to generate interpretable results. However, they can hallucinate, producing information that appears credible but is fabricated, which may mislead policymakers and transportation engineers. The absence of ground-truth labels for evaluating interpretive results and the labor-intensive process of manually constructing evaluation data sets make it difficult to quantify and optimize the credibility of model outputs. Future research could assess the support for model interpretations through result decomposition and external corpora like Wikipedia or traffic regulations. Additionally, employing more powerful LLMs aligned with human preferences to evaluate outcomes on a larger scale could be a promising approach.
4.2 Model stability and accuracy
In transportation fields like autonomous driving and travel prediction, the stability of model outputs is crucial, as it directly impacts traffic safety and application effectiveness. For instance, unstable next-location predictions can negatively affect user experience in location-based services, resulting in lower user retention rates. The inherent randomness in LLM outputs makes it challenging for current LLM-based models to deliver consistent content. Moreover, constraining the output format of these models remains difficult. Future research should focus on designing more effective prompts to control outputs, creating cost-efficient methods to refine non-compliant samples iteratively, and developing model frameworks tailored for specific output formats.
While LLMs excel in general tasks due to large-scale data pre-training, their performance can lag behind deep learning models trained on specific domain problems. This paper focuses on leveraging the capabilities of LLMs in handling multi-modal data such as natural language text and videos, enabling a single LLM model to comprehend general domain knowledge. The definition of general knowledge depends on the specific smart city contexts. For traffic management systems, general knowledge might encompass traffic flow prediction and traffic signal control, whereas, for carbon emission monitoring systems, it might include patterns of environmental change and pollution emission trends. However, accuracy in model inference is critical for urban decision-makers. To improve the decision-making accuracy of LLM in large-scale dynamic urban environments, building auxiliary decision-making modules that integrate deep learning models at multiple levels—input, feature, and decision—is essential. These small models based on deep learning are trained according to specific tasks to serve as domain expert knowledge. Thereby, this integrated and collaborative framework, with LLM as the backbone and smaller deep learning models as plug-ins dynamically invoked as per task requirements, can maximize the strengths and mitigate the limitations of both. Taking the human mobility prediction task as an example, the traffic flow prediction results based on deep learning can be used as the model input of LLM, thereby enhancing LLM’s understanding of the traffic environment. The spatiotemporal feature vectors extracted by deep neural networks can be concatenated with the intermediate layer of LLM to enrich feature mining of LLM. In addition, ensemble learning methods (
Dong et al., 2020) can be employed to combine the predictions of LLMs and multiple deep learning models using a voting mechanism, serving as the final decision output. Finally, it improves overall model accuracy while enhancing the robustness and reliability of decisions. It should be noted that integrating deep learning models will inevitably introduce more model parameters. On the one hand, practitioners need to balance the model’s computational complexity and accuracy according to specific tasks. On the other hand, designing effective model pruning strategies and distributed processing frameworks can help alleviate computational challenges.
4.3 Model complexity
Furthermore, while the superior capabilities of LLMs stem from their vast parameters, they also bring limitations such as high computational costs and slow inference speeds. However, the field of urban mobility computing urgently requires “smaller” models to better serve users and decision-makers on edge devices and platforms, as well as in regions with limited resources and infrastructure, adapting to real-time data updates. Knowledge distillation (
Wang et al., 2021) is an effective method. It involves transferring knowledge from a large, pre-trained model (teacher) to a smaller, more efficient model (student). For instance, in traffic flow prediction, a multi-modal model fine-tuned with extensive traffic data (e.g., real-time sensor data, historical traffic patterns, and weather information) can be used as the teacher. The student model, designed to handle specific traffic scenarios such as rush hour congestion, learns to mimic the teacher’s decision-making process, capturing essential patterns in traffic behavior without the need for the teacher model’s extensive computational resources. Additionally, model quantization techniques (
Yao et al., 2022) can speed up inference by lowering the precision of the model’s parameters. For instance, converting the model’s 32-bit floating-point values to 8-bit integers significantly reduces the memory footprint and accelerates processing time. In a traffic monitoring system, the quantized model can quickly process large-scale data from traffic cameras, sensors, or GPS signals, enabling faster response times for tasks such as traffic signal timing, en-route vehicle diversion, and congestion mitigation. Moreover, model pruning (
Ma et al., 2023) techniques provide another avenue for creating lightweight models by removing redundant parameters, neurons, or layers that contribute minimally to the model’s output. For example, neurons that contribute less to the final output can be pruned by assessing the activation levels of neurons or layers. On the other hand, by setting the parameter importance threshold, network edges with smaller weights are removed.
In addition to developing lightweight models, by carefully designing task-specific prompts, we can constrain the LLM’s response space, reducing unnecessary computations. Techniques like prompt tuning and template optimization enable the model to focus on specific urban mobility tasks, enhancing efficiency. Meanwhile, mobility computing tasks can be disassembled and assigned to algorithms with varying computational complexity. For instance, simpler tasks can be handled by rule-based or traditional machine learning models, while LLMs are reserved for tasks requiring complex reasoning or language understanding. Additionally, computations can be offloaded to edge devices or distributed systems, balancing the load and reducing central processing demands.
5 Conclusions
As urban transportation systems grow more complex, traditional frameworks struggle to align with the dynamic demands of real-time and multi-modal environments. LLMs, with their advanced natural language processing and multitasking capabilities, offer promising solutions for understanding and optimizing these systems. However, implementing and deploying them requires careful, gradual progress. While LLMs can process large-scale heterogeneous mobility data and deliver context-aware, interpretable decision support, challenges such as model instability and high computational costs must be addressed to unlock their potential. Going forward, research should prioritize improving multi-modal data utilization through efficient data encapsulation and representation (including data alignment, feature transformation, and data integration) while boosting computational efficiency and accuracy via model compression and integration. Additionally, lightweight model construction will be crucial for developing implementable model applications. On this basis, the LLM-based smart city mobility computing framework can flexibly integrate and expand components, such as anomaly identification, cluster analysis, and deviation analysis. Achieving these goals necessitates interdisciplinary collaboration across disciplines such as urban science, transportation engineering, computer science, and social science. For example, insights from urban planning can guide the design of LLMs that reflect city-specific transportation patterns and long-term infrastructure goals. Meanwhile, advances from computer science in areas such as efficient algorithms, and distributed computing are critical to improving the scalability and efficiency of LLM-based solutions. By tackling these challenges, LLMs hold significant promise for improving urban transportation systems, supporting the evolution of smart cities, and enabling more adaptive, efficient, and sustainable urban management intelligence.