1 Introduction
In modern society, traffic management faces complex challenges due to accelerated urbanization, higher population density, and increased traffic demand, resulting in frequent congestion, pollution, and accidents, which urges the need for effective solutions. ITS is an essential tool for addressing these issues, integrating advanced technologies to enhance transportation efficiency, service control, and vehicle manufacturing. ITS architecture typically includes four main subsystems: road, vehicle, traveler, and management (
Meneguette et al., 2018;
Alams et al., 2016). Through the collaboration of these four subsystems, ITS achieves intelligent traffic flow management, dynamic vehicle monitoring, personalized traveler services, and effective control of the traffic environment, which enhances urban traffic efficiency and sustainability.
With the rapid development of AI, ITS applications are expanding. AI processes extensive traffic data in real time, providing precise decision support for traffic managers. For instance, intelligent traffic signals are designed to adapt in real time to traffic conditions, allocating signal time efficiently to improve road capacity and minimize traffic congestion. Additionally, intelligent monitoring systems detect abnormal traffic behaviors (e.g., wrong-way driving, speeding) and analyze this data to issue alerts, enabling timely responses that significantly enhance traffic safety (
Veres and Moussa, 2019;
Yin et al., 2021). These applications make traffic management more intelligent and provide support for the sustainable development of urban traffic. However, the increasing complexity of urban traffic networks, characterized by greater diversity and interactions between different transportation modes, poses challenges to traditional AI techniques. Recently, the emergence of AIGC has provided new solutions to these challenges. Different from traditional AI, AIGC focuses on creating new content and generating solutions, which provides greater creativity and flexibility. By performing in-depth data analysis, it can generate innovative solutions based on existing information, providing diverse strategies for traffic management. On one hand, by integrating various technologies such as generative AI algorithms, natural language processing (NLP), and computer vision, AIGC holds significant potential to enhance human-computer interaction in traffic scenarios. Such applications mainly exist in three major areas: dialog and reasoning, prediction and decision-making, and multimodal generation. On the other hand, the National ITS Reference Architecture in the United States defines the physical objects, execution functions, and information exchange processes in ITS from a physical perspective. The architecture consists of four subsystems: road subsystem, vehicle subsystem, traveler subsystem, and management subsystem (
Meneguette et al., 2018;
Alams et al., 2016). The differentiated needs for AIGC across four subsystems make the application in ITS both targeted and diverse.
Therefore, this paper focuses on specific applications of the three core AIGC technologies within the four ITS subsystems, including real-time traffic information, traffic flow prediction, accident risk assessment, and traffic safety education. These applications improve user experience and advance traffic management for greater intelligence, efficiency, safety, and sustainability. Additionally, this paper examines the challenges of AIGC in practical applications and proposes solutions to guide future development and implementation. By analyzing these applications and challenges, we aim to provide practical insights for optimizing ITS.
In conclusion, the main contributions of this review are summarized as follows:
1) This paper innovatively proposes a review framework of generative AI applications in ITS. We propose a cross-dimensional framework that systematically connects core AIGC technologies with ITS subsystems. With the help of this framework, the potential application direction of AIGC can be observed and analyzed from different dimensions, so as to better sort out the technical path of generative AI to promote the development of ITS, and to provide reference for the subsequent related research.
2) This paper comprehensively reviews the cutting-edge applications, potential directions and future challenges of AIGC in ITS. We systematically analyze how generative AI can effectively solve key problems in ITS in terms of dialog and reasoning, prediction and decision-making, and multimodal generation. Meanwhile, this paper discusses the open challenges encountered in applying generative AI in ITS. By discussing the possible future research directions and challenges to be addressed, this paper provides a literature reference for researchers, policy makers, and industry professionals to promote the progress of intelligent transportation.
The rest of this review is organized as follows: Section 2 introduces the methodology adopted in this review. Section 3 introduces the key AIGC technologies for text, image, video, and multimodal tasks. Section 4 examines three core technologies of AIGC — dialog and reasoning, prediction and decision-making, and multimodal information generation, and analyzes their specific applications across the four ITS subsystems. Section 5 discusses the main challenges AIGC faces within ITS and suggests possible solutions. Finally, Section 6 summarizes current AIGC applications in intelligent traffic and outlines future development directions.
2 Research methodology
To comprehensively retrieve relevant literature for this review, we utilized multiple academic databases, including Google Scholar, Web of Science, and ScienceDirect, ensuring a broad and diverse range of sources. When writing the literature review on AIGC technologies, we first systematically categorized the research into different modalities: text generation, image generation, audio generation, and multimodal generation. For each modality, we implemented a search strategy based on core keywords, expanding to closely related technical terms. For instance, in the domain of text generation, we broadened the search to include terms like “text creation,” “language generation,” and “natural language generation.” After gathering the literature, we organized the key advancements within each modality chronologically, emphasizing significant milestones in technological development and elaborating on landmark breakthroughs and innovations. By systematically synthesizing these technological advances, we ensured that the review remained concise, logically structured, and clearly illustrated the evolution of AIGC technologies.
In the process of identifying literature related to the application of AIGC in ITS, this study employed a systematic keyword combination strategy. Specifically, by pairing the three primary technical domains of AIGC—dialog and reasoning, prediction and decision-making, and multimodal generation, with the four major subsystems of ITS (road, vehicle, passenger, and management), a series of application-specific keywords were generated. For example, combining the road subsystem with dialog and reasoning technology produced the keyword “Dialogue and Reasoning for Road Subsystems,” which reflects the broad application of AIGC technology in the road subsystem, covering various functional scenarios like traffic management and road monitoring. To further enhance the coverage and precision of the search, a synonym expansion strategy was also employed, such as expanding “Dialogue Systems” to “Conversational AI”, thereby capturing a wider array of relevant literature. This systematic keyword combination and expansion strategy not only ensured comprehensive coverage of the diverse application scenarios of AIGC in ITS but also significantly improved the efficiency of literature retrieval, ensuring the scientific rigor and thoroughness of the review.
3 AIGC
3.1 Text generation
Text generation initially relied on models such as Recurrent Neural Network (RNN), Long Short-Term Memory network (LSTM), and Gated Recurrent Unit (GRU). In 2017, Google introduced the Transformer model, which differs from traditional RNN and CNN structures by employing an encoder-decoder architecture based on the attention mechanism. This model eliminates the need for recurrent and convolutional operations, significantly enhancing text processing performance (
Vaswani, 2017). Subsequently, the series of GPT models based on Transformer model rapidly developed. GPT is based on a generative pre-trained Transformer architecture that employs a decoder-only Transformer model to generate output distributions for target tokens (
Radford, 2018). GPT-2 further improved the quality of generated text through extensive pre-training on large data sets (
Radford et al., 2019). In 2020, GPT-3 expanded the parameter scale to 175B, which is more than a 100-fold increase over GPT-2, representing an extreme attempt at model scaling (
Brown, 2020). In 2022, OpenAI released ChatGPT, a conversational AI model fine-tuned from GPT-3.5. This model was trained on human dialog data, exhibiting the extensive knowledge base, complex problem-solving abilities, context tracking and modeling capabilities, while aligning with human values (
Ouyang et al., 2022).
3.2 Image generation
In 2013, Variational Autoencoder (VAE) was introduced as the latent variable model, providing a new perspective on image generation (
Kingma and Welling, 2013). VAE uses a lower bound estimator to fit an approximate inference model (also known as a recognition model) to unmanageable posterior distributions, achieving efficiency and flexibility in generating samples, thus becoming a significant component of image generation techniques. Subsequently, Generative Adversarial Networks (GANs) utilized an adversarial process between a generator and a discriminator to produce high-quality images, significantly enhancing the realism and diversity of generated images (
Goodfellow et al., 2014). In 2018, Normalizing Flow introduced a precise probability density modeling method that transforms simple densities into rich data distributions through a series of transformations, enhancing the quality and diversity of generated images (
Papamakarios et al., 2021). Recently, diffusion models have emerged as a prominent technology for image generation due to their gradual process of simulating data generation, which greatly enhances the realism and quality of generated images. Recently, diffusion model was an emerging technology for image generation due to its gradual process of simulating data generation, which greatly enhances the realism and quality of generated images (
Ho et al., 2020;
Dhariwal and Nichol, 2021).
3.3 Audio generation
Recently, Audio generation has made significant advancements. WaveNet, proposed by DeepMind, generates high-quality audio waveforms using deep Convolutional Neural Networks (CNNs) with dilated causal convolutions (
Oord et al., 2016). Tacotron 2, released by Google, is a speech synthesis system that enhances speech naturalness and fluency by combining CNN and RNN, using Mel spectrograms as conditional inputs for WaveNet (
Shen et al., 2018). Transformer-TTS further propelled speech synthesis by introducing and adapting multi-head attention mechanisms to replace RNN structures and the original attention mechanisms in Tacotron 2, resulting in more efficient synthesis (
Li et al., 2019). Whisper, launched by OpenAI, improved performance through extensive data and parameters, supporting multiple languages and maintaining high accuracy in noisy environments (
Radford et al., 2023). MusicLM, released by Google, uses multi-stage autoregressive modeling and incorporates text conditions to generate high-quality music segments from textual descriptions, demonstrating innovative capabilities in music creation. These technological advancements have not only propelled the development of audio generation but also provided important references for future model improvements (
Agostinelli et al., 2023).
3.4 Multimodal generation
The advancement of multimodal generation technology has broadened the application fields of generative AI. The complex connections and interactions between modalities make multimodal representation spaces harder to learn than unimodal ones. OpenAI’s CLIP (Contrastive Language–Image Pretraining) model enables cross-modal search and matching through joint training of text and images (
Radford et al., 2021). Microsoft’s Florence is a vision-language model that adapts well to various tasks, including classification, retrieval, object detection, visual question answering (VQA), image description, video retrieval, and action recognition, demonstrating flexibility across scenes, objects, static images, and dynamic videos (
Yuan et al., 2021). In 2023, the releases of Gemini and GPT-4 signal further progress in multimodal generation technology (
Team et al., 2023;
Achiam et al., 2023). Gemini can generate integrated content across text, images, and audio through multimodal fusion, while GPT-4 achieves cross-modal understanding and generation via joint training on text and images.
4 Applications of AIGC in ITS
The application of AIGC within ITS demonstrates significant potential in enhancing traffic management efficiency and user experience. AIGC encompasses a wide range of technologies, including generative AI algorithms, NLP, and computer vision (
Yang et al., 2024c;
Wang et al., 2023). The primary technologies underlying AIGC can be divided into three essential components:
1)
Dialogue and reasoning. It enables users to interact with the system through natural language, featuring contextual understanding and emotional recognition capabilities. By integrating information from various modalities, these technologies generate a more immersive user experience and facilitate human-like communication between chatbots, virtual assistants, and customers, thereby improving human-computer interaction (
Liu et al., 2024b;
Yang et al., 2024c). For example, AIGC enables in-vehicle intelligent virtual assistants (IVAs) with enhanced NLP and dialog generation and reasoning capabilities, enabling them to respond more flexibly to user needs, enhance the user’s travel experience, and provide dynamic travel suggestions.
2)
Prediction and decision-making. As another crucial application of AIGC, prediction and decision-making models focus on analyzing historical data to identify potential trends. This enables the interpretation, prediction, and response to complex scenarios and interactions across various domains, thereby assisting users or managers in making informed choices. Such technologies provide accurate predictions in complex situations, and possess the ability to adapt to dynamic changes (
Shoaib et al., 2023;
Wang et al., 2023). For example, large language models (LLMs) can accurately capture the global spatial and temporal patterns of traffic flows and predict future demand for transportation modes, such as buses, taxis, and shared bikes, which is expected to help achieve efficient vehicle allocation and scheduling.
3)
Multimodal generation. It is the foundation of AIGC, driving innovation across diverse fields by automatically generating high-quality text, images, and videos, and by integrating multimodal inputs and outputs (
Zhang et al., 2024c;
Wang et al., 2023). For example, AIGC helps to solve the problem of generating scarce data samples and has become an effective tool for automatic vehicle image recognition. In the field of traffic scene generation, advanced generative models are also widely used in the training, development and testing of ITS, which can greatly reduce the time and cost of real-world testing.
ITS is designed to improve transportation services and create an integrated management system for people, roads, and vehicles (
Patel et al., 2019;
Sayed et al., 2023). The National ITS Reference Architecture in the United States divides the ITS framework into four subsystems from physical perspective: road subsystem, vehicle subsystem, traveler subsystem, and management subsystem (Fig. 1). AIGC integrates with these four subsystems through three application technology areas—dialog and reasoning, prediction and decision-making, and multimodal generation, resulting in 11 specific applications (Fig. 2). The applications of AIGC technology in each of these three technology areas and how AIGC contributes to urban traffic development in different subsystem scenarios will be discussed in the following sections.
4.1 Application of dialogue and reasoning in ITS
Dialogue and reasoning technology is one of the core applications of AIGC in ITS, facilitating smooth interaction between users and systems through NLP and intelligent reasoning. This interaction extends beyond mere information exchange. Meanwhile, it encompasses in-depth analysis of driving behavior and individual needs, achieving personalized services and recommendations that significantly enhance user experience. AIGC-powered intelligent virtual assistants respond to travel demands in real time, providing intuitive navigation and dynamic travel suggestions while enhancing human-vehicle interaction (HVI) and capturing drivers’ emotional states to improve safety and convenience. In traffic accident analysis, this technology enables rapid identification and analysis of accidents and the effective formulation of preventive measures.
4.1.1 Human-vehicle interaction
HVI is closely related to human-robot interaction (HRI) and encompasses research in perception, information exchange, reasoning, and decision-making (
Goodrich and Schultz, 2008). As autonomous driving technology advances, drivers have more time and opportunities to engage in various tasks beyond driving, opening new avenues for intelligent HVI. Vehicles integrate sensors to capture the behaviors, emotions, and individual preferences of drivers and passengers, providing precise functionalities and services to enhance the driving experience (
Biondi et al., 2019). The personalized data generation of AIGC enhances HVI by allowing customization of driving experiences based on driver preferences (e.g., adjusting speed, routes, and interior ambiance) and providing tailored feedback (e.g., suggesting optimal driving habits, alerting to potential dangers, and offering emergency assistance), thereby making autonomous driving more enjoyable, efficient, and safe (
Zhang et al., 2024b). The role of IVAs varies with the current task, the level of automation, and the driver’s cognitive state. With AIGC assistance, HVI integrates visual (
Martinelli et al., 2020;
Lim et al., 2020), tactile (
Blomeyer and Schulte-Gehrmann, 2019), auditory (
Liao et al., 2017;
Tran and Tsai, 2020), and physiologic sensing (
Dahiya, 2019;
Manjakkal et al., 2021) technologies to recognize and interact with human behaviors and intentions (
Capallera et al., 2022). These interactions can be categorized into implicit and explicit types (
Capallera et al., 2022;
Biondi et al., 2019). Implicit interactions rely on AI algorithms to infer user behavioral states or intentions, and explicit interactions are triggered by active commands, such as voice and gestures (
Janssen et al., 2020).
4.1.2 Intelligent virtual assistants
AIGC empowers IVAs with enhanced NLP, dialog generation, and reasoning functions, which enable them to respond more flexibly to user needs. Through generative language models, IVAs produce personalized dialogs in real time, address complex travel queries and provide dynamic travel suggestions (
Ren, 2024;
Liu et al., 2024b). For instance, IVAs use AIGC to create personalized voice prompts and interactive feedback, and to further provide intuitive guidance through virtual traffic images and audio navigation prompts. In addition, IVAs can analyze users’ historical behaviors and preferences to generate travel suggestions that align better with user needs, enhancing interaction efficiency and service quality (
Ren, 2024). Furthermore, AIGC-driven IVAs can dynamically recommend routes based on real-time traffic conditions and utilize big data to predict future traffic flow. Personalized travel services help users plan optimal travel times and routes in advance, enhancing their experience and overall satisfaction (
Liu et al., 2024b). As AIGC evolves, IVAs will gain enhanced capabilities for better understanding user needs as well as delivering smarter and optimized services. In the future, the integration of AIGC and IVAs will yield an even more intelligent and convenient interactive experience.
4.1.3 Intelligent traffic accident analysis
Intelligent traffic accident analysis utilizes data analysis and AI for the real-time identification, responsibility determination, accident pattern recognition, and formulation of preventive measures related to traffic accidents (
Zheng et al., 2023). As urbanization and the number of vehicles increase, the road traffic system becomes increasingly complex. Therefore, it is crucial to achieve precise identification and analysis of traffic accidents, as well as to determine responsibility. The advantages of AIGC in intelligent traffic accident analysis include its ability to automatically generate and analyze large amounts of data. It can accurately identify accident causes and responsibilities while optimizing processing flows in real time, significantly enhancing the efficiency and accuracy of accident management (
Yu et al., 2024;
Wang et al., 2024a;
Chen et al., 2024). For real-time accident identification, Yu et al. (
2024) proposed a fine-grained traffic accident detection framework based on Transformers, which combines RGB and optical flow information for accident classification, spatiotemporal localization, and severity estimation. Wang et al. (
2024a) developed traffic accident profiles based on generative image supplement samples and AI auto-labeling, simplifying the traffic accident management process and improving its efficiency and accuracy. For responsibility determination, Chen et al. (
2024) introduced the automated method based on LLM and collision detection for determining responsibility in minor accidents, achieving accident assessments without human intervention and generating detailed accident reports. For accident pattern recognition and preventive measure formulation, Zheng et al. (
2023) developed a smartphone-based collision report generation framework utilizing LLMs and cross-modal encoders, enhancing the efficiency of accident identification and analysis. Zhou et al. (
2024) explored the application of the large visual language model GPT-4V in understanding and processing complex traffic events, finding that GPT-4V effectively recognizes and analyzes accident scenarios and suggests reasonable safety decisions. Grigorev et al. (
2024) developed TrafficSafetyGPT, a language model tailored for traffic safety tasks that provides precise and safe traffic management recommendations, offering technical support for intelligent traffic safety measures. In the future, AIGC will enhance accident data analysis by integrating historical and real-time data. This will enable accurate identification of accident patterns and influencing factors. As a result, traffic management authorities can predict high-incident areas and times, allowing them to formulate appropriate safety measures. This advancement will drive ITS toward greater safety and efficiency.
4.2 Application of prediction and decision-making in ITS
In the field of ITS, prediction and decision-making technologies serve as another essential application. The core of these technologies lies in the comprehensive collection and in-depth analysis of various types of traffic-related data, including historical data and real-time data. On one hand, this method can assist drivers with tasks using intelligent road facilities and generative AI, promptly identify and warn of dangerous driving behaviors, and ensure driving safety. On the other hand, AIGC technology, through the analysis and prediction of a large amount of traffic data, can accurately perceive traffic trends and conditions, help traffic management personnel and drivers make wise decisions, and promote the rational allocation of resources.
4.2.1 Intelligent road infrastructure
With the development of AIGC, the intelligence level of road infrastructure has significantly improved. Real-time collection and analysis of traffic data form the core task in this field. By deploying sensors, cameras, and other intelligent devices on roads, a large amount of real-time traffic data are captured and processed through complex algorithms and models to achieve real-time inference of traffic conditions. This AIGC-based infrastructure enables automated monitoring and provides accurate, real-time traffic data (e.g., flow, speed, patterns) to traffic management.
Some scholars believe that the future structure of roads will gradually shift from the current human-vehicle-road component-based system to an intelligent system focused on vehicle-road and vehicle-to-vehicle interactions. In response to this trend, the integration of AIGC into road infrastructure construction is becoming increasingly important (
Pompigna and Mauro, 2022;
Shiwakoti et al., 2020;
Mahmassani, 2016). According to Okem et al. (
2023), intelligent roads generated by AIGC will no longer be passive infrastructure but, through real-time information interaction and adaptive functions, become an essential component of urban traffic systems that promotes overall mobility. Zhao and Wu (
2015) categorized the functions of intelligent roads into four parts: automatic real-time road condition monitoring, real-time information interaction, automatic adaptation to traffic conditions, and energy support for vehicles. The realization of these functions relies on the synergy of AI algorithms and sensors. Mao et al. (
2021) suggest that AIGC-generated intelligent road information can provide more comprehensive coverage of road conditions, enabling vehicles and drivers to obtain detailed traffic and weather information in real-time, thereby enhancing the refined control capabilities of traffic management departments. Through augmented reality (AR) technology, intelligent roads can also provide real-time information to drivers, enhancing the interactive experience (
Lee, 2024).
In addition, AIGC can provide real-time information to vehicles and pedestrians on the road through traffic signal control, promoting the intelligent development of urban transportation systems. With the assistance of AIGC, traffic signal control systems precisely adjust signal timing patterns and promptly deliver traffic information (e.g., road conditions) to vehicles and pedestrians. This reduces vehicle wait times, improves traffic flow efficiency, and enables intelligent traffic signal control. One of the primary goals of ITS is to optimize traffic flow and reduce congestion. AIGC holds significant potential in traffic guidance and information dissemination. It can broadcast AI-detected traffic events and road condition changes through various channels (e.g., electronic screens and navigation apps) to vehicles and pedestrians in real time, enhancing the timeliness and accuracy of travel decisions. Recently, LLMs and specialized machine learning models have been widely adopted, though their potential in traffic information dissemination remains underexplored (
Lin, 2024;
Kohnke et al., 2023;
Jeon et al., 2023). Gomez et al. (
2019) highlighted that intelligent information dissemination is a key component in advancing the intelligence of urban and highway transportation systems. Syum Gebre et al. (2024) integrated physical information neural networks (PINNs) with GPT-4 to provide real-time responses to user inquiries, which not only improved traffic prediction accuracy but also enhanced real-time navigation for users. In the future, AIGC is expected to drive even more transformative changes in traffic signal control. An AIGC-based traffic signal control system captures and integrates multi-source data to predict traffic conditions accurately. With advanced training and adaptive learning, these systems can adjust signal intervals, optimize traffic flow, reduce congestion, and improve urban traffic efficiency.
4.2.2 Driver assistance decision support
AIGC is rapidly becoming a major driving force in the field of driver assistance decision support, showing great potential, especially in recognizing and alerting unsafe driving behaviors (
Ren, 2024;
Liu et al., 2023). By analyzing real-time multimodal data, such as images, audio, and sensor data, AIGC can effectively identify risky behaviors like speeding, sudden braking, drowsy driving, and distracted driving. It can also generate personalized safety alerts, optimizing the driver’s decision-making process, thereby significantly enhancing driving safety (
Li et al., 2023). For example, by integrating edge computing and cloud computing technologies, AIGC can detect and alert for distracted driving behavior in real-time, increasing driver awareness and safety (
Gumaei et al., 2020;
Qiu et al., 2022). Additionally, AIGC can continuously adjust and improve the alert content by learning driver habits and feedback, dynamically tuning the frequency of safety reminders, thus enhancing both safety and comfort. This provides a novel solution for decision support within ITS (
Ren, 2024;
Liu et al., 2023). For example, Wang et al. (
2024d) proposed the concept of Mobility Digital Twin (MDT), which includes Vehicle Digital Twin (VDT), Driver Digital Twin (DDT), and Traffic Digital Twin (TDT). The DDT system consists of four components: the real driver, the digital driver, multimodal interfaces, and associated applications. The real driver serves as the physical entity and foundation of the DDT system, acting both as the data generator and the recipient of system services. The digital driver is a virtual replica of the human driver, capable of accurately and comprehensively reflecting the real driver’s behavior in real time. During the driving process, the digital driver model interacts with the human driver in real time, monitoring fluctuations in the real driver’s emotions and decline in driving abilities. In summary, the continuous development of AIGC will provide smarter and more humanized experiences for driver assistance systems, offering strong support for the comprehensive enhancement of traffic safety and driving efficiency.
4.2.3 Intelligent traffic prediction and management
Traffic prediction is a core task in ITS, aimed at predicting future traffic characteristics, such as traffic flow and public transportation demand, by analyzing historical data (
Liu et al., 2024a). This prediction assists traffic managers in better allocating resources, optimizing signal control, and improving road safety (
Miao et al., 2024;
Zhou et al., 2024). Traditional traffic prediction primarily relies on machine learning and deep learning models, such as RNNs (
Almukhalfi et al., 2024), CNNs (
Yuan et al., 2018), and Graph Convolutional Networks (GCNs) (
Feng et al., 2023). While these models have made progress in capturing spatiotemporal dependencies, they face challenges in handling complex non-Euclidean structures and long-tail scenarios, often leading to accuracy limitations and excessive model complexity. Moreover, traditional models struggle to adapt to dynamic traffic environments and have high training costs due to reliance on large data sets (
Liu et al., 2024a;
Jiang et al., 2023). Generative AI (AIGC) technology offers new solutions for traffic prediction. Compared to traditional models, AIGC (e.g., LLMs) captures spatiotemporal dependencies better through parameter expansion and large-scale pretraining, reducing dependence on specific adjacency matrices, greatly enhancing model adaptability and predictive accuracy (
Qu et al., 2023). For instance, TrafficBERT, using the Transformer architecture, effectively captures road condition changes in large data sets through multi-head self-attention mechanisms, outperforming traditional statistical and deep learning models without relying on specific road or weather data (
Zhao et al., 2023). AIGC demonstrates exceptional capability in traffic flow prediction. For example, the ST-LLM (Spatiotemporal Large Language Model) integrates spatiotemporal embeddings into a unified prediction framework, capturing global spatiotemporal patterns of traffic flow precisely, particularly excelling in few-shot and zero-shot scenarios (
Liu et al., 2024a). Moreover, AIGC shows significant potential in traffic demand prediction, aiming to forecast future demand for transport services in specific times or regions, such as taxis, ride-hailing, and bike-sharing. For instance, Liu et al. (
2024a) effectively predicted taxi and bike-sharing demand using the proposed ST-LLM, achieving efficient vehicle allocation and dispatch. Yuan et al. (
2025) developed UniST, a universal model for urban spatiotemporal prediction, which utilizes generative pretraining and spatiotemporal knowledge prompts to excel in demand prediction and other tasks.
In the field of intelligent traffic management, the importance of traffic management has become increasingly prominent as ITS continues to develop. Intelligent traffic management achieves optimized management of traffic flow by dynamically monitoring and controlling road traffic, vehicles, and infrastructure, improving road efficiency, reducing congestion, and enhancing travel safety. As an emerging technology, AIGC plays a vital role in this field. It can accurately detect vehicle safety conditions, identify traffic flow status in real-time, intelligently monitor road traffic controls, and comprehensively perceive changes in the surrounding environment, providing detailed data support for traffic managers and assisting decision-makers in formulating scientifically sound traffic management plans to optimize road traffic efficiency (
Ren, 2024). In addition, the development of AIGC technology is continuously driving cities toward a true transformation into smart cities. Lifelo et al. (
2024) proposed that meta-universes in the context of AIGC technology are able to fully integrate digital and physical reality spaces, allowing for immersive real-time interactions between transportation managers, citizens, and urban infrastructures, making smart cities more sustainable. Villarreal et al. (
2023) explored the application of ChatGPT in helping novices solve complex hybrid traffic control problems in ITS. The results showed that ChatGPT can improve the success rate of hybrid traffic control tasks and help smart city transportation systems to operate more efficiently. Zhang et al. (
2024c) combined LLM with the Transportation Fundamental Models (TFM), and the results showed that AIGC can assist humans to efficiently decompose complex traffic management tasks and provide decision support, which significantly improves the accuracy and response speed of traffic data analysis, showing great potential to promote the development of smart city management toward greater efficiency and intelligence. Another important aspect is the application of Digital Twin (DT) technology in smart cities, particularly in traffic management, where it enhances the efficiency of resource allocation and public safety through real-time monitoring and optimization of urban traffic systems. When integrated with AIGC, DTs can further improve the simulation capabilities of traffic scenarios, thereby increasing the intelligence level of Intelligent ITS (
del Campo et al., 2024).
4.3 Application of multimodal generation in ITS
Multimodal information generation technology, as one of the foundational technologies of AIGC, has become a core driver of innovation across various fields. It possesses powerful capabilities to automatically generate high-quality text, images, and videos, enabling further integration of multimodal information inputs and outputs. For example, in road network design and planning within the transportation sector, AIGC can assist in generating optimal solutions and providing recommendations, enhancing road network traffic capacity. In autonomous driving scenario generation, it provides rich, diverse, and high-fidelity data for the development of autonomous driving technologies. In driving behavior simulation, it creates highly realistic scenarios to improve drivers’ safety awareness and skills. In traffic data and scenario generation, it addresses data scarcity issues and constructs effective scenarios for training and testing. In generating content for traffic safety education and publicity, it can create diverse materials and interactive platforms to increase public awareness and understanding.
4.3.1 Road network design and planning
The application of AIGC in road network design and planning enables road planners to generate optimal design solutions and quickly simulate traffic flow performance under different scenarios, assessing each option’s advantages and disadvantages. This helps select the most efficient and cost-effective solution. AIGC can also generate road network optimization recommendations based on historical data and real-time demand, such as adding or optimizing road nodes and adjusting lane configurations, to improve network capacity.
In urban transportation, road network planning is closely related to the Network Design Problem (NDP) (
Bagloee et al., 2017). Previously, some researchers have utilized traditional machine learning methods, such as artificial neural networks, simulated annealing algorithms, and genetic algorithms, to conduct in-depth studies on NDP (
Karoonsoontawong and Waller, 2006;
Akgüngör and Doğan, 2009;
Xu et al., 2009). However, research using AIGC to generate optimized design solutions remains relatively limited. Yan and Li (
2023) suggest that traditional traffic scenario simulations are constrained by data costs and complexity, making it challenging to capture complex dynamic traffic processes comprehensively. AIGC models can learn from small amounts of real traffic data, generating more realistic traffic scenarios that simulate dynamic changes under varying traffic volumes, thus enhancing evaluation and optimization effectiveness.
In practical applications, the US Department of Transportation allocated 15 million dollars to support AI system development and launched the Complete Streets AI initiative, incorporating AIGC into transportation infrastructure construction. The initiative aims to provide decision-support tools for state and local governments, aiding in the design and deployment of Complete Streets networks.
4.3.2 Autonomous driving scenario generation
The development and testing of autonomous driving systems require high-quality data; however, real-world data sets often suffer from small data volumes, unable to cover all scenarios, with high collection and labeling costs (
Liu et al., 2019). Data generation within AIGC is a promising alternative that can provide large-scale, diverse, and high-fidelity data for the development of autonomous driving technology, helping these systems better handle complex real-world driving conditions (
Zhang et al., 2024d;
Liu et al., 2019).
For example, in the urban metaverse cyberspace, AIGC technology, through a smart city transportation digital twin platform, can dynamically generate virtual mappings of complex traffic scenarios, providing real-time decision support for ITS systems (
Wang et al., 2024d;
Jin et al., 2023). Specifically, DT technology enables precise environmental modeling for connected and automated vehicles (CAVs) in smart cities, enhancing their performance in the physical world (
Xiong et al., 2023). However, DTs primarily focus on environmental modeling while overlooking the interactions and intentions of traffic participants (
Xue et al., 2024). The metaverse, integrating virtual reality (VR) and AR technologies, creates an immersive training environment, allowing CAVs to undergo more realistic training and interact with other road users (
Xu et al., 2023b;
Shi et al., 2023;
Mourtzis, 2023). In this process, AIGC technology plays a crucial role (
Wen et al., 2023). AIGC can generate diverse traffic scenarios within the metaverse, enabling CAVs to train for various complex and unknown traffic situations, particularly those they have never encountered before (
Wang et al., 2024e). Furthermore, by integrating AIGC technology, the metaverse can simulate not only regular traffic conditions but also extreme and safety-critical scenarios, allowing autonomous vehicles to respond more accurately and safely to unpredictable real-world situations (
Xu et al., 2023b). In summary, the combination of digital twins and AIGC, facilitated by the metaverse, provides diverse and innovative training environments that enhance CAVs’ decision-making and interactive reasoning capabilities, enabling them to navigate complex traffic conditions and advancing the development of ITS.
4.3.3 Driving behavior simulation
Unsafe driving behaviors such as fatigue driving, speeding, and distracted driving have become major causes of traffic accidents, leading to numerous fatalities, injuries, and property losses (
Shaik, 2023). Driving behavior simulation technology can simulate the consequences of these risky behaviors in real driving scenarios, visually demonstrating the potential accident risks. This enhances drivers’ awareness of the dangers, encouraging them to drive more cautiously and avoid unsafe behaviors, thus effectively improving road safety (
Amini et al., 2023). Recently, scholars have conducted extensive research on driving behavior simulation. Fang et al. (
2021) proposed a Semantic Context-Induced Attention Fusion Network (SCAFNet), which, through video semantic region segmentation and fusion technology, improved attention prediction for drivers in accident scenarios. Amini et al. (
2023) found through driving behavior simulation that distracted driving significantly impacts driver reaction times and gaze patterns. Moreover, digital twin and AR technologies can enhance the accuracy and realism of driving behavior simulation, enabling drivers to experience more immersive training in complex scenarios. This, in turn, significantly improves their ability to respond to unexpected situations and enhances overall driving safety (
Calvi et al., 2020;
Liao et al., 2023;
Wang et al., 2024b). For instance, VDT technology is utilized to generate virtual vehicles and replicate or extend real vehicle behavior patterns through data-driven and physics-based approaches. This allows for the simulation of various traffic scenarios, helping drivers anticipate and respond to potential issues (
Wang et al., 2023;
Xu et al., 2023b). When integrated with AIGC technology, VDT can further enhance the diversity and complexity of simulated scenarios (
Wang et al., 2023). AIGC enables the real-time generation of a wide range of complex traffic situations, including unprecedented extreme events, providing more comprehensive training data for driving behavior simulation. By leveraging AIGC, driving behavior simulation can overcome traditional limitations, offering a novel approach to achieving safer and more efficient driving behavior modeling (
Wang et al., 2024c).
4.3.4 Traffic data and scenario generation
The execution of traffic tasks relies on vast amounts of data from various sources. However, real-world data often faces challenges such as a lack of comprehensiveness, inconsistency, and uneven distribution, making it difficult to use directly for model training. With the continuous development of AIGC, generating traffic data and constructing traffic scenarios have become significantly easier (
Qu et al., 2023). Furthermore, as discussed in previous sections, the rise of the metaverse presents significant opportunities for the development of AI models specifically designed for augmented AR/VR, driven by the vast amount of data generated within the metaverse. In particular, with the application of AIGC technology, the metaverse will continuously produce massive volumes of multimodal traffic data and traffic scenarios (
Zhang et al., 2024a).
In terms of generating scarce data samples, AIGC has become an effective tool for tackling challenges in traffic simulation, automatic vehicle image recognition, and traffic flow prediction modeling, gaining increasing attention (
Lateef et al., 2023). For example, Generative Adversarial Networks (GAN) can generate synthetic data, especially for long-tail scenarios, effectively expanding training samples and enhancing model recognition capabilities. Zhang et al. (
2024d) combined AR and GANs to generate high-quality synthetic data for training roadside perception detectors that can adapt to varying weather and lighting conditions. Additionally, the multimodal capabilities of LLMs have significantly reduced data labeling costs and greatly improved labeling efficiency. Another crucial task in traffic research is data imputation, as some data may be missing due to equipment malfunctions or insufficient measurements. AIGC has shown strong capabilities in data recovery (
Qu et al., 2023). For instance, Zhang et al. (
2024d) used GPT-3.5 to generate human-like text to fine-tune a BERT model, improving the accuracy of traffic data imputation. Chen et al. (
2023) developed GATGPT, which combines LLMs with Graph Attention Networks to improve data imputation accuracy by capturing spatiotemporal dependencies in multivariate time series missing data.
Meanwhile, the advancement of AIGC further promotes the rapid development of digital twin technology in smart cities. Xu et al. (
2024) summarized four typical applications of generative AI to promote the development of smart cities, which are transportation data augmentation, missing data imputation, generation of synthetic mobility data, and traffic simulation scenario generation. One of the advantages that AIGC technology possesses is that it facilitates the realization of advanced generative models for a wide range of applications in the training, development and testing of ITS, significantly reducing the time and cost of real-world testing. For example, Hu et al. (
2023) developed a Cooperative Adaptive Cruise Control (CACC) simulation platform to evaluate the effectiveness of connected vehicles and autonomous control systems. Yang et al. (
2024a) proposed the “World Center Diffusion Transformer” (WcDT) using diffusion probability models and Transformers to generate multimodal autonomous driving trajectories. Wang et al. (
2024b) introduced an autonomous scenario understanding framework, RoadScene2Graph (RS2G), which dynamically captures relationships between road users and transfers simulated knowledge to real-world scenarios. Yang et al. (
2024b) also proposed the Transformer-based PSG4DFormer model, capable of predicting panoramic segmentation masks and generating corresponding scene graphs, integrating LLMs to achieve dynamic scene understanding. Huang et al. (
2024) further combined imitation learning with diffusion generative models, proposing the Versatile Behavior Diffusion (VBD) framework, which simulates interactive scenarios among multiple traffic participants, generating consistent multi-agent interactions with refined scene editing capabilities.
4.3.5 Traffic safety education and public awareness content generation
AIGC demonstrates strong capabilities in video content creation and processing and has been widely used for generating trailers and promotional videos (
Ren, 2024;
Loeschcke et al., 2022). Building on this foundation, AIGC can play an even larger role. For example, AIGC’s powerful content generation capabilities can be fully utilized to create diverse traffic safety education materials. Through AIGC, vivid videos can be produced that use real cases and impactful visuals to illustrate the dangers of traffic accidents, raising public awareness. It can also produce illustrated tutorials that explain traffic safety knowledge and regulations in a clear and concise manner, enabling the public to quickly understand and master essential information. Additionally, AIGC can construct realistic simulation scenarios, allowing the public to experience proper responses in different traffic situations firsthand, thus enhancing traffic safety awareness and encouraging adherence to traffic regulations.
Moreover, AIGC can facilitate the creation of an interactive Q&A platform. On this platform, citizens can access detailed explanations of traffic policies, gaining insights into the significance and purpose behind various traffic regulations. When encountering traffic-related questions, citizens can receive real-time answers through the platform, whether about the meaning of road signs, traffic signal rules, or specific driving guidelines. Such an interactive Q&A platform not only provides convenient traffic information services but also helps increase public understanding and support for traffic management efforts, contributing to a safer and more orderly traffic environment.
5 Application challenges
5.1 Safety risks of fake content
AIGC is susceptible to the “hallucination” issue, where generated content may not align with real-world facts or user inputs. The hallucination problem in generative AI mainly stems from limitations in data, the training process, and inference mechanisms. Flaws and biases in the data, issues with architecture and strategies during training, and randomness in sampling and decoding constraints during inference can all lead to content that deviates from actual circumstances (
Ji et al., 2023). Especially in complex and unknown scenarios, the model may fail to capture all variables and details accurately, resulting in fake or misleading content. In such a complex traffic environment, the safety risks posed by fake content are particularly severe—for example, inaccurate road condition recognition or erroneous decision-making could directly lead to traffic accidents or even trigger system crashes, thereby threatening public safety and impacting the efficiency and stability of the entire traffic network. Thus, effective measures must be taken to improve traffic data quality, optimize the training process, and strengthen inference mechanisms to reduce the risk of hallucination, ensuring the reliability and safety of generative AI in the field of transportation.
5.2 Workforce transformation and human-machine collaboration
With the advancement of AIGC, traditional job markets are facing significant disruption. Previously, technological innovation primarily focused on replacing physical labor, freeing people from inefficient, repetitive tasks. However, AIGC is not only capable of handling simple repetitive tasks but also of simulating and replacing human thinking and decision-making processes in complex situations. For instance, in the field of intelligent driving, AIGC demonstrates observation and decision-making behaviors remarkably similar to those of human drivers through autoregressive, prompt-based, and contextual techniques. Consequently, it is gradually replacing the role of traditional drivers. With the proliferation of AIGC, the labor market is undergoing corresponding adjustments. Human roles are no longer confined to executing repetitive tasks; rather, they now require the ability to collaborate effectively with emerging technologies (
Nardo et al., 2020). For example, the autonomous driving field has already given rise to new roles, such as “autonomous vehicle safety operator,” indicating that the future work environment will center on human-machine collaboration. The future workforce will be more involved in the oversight, control, and optimization of AI systems rather than traditional operational work. This shift brings new challenges in skill requirements and increases the demand for vocational training and reskilling.
5.3 Social cognition and emotional trust problem
With the advancement of AIGC, human-machine interactions in the transportation sector are becoming increasingly frequent. The promotion of AI technology in the transportation field still faces challenges in social cognition and emotional trust (
Kim et al., 2023). On one hand, due to the complex and often opaque decision-making logic of such technology, public trust remains relatively low (
Tjoa and Guan, 2020). For example, an autonomous driving system may respond differently than a human driver to sudden traffic situations, potentially causing discomfort or even panic among passengers, leading to an aversion to ITS and hindering the adoption and widespread use of these technologies. On the other hand, excessive trust in AIGC could lead to insufficient monitoring of AI systems, increasing safety risks (
Aroyo et al., 2021). For instance, if traffic management departments overly rely on algorithms for traffic flow control, they might overlook data anomalies or system vulnerabilities, compromising the safety and efficiency of the entire traffic network. Therefore, finding a balanced level of trust in AIGC is crucial. It is essential to avoid both the rejection of technology due to mistrust and the risks associated with blind trust in it. To achieve this balance, it is necessary to establish a comprehensive regulatory and oversight system to ensure that AIGC’s decision-making process is transparent and auditable. Additionally, providing training for users and operators can help them correctly understand and utilize these technologies and make effective judgments and interventions when needed. This balance will contribute to the safe and effective application of generative AI in the transportation sector.
5.4 Legal and ethical issues
The application of AIGC raises legal and ethical concerns. The development and optimization of AIGC rely on extensive data training, often including content protected by copyright law. As a result, the generated content may face issues related to intellectual property protection. The application of AIGC in the transportation sector inevitably encounters numerous regulatory challenges, such as privacy protection, data security, and road safety. However, these models may not fully comply with existing regulations, and potential vulnerabilities or inaccurate responses could lead to accidents, posing potential legal liability risks. Defining the legal responsibility of AIGC in such contexts is a pressing issue that needs further exploration. In addition to legal issues, AIGC raises ethical concerns. Although AIGC can optimize decision-making by aggregating traffic data, it may inadvertently capture human biases and trends, potentially resulting in negative effects such as inducement or discrimination (
Ferrara, 2023). This potential bias could exacerbate inequalities in the distribution of traffic resources, making it essential to consider the impacts on social fairness and ethical principles when designing and applying AIGC.
6 Conclusions
AIGC plays an increasingly important role in ITS, providing powerful tools and methods to enhance transportation efficiency, safety, and sustainability. In this review, we first introduced the key technologies of AIGC in the four generative tasks of text, image, audio, and multimodal content. We then systematically discussed the critical application solutions of AIGC’s three core technologies — dialog and reasoning, prediction and decision-making, and multimodal information generation — in the four subsystems of ITS. Finally, we examined the safety risks of false content, workforce transformation and human-machine collaboration, social cognition and emotional trust challenges, as well as legal and ethical issues posed by AIGC. This review makes significant contributions by innovatively proposing a cross-dimensional framework that systematically connects core AIGC technologies with ITS subsystems, enabling the observation and analysis of potential AIGC applications from multiple perspectives. This framework not only clarifies the technical pathways through which generative AI can promote ITS development but also serves as a valuable reference for subsequent research. Additionally, the paper comprehensively reviews the cutting-edge applications, potential directions, and future challenges of AIGC in ITS, systematically analyzing how generative AI addresses key issues in dialog and reasoning, prediction and decision-making, and multimodal generation. By discussing open challenges and potential future research directions, this review provides a critical literature reference for researchers, policymakers, and industry professionals, ultimately advancing the progress of ITS.
This research provides important theoretical support and practical guidance for government authorities, researchers, and decision-makers in the transportation industry. For traffic management departments, the systematic framework proposed in this study can help them select appropriate technological solutions based on specific needs, thereby addressing practical transportation issues more efficiently and promoting the innovation and implementation of intelligent transportation technologies. For researchers, this study not only systematically reviews the cutting-edge applications of AIGC in ITS but also identifies key gaps and technical challenges in current research, offering clear direction for future research and technological breakthroughs. For policymakers, this study delves into the social impacts and ethical issues of AIGC in ITS, providing a solid theoretical foundation for the formulation of sound regulations and policies, thus facilitating the standardized development and sustainable application of AIGC technologies in the transportation sector.
However, this reviewer has several limitations. First, during the literature selection and classification process, the authors’ subjective judgment may have influenced the objectivity of the review. Specifically, in constructing the classification framework, and selecting key studies, the authors’ research background, academic preferences, and understanding of the field’s major trends may have subtly affected the final selection of literature. Second, the division of AIGC’s three application technology areas and ITS’s four subsystems into 12 application scenarios may lead to insufficient coverage of certain subfields in the literature. Although efforts were made to address this issue by expanding the scope of keywords, the comprehensiveness and persuasiveness of the review may still be impacted. Third, if the review overly relies on theoretical research and lacks a thorough analysis of real-world applications or experimental data, the practical implications of the conclusions may be insufficient. While some practical cases have been included, the integration of theory and practice remains inadequate, especially in terms of discussing technical challenges, economic costs, and social impacts in real-world deployments. These limitations highlight the need for further refinement in future research to enhance the objectivity, scope, and practical relevance of the review.
Existing research mostly focuses on specific subsystems and technologies, such as traffic signal optimization, autonomous driving navigation, or passenger information services. While research targeting specific functions can address local issues to some extent, it is difficult to comprehensively meet the complex, multidimensional demands of ITS. Future research should focus on developing a universal large model that integrates these three AIGC technologies and coordinates the four subsystems of ITS to enhance overall efficiency and effectiveness. This model must possess powerful comprehensive capabilities, enabling it to handle the diverse needs of the road, vehicle, passenger, and management subsystems simultaneously. Furthermore, it should ensure that each subsystem operates efficiently and independently within its specific domain. Additionally, future studies should delve into the technical challenges, economic costs, and social impacts of deploying such a universal large model in real-world scenarios, laying a solid theoretical and practical foundation for its widespread application in ITS. Through this research direction, ITS will move toward a more intelligent and integrated phase, providing robust technological support for the sustainable development of urban transportation.
The Author(s). This article is published with open access at link.springer.com and journal.hep.com.cn