Predicting the Unpredictable: A Reproducible Framework with Open Multi-Source Data for Irregular Non-commuting OD Flows

Hongmeng Cui , Bingfeng Si , Yueqing Li , Jingwen Xue , Dazhuang Chi

Urban Rail Transit ›› 2026, Vol. 12 ›› Issue (1) : 91 -119.

PDF
Urban Rail Transit ›› 2026, Vol. 12 ›› Issue (1) :91 -119. DOI: 10.1007/s40864-025-00267-3
Original Research Papers
research-article
Predicting the Unpredictable: A Reproducible Framework with Open Multi-Source Data for Irregular Non-commuting OD Flows
Author information +
History +
PDF

Abstract

Real-time prediction of dynamic origin–destination (OD) passenger flows is essential for efficient passenger flow management in urban rail transit (URT) systems. Existing studies have primarily focused on commuting OD flows, which exhibit strong regularity and are supported by abundant data samples. In contrast, non-commuting OD flows—especially those generated by irregular passengers with limited historical data—are characterized by high stochasticity and data sparsity and have received relatively little attention, with existing studies often reporting unsatisfactory predictive performance. To address these challenges, this study proposes a novel real-time OD flow prediction framework for irregular non-commuting passengers through multi-source data fusion and feature extraction. Specifically, individual-level spatiotemporal behavioral features are extracted from metro AFC data using a density-based clustering algorithm. Land-use and geo-economic data are then integrated to characterize individual travel preferences and construct a multidimensional behavioral indicator system. Building upon these features, hierarchical clustering and machine learning models are employed to perform personalized destination prediction. Empirical experiments conducted on Nanjing Metro data demonstrate that the proposed framework substantially improves prediction accuracy for non-commuting passengers and provides new insights into dynamic OD modeling. The results highlight the strong applicability and potential of the method for real-time passenger flow prediction in complex urban rail systems.

Keywords

Urban rail transit / Real-time OD passenger flow prediction / Non-commuting passengers / Multi-source data fusion / Machine learning

Cite this article

Download citation ▾
Hongmeng Cui, Bingfeng Si, Yueqing Li, Jingwen Xue, Dazhuang Chi. Predicting the Unpredictable: A Reproducible Framework with Open Multi-Source Data for Irregular Non-commuting OD Flows. Urban Rail Transit, 2026, 12(1): 91-119 DOI:10.1007/s40864-025-00267-3

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Aminpour N, Saidi S. Unveiling mobility patterns beyond home/work activities: A topic modeling approach using transit smart card and land-use data. Travel Behav Soc. 2025, 38. 100905

[2]

Bai L, Yao L, Wang Xet al. . Deep spatial–temporal sequence modeling for multi-step passenger demand prediction. Futur Gener Comput Syst. 2021, 121: 25-34.

[3]

Bezdek JC. Pattern recognition with fuzzy objective function algorithms. Springer US. 1981.

[4]

Cats O. Identifying human mobility patterns using smart card data. Transp Rev. 2023, 44(1): 213-243.

[5]

Chen E, Ye Z, Bi H. Incorporating smart card data in spatio-temporal analysis of metro travel distances. Sustainability. 2019, 11(24): 7069.

[6]

Cheng Z, Trépanier M, Sun L. Real-time forecasting of metro origin-destination matrices with high-order weighted dynamic mode decomposition. Transp Sci. 2022, 564904-918.

[7]

Chi H, Wang B, Ge Qet al. . Knowledge graph-based enhanced transformer for metro individual travel destination prediction. J Adv Transp. 2022, 2022: 1-9.

[8]

Cui H, Si B, Wang Jet al. . Short-term origin–destination flow prediction for urban rail network: a deep learning method based on multi-source big data. Complex Intell Syst. 2024, 10(4): 4675-4696.

[9]

Dahmen V, Weikl S, Bogenberger K. Interpretable machine learning for mode choice modeling on tracking-based revealed preference data. Transp Res Record J Transp Res Board. 2024, 2678(11): 2075-2091.

[10]

Guerra E, Cervero R, Tischler D. Half-mile circle: does it best represent transit station catchments?. Transp Res Record J Transp Res Board. 2012, 2276(1): 101-109.

[11]

Hathaway R, Bezdek J. Fuzzy c-means clustering of incomplete data. IEEE Trans Syst Man Cybernet B (Cybernetics). 2001, 31(5): 735-744.

[12]

Imani M, Beikmohammadi A, Arabnia HR. Comprehensive analysis of random forest and xgboost performance with smote, adasyn, and gnus under varying imbalance levels. Technologies. 2025, 13(3): 88.

[13]

Jiang J, Xu Y, He S, et al. (2020) Predication of the urban rail transit commuter flows in long trip chains. In: COTA international conference of transportation professionals

[14]

Jiang W, Ma Z, Koutsopoulos HN. Deep learning for short-term origin–destination passenger flow prediction under partial observability in urban railway systems. Neural Comput Appl. 2022, 3464813-4830.

[15]

Ke J, Qin X, Yang Het al. . Predicting origin-destination ride-sourcing demand with a spatio-temporal encoder-decoder residual multi-graph convolutional network. Transp Res C Emerg Technol. 2021, 122. 102858

[16]

Kieu LM, Bhaskar A, Chung E. Passenger segmentation using smart card data. IEEE Trans Intell Transp Syst. 2015, 16(3): 1537-1548.

[17]

Kipf TN, Welling M (2016) Semi-supervised classification with graph convolutional networks. https://doi.org/10.48550/ARXIV.1609.02907

[18]

Kumar S, Toshniwal D. A data mining approach to characterize road accident locations. J Mod Transp. 2016, 24(1): 62-72.

[19]

Kusakabe T, Asakura Y. Behavioural data mining of transit smart card data: a data fusion approach. Transp Res C Emerg Technol. 2014, 46: 179-191.

[20]

Li C, Geng M, Chen Yet al. . Demand forecasting and predictability identification of ride-sourcing via bidirectional spatial-temporal transformer neural processes. Transp Res C Emerg Technol. 2024, 158. 104427

[21]

Li D, Cao J, Li Ret al. . A spatio-temporal structured LSTM model for short-term prediction of origin-destination matrix in rail transit with multisource data. IEEE Access. 2020, 8: 84000-84019.

[22]

Li M, Kwan MP, Hu Wet al. . Examining the effects of station-level factors on metro ridership using multiscale geographically weighted regression. J Transp Geogr. 2023, 113. 103720

[23]

Li W, Sui L, Zhou M, et al (2021) Short-term passenger flow forecast for urban rail transit based on multi-source data. EURASIP J Wire Commun Netw 2021(1). https://doi.org/10.1186/s13638-020-01881-4

[24]

Li X, Pan G, Wu Zet al. . Prediction of urban human mobility using large-scale taxi traces and its applications. Front Comp Sci. 2012, 6(1): 111-121.

[25]

Li Y, Lu J, Zhang Let al. . Taxi booking mobile app order demand prediction based on short-term traffic forecasting. Transp Res Record J Transp Res Board. 2017, 2634(1): 57-68.

[26]

Li Y, Cheng S, Feng Yet al. . Developing a novel approach in estimating urban commute traffic by integrating community detection and hypergraph representation learning. Expert Syst Appl. 2024, 249. 123790

[27]

Ma X, Liu C, Wen Het al. . Understanding commuting patterns using transit smart card data. J Transp Geogr. 2017, 58: 135-145.

[28]

Mo B, Zhao Z, Koutsopoulos HNet al. . Individual mobility prediction in mass transit systems using smart card data: an interpretable activity-based hidden markov approach. IEEE Trans Intell Transp Syst. 2022, 23(8): 12014-12026.

[29]

Moreira-Matias L, Gama J, Ferreira Met al. . Predicting taxi–passenger demand using streaming data. IEEE Trans Intell Transp Syst. 2013, 14(3): 1393-1402.

[30]

O’Sullivan S, Morrall J. Walking distances to and from light-rail transit stations. Transp Res Record J Transp Res Board. 1996, 1538(1): 19-26.

[31]

Peng B, Wang J, Xia Xet al. . A study on the impact of housing prices on residents’ travel frequency and transportation resilience in 35 Chinese cities. Heliyon. 2024, 10(1. e23469

[32]

Rish I. An empirical study of the naive bayes classifier. J Univ Comput Sci. 2001, 12): 127

[33]

Robert C, Casella G. Introducing Monte Carlo Methods with R. Springer New York. 2010.

[34]

Rong J, Xu W, Wen Y. A spatiotemporal model for urban taxi origin–destination prediction based on multi-HOP GCN and hierarchical LSTM. Alex Eng J. 2025, 128: 905-917.

[35]

Roos J, Gavin G, Bonnevay S. A dynamic Bayesian network approach to forecast short-term urban rail passenger flows with incomplete data. Transp Res Procedia. 2017, 26: 53-61.

[36]

Rubinstein RY, Kroese DP (2016) Simulation and the Monte Carlo Method. Wiley. https://doi.org/10.1002/9781118631980

[37]

Schubert E, Sander J, Ester Met al. . Dbscan revisited, revisited: why and how you should (still) use dbscan. ACM Trans Database Syst. 2017, 4231-21.

[38]

Scornet E, Biau G, Vert JP (2015) Consistency of random forests. The Annals of Statistics 43(4). https://doi.org/10.1214/15-aos1321

[39]

Shi X, Chen Z, Wang H, et al (2015) Convolutional lstm network: a machine learning approach for precipitation nowcasting. https://doi.org/10.48550/ARXIV.1506.04214

[40]

Shi X, Gao Z, Lausen L, et al (2017) Deep learning for precipitation nowcasting: a benchmark and a new model. https://doi.org/10.48550/ARXIV.1706.03458

[41]

Shi Z, Liu Y, He Met al. . How does built environment affect metro trip time of elderly? evidence from smart card data in nanjing. J Adv Transp. 2022, 2022: 1-17.

[42]

Song C, Qu Z, Blumm Net al. . Limits of predictability in human mobility. Science. 2010, 327(5968): 1018-1021.

[43]

Wang H, Zhao J, Ye K, et al (2020) A destination prediction model for individual passengers in urban rail transit. In: 2020 international conference on high performance big data and intelligent systems (HPBD&IS). IEEE, https://doi.org/10.1109/hpbdis49115.2020.9130592

[44]

Wang J, Wan F, Dong Cet al. . Spatiotemporal effects of built environment factors on varying rail transit station ridership patterns. J Transp Geogr. 2023, 109. 103597

[45]

Wang L, Wang H, Han Xet al. . A novel adaptive density-based spatial clustering of application with noise based on bird swarm optimization algorithm. Comput Commun. 2021, 174: 205-214.

[46]

Wang S, Weber T, Schramm Det al. . Simulation-based investigation of on-demand vehicle deployment for night bus routes using the monte Carlo method. Future Transp. 2024, 4(2): 380-408.

[47]

Wang X, Zhang Y, Zhang J. Large-scale origin–destination prediction for urban rail transit network based on graph convolutional neural network. Sustainability. 2024, 16(23): 10190.

[48]

Yin D, Jiang R, Deng Jet al. . Mtmgnn: multi-time multi-graph neural network for metro passenger flow prediction. GeoInformatica. 2022, 27177-105.

[49]

Zhang C, Ma G, Zhang Let al. . Graph neural networks empowered origin?destination learning for urban traffic prediction. CAAI Trans Intell Technol. 2025.

[50]

Zhang J, Che H, Chen Fet al. . Short-term origin-destination demand prediction in urban rail transit systems: a channel-wise attentive split-convolutional neural network method. Transp Res C Emerg Technol. 2021, 124. 102928

[51]

Zhang P, Koutsopoulos HN, Ma Z. Deeptrip: a deep learning model for the individual next trip prediction with arbitrary prediction times. IEEE Trans Intell Transp Syst. 2023, 24(6): 5842-5855.

[52]

Zhang Y, Sun K, Wen Det al. . Deep learning for metro short-term origin-destination passenger flow forecasting considering section capacity utilization ratio. IEEE Trans Intell Transp Syst. 2023, 24(8): 7943-7960.

[53]

Zhao J, Deng W, Song Yet al. . What influences metro station ridership in China? insights from Nanjing. Cities. 2013, 35: 114-124.

[54]

Zhao Y, Ma Z. Naïve bayes-based transition model for short-term metro passenger flow prediction under planned events. Transp Res Record J Transp Res Board. 2022, 2676(9): 309-324.

[55]

Zhao Z, Koutsopoulos HN, Zhao J. Individual mobility prediction using transit smart card data. Transp Res C Emerg Technol. 2018, 89: 19-34.

[56]

Zong F, Tian Y, He Yet al. . Trip destination prediction based on multi-day GPS data. Physica A. 2019, 515: 258-269.

Funding

Fundamental Research Funds for the Central Universities(2022YJS067)

National Natural Science Foundation of China(72288101)

Supported by Beijing Natural Science Foundation(L211026)

RIGHTS & PERMISSIONS

The Author(s)

PDF

48

Accesses

0

Citation

Detail

Sections
Recommended

/