DualMamba: a patch-based model with dual mamba for long-term time series forecasting

Guang-Yu WEI , Hui-Chuan HUANG , Zhi-Qing ZHONG , Wen-Long SUN , Yong-Hao WAN , Ai-Min FENG

Front. Comput. Sci. ›› 2026, Vol. 20 ›› Issue (2) : 2002315

PDF (2438KB)
Front. Comput. Sci. ›› 2026, Vol. 20 ›› Issue (2) : 2002315 DOI: 10.1007/s11704-025-41293-5
Artificial Intelligence
RESEARCH ARTICLE

DualMamba: a patch-based model with dual mamba for long-term time series forecasting

Author information +
History +
PDF (2438KB)

Abstract

The field of time series forecasting has been seen widespread application of Transformer-based architectures. However, the quadratic complexity of the attention mechanism limits its performance in long-term time series forecasting. The proposition of patching mechanism has alleviated this issue to some extent, but models will struggle to effectively unify the information between intra-patch and inter-patch. To address this problem, we propose DualMamba, a novel Mamba-based model for time series forecasting, which segments the time series into subseries-level patches and employs dual Mamba modules to capture local and global information separately. Specifically, the time series use patch-wise dependencies to guide the local module, where each patch uses a point-wise representation of time series data. Furthermore, we design an information fusion mechanism for integrating information between intra-patch and inter-patch, which effectively incorporates global information into local contexts. This allows the model to capture both local details and global trends. Extensive experiments on several real-world datasets demonstrate that DualMamba achieves state-of-the-art performance in most cases and has reliable robustness, making it highly adaptable for various types of time series.

Graphical abstract

Keywords

long-term time series forecasting / state space model / mamba / patching

Cite this article

Download citation ▾
Guang-Yu WEI, Hui-Chuan HUANG, Zhi-Qing ZHONG, Wen-Long SUN, Yong-Hao WAN, Ai-Min FENG. DualMamba: a patch-based model with dual mamba for long-term time series forecasting. Front. Comput. Sci., 2026, 20(2): 2002315 DOI:10.1007/s11704-025-41293-5

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Zhu Z, Chen W, Xia R, Zhou T, Niu P, Peng B, Wang W, Liu H, Ma Z, Gu X, Wang J, Chen Q, Yang L, Wen Q, Sun L . Energy forecasting with robust, flexible, and explainable machine learning algorithms. AI Magazine, 2023, 44( 4): 377–393

[2]

Crespo Cuaresma J, Hlouskova J, Kossmeier S, Obersteiner M . Forecasting electricity spot-prices using linear univariate time-series models. Applied Energy, 2004, 77( 1): 87–106

[3]

Thompson J R, Wilson J R . Multifractal detrended fluctuation analysis: Practical applications to financial time series. Mathematics and Computers in Simulation, 2016, 126: 63–88

[4]

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser Ł, Polosukhin I. Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, 6000–6010

[5]

Radford A, Kim J W, Hallacy C, Ramesh A, Goh G, Agarwal S, Sastry G, Askell A, Mishkin P, Clark J, Krueger G, Sutskever I. Learning transferable visual models from natural language supervision. In: Proceedings of the 38th International Conference on Machine Learning. 2021, 8748–8763

[6]

Devlin J, Chang M W, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2019, 4171–4186

[7]

Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J, Houlsby N. An image is worth 16x16 words: transformers for image recognition at scale. In: Proceedings of the 9th International Conference on Learning Representations. 2021

[8]

Brown T B, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A, Agarwal S, Herbert-Voss A, Krueger G, Henighan T, Child R, Ramesh A, Ziegler D M, Wu J, Winter C, Hesse C, Chen M, Sigler E, Litwin M, Gray S, Chess B, Clark J, Berner C, McCandlish S, Radford A, Sutskever I, Amodei D. Language models are few-shot learners. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. 2020, 159

[9]

Cheng M, Liu Q, Liu Z, Li Z, Luo Y, Chen E. FormerTime: hierarchical multi-scale representations for multivariate time series classification. In: Proceedings of the ACM Web Conference 2023. 2023, 1437–1445

[10]

Zeng A, Chen M, Zhang L, Xu Q. Are transformers effective for time series forecasting? In: Proceedings of the 37th AAAI Conference on Artificial Intelligence. 2023, 11121–11128

[11]

Li S, Jin X, Xuan Y, Zhou X, Chen W, Wang Y X, Yan X. Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019, 471

[12]

Liu Y, Hu T, Zhang H, Wu H, Wang S, Ma L, Long M. iTransformer: Inverted transformers are effective for time series forecasting. In: Proceedings of the 12th International Conference on Learning Representations. 2024

[13]

Nie Y, Nguyen N H, Sinthong P, Kalagnanam J. A time series is worth 64 words: Long-term forecasting with transformers. In: Proceedings of the 11th International Conference on Learning Representations. 2023

[14]

Lee S, Park T, Lee K. Learning to embed time series patches independently. In: Proceedings of the 12th International Conference on Learning Representations. 2024

[15]

Gu A, Goel K, C. Efficiently modeling long sequences with structured state spaces. In: Proceedings of the 10th International Conference on Learning Representations. 2022

[16]

Gu A, Johnson I, Goel K, Saab K, Dao T, Rudra A, C. Combining recurrent, convolutional, and continuous-time models with linear state-space layers. In: Proceedings of the 35th International Conference on Neural Information Processing Systems. 2021, 44

[17]

Patro B N, Agneeswaran V S. Mamba-360: survey of state space models as transformer alternative for long sequence modelling: methods, applications, and challenges. 2024, arXiv preprint arXiv: 2404.16112

[18]

Gu A, Dao T. Mamba: linear-time sequence modeling with selective state spaces. 2023, arXiv preprint arXiv: 2312.00752

[19]

Grazzi R, Siems J N, Schrodi S, Brox T, Hutter F. Is mamba capable of in-context learning? In: Proceedings of the International Conference on Automated Machine Learning. 2024

[20]

Zhu L, Liao B, Zhang Q, Wang X, Liu W, Wang X. Vision mamba: efficient visual representation learning with bidirectional state space model. In: Proceedings of the 41st International Conference on Machine Learning. 2024

[21]

Li S, Singh H, Grover A. Mamba-ND: selective state space modeling for multi-dimensional data. In: Proceedings of the 18th European Conference on Computer Vision. 2025, 75–92

[22]

Ahamed M A, Cheng Q S. TimeMachine: a time series is worth 4 mambas for long-term forecasting. In: Proceedings of the 27th European Conference on Artificial Intelligence, - Including 13th Conference on Prestigious Applications of Intelligent Systems. 2024, 1688–1695

[23]

Wang Z, Kong F, Feng S, Wang M, Yang X, Zhao H, Wang D, Zhang Y. Is mamba effective for time series forecasting? Neurocomputing, 2025, 619: 129178

[24]

Gu A, Dao T, Ermon S, Rudra A, C. Hippo: recurrent memory with optimal polynomial projections. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. 2020, 125

[25]

Chen P, Zhang Y, Cheng Y, Shu Y, Wang Y, Wen Q, Yang B, Guo C. Pathformer: Multi-scale transformers with adaptive pathways for time series forecasting. In: Proceedings of the 12th International Conference on Learning Representations. 2024

[26]

Zhou H, Zhang S, Peng J, Zhang S, Li J, Xiong H, Zhang W. Informer: Beyond efficient transformer for long sequence time-series forecasting. In: Proceedings of the 35th AAAI Conference on Artificial Intelligence. 2021, 11106–11115

[27]

Liu S, Yu H, Liao C, Li J, Lin W, Liu A X, Dustdar S. Pyraformer: Low-complexity pyramidal attention for long-range time series modeling and forecasting. In: Proceedings of the 10th International Conference on Learning Representations. 2022

[28]

Woo G, Liu C, Sahoo D, Kumar A, Hoi S C H. ETSformer: exponential smoothing transformers for time-series forecasting. 2022, arXiv preprint arXiv: 2202.01381

[29]

Zhou T, Ma Z, Wen Q, Wang X, Sun L, Jin R. Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting. In: Proceedings of the 39th International Conference on Machine Learning. 2022, 27268–27286

[30]

Gong Z, Tang Y, Liang J. PatchMixer: a patch-mixing architecture for long-term time series forecasting. 2024, arXiv preprint arXiv: 2310.00655

[31]

Liang A, Jiang X, Sun Y, Shi X, Li K. Bi-mamba+: bidirectional mamba for time series forecasting. 2024, arXiv preprint arXiv: 2404.15772

[32]

Zeng C, Liu Z, Zheng G, Kong L. CMamba: channel correlation enhanced state space models for multivariate time series forecasting. 2024, arXiv preprint arXiv: 2406.05316

[33]

Xu X, Chen C, Liang Y, Huang B, Bai G, Zhao L, Shu K. SST: Multi-scale hybrid mamba-transformer experts for long-short range time series forecasting. 2024, arXiv preprint arXiv: 2404.14757

[34]

Li Z, Qi S, Li Y, Xu Z. Revisiting long-term time series forecasting: An investigation on linear mapping. 2023, arXiv preprint arXiv: 2305.10721

[35]

Das A, Kong W, Leach A, Mathur S, Sen R, Yu R. Long-term forecasting with tide: time-series dense encoder. Transactions on Machine Learning Research, 2023, 2023

[36]

Wu H, Hu T, Liu Y, Zhou H, Wang J, Long M. TimesNet: temporal 2D-variation modeling for general time series analysis. In: Proceedings of the 11th International Conference on Learning Representations. 2023

[37]

Liu M, Zeng A, Chen M, Xu Z, Lai Q, Ma L, Xu Q. SCINet: time series modeling and forecasting with sample convolution and interaction. In: Proceedings of the 36th International Conference on Neural Information Processing Systems. 2022, 421

[38]

Liu Y, Wu H, Wang J, Long M. Non-stationary transformers: Exploring the stationarity in time series forecasting. In: Proceedings of the 36th International Conference on Neural Information Processing Systems. 2022, 718

[39]

Wu H, Xu J, Wang J, Long M. Autoformer: decomposition transformers with auto-correlation for long-term series forecasting. In: Proceedings of the 35th International Conference on Neural Information Processing Systems. 2021, 1717

RIGHTS & PERMISSIONS

Higher Education Press

AI Summary AI Mindmap
PDF (2438KB)

Supplementary files

Highlights

840

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/