Offline model-based reinforcement learning with causal structured world models

Zhengmao ZHU , Honglong TIAN , Xionghui CHEN , Kun ZHANG , Yang YU

Front. Comput. Sci. ›› 2025, Vol. 19 ›› Issue (4) : 194347

PDF (2392KB)
Front. Comput. Sci. ›› 2025, Vol. 19 ›› Issue (4) : 194347 DOI: 10.1007/s11704-024-3946-y
Artificial Intelligence
RESEARCH ARTICLE

Offline model-based reinforcement learning with causal structured world models

Author information +
History +
PDF (2392KB)

Abstract

Model-based methods have recently been shown promising for offline reinforcement learning (RL), which aims at learning good policies from historical data without interacting with the environment. Previous model-based offline RL methods employ a straightforward prediction method that maps the states and actions directly to the next-step states. However, such a prediction method tends to capture spurious relations caused by the sampling policy preference behind the offline data. It is sensible that the environment model should focus on causal influences, which can facilitate learning an effective policy that can generalize well to unseen states. In this paper, we first provide theoretical results that causal environment models can outperform plain environment models in offline RL by incorporating the causal structure into the generalization error bound. We also propose a practical algorithm, oFfline mOdel-based reinforcement learning with CaUsal Structured World Models (FOCUS), to illustrate the feasibility of learning and leveraging causal structure in offline RL. Experimental results on two benchmarks show that FOCUS reconstructs the underlying causal structure accurately and robustly, and, as a result, outperforms both model-based offline RL algorithms and causal model-based offline RL algorithms.

Graphical abstract

Keywords

reinforcement learning / offline reinforcement learning / model-based reinforcement learning / causal discovery

Cite this article

Download citation ▾
Zhengmao ZHU, Honglong TIAN, Xionghui CHEN, Kun ZHANG, Yang YU. Offline model-based reinforcement learning with causal structured world models. Front. Comput. Sci., 2025, 19(4): 194347 DOI:10.1007/s11704-024-3946-y

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Yu F, Xian W, Chen Y, Liu F, Liao M, Madhavan V, Darrell T. BDD100K: a diverse driving video database with scalable annotation tooling. 2018, arXiv preprint arXiv: 1805.04687

[2]

Gottesman O, Johansson F, Komorowski M, Faisal A, Sontag D, Doshi-Velez F, Celi L A . Guidelines for reinforcement learning in healthcare. Nature Medicine, 2019, 25( 1): 16–18

[3]

Yu T, Thomas G, Yu L, Ermon S, Zou J, Levine S, Finn C, Ma T. MOPO: model-based offline policy optimization. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. 2020, 1185

[4]

Bengio Y, Deleu T, Rahaman N, Ke N R, Lachapelle S, Bilaniuk O, Goyal A, Pal C J. A meta-transfer objective for learning to disentangle causal mechanisms. In: Proceedings of the 8th International Conference on Learning Representations. 2020

[5]

de Haan P, Jayaraman D, Levine S. Causal confusion in imitation learning. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019, 1049

[6]

Tenenbaum J. Building machines that learn and think like people. In: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems. 2018, 5

[7]

Edmonds M, Kubricht J, Summers C, Zhu Y, Rothrock B, Zhu S C, Lu H. Human causal transfer: challenges for deep reinforcement learning. In: Proceedings of the 40th Annual Meeting of the Cognitive Science Society. 2018

[8]

Spirtes P, Glymour C N, Scheines R. Causation, Prediction, and Search. 2nd ed. Cambridge: MIT Press, 2000

[9]

Zhang K, Peters J, Janzing D, Schölkopf B. Kernel-based conditional independence test and application in causal discovery. In: Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence. 2011, 804−813

[10]

Sun X, Janzing D, Schölkopf B, Fukumizu K. A kernel-based causal learning algorithm. In: Proceedings of the 24th International Conference on Machine Learning. 2007, 855−862

[11]

Heckerman D, Meek C, Cooper G. A Bayesian approach to causal discovery. In: Holmes D E, Jain L C, eds. Innovations in Machine Learning: Theory and Applications. Berlin, Heidelberg: Springer, 2006, 1−28

[12]

Margaritis D. Distribution-free learning of Bayesian network structure in continuous domains. In: Proceedings of the 20th National Conference on Artificial Intelligence. 2005, 825−830

[13]

Ke N R, Bilaniuk O, Goyal A, Bauer S, Larochelle H, Schölkopf B, Mozer M C, Pal C, Bengio Y. Learning neural causal models from unknown interventions. 2019, arXiv preprint arXiv: 1910.01075

[14]

Wang Z, Xiao X, Xu Z, Zhu Y, Stone P. Causal dynamics learning for task-independent state abstraction. In: Proceedings of the 39th International Conference on Machine Learning. 2022, 23151−23180

[15]

Bellman R . A Markovian decision process. Journal of Mathematics and Mechanics, 1957, 6( 5): 679–684

[16]

Kurutach T, Clavera I, Duan Y, Tamar A, Abbeel P. Model-ensemble trust-region policy optimization. In: Proceedings of the 6th International Conference on Learning Representations. 2018

[17]

Williams G, Wagener N, Goldfain B, Drews P, Rehg J M, Boots B, Theodorou E A. Information theoretic MPC for model-based reinforcement learning. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA). 2017, 1714−1721

[18]

Kidambi R, Rajeswaran A, Netrapalli P, Joachims T. MOReL: model-based offline reinforcement learning. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. 2020, 1830

[19]

Akkaya I, Andrychowicz M, Chociej M, Litwin M, McGrew B, Petron A, Paino A, Plappert M, Powell G, Ribas R, Schneider J, Tezak N, Tworek J, Welinder P, Weng L, Yuan Q, Zaremba W, Zhang L. Solving Rubik’s cube with a robot hand. 2019, arXiv preprint arXiv: 1910.07113

[20]

Hoerl A E, Kennard R W . Ridge regression: biased estimation for nonorthogonal problems. Technometrics, 2000, 42( 1): 80–86

[21]

Pearl J. Causality: Models, Reasoning, and Inference. Cambridge: Cambridge University Press, 2000

[22]

Koller D, Friedman N. Probabilistic Graphical Models: Principles and Techniques. Cambridge: MIT Press, 2009

[23]

Yu T, Kumar A, Rafailov R, Rajeswaran A, Levine S, Finn C. COMBO: conservative offline model-based policy optimization. In: Proceedings of the 35th International Conference on Neural Information Processing Systems. 2021, 2218

[24]

Todorov E, Erez T, Tassa Y. MuJoCo: a physics engine for model-based control. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. 2012, 5026−5033

[25]

Fu J, Kumar A, Nachum O, Tucker G, Levine S. D4RL: datasets for deep data-driven reinforcement learning. 2020, arXiv preprint arXiv: 2004.07219

[26]

Xu T, Li Z, Yu Y. Error bounds of imitating policies and environments. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. 2020, 1320

RIGHTS & PERMISSIONS

Higher Education Press

AI Summary AI Mindmap
PDF (2392KB)

Supplementary files

Highlights

1413

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/