Solving Inventory Management Problems through Deep Reinforcement Learning

Qinghao Wang; Yijie Peng; Yaodong Yang

doi:10.1007/s11518-022-5544-6

Journal of Systems Science and Systems Engineering ›› 2022, Vol. 31 ›› Issue (6) :677 -689. DOI: 10.1007/s11518-022-5544-6

Article

Solving Inventory Management Problems through Deep Reinforcement Learning

Qinghao Wang ¹^,³
, Yijie Peng ¹^,²^,^b
, Yaodong Yang ¹^,^c

Author information +

History +

PDF

Abstract

Inventory management (e.g. lost sales) is a central problem in supply chain management. Lost sales inventory systems with lead times and complex cost function are notoriously hard to optimize. Deep reinforcement learning (DRL) methods can learn optimal decisions based on trails and errors from the environment due to its powerful complex function representation capability and has recently shown remarkable successes in solving challenging sequential decision-making problems. This paper studies typical lost sales and multi-echelon inventory systems. We first formulate inventory management problem as a Markov Decision Process by taking into account ordering cost, holding cost, fixed cost and lost-sales cost and then develop a solution framework DDLS based on Double deep Q-networks (DQN).

In the lost-sales scenario, numerical experiments demonstrate that increasing fixed ordering cost distorts the ordering behavior, while our DQN solutions with improved state space are flexible in the face of different cost parameter settings, which traditional heuristics find challenging to handle. We then study the effectiveness of our approach in multi-echelon scenarios. Empirical results demonstrate that parameter sharing can significantly improve the performance of DRL. As a form of information sharing, parameter sharing among multi-echelon suppliers promotes the collaboration of agents and improves the decision-making efficiency. Our research further demonstrates the potential of DRL in solving complex inventory management problems.

Keywords

Inventory management / deep reinforcement learning / parameter sharing / lost sales / multi-echelon models

Cite this article

Download citation ▾

Qinghao Wang, Yijie Peng, Yaodong Yang. Solving Inventory Management Problems through Deep Reinforcement Learning. Journal of Systems Science and Systems Engineering, 2022, 31(6): 677-689 DOI:10.1007/s11518-022-5544-6

登录浏览全文

4963

注册一个新账户忘记密码

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Angulo A, Nachtmann H, Waller M A. Supply chain information sharing in a vendor managed inventory partnership. Journal of Business Logistics, 2011, 25(1): 101-120.

[2]	Arrow K J, Karlin S, Scarf H E. (1958). Studies in the Mathematical Theory of Inventory and Production. Stanford University.

[3]	Arulkumaran K, Deisenroth M P, Brundage M, Bharath A A. Deep reinforcement learning: A brief survey. IEEE Signal Processing Magazine, 2017, 34(6): 26-38.

[4]	Beamon B M. Supply chain design and analysis: Models and methods. International Journal of Production Economics, 1998, 55(3): 281-294.

[5]	Bijvank M, Vis I F A. Lost-sales inventory theory: A review. European Journal of Operational Research, 2011, 215(1): 1-13.

[6]	Chen F, Drezner Z, Ryan J K, Simchi-Levi D. Quantifying the bullwhip effect in a simple supply chain: The impact of forecasting, lead times, and information. Management Science, 2000, 46(3): 436-443.

[7]	Chi C. Optimal ordering policies for periodic-review systems with replenishment cycles. European Journal of Operational Research, 2006, 170(1): 44-56.

[8]	Federgruen A, Zipkin P H. An efficient algorithm for computing optimal (s, S) policies. Operations Research, 1984, 32(6): 1268-1285.

[9]	Feldman, Richard M. A continuous review (s, S) inventory system in a random environment. Journal of Applied Probability, 1978, 15(3): 654-659.

[10]	Gijsbrechts J, Boute R N, Van Mieghem J, Zhang D. Can deep reinforcement learning improve inventory management? Performance on dual sourcing, lost sales and multi-echelon problems. Manufacturing and Service Operations Management, 2021, 24: 1349-1368.

[11]	Goldberg D A, Katz-Rogozhnikov D A, Lu Y, Sharma M, Squillante M S. Asymptotic optimality of constant-order policies for lost sales inventory models with large lead times. Mathematics of Operations Research, 2012, 41(3): 898-913.

[12]	Huh W T, Janakiraman G, Muckstadt J A, Rusmevichientong P. Asymptotic optimality of Order-Up-To policies in lost sales inventory systems. Management Science, 2009, 55(3): 404-420.

[13]	Ivanov S, D’yakonov A (2019). Modern deep reinforcement learning algorithms. arXiv Preprint arXiv:1906.10025.

[14]	Johnson E L. On (s, S) policies. Management Science, 1968, 15(1): 80-101.

[15]	Karlin S, Scarf H. Inventory models of the Arrow-Harris-Marschak type with time lag. Studies in the Mathematical Theory of Inventory and Production, 1958, Palo Alto, CA: Standford University Press.

[16]	Kodama M. The optimality of (S,s) policies in the dynamic inventory problem with emergency and non-stationary stochastic demands. Kumamoto Journal of Science.ser.a Mathematics Physics & Chemistry, 1967, 8(1): 1-10.

[17]	Kok T D, Grob C, Laumanns M, Minner S, Rambau J, Schade K. A typology and literature review on stochastic multi-echelon inventory models. European Journal of Operational Research, 2018, 269(3): 955-983.

[18]	Li Y(2017). Deep reinforcement learning: An overview. arXiv preprint arXiv:1701.07274.

[19]	McGrath T, Kapishnikov A, McGrath T, Kapishnikov A, Tomasev N, Pearce A, Hassabis D, Kim B, Paquet U, Kramnik V(2021). Acquisition of chess knowledge in Alphazero. arXiv preprint arXiv:2111.09259.

[20]	Mnih V, Kavukcuoglu K, David S, Alex G, Ioannis A, Daan W, Martin R (2013). Playing Atari with deep reinforcement learning. arXiv Preprint arXiv:1312.5602.

[21]

Minh V, Kavukcuoglu K, Silver D, Rusu A A, Veness J, Bellemare M G, Graves A, Riedmiller M, Fidjeland A K, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wierstra D, Legg S, Hassabis D. Human-level control through deep reinforcement learning. Nature, 2015, 518(7540): 529-533.

[22]	Mnih V, Badia A P, Mirza M, Graves A, Lillicrap T, Harley T, Silver D, Kavukcuoglu K (2016). Asynchronous methods for deep reinforcement learning. International Conference on Machine Learning 1928–1937, June, 2016.

[23]	Moor B, Gisbrechts J, Boute R N, Slowinski R, Artalejo J, Billaut J C, Dyson R, Peccati L. Reward shaping to improve the performance of deep reinforcement learning in perishable inventory management. European Journal of Operational Research, 2022, 301(2): 535-545.

[24]	Ruud H T, Willem K. Dynamic inventory rationing strategies for inventory systems with two demand classes, Poisson demand and backordering. European Journal of Operational Research, 2008, 190(1): 156-178.

[25]	Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017). Proximal policy optimization algorithms. arXiv Preprint arXiv:1707.06347.

[26]	Sherbrooke, Craig C. Metric: A multi-echelon technique for recoverable item control. Operations Research, 1968, 16(1): 122-141.

[27]	Van H H, Guez A, Silver D (2016). Deep reinforcement learning with double Q-learning. In Proceedings of the AAAI Conference on Artificial Intelligence 30, March, 2016.

[28]	Vanvuchelen N, Gijsbrechts J, Boute R. Use of proximal policy optimization for the joint replenishment problem. Computers in Industry, 2020, 119: 103239.

[29]	Verhoef P C, Sloot L M. Out-of-Stock: Reactions, antecedents, management solutions, and a future perspective. Retailing in the 21st Century, 2006, Berlin, Heidelberg: Springer

[30]	Watkins, Christopher John Cornish H. Learning from Delayed Rewards, 1989, Cambridge, United Kingdom: King’s College

[31]	Xin L. Understanding the performance of capped base-stock policies in lost-sales inventory models. Operations Research, 2021, 69(1): 61-70.

[32]	Xin L, Goldberg D A. Optimality gap of constant-order policies decays exponentially in the lead time for lost sales models. Operations Research, 2016, 64(6): 1556-1565.

[33]	Zabel E. A note on the optimality of (S,s) policies in inventory theory. Management Science, 1962, 9(1): 123-125.

[34]	Zhao X, Qiu M. Information sharing in a multi-echelon inventory system. Tsinghua Science & Technology, 2007, 12(4): 466-474.

[35]	Zipkin P. Old and new methods for lost-Sales inventory systems. Operations Research: The Journal of the Operations Research Society of America, 2008, 56(5): 1256-1263.