Solving Inventory Management Problems through Deep Reinforcement Learning
Qinghao Wang , Yijie Peng , Yaodong Yang
Journal of Systems Science and Systems Engineering ›› 2022, Vol. 31 ›› Issue (6) : 677 -689.
Solving Inventory Management Problems through Deep Reinforcement Learning
Inventory management (e.g. lost sales) is a central problem in supply chain management. Lost sales inventory systems with lead times and complex cost function are notoriously hard to optimize. Deep reinforcement learning (DRL) methods can learn optimal decisions based on trails and errors from the environment due to its powerful complex function representation capability and has recently shown remarkable successes in solving challenging sequential decision-making problems. This paper studies typical lost sales and multi-echelon inventory systems. We first formulate inventory management problem as a Markov Decision Process by taking into account ordering cost, holding cost, fixed cost and lost-sales cost and then develop a solution framework DDLS based on Double deep Q-networks (DQN).
In the lost-sales scenario, numerical experiments demonstrate that increasing fixed ordering cost distorts the ordering behavior, while our DQN solutions with improved state space are flexible in the face of different cost parameter settings, which traditional heuristics find challenging to handle. We then study the effectiveness of our approach in multi-echelon scenarios. Empirical results demonstrate that parameter sharing can significantly improve the performance of DRL. As a form of information sharing, parameter sharing among multi-echelon suppliers promotes the collaboration of agents and improves the decision-making efficiency. Our research further demonstrates the potential of DRL in solving complex inventory management problems.
Inventory management / deep reinforcement learning / parameter sharing / lost sales / multi-echelon models
| [1] |
|
| [2] |
Arrow K J, Karlin S, Scarf H E. (1958). Studies in the Mathematical Theory of Inventory and Production. Stanford University. |
| [3] |
|
| [4] |
|
| [5] |
|
| [6] |
|
| [7] |
|
| [8] |
|
| [9] |
|
| [10] |
|
| [11] |
|
| [12] |
|
| [13] |
Ivanov S, D’yakonov A (2019). Modern deep reinforcement learning algorithms. arXiv Preprint arXiv:1906.10025. |
| [14] |
|
| [15] |
|
| [16] |
|
| [17] |
|
| [18] |
Li Y(2017). Deep reinforcement learning: An overview. arXiv preprint arXiv:1701.07274. |
| [19] |
McGrath T, Kapishnikov A, McGrath T, Kapishnikov A, Tomasev N, Pearce A, Hassabis D, Kim B, Paquet U, Kramnik V(2021). Acquisition of chess knowledge in Alphazero. arXiv preprint arXiv:2111.09259. |
| [20] |
Mnih V, Kavukcuoglu K, David S, Alex G, Ioannis A, Daan W, Martin R (2013). Playing Atari with deep reinforcement learning. arXiv Preprint arXiv:1312.5602. |
| [21] |
|
| [22] |
Mnih V, Badia A P, Mirza M, Graves A, Lillicrap T, Harley T, Silver D, Kavukcuoglu K (2016). Asynchronous methods for deep reinforcement learning. International Conference on Machine Learning 1928–1937, June, 2016. |
| [23] |
|
| [24] |
|
| [25] |
Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017). Proximal policy optimization algorithms. arXiv Preprint arXiv:1707.06347. |
| [26] |
|
| [27] |
Van H H, Guez A, Silver D (2016). Deep reinforcement learning with double Q-learning. In Proceedings of the AAAI Conference on Artificial Intelligence 30, March, 2016. |
| [28] |
|
| [29] |
|
| [30] |
|
| [31] |
|
| [32] |
|
| [33] |
|
| [34] |
|
| [35] |
|
/
| 〈 |
|
〉 |