Improving deep reinforcement learning by safety guarding model via hazardous experience planning
Pai PENG, Fei ZHU, Xinghong LING, Peiyao ZHAO, Quan LIU
Improving deep reinforcement learning by safety guarding model via hazardous experience planning
[1] |
Kai A , Deisenroth M P , Brundage M , Bharath A A . Deep reinforcement learning: a brief survey. IEEE Signal Processing Magazine, 2017, 34( 6): 26– 38
|
[2] |
Cheng R, Orosz G, Murray R M, Burdick J W. End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2019, 3387−3395
|
[3] |
Saunders W, Sastry G, Stuhlmueller A, Evans O. Trial without error: towards safe reinforcement learning via human intervention. In: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems. 2018, 2067−2069
|
[4] |
Achiam J, Held D, Tamar A, Abbeel P. Constrained policy optimization. In: Proceedings of the International Conference on Machine Learning. 2017, 22– 31
|
[5] |
García J , Fernández F . A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research, 2015, 16
|
[6] |
Chatzilygeroudis K , Vassiliades V , Mouret J B . Reset-free trial-and-error learning for robot damage recovery. Robotics and Autonomous Systems, 2018, 100
|
[7] |
Zhu F , Wu W , Fu Y , Liu Q . A dual deep network based secure deep reinforcement learning method. Chinese Journal of Computers, 2019, 42( 8): 1812– 1826
|
/
〈 | 〉 |