Derivative-free reinforcement learning: a review

Hong QIAN; Yang YU

doi:10.1007/s11704-020-0241-4

PDF(699 KB)

Front. Comput. Sci. ›› 2021, Vol. 15 ›› Issue (6) : 156336. DOI: 10.1007/s11704-020-0241-4

REVIEW ARTICLE

Derivative-free reinforcement learning: a review

Hong QIAN ,
Yang YU

Author information +

History +

Abstract

Reinforcement learning is about learning agent models that make the best sequential decisions in unknown environments. In an unknown environment, the agent needs to explore the environment while exploiting the collected information, which usually forms a sophisticated problem to solve. Derivative-free optimization, meanwhile, is capable of solving sophisticated problems. It commonly uses a sampling-andupdating framework to iteratively improve the solution, where exploration and exploitation are also needed to be well balanced. Therefore, derivative-free optimization deals with a similar core issue as reinforcement learning, and has been introduced in reinforcement learning approaches, under the names of learning classifier systems and neuroevolution/evolutionary reinforcement learning. Although such methods have been developed for decades, recently, derivative-free reinforcement learning exhibits attracting increasing attention. However, recent survey on this topic is still lacking. In this article, we summarize methods of derivative-free reinforcement learning to date, and organize the methods in aspects including parameter updating, model selection, exploration, and parallel/distributed methods. Moreover, we discuss some current limitations and possible future directions, hoping that this article could bring more attentions to this topic and serve as a catalyst for developing novel and efficient approaches.

Keywords

reinforcement learning / derivative-free optimization / neuroevolution reinforcement learning / neural architecture search

Cite this article

EndNote

Ris (Procite)

Bibtex

Download citation ▾

Hong QIAN, Yang YU. Derivative-free reinforcement learning: a review. Front. Comput. Sci., 2021, 15(6): 156336 https://doi.org/10.1007/s11704-020-0241-4

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Sutton R S, Barto A G. Reinforcement Learning: An Introduction. Cambridge, Massachusetts: MIT Press, 1998 CrossRef Google scholar

[2]	Wiering M, Van Otterlo M. Reinforcement Learning: State-of-the-Art. Berlin, Heidelberg: Springer, 2012 CrossRef Google scholar

[3]	Dietterich T G. Machine learning research: four current directions. Artificial Intelligence Magazine, 1997, 18(4): 97–136

[4]	Mnih V, Kavukcuoglu K, Silver D, Rusu A A, Veness J, Bellemare M G, Graves A, Riedmiller M, Fidjeland A K, Ostrovski G. Human-level control through deep reinforcement learning. Nature, 2015, 518(7540): 529–533 CrossRef Google scholar

[5]	Silver D, Huang A, Maddison C J, Guez A, Sifre L, Driessche G V D, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M. Mastering the game of Go with deep neural networks and tree search. Nature, 2016, 529(7587): 484–489 CrossRef Google scholar

[6]

Silver D, Hubert T, Schrittwieser J, Antonoglou I, Lai M, Guez A, Lanctot M, Sifre L, Kumaran D, Graepel T, Lillicrap T, Simonyan K, Hassabis D. A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science, 2018, 362(6419): 1140–1144

CrossRef Google scholar

[7]	Abbeel P, Coates A, Quigley M, Ng A Y. An application of reinforcement learning to aerobatic helicopter flight. In: Proceedings of the 19th International Conference on Neural Information Processing Systems. 2006, 1–8

[8]	Zoph B, Le Q V. Neural architecture search with reinforcement learning. In: Proceedings of the 5th International Conference on Learning Representations. 2017

[9]	Huang C, Lucey S, Ramanan D. Learning policies for adaptive tracking with deep feature cascades. In: Proceedings of the IEEE International Conference on Computer Vision. 2017, 105–114 CrossRef Google scholar

[10]	Yu L, Zhang W, Wang J, Yu Y. SeqGAN: sequence generative adversarial nets with policy gradient. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence. 2017, 2852–2858

[11]	Wang Y C, Usher J M. Application of reinforcement learning for agentbased production scheduling. Engineering Applications of Artificial Intelligence, 2005, 18(1): 73–82 CrossRef Google scholar

[12]	Choi J J, Laibson D, Madrian B C, Metrick A. Reinforcement learning and savings behavior. The Journal of Finance, 2009, 64(6): 2515–2534 CrossRef Google scholar

[13]	Shi J C, Yu Y, Da Q, Chen S Y, Zeng A. Virtual-taobao: virtualizing real-world online retail environment for reinforcement learning. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence. 2019, 4902–4909 CrossRef Google scholar

[14]	Boyan J A, Littman M L. Packet routing in dynamically changing networks: a reinforcement learning approach. In: Proceedings of the 6th International Conference on Neural Information Processing Systems. 1993, 671–678

[15]	Frank M J, Seeberger L C, O’reilly R C. By carrot or by stick: cognitive reinforcement learning in parkinsonism. Science, 2004, 306(5703): 1940–1943 CrossRef Google scholar

[16]	Samejima K, Ueda Y, Doya K, Kimura M. Representation of actionspecific reward values in the striatum. Science, 2005, 310(5752): 1337–1340 CrossRef Google scholar

[17]	Shalev-Shwartz S, Shamir O, Shammah S. Failures of gradient-based deep learning. In: Proceedings of the 34th International Conference on Machine Learning. 2017, 3067–3075

[18]	Conn A R, Scheinberg K, Vicente L N. Introduction to Derivative-Free Optimization. Philadelphia, PA: SIAM, 2009 CrossRef Google scholar

[19]	Kolda T G, Lewis R M, Torczon V. Optimization by direct search: new perspectives on some classical and modern methods. SIAM Review, 2003, 45(3): 385–482 CrossRef Google scholar

[20]	Rios L M, Sahinidis N V. Derivative-free optimization: a review of algorithms and comparison of software implementations. Journal of Global Optimization, 2013, 56(3): 1247–1293 CrossRef Google scholar

[21]	Sigaud O, Wilson SW. Learning classifier systems: a survey. Soft Computing, 2007, 11(11): 1065–1078 CrossRef Google scholar

[22]	Moriarty D E, Schultz A C, Grefenstette J J. Evolutionary algorithms for reinforcement learning. Journal of Artificial Intelligence Research, 1999, 11: 241–276 CrossRef Google scholar

[23]	Whiteson S. Evolutionary computation for reinforcement learning. In: Wiering M, van Otterlo M, eds. Reinforcement Learning: State-of-the-Art. Springer, Berlin, Heidelberg, 2012, 325–355 CrossRef Google scholar

[24]	Bellman R. A Markovian decision process. Journal of Mathematics and Mechanics, 1957, 6(5): 679–684 CrossRef Google scholar

[25]	Bartlett P L, Baxter J. Infinite-horizon policy gradient estimation. Journal of Artificial Intelligence Research, 2001, 15: 319–350 CrossRef Google scholar

[26]	Holland J H. Adaptation in Natural and Artificial Systems. Ann Arbor, MI: The University of Michigan Press, 1975

[27]	Hansen N, Müller S D, Koumoutsakos P. Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (CMA-ES). Evolutionary Computation, 2003, 11(1): 1–18 CrossRef Google scholar

[28]	Shahriari B, Swersky K, Wang Z, Adams R P, Freitas d N. Taking the human out of the loop: a review of Bayesian optimization. Proceedings of the IEEE, 2016, 104(1): 148–175 CrossRef Google scholar

[29]	De Boer P T, Kroese D P, Mannor S, Rubinstein R Y. A tutorial on the cross-entropy method. Annals of Operations Research, 2005, 134(1): 19–67 CrossRef Google scholar

[30]	Munos R. From bandits to Monte-Carlo tree search: the optimistic principle applied to optimization and planning. Foundations and Trends in Machine Learning, 2014, 7(1): 1–129 CrossRef Google scholar

[31]	Yu Y, Qian H, Hu Y Q. Derivative-free optimization via classification. In: Proceedings of the 30th AAAI Conference on Artificial Intelligence. 2016, 2286–2292

[32]	He J, Yao X. Drift analysis and average time complexity of evolutionary algorithms. Artificial Intelligence, 2001, 127(1): 57–85 CrossRef Google scholar

[33]	Yu Y, Zhou Z H. A new approach to estimating the expected first hitting time of evolutionary algorithms. Artificial Intelligence, 2008, 172(15): 1809–1832 CrossRef Google scholar

[34]	Bull A D. Convergence rates of efficient global optimization algorithms. Journal of Machine Learning Research, 2011, 12: 2879–2904

[35]	Jamieson K G, Nowak R D, Recht B. Query complexity of derivativefree optimization. In: Proceedings of the 25th International Conference on Neural Information Processing Systems. 2012, 2681–2689

[36]	Yu Y, Qian H. The sampling-and-learning framework: a statistical view of evolutionary algorithms. In: Proceedings of the 2014 IEEE Congress on Evolutionary Computation. 2014, 149–158 CrossRef Google scholar

[37]	Duchi J C, Jordan M I, Wainwright M J, Wibisono A. Optimal rates for zero-order convex optimization: the power of two function evaluations. IEEE Transactions on Information Theory, 2015, 61(5): 2788–2806 CrossRef Google scholar

[38]	Yu Y, Qian C, Zhou Z H. Switch analysis for running time analysis of evolutionary algorithms. IEEE Transactions on Evolutionary Computation, 2015, 19(6): 777–792 CrossRef Google scholar

[39]	Kawaguchi K, Kaelbling L P, Lozano-Perez T. Bayesian optimization with exponential convergence. In: Proceedings of the 28th International Conference on Neural Information Processing Systems. 2015, 2809–2817

[40]	Kawaguchi K, Maruyama Y, Zheng X. Global continuous optimization with error bound and fast convergence. Journal of Artificial Intelligence Research, 2016, 56: 153–195 CrossRef Google scholar

[41]	Mitchell M. An Introduction to Genetic Algorithms. Cambridge, MA: MIT Press, 1998 CrossRef Google scholar

[42]	Taylor M E,Whiteson S, Stone P. Comparing evolutionary and temporal difference methods in a reinforcement learning domain. In: Proceedings of the 2006 Conference on Genetic and Evolutionary Computation. 2006, 1321–1328 CrossRef Google scholar

[43]	Abdolmaleki A, Lioutikov R, Peters J, Lau N, Reis L P, Neumann G. Model-based relative entropy stochastic search. In: Proceedings of the 28th International Conference on Neural Information Processing Systems. 2015, 3537–3545

[44]	Hu Y Q, Qian H, Yu Y. Sequential classification-based optimization for direct policy search. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence. 2017, 2029–2035

[45]	Salimans T, Ho J, Chen X, Sidor S, Sutskever I. Evolution strategies as a scalable alternative to reinforcement learning. 2017, arXiv:1703.03864

[46]	Snoek J, Larochelle H, Adams R P. Practical Bayesian optimization of machine learning algorithms. In: Proceedings of the 25th International Conference on Neural Information Processing Systems. 2012, 2960–2968

[47]	Thornton C, Hutter F, Hoos H H, Leyton-Brown K. Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2013, 847–855 CrossRef Google scholar

[48]	Real E, Moore S, Selle A, Saxena S, Suematsu Y L, Tan J, Le Q V, Kurakin A. Large-scale evolution of image classifiers. In: Proceedings of the 34th International Conference on Machine Learning. 2017, 2902–2911

[49]	Real E, Aggarwal A, Huang Y, Le Q V. Regularized evolution for image classifier architecture search. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2019, 4780–4789 CrossRef Google scholar

[50]	Zhang Y, Sohn K, Villegas R, Pan G, Lee H. Improving object detection with deep convolutional networks via Bayesian optimization and structured prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015, 249–258 CrossRef Google scholar

[51]	Qian C, Yu Y, Zhou Z H. Subset selection by pareto optimization. In: Proceedings of the 28th International Conference on Neural Information Processing Systems. 2015, 1765–1773

[52]	Qian C, Shi J C, Yu Y, Tang K. On subset selection with general cost constraints. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence. 2017, 2613–2619 CrossRef Google scholar

[53]	Brown M, An B, Kiekintveld C, Ordóñez F, Tambe M. An extended study onmulti-objective security games. Autonomous Agents andMulti-Agent Systems, 2014, 28(1): 31–71 CrossRef Google scholar

[54]	Domingos P M. A few useful things to know about machine learning. Communications of the ACM, 2012, 55(10): 78–87 CrossRef Google scholar

[55]	Yu Y. Towards sample efficient reinforcement learning. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence. 2018, 5739–5743 CrossRef Google scholar

[56]	Plappert M, Houthooft R, Dhariwal P, Sidor S, Chen R Y, Chen X, Asfour T, Abbeel P, Andrychowicz M. Parameter space noise for exploration. In: Proceedings of the 6th International Conference on Learning Representations. 2018

[57]	Pathak D, Agrawal P, Efros A A, Darrell T. Curiosity-driven exploration by self-supervised prediction. In: Proceedings of the 34th International Conference on Machine Learning. 2017, 2778–2787 CrossRef Google scholar

[58]	Duan Y, Chen X, Houthooft R, Schulman J, Abbeel P. Benchmarking deep reinforcement learning for continuous control. In: Proceedings of the 33rd International Conference on Machine Learning. 2016, 1329–1338

[59]	Schulman J, Levine S, Abbeel P, Jordan M I, Moritz P. Trust region policy optimization. In: Proceedings of the 32nd International Conference on Machine Learning. 2015, 1889–1897

[60]	Bach F R, Perchet V. Highly-smooth zero-th order online optimization. In: Proceedings of the 29th Conference on Learning Theory. 2016, 257–283

[61]	Qian H, Yu Y. Scaling simultaneous optimistic optimization for highdimensional non-convex functions with low effective dimensions. In: Proceedings of the 30th AAAI Conference on Artificial Intelligence. 2016, 2000–2006

[62]	Yao X. Evolving artificial neural networks. Proceedings of the IEEE, 1999, 87(9): 1423–1447 CrossRef Google scholar

[63]	Stanley K O, Clune J, Lehman J, Miikkulainen R. Designing neural networks through neuroevolution. Nature Machine Intelligence, 2019, 1(1): 24–35 CrossRef Google scholar

[64]	Such F P, Madhavan V, Conti E, Lehman J, Stanley K O, Clune J. Deep neuroevolution: genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning. 2017, arXiv preprint arXiv:1712.06567

[65]	Morse G, Stanley K O. Simple evolutionary optimization can rival stochastic gradient descent in neural networks. In: Proceedings of the 2016 Conference on Genetic and Evolutionary Computation. 2016, 477–484 CrossRef Google scholar

[66]	Zhang X, Clune J, Stanley K O. On the relationship between the OpenAI evolution strategy and stochastic gradient descent. 2017, arXiv preprint arXiv:1712.06564

[67]	Koutník J, Cuccu G, Schmidhuber J, Gomez F J. Evolving large-scale neural networks for vision-based reinforcement learning. In: Proceedings of the 2013 Conference on Genetic and Evolutionary Computation. 2013, 1061–1068 CrossRef Google scholar

[68]	Hausknecht M J, Lehman J, Miikkulainen R, Stone P. A neuroevolution approach to general Atari game playing. IEEE Transactions on Computational Intelligence and AI in Games, 2014, 6(4): 355–366 CrossRef Google scholar

[69]	Risi S, Togelius J. Neuroevolution in games: state of the art and open challenges. IEEE Transactions on Computational Intelligence and AI in Games, 2017, 9(1): 25–41 CrossRef Google scholar

[70]	Chrabaszcz P, Loshchilov I, Hutter F. Back to basics: benchmarking canonical evolution strategies for playing atari. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence. 2018, 1419–1426 CrossRef Google scholar

[71]	Mania H, Guy A, Recht B. Simple random search of static linear policies is competitive for reinforcement learning. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems. 2018, 1805–1814

[72]	Malik D, Pananjady A, Bhatia K, Khamaru K, Bartlett P, Wainwright M J. Derivative-free methods for policy optimization: guarantees for linear quadratic systems. In: Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics. 2019, 2916–2925

[73]	Hansen N, Arnold D V, Auger A. Evolution strategies. In: Kacprzyk J, Pedrycz W, eds. Springer Handbook of Computational Intelligence. Springer, Berlin, Heidelberg, 2015, 871–898 CrossRef Google scholar

[74]	Hansen N, Ostermeier A. Adapting arbitrary normal mutation distributions in evolution strategies: the covariance matrix adaptation. In: Proceedings of 1996 IEEE International Conference on Evolutionary Computation. 1996, 312–317

[75]	Hansen N, Ostermeier A. Completely derandomized self-adaptation in evolution strategies. Evolutionary Computation, 2001, 9(2): 159–195 CrossRef Google scholar

[76]	Heidrich-Meisner V, Igel C. Evolution strategies for direct policy search. In: Proceedings of the 10th International Conference on Parallel Problem Solving from Nature. 2008, 428–437 CrossRef Google scholar

[77]	Heidrich-Meisner V, Igel C. Neuroevolution strategies for episodic reinforcement learning. Journal of Algorithms, 2009, 64(4): 152–168 CrossRef Google scholar

[78]	Peters J, Schaal S. Natural actor-critic. Neurocomputing, 2008, 71(7-9): 1180–1190 CrossRef Google scholar

[79]	Heidrich-Meisner V, Igel C. Hoeffding and Bernstein races for selecting policies in evolutionary direct policy search. In: Proceedings of the 26th International Conference on Machine Learning. 2009, 401–408 CrossRef Google scholar

[80]	Stulp F, Sigaud O. Path integral policy improvement with covariance matrix adaptation. In: Proceedings of the 29th International Conference on Machine Learning. 2012

[81]	Szita I, Lörincz A. Learning Tetris using the noisy cross-entropy method. Neural Computation, 2006, 18(12): 2936–2941 CrossRef Google scholar

[82]	Wierstra D, Schaul T, Peters J, Schmidhuber J. Natural evolution strategies. In: Proceedings of the 2008 IEEE Congress on Evolutionary Computation. 2008, 3381–3387 CrossRef Google scholar

[83]	Wierstra D, Schaul T, Glasmachers T, Sun Y, Peters J, Schmidhuber J. Natural evolution strategies. Journal of Machine Learning Research, 2014, 15(1): 949–980

[84]	Salimans T, Goodfellow I J, Zaremba W, Cheung V, Radford A, Chen X. Improved techniques for training GANs. In: Proceedings of the 29th International Conference on Neural Information Processing Systems. 2016, 2226–2234

[85]	Geweke J. Antithetic acceleration of Monte Carlo integration in Bayesian inference. Journal of Econometrics, 1988, 38(1–2): 73–89 CrossRef Google scholar

[86]	Brockhoff D, Auger A, Hansen N, Arnold D V, Hohm T. Mirrored sampling and sequential selection for evolution strategies. In: Proceedings of the 11th International Conference on Parallel Problem Solving from Nature. 2010, 11–21 CrossRef Google scholar

[87]	Todorov E, Erez T, Tassa Y. MuJoCo: a physics engine for model-based control. In: Proceedings of 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. 2012, 5026–5033 CrossRef Google scholar

[88]	Bellemare M G, Naddaf Y, Veness J, Bowling M. The arcade learning environment: an evaluation platform for general agents. Journal of Artificial Intelligence Research, 2013, 47: 253–279 CrossRef Google scholar

[89]	Brockman G, Cheung V, Pettersson L, Schneider J, Schulman J, Tang J, Zaremba W. OpenAI Gym. 2016, arXiv preprint arXiv:1606.01540

[90]	Mnih V, Badia A P, Mirza M, Graves A, Lillicrap T, Harley T, Silver D, Kavukcuoglu K. Asynchronous methods for deep reinforcement learning. In: Proceedings of the 33rd International Conference on Machine Learning. 2016, 1928–1937

[91]	Lehman J, Chen J, Clune J, Stanley K O. ES is more than just a traditional finite-difference approximator. In: Proceedings of the 2018 Conference on Genetic and Evolutionary Computation. 2018, 450–457 CrossRef Google scholar

[92]	Choromanski K, Rowland M, Sindhwani V, Turner R E, Weller A. Structured evolution with compact architectures for scalable policy optimization. In: Proceedings of the 35th International Conference on Machine Learning. 2018, 969–977

[93]	Chen Z, Zhou Y, He X, Jiang S. A restart-based rank-1 evolution strategy for reinforcement learning. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence. 2019, 2130–2136 CrossRef Google scholar

[94]	Choromanski K, Pacchiano A, Parker-Holder J, Tang Y, Sindhwani V. From complexity to simplicity: adaptive ES-active subspaces for blackbox optimization. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2019

[95]	Constantine P G. Active Subspaces — Emerging Ideas for Dimension Reduction in Parameter Studies. volume 2 of SIAM spotlights. Philadelphia, PA: SIAM, 2015 CrossRef Google scholar

[96]	Liu G, Zhao L, Yang F, Bian J, Qin T, Yu N, Liu T Y. Trust region evolution strategies. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence. 2019, 4352–4359 CrossRef Google scholar

[97]	Tang Y, Choromanski K, Kucukelbir A. Variance reduction for evolution strategies via structured control variates. In: Proceedings of International Conference on Artifical Intelligence and Statistics. 2020, 646–656

[98]	Fuks L, Awad N, Hutter F, Lindauer M. An evolution strategy with progressive episode lengths for playing games. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence. 2019, 1234–1240 CrossRef Google scholar

[99]	Houthooft R, Chen Y, Isola P, Stadie B C, Wolski F, Ho J, Abbeel P. Evolved policy gradients. In: Proceedings of the 31th International Conference on Neural Information Processing Systems. 2018, 5405–5414

[100]

Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O. Proximal policy optimization algorithms. 2017, arXiv preprint arXiv:1707.06347

[101]

Duan Y, Schulman J, Chen X, Bartlett P L, Sutskever I, Abbeel P. RL2: Fast reinforcement learning via slow reinforcement learning. 2016, arXiv preprint arXiv:1611.02779

[102]

Finn C, Abbeel P, Levine S. Model-agnostic meta-learning for fast adaptation of deep networks. In: Proceedings of the 34th International Conference on Machine Learning. 2017, 1126–1135

[103]

Ha D, Schmidhuber J. Recurrent world models facilitate policy evolution. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2018, 2455–2467

[104]

Yu W, Liu C K, Turk G. Policy transfer with strategy optimization. In: Proceedings of the 7th International Conference on Learning Representations. 2019

[105]

Lehman J, Stanley K O. Abandoning objectives: evolution through the search for novelty alone. Evolutionary Computation, 2011, 19(2): 189–223

CrossRef Google scholar

[106]

Gangwani T, Peng J. Policy optimization by genetic distillation. In: Proceedings of the 6th International Conference on Learning Representations. 2018

[107]

Ross S, Gordon G J, Bagnell D. A reduction of imitation learning and structured prediction to no-regret online learning. In: Proceedings of the 14th International Conference on Artificial Intelligence and Statistics. 2011, 627–635

[108]

Bodnar C, Day B, Lió P. Proximal distilled evolutionary reinforcement learning. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence. 2020, 3283–3290

CrossRef Google scholar

[109]

Khadka S, Tumer K. Evolution-guided policy gradient in reinforcement learning. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2018, 1196–1208

[110]

Lehman J, Chen J, Clune J, Stanley K O. Safe mutations for deep and recurrent neural networks through output gradients. In: Proceedings of the 2018 Conference on Genetic and Evolutionary Computation. 2018, 117–124

CrossRef Google scholar

[111]

Fujimoto S, Hoof H, Meger D. Addressing function approximation error in actor-critic methods. In: Proceedings of the 35th International Conference on Machine Learning. 2018, 1582–1591

[112]

Rasmussen C E, Williams C K I. Gaussian Processes for Machine Learning. Cambridge, Massachusetts: MIT Press, 2006

CrossRef Google scholar

[113]

Kushner H J. A new method of locating the maximum of an arbitrary multipeak curve in the presence of noise. Journal of Basic Engineering, 1964, 86: 97–106

CrossRef Google scholar

[114]

Močkus J, Tiesis V, Žilinskas A. Toward global optimization. In:Dixon L C W, Szego G P, eds. The Application of Bayesian Methods for Seeking the Extremum. Elsevier, Amsterdam, Netherlands, 1978, 117–128

[115]

Srinivas N, Krause A, Kakade S M, Seeger M W. Gaussian process optimization in the bandit setting: no regret and experimental design. In: Proceedings of the 27th International Conference on Machine Learning. 2010, 1015–1022

[116]

Freitas d N, Smola A J, Zoghi M. Exponential regret bounds for Gaussian process bandits with deterministic observations. In: Proceedings of the 29th International Conference on Machine Learning. 2012

[117]

Brochu E, Cora V M, Freitas d N. A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. 2010, arXiv preprint arXiv:1012.2599

[118]

Wilson A, Fern A, Tadepalli P. Using trajectory data to improve Bayesian optimization for reinforcement learning. Journal of Machine Learning Research, 2014, 15(1): 253–282

[119]

Calandra R, Seyfarth A, Peters J, Deisenroth MP. An experimental comparison of Bayesian optimization for bipedal locomotion. In: Proceedings of the 2014 IEEE International Conference on Robotics and Automation. 2014, 1951–1958

CrossRef Google scholar

[120]

Calandra R, Seyfarth A, Peters J, Deisenroth MP. Bayesian optimization for learning gaits under uncertainty — an experimental comparison on a dynamic bipedal walker. Annals of Mathematics and Artificial Intelligence, 2016, 76(1-2): 5–23

CrossRef Google scholar

[121]

Marco A, Berkenkamp F, Hennig P, Schoellig A P, Krause A, Schaal S, Trimpe S. Virtual vs. real: trading off simulations and physical experiments in reinforcement learning with Bayesian optimization. In: Proceedings of the 2017 IEEE International Conference on Robotics and Automation. 2017, 1557–1563

CrossRef Google scholar

[122]

Letham B, Bakshy E. Bayesian optimization for policy search via onlineoffline experimentation. 2019, arXiv preprint arXiv:1904.01049

[123]

Swersky K, Snoek J, Adams R P. Multi-task Bayesian optimization. In: Proceedings of the 26th International Conference on Neural Information Processing Systems. 2013, 2004–2012

[124]

Vien N A, Zimmermann H, Toussaint M. Bayesian functional optimization. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 2018, 4171–4178

[125]

Vien N A, Dang V H, Chung T. A covariance matrix adaptation evolution strategy for direct policy search in reproducing kernel Hilbert space. In: Proceedings of The 9th Asian Conference on Machine Learning. 2017, 606–621

[126]

Eriksson D, Pearce M, Gardner J R, Turner R, Poloczek M. Scalable global optimization via local Bayesian optimization. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems. 2019, 5497–5508

[127]

Lozano J A, Larranaga P, Inza I, Bengoetxea E. Towards a New Evolutionary Computation: advances on Estimation of Distribution Algorithms. Berlin, Germany: Springer-Verlag, 2006

CrossRef Google scholar

[128]

Hashimoto T, Yadlowsky S, Duchi J C. Derivative free optimization via repeated classification. In: Proceedings of the 2018 International Conference on Artificial Intelligence and Statistics. 2018, 2027–2036

[129]

Zhou A, Zhang J, Sun J, Zhang G. Fuzzy-classification assisted solution preselection in evolutionary optimization. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence. 2019, 2403–2410

CrossRef Google scholar

[130]

Dasgupta D, McGregor D. Designing application-specific neural networks using the structured genetic algorithm. In: Proceedings of the International Conference on Combinations of Genetic Algorithms and Neural Networks. 1992, 87–96

[131]

Stanley K O, Miikkulainen R. Efficient reinforcement learning through evolving neural network topologies. In: Proceedings of the 2002 Conference on Genetic and Evolutionary Computation. 2002, 569–577

[132]

Stanley K O, Miikkulainen R. Evolving neural networks through augmenting topologies. Evolutionary Computation, 2002, 10(2): 99–127

CrossRef Google scholar

[133]

Singh S P, Sutton R S. Reinforcement learning with replacing eligibility traces. Machine Learning, 1996, 22(1–3): 123–158

CrossRef Google scholar

[134]

Whiteson S, Stone P. Sample-efficient evolutionary function approximation for reinforcement learning. In: Proceedings of the 21st AAAI Conference on Artificial Intelligence. 2006, 518–523

[135]

Whiteson S, Stone P. Evolutionary function approximation for reinforcement learning. Journal of Machine Learning Research, 2006, 7: 877–917

[136]

Kohl N, Miikkulainen R. Evolving neural networks for strategic decision-making problems. Neural Networks, 2009, 22(3): 326–337

CrossRef Google scholar

[137]

Gauci J, Stanley K O. A case study on the critical role of geometric regularity in machine learning. In: Proceedings of the 23rd AAAI Conference on Artificial Intelligence. 2008, 628–633

[138]

Hausknecht M J, Khandelwal P, Miikkulainen R, Stone P. HyperNEATGGP: a hyperNEAT-based Atari general game player. In: Proceedings of the 2012 Conference on Genetic and Evolutionary Computation. 2012, 217–224

CrossRef Google scholar

[139]

Ebrahimi S, Rohrbach A, Darrell T. Gradient-free policy architecture search and adaptation. In: Proceedings of the 1st Conference on Robot Learning. 2017, 505–514

[140]

Zoph B, Le Q V. Neural architecture search with reinforcement learning. In: Proceedings of the 5th International Conference on Learning Representations. 2017

[141]

Gaier A, Ha D. Weight agnostic neural networks. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2019, 5365–5379

[142]

Conti E, Madhavan V, Such F P, Lehman J, Stanley K O, Clune J. Improving exploration in evolution strategies for deep reinforcement learning via a population of novelty-seeking agents. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2018, 5032–5043

[143]

Chen X H, Yu Y. Reinforcement learning with derivative-free exploration. In: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems. 2019, 1880–1882

[144]

Lillicrap T P, Hunt J J, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D. Continuous control with deep reinforcement learning. In: Proceedings of the 4th International Conference on Learning Representations. 2016

[145]

Vemula A, Sun W, Bagnell J A. Contrasting exploration in parameter and action space: a zeroth-order optimization perspective. In: Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics. 2019, 2926–2935

[146]

Colas C, Sigaud O, Oudeyer P Y. GEP-PG: decoupling exploration and exploitation in deep reinforcement learning algorithms. In: Proceedings of the 35th International Conference on Machine Learning. 2018, 1038–1047

[147]

Liu Y R, Hu Y Q, Qian H, Yu Y, Qian C. ZOOpt: toolbox for derivativefree optimization. 2017, arXiv preprint arXiv:1801.00329

[148]

Jaderberg M, Dalibard V, Osindero S, Czarnecki W M, Donahue J, Razavi A, Vinyals O, Green T, Dunning I, Simonyan K, Fernando C, Kavukcuoglu K. Population based training of neural networks. 2017, arXiv preprint arXiv:1711.09846

[149]

Beattie C, Leibo J Z, Teplyashin D, Ward T, Wainwright M, Küttler H, Lefrancq A, Green S, Valdés V, Sadik A, Schrittwieser J, Anderson K, York S, Cant M, Cain A, Bolton A, Gaffney S, King H, Hassabis D, Legg S, Petersen S. DeepMind Lab. 2016, arXiv preprint arXiv:1612.03801

[150]

Vinyals O, Ewalds T, Bartunov S, Georgiev P, Vezhnevets A S, Yeo M, Makhzani A, Küttler H, Agapiou J P, Schrittwieser J, Quan J, Gaffney S, Petersen S, Simonyan K, Schaul T, Hasselt v H, Silver D, Lillicrap T P, Calderone K, Keet P, Brunasso A, Lawrence D, Ekermo A, Repp J, Tsing R. StarCraft II: a new challenge for reinforcement learning. 2017, arXiv preprint arXiv:1708.04782

[151]

Moritz P, Nishihara R, Wang S, Tumanov A, Liaw R, Liang E, Paul W, Jordan M I, Stoica I. Ray: a distributed framework for emerging AI applications. In: Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation. 2018, 561–577

[152]

Elfwing S, Uchibe E, Doya K. Online meta-learning by parallel algorithm competition. In: Proceedings of the 2018 Conference on Genetic and Evolutionary Computation. 2018, 426–433

CrossRef Google scholar

[153]

Baker J E. Reducing bias and inefficiency in the selection algorithm. In: Proceedings of the 2nd International Conference on Genetic Algorithms. 1987, 14–21

[154]

Jaderberg M, Czarnecki WM, Dunning I, Marris L, Lever G, Castaneda A G, Beattie C, Rabinowitz N C, Morcos A S, Ruderman A, Sonnerat N, Green T, Deason L, Leibo J Z, Silver D, Hassabis D, Kavukcuoglu K, Graepel T. Human-level performance in 3d multiplayer games with population-based reinforcement learning. Science, 2019, 364(6443): 859–865

CrossRef Google scholar

[155]

Jung W, Park G, Sung Y. Population-guided parallel policy search for reinforcement learning. In: Proceedings of the 8th International Conference on Learning Representations. 2020

[156]

Pourchot A, Perrin N, Sigaud O. Importance mixing: Improving sample reuse in evolutionary policy search methods. 2018, arXiv preprint arXiv:1808.05832

[157]

Stork J, Zaefferer M, Bartz-Beielstein T, Eiben A E. Surrogate models for enhancing the efficiency of neuroevolution in reinforcement learning. In: Proceedings of the 2019 Conference on Genetic and Evolutionary Computation. 2019, 934–942

CrossRef Google scholar

[158]

Bibi A, Bergou E H, Sener O, Ghanem B, Richtárik P. A stochastic derivative-free optimization method with importance sampling: theory and learning to control. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence. 2020, 3275–3282

CrossRef Google scholar

[159]

Chen X, Liu S, Xu K, Li X, Lin X, Hong M, Cox D D. ZO-AdaMM: aeroth-order adaptive momentum method for black-box optimization. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems. 2019, 7202–7213

[160]

Gorbunov E A, Bibi A, Sener O, Bergou E H, Richtárik P. A stochastic derivative free optimization method with momentum. In: Proceedings of the 8th International Conference on Learning Representations. 2020

[161]

Kandasamy K, Schneider J, Poczos B. High dimensional Bayesian optimization and bandits via additive models. In: Proceedings of the 32nd International Conference on Machine Learning. 2015, 295–304

[162]

Wang Z, Zoghi M, Hutter F, Matheson D, Freitas N D. Bayesian optimization in a billion dimensions via random embeddings. Journal of Artificial Intelligence Research, 2016, 55: 361–387

CrossRef Google scholar

[163]

Qian H, Hu Y Q, Yu Y. Derivative-free optimization of high-dimensional non-convex functions by sequential random embeddings. In: Proceedings of the 25th International Joint Conference on Artificial Intelligence. 2016, 1946–1952

[164]

Yang P, Tang K, Yao X. Turning high-dimensional optimization into computationally expensive optimization. IEEE Transactions on Evolutionary Computation, 2018, 22(1): 143–156

CrossRef Google scholar

[165]

Mutny M, Krause A. Efficient high dimensional Bayesian optimization with additivity and quadrature fourier features. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2018, 9019–9030

[166]

Müller N, Glasmachers T. Challenges in high-dimensional reinforcement learning with evolution strategies. In: Proceedings of the 15th International Conference on Parallel Problem Solving from Nature. 2018, 411–423

CrossRef Google scholar

[167]

Li Z, Zhang Q, Lin X, Zhen H L. Fast covariance matrix adaptation for large-scale black-box optimization. IEEE Transaction on Cybernetics, 2020, 50(5): 2073–2083

CrossRef Google scholar

[168]

Wang H, Qian H, Yu Y. Noisy derivative-free optimization with value suppression. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 2018, 1447–1454

RIGHTS & PERMISSIONS

2021 Higher Education Press

AI Summary AI Mindmap

PDF(699 KB)

Accesses

Citations

Detail

Sections

Recommended

Received	Accepted	Published
31 May 2020	20 Nov 2020	15 Dec 2021
Just Accepted Date	Issue Date
16 Mar 2021	07 Sep 2021

About the journal

Aims & scope

Description

Editorial board

Abstracting / Indexing

Contact us

Browse

Just accepted

Online first

Latest issue

All volumes and issues

Collections

Featured articles

Most accessed

Most cited

Collections

Multimedia collections

Authors & reviewers

Online submisson

Call for papers

Guidelines for authors

Download templates

Guidelines for reviewers

Abstract

Keywords

Cite this article

{{custom_sec.title}}

{{custom_sec.title}}

References

RIGHTS & PERMISSIONS