Derivative-free reinforcement learning: a review

Hong QIAN , Yang YU

Front. Comput. Sci. ›› 2021, Vol. 15 ›› Issue (6) : 156336

PDF (699KB)
Front. Comput. Sci. ›› 2021, Vol. 15 ›› Issue (6) : 156336 DOI: 10.1007/s11704-020-0241-4
REVIEW ARTICLE

Derivative-free reinforcement learning: a review

Author information +
History +
PDF (699KB)

Abstract

Reinforcement learning is about learning agent models that make the best sequential decisions in unknown environments. In an unknown environment, the agent needs to explore the environment while exploiting the collected information, which usually forms a sophisticated problem to solve. Derivative-free optimization, meanwhile, is capable of solving sophisticated problems. It commonly uses a sampling-andupdating framework to iteratively improve the solution, where exploration and exploitation are also needed to be well balanced. Therefore, derivative-free optimization deals with a similar core issue as reinforcement learning, and has been introduced in reinforcement learning approaches, under the names of learning classifier systems and neuroevolution/evolutionary reinforcement learning. Although such methods have been developed for decades, recently, derivative-free reinforcement learning exhibits attracting increasing attention. However, recent survey on this topic is still lacking. In this article, we summarize methods of derivative-free reinforcement learning to date, and organize the methods in aspects including parameter updating, model selection, exploration, and parallel/distributed methods. Moreover, we discuss some current limitations and possible future directions, hoping that this article could bring more attentions to this topic and serve as a catalyst for developing novel and efficient approaches.

Keywords

reinforcement learning / derivative-free optimization / neuroevolution reinforcement learning / neural architecture search

Cite this article

Download citation ▾
Hong QIAN, Yang YU. Derivative-free reinforcement learning: a review. Front. Comput. Sci., 2021, 15(6): 156336 DOI:10.1007/s11704-020-0241-4

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Sutton R S, Barto A G. Reinforcement Learning: An Introduction. Cambridge, Massachusetts: MIT Press, 1998

[2]

Wiering M, Van Otterlo M. Reinforcement Learning: State-of-the-Art. Berlin, Heidelberg: Springer, 2012

[3]

Dietterich T G. Machine learning research: four current directions. Artificial Intelligence Magazine, 1997, 18(4): 97–136

[4]

Mnih V, Kavukcuoglu K, Silver D, Rusu A A, Veness J, Bellemare M G, Graves A, Riedmiller M, Fidjeland A K, Ostrovski G. Human-level control through deep reinforcement learning. Nature, 2015, 518(7540): 529–533

[5]

Silver D, Huang A, Maddison C J, Guez A, Sifre L, Driessche G V D, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M. Mastering the game of Go with deep neural networks and tree search. Nature, 2016, 529(7587): 484–489

[6]

Silver D, Hubert T, Schrittwieser J, Antonoglou I, Lai M, Guez A, Lanctot M, Sifre L, Kumaran D, Graepel T, Lillicrap T, Simonyan K, Hassabis D. A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science, 2018, 362(6419): 1140–1144

[7]

Abbeel P, Coates A, Quigley M, Ng A Y. An application of reinforcement learning to aerobatic helicopter flight. In: Proceedings of the 19th International Conference on Neural Information Processing Systems. 2006, 1–8

[8]

Zoph B, Le Q V. Neural architecture search with reinforcement learning. In: Proceedings of the 5th International Conference on Learning Representations. 2017

[9]

Huang C, Lucey S, Ramanan D. Learning policies for adaptive tracking with deep feature cascades. In: Proceedings of the IEEE International Conference on Computer Vision. 2017, 105–114

[10]

Yu L, Zhang W, Wang J, Yu Y. SeqGAN: sequence generative adversarial nets with policy gradient. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence. 2017, 2852–2858

[11]

Wang Y C, Usher J M. Application of reinforcement learning for agentbased production scheduling. Engineering Applications of Artificial Intelligence, 2005, 18(1): 73–82

[12]

Choi J J, Laibson D, Madrian B C, Metrick A. Reinforcement learning and savings behavior. The Journal of Finance, 2009, 64(6): 2515–2534

[13]

Shi J C, Yu Y, Da Q, Chen S Y, Zeng A. Virtual-taobao: virtualizing real-world online retail environment for reinforcement learning. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence. 2019, 4902–4909

[14]

Boyan J A, Littman M L. Packet routing in dynamically changing networks: a reinforcement learning approach. In: Proceedings of the 6th International Conference on Neural Information Processing Systems. 1993, 671–678

[15]

Frank M J, Seeberger L C, O’reilly R C. By carrot or by stick: cognitive reinforcement learning in parkinsonism. Science, 2004, 306(5703): 1940–1943

[16]

Samejima K, Ueda Y, Doya K, Kimura M. Representation of actionspecific reward values in the striatum. Science, 2005, 310(5752): 1337–1340

[17]

Shalev-Shwartz S, Shamir O, Shammah S. Failures of gradient-based deep learning. In: Proceedings of the 34th International Conference on Machine Learning. 2017, 3067–3075

[18]

Conn A R, Scheinberg K, Vicente L N. Introduction to Derivative-Free Optimization. Philadelphia, PA: SIAM, 2009

[19]

Kolda T G, Lewis R M, Torczon V. Optimization by direct search: new perspectives on some classical and modern methods. SIAM Review, 2003, 45(3): 385–482

[20]

Rios L M, Sahinidis N V. Derivative-free optimization: a review of algorithms and comparison of software implementations. Journal of Global Optimization, 2013, 56(3): 1247–1293

[21]

Sigaud O, Wilson SW. Learning classifier systems: a survey. Soft Computing, 2007, 11(11): 1065–1078

[22]

Moriarty D E, Schultz A C, Grefenstette J J. Evolutionary algorithms for reinforcement learning. Journal of Artificial Intelligence Research, 1999, 11: 241–276

[23]

Whiteson S. Evolutionary computation for reinforcement learning. In: Wiering M, van Otterlo M, eds. Reinforcement Learning: State-of-the-Art. Springer, Berlin, Heidelberg, 2012, 325–355

[24]

Bellman R. A Markovian decision process. Journal of Mathematics and Mechanics, 1957, 6(5): 679–684

[25]

Bartlett P L, Baxter J. Infinite-horizon policy gradient estimation. Journal of Artificial Intelligence Research, 2001, 15: 319–350

[26]

Holland J H. Adaptation in Natural and Artificial Systems. Ann Arbor, MI: The University of Michigan Press, 1975

[27]

Hansen N, Müller S D, Koumoutsakos P. Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (CMA-ES). Evolutionary Computation, 2003, 11(1): 1–18

[28]

Shahriari B, Swersky K, Wang Z, Adams R P, Freitas d N. Taking the human out of the loop: a review of Bayesian optimization. Proceedings of the IEEE, 2016, 104(1): 148–175

[29]

De Boer P T, Kroese D P, Mannor S, Rubinstein R Y. A tutorial on the cross-entropy method. Annals of Operations Research, 2005, 134(1): 19–67

[30]

Munos R. From bandits to Monte-Carlo tree search: the optimistic principle applied to optimization and planning. Foundations and Trends in Machine Learning, 2014, 7(1): 1–129

[31]

Yu Y, Qian H, Hu Y Q. Derivative-free optimization via classification. In: Proceedings of the 30th AAAI Conference on Artificial Intelligence. 2016, 2286–2292

[32]

He J, Yao X. Drift analysis and average time complexity of evolutionary algorithms. Artificial Intelligence, 2001, 127(1): 57–85

[33]

Yu Y, Zhou Z H. A new approach to estimating the expected first hitting time of evolutionary algorithms. Artificial Intelligence, 2008, 172(15): 1809–1832

[34]

Bull A D. Convergence rates of efficient global optimization algorithms. Journal of Machine Learning Research, 2011, 12: 2879–2904

[35]

Jamieson K G, Nowak R D, Recht B. Query complexity of derivativefree optimization. In: Proceedings of the 25th International Conference on Neural Information Processing Systems. 2012, 2681–2689

[36]

Yu Y, Qian H. The sampling-and-learning framework: a statistical view of evolutionary algorithms. In: Proceedings of the 2014 IEEE Congress on Evolutionary Computation. 2014, 149–158

[37]

Duchi J C, Jordan M I, Wainwright M J, Wibisono A. Optimal rates for zero-order convex optimization: the power of two function evaluations. IEEE Transactions on Information Theory, 2015, 61(5): 2788–2806

[38]

Yu Y, Qian C, Zhou Z H. Switch analysis for running time analysis of evolutionary algorithms. IEEE Transactions on Evolutionary Computation, 2015, 19(6): 777–792

[39]

Kawaguchi K, Kaelbling L P, Lozano-Perez T. Bayesian optimization with exponential convergence. In: Proceedings of the 28th International Conference on Neural Information Processing Systems. 2015, 2809–2817

[40]

Kawaguchi K, Maruyama Y, Zheng X. Global continuous optimization with error bound and fast convergence. Journal of Artificial Intelligence Research, 2016, 56: 153–195

[41]

Mitchell M. An Introduction to Genetic Algorithms. Cambridge, MA: MIT Press, 1998

[42]

Taylor M E,Whiteson S, Stone P. Comparing evolutionary and temporal difference methods in a reinforcement learning domain. In: Proceedings of the 2006 Conference on Genetic and Evolutionary Computation. 2006, 1321–1328

[43]

Abdolmaleki A, Lioutikov R, Peters J, Lau N, Reis L P, Neumann G. Model-based relative entropy stochastic search. In: Proceedings of the 28th International Conference on Neural Information Processing Systems. 2015, 3537–3545

[44]

Hu Y Q, Qian H, Yu Y. Sequential classification-based optimization for direct policy search. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence. 2017, 2029–2035

[45]

Salimans T, Ho J, Chen X, Sidor S, Sutskever I. Evolution strategies as a scalable alternative to reinforcement learning. 2017, arXiv:1703.03864

[46]

Snoek J, Larochelle H, Adams R P. Practical Bayesian optimization of machine learning algorithms. In: Proceedings of the 25th International Conference on Neural Information Processing Systems. 2012, 2960–2968

[47]

Thornton C, Hutter F, Hoos H H, Leyton-Brown K. Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2013, 847–855

[48]

Real E, Moore S, Selle A, Saxena S, Suematsu Y L, Tan J, Le Q V, Kurakin A. Large-scale evolution of image classifiers. In: Proceedings of the 34th International Conference on Machine Learning. 2017, 2902–2911

[49]

Real E, Aggarwal A, Huang Y, Le Q V. Regularized evolution for image classifier architecture search. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2019, 4780–4789

[50]

Zhang Y, Sohn K, Villegas R, Pan G, Lee H. Improving object detection with deep convolutional networks via Bayesian optimization and structured prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015, 249–258

[51]

Qian C, Yu Y, Zhou Z H. Subset selection by pareto optimization. In: Proceedings of the 28th International Conference on Neural Information Processing Systems. 2015, 1765–1773

[52]

Qian C, Shi J C, Yu Y, Tang K. On subset selection with general cost constraints. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence. 2017, 2613–2619

[53]

Brown M, An B, Kiekintveld C, Ordóñez F, Tambe M. An extended study onmulti-objective security games. Autonomous Agents andMulti-Agent Systems, 2014, 28(1): 31–71

[54]

Domingos P M. A few useful things to know about machine learning. Communications of the ACM, 2012, 55(10): 78–87

[55]

Yu Y. Towards sample efficient reinforcement learning. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence. 2018, 5739–5743

[56]

Plappert M, Houthooft R, Dhariwal P, Sidor S, Chen R Y, Chen X, Asfour T, Abbeel P, Andrychowicz M. Parameter space noise for exploration. In: Proceedings of the 6th International Conference on Learning Representations. 2018

[57]

Pathak D, Agrawal P, Efros A A, Darrell T. Curiosity-driven exploration by self-supervised prediction. In: Proceedings of the 34th International Conference on Machine Learning. 2017, 2778–2787

[58]

Duan Y, Chen X, Houthooft R, Schulman J, Abbeel P. Benchmarking deep reinforcement learning for continuous control. In: Proceedings of the 33rd International Conference on Machine Learning. 2016, 1329–1338

[59]

Schulman J, Levine S, Abbeel P, Jordan M I, Moritz P. Trust region policy optimization. In: Proceedings of the 32nd International Conference on Machine Learning. 2015, 1889–1897

[60]

Bach F R, Perchet V. Highly-smooth zero-th order online optimization. In: Proceedings of the 29th Conference on Learning Theory. 2016, 257–283

[61]

Qian H, Yu Y. Scaling simultaneous optimistic optimization for highdimensional non-convex functions with low effective dimensions. In: Proceedings of the 30th AAAI Conference on Artificial Intelligence. 2016, 2000–2006

[62]

Yao X. Evolving artificial neural networks. Proceedings of the IEEE, 1999, 87(9): 1423–1447

[63]

Stanley K O, Clune J, Lehman J, Miikkulainen R. Designing neural networks through neuroevolution. Nature Machine Intelligence, 2019, 1(1): 24–35

[64]

Such F P, Madhavan V, Conti E, Lehman J, Stanley K O, Clune J. Deep neuroevolution: genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning. 2017, arXiv preprint arXiv:1712.06567

[65]

Morse G, Stanley K O. Simple evolutionary optimization can rival stochastic gradient descent in neural networks. In: Proceedings of the 2016 Conference on Genetic and Evolutionary Computation. 2016, 477–484

[66]

Zhang X, Clune J, Stanley K O. On the relationship between the OpenAI evolution strategy and stochastic gradient descent. 2017, arXiv preprint arXiv:1712.06564

[67]

Koutník J, Cuccu G, Schmidhuber J, Gomez F J. Evolving large-scale neural networks for vision-based reinforcement learning. In: Proceedings of the 2013 Conference on Genetic and Evolutionary Computation. 2013, 1061–1068

[68]

Hausknecht M J, Lehman J, Miikkulainen R, Stone P. A neuroevolution approach to general Atari game playing. IEEE Transactions on Computational Intelligence and AI in Games, 2014, 6(4): 355–366

[69]

Risi S, Togelius J. Neuroevolution in games: state of the art and open challenges. IEEE Transactions on Computational Intelligence and AI in Games, 2017, 9(1): 25–41

[70]

Chrabaszcz P, Loshchilov I, Hutter F. Back to basics: benchmarking canonical evolution strategies for playing atari. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence. 2018, 1419–1426

[71]

Mania H, Guy A, Recht B. Simple random search of static linear policies is competitive for reinforcement learning. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems. 2018, 1805–1814

[72]

Malik D, Pananjady A, Bhatia K, Khamaru K, Bartlett P, Wainwright M J. Derivative-free methods for policy optimization: guarantees for linear quadratic systems. In: Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics. 2019, 2916–2925

[73]

Hansen N, Arnold D V, Auger A. Evolution strategies. In: Kacprzyk J, Pedrycz W, eds. Springer Handbook of Computational Intelligence. Springer, Berlin, Heidelberg, 2015, 871–898

[74]

Hansen N, Ostermeier A. Adapting arbitrary normal mutation distributions in evolution strategies: the covariance matrix adaptation. In: Proceedings of 1996 IEEE International Conference on Evolutionary Computation. 1996, 312–317

[75]

Hansen N, Ostermeier A. Completely derandomized self-adaptation in evolution strategies. Evolutionary Computation, 2001, 9(2): 159–195

[76]

Heidrich-Meisner V, Igel C. Evolution strategies for direct policy search. In: Proceedings of the 10th International Conference on Parallel Problem Solving from Nature. 2008, 428–437

[77]

Heidrich-Meisner V, Igel C. Neuroevolution strategies for episodic reinforcement learning. Journal of Algorithms, 2009, 64(4): 152–168

[78]

Peters J, Schaal S. Natural actor-critic. Neurocomputing, 2008, 71(7-9): 1180–1190

[79]

Heidrich-Meisner V, Igel C. Hoeffding and Bernstein races for selecting policies in evolutionary direct policy search. In: Proceedings of the 26th International Conference on Machine Learning. 2009, 401–408

[80]

Stulp F, Sigaud O. Path integral policy improvement with covariance matrix adaptation. In: Proceedings of the 29th International Conference on Machine Learning. 2012

[81]

Szita I, Lörincz A. Learning Tetris using the noisy cross-entropy method. Neural Computation, 2006, 18(12): 2936–2941

[82]

Wierstra D, Schaul T, Peters J, Schmidhuber J. Natural evolution strategies. In: Proceedings of the 2008 IEEE Congress on Evolutionary Computation. 2008, 3381–3387

[83]

Wierstra D, Schaul T, Glasmachers T, Sun Y, Peters J, Schmidhuber J. Natural evolution strategies. Journal of Machine Learning Research, 2014, 15(1): 949–980

[84]

Salimans T, Goodfellow I J, Zaremba W, Cheung V, Radford A, Chen X. Improved techniques for training GANs. In: Proceedings of the 29th International Conference on Neural Information Processing Systems. 2016, 2226–2234

[85]

Geweke J. Antithetic acceleration of Monte Carlo integration in Bayesian inference. Journal of Econometrics, 1988, 38(1–2): 73–89

[86]

Brockhoff D, Auger A, Hansen N, Arnold D V, Hohm T. Mirrored sampling and sequential selection for evolution strategies. In: Proceedings of the 11th International Conference on Parallel Problem Solving from Nature. 2010, 11–21

[87]

Todorov E, Erez T, Tassa Y. MuJoCo: a physics engine for model-based control. In: Proceedings of 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. 2012, 5026–5033

[88]

Bellemare M G, Naddaf Y, Veness J, Bowling M. The arcade learning environment: an evaluation platform for general agents. Journal of Artificial Intelligence Research, 2013, 47: 253–279

[89]

Brockman G, Cheung V, Pettersson L, Schneider J, Schulman J, Tang J, Zaremba W. OpenAI Gym. 2016, arXiv preprint arXiv:1606.01540

[90]

Mnih V, Badia A P, Mirza M, Graves A, Lillicrap T, Harley T, Silver D, Kavukcuoglu K. Asynchronous methods for deep reinforcement learning. In: Proceedings of the 33rd International Conference on Machine Learning. 2016, 1928–1937

[91]

Lehman J, Chen J, Clune J, Stanley K O. ES is more than just a traditional finite-difference approximator. In: Proceedings of the 2018 Conference on Genetic and Evolutionary Computation. 2018, 450–457

[92]

Choromanski K, Rowland M, Sindhwani V, Turner R E, Weller A. Structured evolution with compact architectures for scalable policy optimization. In: Proceedings of the 35th International Conference on Machine Learning. 2018, 969–977

[93]

Chen Z, Zhou Y, He X, Jiang S. A restart-based rank-1 evolution strategy for reinforcement learning. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence. 2019, 2130–2136

[94]

Choromanski K, Pacchiano A, Parker-Holder J, Tang Y, Sindhwani V. From complexity to simplicity: adaptive ES-active subspaces for blackbox optimization. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2019

[95]

Constantine P G. Active Subspaces — Emerging Ideas for Dimension Reduction in Parameter Studies. volume 2 of SIAM spotlights. Philadelphia, PA: SIAM, 2015

[96]

Liu G, Zhao L, Yang F, Bian J, Qin T, Yu N, Liu T Y. Trust region evolution strategies. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence. 2019, 4352–4359

[97]

Tang Y, Choromanski K, Kucukelbir A. Variance reduction for evolution strategies via structured control variates. In: Proceedings of International Conference on Artifical Intelligence and Statistics. 2020, 646–656

[98]

Fuks L, Awad N, Hutter F, Lindauer M. An evolution strategy with progressive episode lengths for playing games. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence. 2019, 1234–1240

[99]

Houthooft R, Chen Y, Isola P, Stadie B C, Wolski F, Ho J, Abbeel P. Evolved policy gradients. In: Proceedings of the 31th International Conference on Neural Information Processing Systems. 2018, 5405–5414

[100]

Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O. Proximal policy optimization algorithms. 2017, arXiv preprint arXiv:1707.06347

[101]

Duan Y, Schulman J, Chen X, Bartlett P L, Sutskever I, Abbeel P. RL2: Fast reinforcement learning via slow reinforcement learning. 2016, arXiv preprint arXiv:1611.02779

[102]

Finn C, Abbeel P, Levine S. Model-agnostic meta-learning for fast adaptation of deep networks. In: Proceedings of the 34th International Conference on Machine Learning. 2017, 1126–1135

[103]

Ha D, Schmidhuber J. Recurrent world models facilitate policy evolution. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2018, 2455–2467

[104]

Yu W, Liu C K, Turk G. Policy transfer with strategy optimization. In: Proceedings of the 7th International Conference on Learning Representations. 2019

[105]

Lehman J, Stanley K O. Abandoning objectives: evolution through the search for novelty alone. Evolutionary Computation, 2011, 19(2): 189–223

[106]

Gangwani T, Peng J. Policy optimization by genetic distillation. In: Proceedings of the 6th International Conference on Learning Representations. 2018

[107]

Ross S, Gordon G J, Bagnell D. A reduction of imitation learning and structured prediction to no-regret online learning. In: Proceedings of the 14th International Conference on Artificial Intelligence and Statistics. 2011, 627–635

[108]

Bodnar C, Day B, Lió P. Proximal distilled evolutionary reinforcement learning. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence. 2020, 3283–3290

[109]

Khadka S, Tumer K. Evolution-guided policy gradient in reinforcement learning. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2018, 1196–1208

[110]

Lehman J, Chen J, Clune J, Stanley K O. Safe mutations for deep and recurrent neural networks through output gradients. In: Proceedings of the 2018 Conference on Genetic and Evolutionary Computation. 2018, 117–124

[111]

Fujimoto S, Hoof H, Meger D. Addressing function approximation error in actor-critic methods. In: Proceedings of the 35th International Conference on Machine Learning. 2018, 1582–1591

[112]

Rasmussen C E, Williams C K I. Gaussian Processes for Machine Learning. Cambridge, Massachusetts: MIT Press, 2006

[113]

Kushner H J. A new method of locating the maximum of an arbitrary multipeak curve in the presence of noise. Journal of Basic Engineering, 1964, 86: 97–106

[114]

Močkus J, Tiesis V, Žilinskas A. Toward global optimization. In:Dixon L C W, Szego G P, eds. The Application of Bayesian Methods for Seeking the Extremum. Elsevier, Amsterdam, Netherlands, 1978, 117–128

[115]

Srinivas N, Krause A, Kakade S M, Seeger M W. Gaussian process optimization in the bandit setting: no regret and experimental design. In: Proceedings of the 27th International Conference on Machine Learning. 2010, 1015–1022

[116]

Freitas d N, Smola A J, Zoghi M. Exponential regret bounds for Gaussian process bandits with deterministic observations. In: Proceedings of the 29th International Conference on Machine Learning. 2012

[117]

Brochu E, Cora V M, Freitas d N. A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. 2010, arXiv preprint arXiv:1012.2599

[118]

Wilson A, Fern A, Tadepalli P. Using trajectory data to improve Bayesian optimization for reinforcement learning. Journal of Machine Learning Research, 2014, 15(1): 253–282

[119]

Calandra R, Seyfarth A, Peters J, Deisenroth MP. An experimental comparison of Bayesian optimization for bipedal locomotion. In: Proceedings of the 2014 IEEE International Conference on Robotics and Automation. 2014, 1951–1958

[120]

Calandra R, Seyfarth A, Peters J, Deisenroth MP. Bayesian optimization for learning gaits under uncertainty — an experimental comparison on a dynamic bipedal walker. Annals of Mathematics and Artificial Intelligence, 2016, 76(1-2): 5–23

[121]

Marco A, Berkenkamp F, Hennig P, Schoellig A P, Krause A, Schaal S, Trimpe S. Virtual vs. real: trading off simulations and physical experiments in reinforcement learning with Bayesian optimization. In: Proceedings of the 2017 IEEE International Conference on Robotics and Automation. 2017, 1557–1563

[122]

Letham B, Bakshy E. Bayesian optimization for policy search via onlineoffline experimentation. 2019, arXiv preprint arXiv:1904.01049

[123]

Swersky K, Snoek J, Adams R P. Multi-task Bayesian optimization. In: Proceedings of the 26th International Conference on Neural Information Processing Systems. 2013, 2004–2012

[124]

Vien N A, Zimmermann H, Toussaint M. Bayesian functional optimization. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 2018, 4171–4178

[125]

Vien N A, Dang V H, Chung T. A covariance matrix adaptation evolution strategy for direct policy search in reproducing kernel Hilbert space. In: Proceedings of The 9th Asian Conference on Machine Learning. 2017, 606–621

[126]

Eriksson D, Pearce M, Gardner J R, Turner R, Poloczek M. Scalable global optimization via local Bayesian optimization. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems. 2019, 5497–5508

[127]

Lozano J A, Larranaga P, Inza I, Bengoetxea E. Towards a New Evolutionary Computation: advances on Estimation of Distribution Algorithms. Berlin, Germany: Springer-Verlag, 2006

[128]

Hashimoto T, Yadlowsky S, Duchi J C. Derivative free optimization via repeated classification. In: Proceedings of the 2018 International Conference on Artificial Intelligence and Statistics. 2018, 2027–2036

[129]

Zhou A, Zhang J, Sun J, Zhang G. Fuzzy-classification assisted solution preselection in evolutionary optimization. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence. 2019, 2403–2410

[130]

Dasgupta D, McGregor D. Designing application-specific neural networks using the structured genetic algorithm. In: Proceedings of the International Conference on Combinations of Genetic Algorithms and Neural Networks. 1992, 87–96

[131]

Stanley K O, Miikkulainen R. Efficient reinforcement learning through evolving neural network topologies. In: Proceedings of the 2002 Conference on Genetic and Evolutionary Computation. 2002, 569–577

[132]

Stanley K O, Miikkulainen R. Evolving neural networks through augmenting topologies. Evolutionary Computation, 2002, 10(2): 99–127

[133]

Singh S P, Sutton R S. Reinforcement learning with replacing eligibility traces. Machine Learning, 1996, 22(1–3): 123–158

[134]

Whiteson S, Stone P. Sample-efficient evolutionary function approximation for reinforcement learning. In: Proceedings of the 21st AAAI Conference on Artificial Intelligence. 2006, 518–523

[135]

Whiteson S, Stone P. Evolutionary function approximation for reinforcement learning. Journal of Machine Learning Research, 2006, 7: 877–917

[136]

Kohl N, Miikkulainen R. Evolving neural networks for strategic decision-making problems. Neural Networks, 2009, 22(3): 326–337

[137]

Gauci J, Stanley K O. A case study on the critical role of geometric regularity in machine learning. In: Proceedings of the 23rd AAAI Conference on Artificial Intelligence. 2008, 628–633

[138]

Hausknecht M J, Khandelwal P, Miikkulainen R, Stone P. HyperNEATGGP: a hyperNEAT-based Atari general game player. In: Proceedings of the 2012 Conference on Genetic and Evolutionary Computation. 2012, 217–224

[139]

Ebrahimi S, Rohrbach A, Darrell T. Gradient-free policy architecture search and adaptation. In: Proceedings of the 1st Conference on Robot Learning. 2017, 505–514

[140]

Zoph B, Le Q V. Neural architecture search with reinforcement learning. In: Proceedings of the 5th International Conference on Learning Representations. 2017

[141]

Gaier A, Ha D. Weight agnostic neural networks. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2019, 5365–5379

[142]

Conti E, Madhavan V, Such F P, Lehman J, Stanley K O, Clune J. Improving exploration in evolution strategies for deep reinforcement learning via a population of novelty-seeking agents. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2018, 5032–5043

[143]

Chen X H, Yu Y. Reinforcement learning with derivative-free exploration. In: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems. 2019, 1880–1882

[144]

Lillicrap T P, Hunt J J, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D. Continuous control with deep reinforcement learning. In: Proceedings of the 4th International Conference on Learning Representations. 2016

[145]

Vemula A, Sun W, Bagnell J A. Contrasting exploration in parameter and action space: a zeroth-order optimization perspective. In: Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics. 2019, 2926–2935

[146]

Colas C, Sigaud O, Oudeyer P Y. GEP-PG: decoupling exploration and exploitation in deep reinforcement learning algorithms. In: Proceedings of the 35th International Conference on Machine Learning. 2018, 1038–1047

[147]

Liu Y R, Hu Y Q, Qian H, Yu Y, Qian C. ZOOpt: toolbox for derivativefree optimization. 2017, arXiv preprint arXiv:1801.00329

[148]

Jaderberg M, Dalibard V, Osindero S, Czarnecki W M, Donahue J, Razavi A, Vinyals O, Green T, Dunning I, Simonyan K, Fernando C, Kavukcuoglu K. Population based training of neural networks. 2017, arXiv preprint arXiv:1711.09846

[149]

Beattie C, Leibo J Z, Teplyashin D, Ward T, Wainwright M, Küttler H, Lefrancq A, Green S, Valdés V, Sadik A, Schrittwieser J, Anderson K, York S, Cant M, Cain A, Bolton A, Gaffney S, King H, Hassabis D, Legg S, Petersen S. DeepMind Lab. 2016, arXiv preprint arXiv:1612.03801

[150]

Vinyals O, Ewalds T, Bartunov S, Georgiev P, Vezhnevets A S, Yeo M, Makhzani A, Küttler H, Agapiou J P, Schrittwieser J, Quan J, Gaffney S, Petersen S, Simonyan K, Schaul T, Hasselt v H, Silver D, Lillicrap T P, Calderone K, Keet P, Brunasso A, Lawrence D, Ekermo A, Repp J, Tsing R. StarCraft II: a new challenge for reinforcement learning. 2017, arXiv preprint arXiv:1708.04782

[151]

Moritz P, Nishihara R, Wang S, Tumanov A, Liaw R, Liang E, Paul W, Jordan M I, Stoica I. Ray: a distributed framework for emerging AI applications. In: Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation. 2018, 561–577

[152]

Elfwing S, Uchibe E, Doya K. Online meta-learning by parallel algorithm competition. In: Proceedings of the 2018 Conference on Genetic and Evolutionary Computation. 2018, 426–433

[153]

Baker J E. Reducing bias and inefficiency in the selection algorithm. In: Proceedings of the 2nd International Conference on Genetic Algorithms. 1987, 14–21

[154]

Jaderberg M, Czarnecki WM, Dunning I, Marris L, Lever G, Castaneda A G, Beattie C, Rabinowitz N C, Morcos A S, Ruderman A, Sonnerat N, Green T, Deason L, Leibo J Z, Silver D, Hassabis D, Kavukcuoglu K, Graepel T. Human-level performance in 3d multiplayer games with population-based reinforcement learning. Science, 2019, 364(6443): 859–865

[155]

Jung W, Park G, Sung Y. Population-guided parallel policy search for reinforcement learning. In: Proceedings of the 8th International Conference on Learning Representations. 2020

[156]

Pourchot A, Perrin N, Sigaud O. Importance mixing: Improving sample reuse in evolutionary policy search methods. 2018, arXiv preprint arXiv:1808.05832

[157]

Stork J, Zaefferer M, Bartz-Beielstein T, Eiben A E. Surrogate models for enhancing the efficiency of neuroevolution in reinforcement learning. In: Proceedings of the 2019 Conference on Genetic and Evolutionary Computation. 2019, 934–942

[158]

Bibi A, Bergou E H, Sener O, Ghanem B, Richtárik P. A stochastic derivative-free optimization method with importance sampling: theory and learning to control. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence. 2020, 3275–3282

[159]

Chen X, Liu S, Xu K, Li X, Lin X, Hong M, Cox D D. ZO-AdaMM: aeroth-order adaptive momentum method for black-box optimization. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems. 2019, 7202–7213

[160]

Gorbunov E A, Bibi A, Sener O, Bergou E H, Richtárik P. A stochastic derivative free optimization method with momentum. In: Proceedings of the 8th International Conference on Learning Representations. 2020

[161]

Kandasamy K, Schneider J, Poczos B. High dimensional Bayesian optimization and bandits via additive models. In: Proceedings of the 32nd International Conference on Machine Learning. 2015, 295–304

[162]

Wang Z, Zoghi M, Hutter F, Matheson D, Freitas N D. Bayesian optimization in a billion dimensions via random embeddings. Journal of Artificial Intelligence Research, 2016, 55: 361–387

[163]

Qian H, Hu Y Q, Yu Y. Derivative-free optimization of high-dimensional non-convex functions by sequential random embeddings. In: Proceedings of the 25th International Joint Conference on Artificial Intelligence. 2016, 1946–1952

[164]

Yang P, Tang K, Yao X. Turning high-dimensional optimization into computationally expensive optimization. IEEE Transactions on Evolutionary Computation, 2018, 22(1): 143–156

[165]

Mutny M, Krause A. Efficient high dimensional Bayesian optimization with additivity and quadrature fourier features. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2018, 9019–9030

[166]

Müller N, Glasmachers T. Challenges in high-dimensional reinforcement learning with evolution strategies. In: Proceedings of the 15th International Conference on Parallel Problem Solving from Nature. 2018, 411–423

[167]

Li Z, Zhang Q, Lin X, Zhen H L. Fast covariance matrix adaptation for large-scale black-box optimization. IEEE Transaction on Cybernetics, 2020, 50(5): 2073–2083

[168]

Wang H, Qian H, Yu Y. Noisy derivative-free optimization with value suppression. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 2018, 1447–1454

RIGHTS & PERMISSIONS

Higher Education Press

AI Summary AI Mindmap
PDF (699KB)

Supplementary files

Article highlights

1770

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/