Parallel exploration via negatively correlated search

Peng YANG; Qi YANG; Ke TANG; Xin YAO

doi:10.1007/s11704-020-0431-0

PDF(876 KB)

Front. Comput. Sci. ›› 2021, Vol. 15 ›› Issue (5) : 155333. DOI: 10.1007/s11704-020-0431-0

RESEARCH ARTICLE

Parallel exploration via negatively correlated search

Author information +

History +

Abstract

Effective exploration is key to a successful search process. The recently proposed negatively correlated search (NCS) tries to achieve this by coordinated parallel exploration, where a set of search processes are driven to be negatively correlated so that different promising areas of the search space can be visited simultaneously. Despite successful applications of NCS, the negatively correlated search behaviors were mostly devised by intuition, while deeper (e.g., mathematical) understanding is missing. In this paper, a more principled NCS, namely NCNES, is presented, showing that the parallel exploration is equivalent to a process of seeking probabilistic models that both lead to solutions of high quality and are distant from previous obtained probabilistic models. Reinforcement learning, for which exploration is of particular importance, are considered for empirical assessment. The proposed NCNES is applied to directly train a deep convolution network with 1.7 million connection weights for playing Atari games. Empirical results show that the significant advantages of NCNES, especially on games with uncertain and delayed rewards, can be highly owed to the effective parallel exploration ability.

Keywords

evolutionary computation / reinforcement learning / exploration

Cite this article

EndNote

Ris (Procite)

Bibtex

Download citation ▾

Peng YANG, Qi YANG, Ke TANG, Xin YAO. Parallel exploration via negatively correlated search. Front. Comput. Sci., 2021, 15(5): 155333 https://doi.org/10.1007/s11704-020-0431-0

This is a preview of subscription content, contact us for subscripton.

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Tang K, Yang P, Yao X. Negatively correlated search. IEEE Journal on Selected Areas in Communications, 2016, 34(3): 542–550 CrossRef Google scholar

[2]	Back T. Evolutionary Algorithms in Theory and Practice: Evolution Strategies, Evolutionary Programming, Genetic Algorithms. Oxford University Press, 1996 CrossRef Google scholar

[3]	ˇ Crepinšek M, Liu S H, Mernik M. Exploration and exploitation in evolutionary algorithms: a survey. ACM Computing Surveys (CSUR), 2013, 45(3): 1–33 CrossRef Google scholar

[4]	Niu Q, Liu B, Jiang K, Wang H. An improved negatively correlated search inspired by Particle Swarm Optimization. Journal of Physics: Conference Series, IOP Publishing, 2019, 1267(1): 12–37 CrossRef Google scholar

[5]	Wang S, Yang X, Cai Z, Zou L, Gao S. An improved firefly algorithm enhanced by negatively correlated search mechanism. In: Proceedings of IEEE International Conference on Progress in Informatics and Computing. 2018, 67–72 CrossRef Google scholar

[6]	Chen H, Peng Q, Li X, Todo Y, Gao S. An efficient negative correlation gravitational search algorithm. In: Proceedings of IEEE International Conference on Progress in Informatics and Computing (PIC). 2018, 73–79 CrossRef Google scholar

[7]	Xu Z, Lei Z, Yang L, Li X, Gao S. Negative correlation learning enhanced search behavior in backtracking search optimization. In: Proceedings of the 10th International Conference on Intelligent Humanmachine Systems and Cybernetics. 2018, 310–314 CrossRef Google scholar

[8]	Yang P, Tang K, Lozano J A, Cao X. Path planning for single unmanned aerial vehicle by separately evolving waypoints. IEEE Transactions on Robotics, 2015, 31(5): 1130–1146 CrossRef Google scholar

[9]	Li G, Qian C, Jiang C, Lu X, Tang K. Optimization based layer-wise magnitude-based pruning for DNN compression. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence. 2018, 2383–2389 CrossRef Google scholar

[10]	Niu Q, Jiang K, Liu B. A novel binary negatively correlated search for wind farm layout optimization. In: Proceedings of IEEE Congress on Evolutionary Computation. 2019, 191–196 CrossRef Google scholar

[11]	Jiao D, Yang P, Fu L, Ke L, Tang K. Optimal energy-delay scheduling for energy-harvesting WSNs with interference channel via negatively correlated Search. IEEE Internet of Things Journal, 2020, 7(3): 1690–1703 CrossRef Google scholar

[12]	Lin Y, Liu H, Xie G, Zhang Y. Time series forecasting by evolving deep belief network with negative correlation search. In: Proceedings of Chinese Automation Congress. 2018, 3839–3843 CrossRef Google scholar

[13]	Wierstra D, Schaul T, Glasmachers T, Sun Y, Peters J, Schmidhuber J. Natural evolution strategies. The Journal ofMachine Learning Research, 2014, 15(1): 949–980

[14]	Zhang L, Tang K, Yao X. Explicit planning for efficient exploration in reinforcement learning. Advances in Neural Information Processing Systems, 2019, 32: 7488–7497

[15]	Chrabaszcz P, Loshchilov I, Hutter F. Back to basics: benchmarking canonical evolution strategies for playing atari. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence. 2018, 1419–1426 CrossRef Google scholar

[16]	He J, Yao X. From an individual to a population: an analysis of the first hitting time of population-based evolutionary algorithms. IEEE Transactions on Evolutionary Computation, 2002, 6(5): 495–511 CrossRef Google scholar

[17]	Liu Y, Yao X. Ensemble learning via negative correlation. Neural Networks, 1999, 12(10): 1399–1404 CrossRef Google scholar

[18]	Yao X, Liu Y, Lin G. Evolutionary programming made faster. IEEE Transactions on Evolutionary Computation, 1999, 3(2): 82–102 CrossRef Google scholar

[19]	Yang P, Tang K, Lu X. Improving estimation of distribution algorithm on multimodal problems by detecting promising areas. IEEE Transactions on Cybernetics, 2015, 45(8): 1438–1449 CrossRef Google scholar

[20]	Reynolds D A. Gaussian mixture models. Encyclopedia of Biometrics, 2009, 741: 659–663 CrossRef Google scholar

[21]	Schütze O, Coello C A C, Tantar A A, Tantar E, Bouvry P. EVOLVE—a Bridge Between Probability, Set Oriented Numerics and Evolutionary Computation. Springer Berlin Heidelberg, 2013 CrossRef Google scholar

[22]	Kailath T. The divergence and bhattacharyya distance measures in signal selection. IEEE Transactions on Communication Technology, 1967, 15(1): 52–60 CrossRef Google scholar

[23]	Lei Y, Yang P, Tang K, Zhou D X. Optimal stochastic and online learning with individual iterates. In: Proceedings of the 33rd Conference on Neural Information Processing Systems. 2019, 5392–5402

[24]	Yang P, Tang K, Yao X. Turning high-dimensional optimization into computationally expensive optimization. IEEE Transactions on Evolutionary Computation, 2018, 22(1): 143–156 CrossRef Google scholar

[25]	Yang P, Tang K, Yao X. A parallel divide-and-conquer-based evolutionary algorithm for large-scale optimization. IEEE Access, 2019, 7: 163105–163118 CrossRef Google scholar

[26]	Qian C. Distributed pareto optimization for large-scale noisy subset selection. IEEE Transactions on Evolutionary Computation, 2020, 24(4): 694707 CrossRef Google scholar

[27]	Hou N, He F, Zhou Y, Chen Y. An efficient GPU-based parallel tabu search algorithm for hardware/software co-design. Frontiers of Computer Science, 2020, 14(5): 145316 CrossRef Google scholar

[28]	Qian C, Yu Y, Tang K, Jin Y, Yao X, Zhou Z H. On the effectiveness of sampling for evolutionary optimization in noisy environments. Parallel Problem Solving from Nature PPSN XIII, 2014, 8672: 302–311 CrossRef Google scholar

[29]	Qian C, Yu Y, Zhou Z H. Analyzing evolutionary optimization in noisy environments. Evolutionary Computation, 2018, 26(1): 141 CrossRef Google scholar

[30]	Zhou Z H, Feng J. Deep forest: towards an alternative to deep neural networks. In: Proceedings of International Joint Conference on Artificial Intelligence. 2017, 3553–3559 CrossRef Google scholar

[31]	Oh J, Singh S, Lee H. Value prediction network. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, 6118–6128

[32]	Ha D, Schmidhuber J. Recurrent world models facilitate policy evolution. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems. 2018, 2450–2462

[33]	Zhong S, Liu Q, Zhang Z, Fu Q. Efficient reinforcement learning in continuous state and action spaces with Dyna and policy approximation. Frontiers of Computer Science, 2019, 13(1): 106126 CrossRef Google scholar

[34]	Mnih V, Kavukcuoglu K, Silver D, Rusu A A, Veness J, Bellemare MG, . Human-level control through deep reinforcement learning. Nature, 2015, 518(7540): 529–533 CrossRef Google scholar

[35]	Mnih V, Badia A P, Mirza M, Graves A, Lillicrap T, Harley T,. Asynchronous methods for deep reinforcement learning. In: Proceedings of International Conference on Machine Learning, 2016, 1928–1937

[36]	Arulkumaran K, Deisenroth M P, Brundage M, Bharath A A. Deep reinforcement learning: a brief survey. IEEE Signal Processing Magazine, 2017, 34(6): 26–38 CrossRef Google scholar

[37]	Qian H, Yu Y. Derivative-free reinforcement learning: a review. Frontiers of Computer Science. 2020, DOI:10.1007/s11704-020-0241-4

[38]	Tang H, Houthooft R, Foote D, Stooke A, Chen X, Duan Y, Schulman J, DeTurck F, Abbeel P. Exploration: a study of count-based exploration for deep reinforcement learning. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, 2753–2762

[39]	Raykar V, Agrawal P. Sequential crowdsourced labeling as an epsilongreedy exploration in a markov decision process. In: Proceedings of the 7th International Conference on Artificial Intelligence and Statistics. 2014, 832–840

[40]	Andrieu C, Freitas N D, Doucet A, Jordan M. An introduction toMCMC for machine learning. Machine Learning, 2003, 50(1–2): 5–43 CrossRef Google scholar

[41]	Plappert M, Houthooft R, Dhariwal P, Sidor S, Chen R, Chen X, Asfour T, Abbeel P, Andrychowicz M. Parameter space noise for exploration. In: Proceedings of International Conference on Machine learning. 2018

[42]	Pathak D, Agrawal P, Efros A A, Darrell T. Curiosity-driven exploration by self-supervised prediction. In: Proceedings of International Conference on Machine Learning. 2017, 2778–2787 CrossRef Google scholar

[43]	Lehman J, Stanley K O. Abandoning objectives: evolution through the search for novelty alone. Evolutionary Computation, 2011, 19(2): 189223 CrossRef Google scholar

[44]	Conti E, Madhavan V, Such F P, Lehman J, Stanley K, Clune J. Improving exploration in evolution strategies for deep reinforcement learning via a population of novelty-seeking agents. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems. 2018, 5027–5038