Behaviour-diverse automatic penetration testing: a coverage-based deep reinforcement learning approach
Yizhou YANG, Longde CHEN, Sha LIU, Lanning WANG, Haohuan FU, Xin LIU, Zuoning CHEN
Behaviour-diverse automatic penetration testing: a coverage-based deep reinforcement learning approach
Reinforcement Learning (RL) is gaining importance in automating penetration testing as it reduces human effort and increases reliability. Nonetheless, given the rapidly expanding scale of modern network infrastructure, the limited testing scale and monotonous strategies of existing RL-based automated penetration testing methods make them less effective in practical application. In this paper, we present CLAP (Coverage-Based Reinforcement Learning to Automate Penetration Testing), an RL penetration testing agent that provides comprehensive network security assessments with diverse adversary testing behaviours on a massive scale. CLAP employs a novel neural network, namely the coverage mechanism, to address the enormous and growing action spaces in large networks. It also utilizes a Chebyshev decomposition critic to identify various adversary strategies and strike a balance between them. Experimental results across various scenarios demonstrate that CLAP outperforms state-of-the-art methods, by further reducing attack operations by nearly 35%. CLAP also provides enhanced training efficiency and stability and can effectively perform pen-testing over large-scale networks with up to 500 hosts. Additionally, the proposed agent is also able to discover pareto-dominant strategies that are both diverse and effective in achieving multiple objectives.
network security / penetration testing / reinforcement learning / artificial intelligence
Yizhou Yang received his PhD degree from The Australian National University, Australia. He is currently an assistant researcher at Zhongguancun Laboratory, China. His research interests include deep learning applications, network security, and wireless communication systems
Longde Chen is currently an researcher at the National Research Centre of Parallel Computer Engineering and Technology, China. His research interests include parallel computing, large model software application and cybersecurity
Sha Liu is a doctor student at University of Science and Technology of China, China. He is currently an associate researcher at the National Research Centre of Parallel Computer Engineering and Technology, China. His research interests include parallel computing and large model software application
Lanning Wang is currently a professor at the Faculty of Geographical Science, Beijing Normal University, China. His research focuses on climate model and earth system model development, specifically in the areas of the dynamical core of the atmospheric model, model coupling technology, and high-performance computing technology for earth system models. He led the development of the Beijing Normal University – Earth System Model (BNU-ESM) and actively participated in the Couple Model Inter-comparison Project Phase 5 (CMIP5). He received the ACM Gordon Bell Prize in 2016
Haohuan Fu (Senior Member, IEEE) received the PhD degree in computing from Imperial College, London. He is currently a professor with the Ministry of Education Key Laboratory for Earth System Modeling, and the Department of Earth System Science, Tsinghua University, China. He is also the deputy director with National Supercomputing Center, China. His research interests include high-performance computing in earth and environmental sciences, computer architectures, performance optimizations, and programming tools in parallel computing. He was the recipient of ACM Gordon Bell Prize in 2016, 2017, and 2021 and Most Significant Paper Award by FPL in 2015
Xin Liu received the PhD degree from PLA Information Engineering University, China in 2006. She is currently a research fellow at National Research Centre of Parallel Computer Engineering and Technology, China. Her research interests include the areas of parallel algorithms and parallel application software
Zuoning Chen is currently a research fellow at the National Research Centre of Parallel Computer Engineering and Technology, China. Her research interests include operation system and system security
[1] |
Applebaum A, Miller D, Strom B, Korban C, Wolf R. Intelligent, automated red team emulation. In: Proceedings of the 32nd Annual Conference on Computer Security Applications. 2016
|
[2] |
Hoffmann J. Simulated penetration testing: from "dijkstra" to "turing test++". In: Proceedings of the 25th International Conference on Automated Planning and Scheduling. 2015
|
[3] |
Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hubert T, Baker L, Lai M, Bolton A, Chen Y, Lillicrap T, Hui F, Sifre L, van den Driessche G, Graepel T, Hassabis D . Mastering the game of Go without human knowledge. Nature, 2017, 550( 7676): 354–359
|
[4] |
Vinyals O, Babuschkin I, Czarnecki W M, Mathieu M, Dudzik A, Chung J, Choi D H, Powell R, Ewalds T, Georgiev P, Oh J, Horgan D, Kroiss M, Danihelka I, Huang A, Sifre L, Cai T, Agapiou J P, Jaderberg M, Vezhnevets A S, Leblond R, Pohlen T, Dalibard V, Budden D, Sulsky Y, Molloy J, Paine T L, Gulcehre C, Wang Z, Pfaff T, Wu Y, Ring R, Yogatama D, Wünsch D, Mckinney K, Smith O, Schaul T, Lillicrap T, Kavukcuoglu K, Hassabis D, Apps C, Silver D . Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature, 2019, 575( 7782): 350–354
|
[5] |
Laskin M, Lee K, Stooke A, Pinto L, Abbeel P, Srinivas A. Reinforcement learning with augmented data. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. 2020
|
[6] |
Hu Z, Beuran R, Tan Y. Automated penetration testing using deep reinforcement learning. In: Proceedings of 2020 IEEE European Symposium on Security and Privacy Workshops. 2020, 2−10
|
[7] |
Zhou S, Liu J, Hou D, Zhong X, Zhang Y . Autonomous penetration testing based on improved deep Q-network. Applied Sciences, 2021, 11( 19): 8823
|
[8] |
Tran K, Akella A, Standen M, Kim J, Bowman D, Richer T, Lin C T. Deep hierarchical reinforcement agents for automated penetration testing. 2021, arXiv preprint arXiv: 2109.06449
|
[9] |
Schwartz J, Kurniawati H. Autonomous penetration testing using reinforcement learning. 2019, arXiv preprint arXiv: 1905.05965
|
[10] |
Schwartz J, Kurniawatti H. NASim: network attack simulator. Networkattacksimulator.readthedocs.io/, 2019
|
[11] |
Baillie C, Standen M, Schwartz J, Docking M, Bowman D, Kim J. CybORG: an autonomous cyber operations research gym. 2020, arXiv preprint arXiv: 2002.10667
|
[12] |
Shmaryahu D, Shani G, Hoffmann J, Steinmetz M. Simulated penetration testing as contingent planning. In: Proceedings of the 28th International Conference on Automated Planning and Scheduling. 2018, 241−249
|
[13] |
Lallie H S, Debattista K, Bal J . A review of attack graph and attack tree visual syntax in cyber security. Computer Science Review, 2020, 35: 100219
|
[14] |
Erdődi L, Zennaro F M . The agent web model: modeling web hacking for reinforcement learning. International Journal of Information Security, 2022, 21( 2): 293–309
|
[15] |
Li Y, Yan J, Naili M. Deep reinforcement learning for penetration testing of cyber-physical attacks in the smart grid. In: Proceedings of 2022 International Joint Conference on Neural Networks. 2022, 1−9
|
[16] |
Gangupantulu R, Cody T, Rahma A, Redino C, Clark R, Park P. Crown jewels analysis using reinforcement learning with attack graphs. In: Proceedings of 2021 IEEE Symposium Series on Computational Intelligence. 2021, 1−6
|
[17] |
Ghanem M C, Chen T M . Reinforcement learning for efficient network penetration testing. Information, 2019, 11( 1): 6
|
[18] |
Zennaro F M, Erdődi L . Modelling penetration testing with reinforcement learning using capture-the-flag challenges: trade-offs between model-free learning and a priori knowledge. IET Information Security, 2023, 17( 3): 441–457
|
[19] |
Pathak D, Agrawal P, Efros A A, Darrell T. Curiosity-driven exploration by self-supervised prediction. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2017, 2778−2787
|
[20] |
Wang P, Liu J, Zhong X, Yang G, Zhou S, Zhang Y. DUSC-DQN: an improved deep Q-network for intelligent penetration testing path design. In: Proceedings of the 7th International Conference on Computer and Communication Systems. 2022, 476−480
|
[21] |
Vezhnevets A S, Osindero S, Schaul T, Heess N, Jaderberg M, Silver D, Kavukcuoglu K. Feudal networks for hierarchical reinforcement learning. In: Proceedings of the 34th International Conference on Machine Learning. 2017, 3540−3549
|
[22] |
Czarnecki W, Jayakumar S, Jaderberg M, Hasenclever L, Teh Y W, Heess N, Osindero S, Pascanu R. Mix & match agent curricula for reinforcement learning. In: Proceedings of the 35th International Conference on Machine Learning. 2018, 1087−1095
|
[23] |
Farquhar G, Gustafson L, Lin Z, Whiteson S, Usunier N, Synnaeve G. Growing action spaces. In: Proceedings of the 37th International Conference on Machine Learning. 2020, 285
|
[24] |
Murata T, Ishibuchi H, Gen M. Specification of genetic search directions in cellular multi-objective genetic algorithms. In: Proceedings of the 1st International Conference on Evolutionary Multi-Criterion Optimization. 2001, 82−95
|
[25] |
Deb K, Jain H . An evolutionary many-objective optimization algorithm using reference-point-based nondominated sorting approach, Part I: solving problems with box constraints. IEEE Transactions on Evolutionary Computation, 2014, 18( 4): 577–601
|
[26] |
Hsu C H, Chang S H, Liang J H, Chou H P, Liu C H, Chang S C, Pan J Y, Chen Y T, Wei W, Juan D C. MONAS: multi-objective neural architecture search using reinforcement learning. 2018, arXiv preprint arXiv: 1806.10332
|
[27] |
Mossalam H, Assael Y M, Roijers D M, Whiteson S. Multi-objective deep reinforcement learning. 2016, arXiv preprint arXiv: 1610.02707
|
[28] |
Jaderberg M, Czarnecki W M, Dunning I, Marris L, Lever G, Castañeda A G, Beattie C, Rabinowitz N C, Morcos A S, Ruderman A, Sonnerat N, Green T, Deason L, Leibo J Z, Silver D, Hassabis D, Kavukcuoglu K, Graepel T . Human-level performance in 3D multiplayer games with population-based reinforcement learning. Science, 2019, 364( 6443): 859–865
|
[29] |
Shen R, Zheng Y, Hao J, Meng Z, Chen Y, Fan C, Liu Y. Generating behavior-diverse game AIs with evolutionary multi-objective deep reinforcement learning. In: Proceedings of the 29th International Joint Conference on Artificial Intelligence. 2021, 3371−3377
|
[30] |
Strom B E, Applebaum A, Miller D P, Nickels K C, Pennington A G, Thomas C B. Mitre att&ck: Design and philosophy. Mitre Product MP, 2018
|
[31] |
Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O. Proximal policy optimization algorithms. 2017, arXiv preprint arXiv: 1707.06347
|
[32] |
Schulman J, Moritz P, Levine S, Jordan M I, Abbeel P. High-dimensional continuous control using generalized advantage estimation. In: Proceedings of the 4th International Conference on Learning Representations. 2016
|
[33] |
Burda Y, Edwards H, Storkey A J, Klimov O. Exploration by random network distillation. In: Proceedings of the 7th International Conference on Learning Representations. 2019
|
[34] |
Sutton R S, Barto A G. Reinforcement Learning: An Introduction. 2nd ed. Cambridge: MIT Press, 2018
|
[35] |
Roijers D M, Vamplew P, Whiteson S, Dazeley R . A survey of multi-objective sequential decision-making. Journal of Artificial Intelligence Research, 2013, 48: 67–113
|
[36] |
Oh I, Rho S, Moon S, Son S, Lee H, Chung J . Creating pro-level AI for a real-time fighting game using deep reinforcement learning. IEEE Transactions on Games, 2022, 14( 2): 212–220
|
[37] |
Agarwal R, Schwarzer M, Castro P S, Courville A C, Bellemare M. Deep reinforcement learning at the edge of the statistical precipice. In: Proceedings of the 35th International Conference on Neural Information Processing Systems. 2021, 29304−29320
|
[38] |
Standen M, Bowman D, Hoang S, Richer T, Lucas M, Van Tassel R. Cyber autonomy gym for experimentation challenge 1, 2021
|
[39] |
Katz F H . Breadth vs. depth: best practices teaching cybersecurity in a small public university sharing models. The Cyber Defense Review, 2018, 3( 2): 65–72
|
/
〈 | 〉 |