Advanced persistent threat detection via mining long-term features in provenance graphs

Fan XU , Qinxin ZHAO , Xiaoxiao LIU , Nan WANG , Meiqi GAO , Xuezhi WEN , Dalin ZHANG

Front. Comput. Sci. ›› 2025, Vol. 19 ›› Issue (10) : 1910809

PDF (1362KB)
Front. Comput. Sci. ›› 2025, Vol. 19 ›› Issue (10) : 1910809 DOI: 10.1007/s11704-024-40610-8
Information Security
RESEARCH ARTICLE

Advanced persistent threat detection via mining long-term features in provenance graphs

Author information +
History +
PDF (1362KB)

Abstract

Advanced Persistent Threats (APTs) pose significant challenges to detect due to their “low-and-slow” attack patterns and frequent use of zero-day vulnerabilities. Within this task, the extraction of long-term features is often crucial. In this work, we propose a novel end-to-end APT detection framework named Long-Term Feature Association Provenance Graph Detector (LT-ProveGD). Specifically, LT-ProveGD encodes contextual information of the dynamic provenance graph while preserving the topological information with space efficiency. To combat “low-and-slow” attacks, LT-ProveGD develops an autoencoder with an integrated multi-head attention mechanism to extract long-term dependencies within the encoded representations. Furthermore, to facilitate the detection of previously unknown attacks, we leverage Jenks’ natural breaks methodology, enabling detection without relying on specific attack information. By conducting extensive experiments on five widely used datasets with state-of-the-art attack detection methods, we demonstrate the superior effectiveness of LT-ProveGD.

Graphical abstract

Keywords

advanced persistent threats / provenance graph / long-term features extraction

Cite this article

Download citation ▾
Fan XU, Qinxin ZHAO, Xiaoxiao LIU, Nan WANG, Meiqi GAO, Xuezhi WEN, Dalin ZHANG. Advanced persistent threat detection via mining long-term features in provenance graphs. Front. Comput. Sci., 2025, 19(10): 1910809 DOI:10.1007/s11704-024-40610-8

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Pasquier T F J M, Singh J, Eyers D, Bacon J . Camflow: managed data-sharing for cloud services. IEEE Transactions on Cloud Computing, 2017, 5( 3): 472–484

[2]

Xu F, Wang N, Wu H, Wen X, Zhao X, Wan H. Revisiting graph-based fraud detection in sight of heterophily and spectrum. In: Proceedings of the 38th AAAI Conference on Artificial Intelligence. 2024, 9214–9222

[3]

Stojanović B, Hofer-Schmitz K, Kleb U . Apt datasets and attack modeling for automated detection methods: a review. Computers & Security, 2020, 92: 101734

[4]

Hindy H, Atkinson R, Tachtatzis C, Colin J N, Bayne E, Bellekens X . Utilising deep learning techniques for effective zero-day attack detection. Electronics, 2020, 9( 10): 1684

[5]

Erlacher F, Dressler F . On high-speed flow-based intrusion detection using snort-compatible signatures. IEEE Transactions on Dependable and Secure Computing, 2022, 19( 1): 495–506

[6]

Li Z, Chen Q A, Yang R, Chen Y, Ruan W . Threat detection and investigation with system-level provenance graphs: a survey. Computers & Security, 2021, 106: 102282

[7]

Lv Y, Qin S, Zhu Z, Yu Z, Li S, Han W. A review of provenance graph based apt attack detection: applications and developments. In: Proceedings of the 7th IEEE International Conference on Data Science in Cyberspace. 2022, 498–505

[8]

Sterckx L, Demeester T, Deleu J, Develder C . Knowledge base population using semantic label propagation. Knowledge-Based Systems, 2016, 108: 79–91

[9]

Stitz H, Gratzl S, Piringer H, Zichner T, Streit M . KnowledgePearls: provenance-based visualization retrieval. IEEE Transactions on Visualization and Computer Graphics, 2019, 25( 1): 120–130

[10]

Church K W . Word2Vec. Natural Language Engineering, 2017, 23( 1): 155–162

[11]

Xu F, Wang N, Wu H, Wen X, Zhang D, Lu S, Li B, Gong W, Wan H, Zhao X. Gladformer: a mixed perspective for graph-level anomaly detection. In: Proceedings of European Conference on Machine Learning and Knowledge Discovery in Databases. 2024, 337–353

[12]

Kashyap V, Sheth A . Semantic and schematic similarities between database objects: a context-based approach. The VLDB Journal, 1996, 5( 4): 276–304

[13]

Milajerdi S M, Gjomemo R, Eshete B, Sekar R, Venkatakrishnan V N. HOLMES: real-time APT detection through correlation of suspicious information flows. In: Proceedings of 2019 IEEE Symposium on Security and Privacy. 2019, 1137–1152

[14]

Hassan W U, Guo S, Li D, Chen Z, Jee K, Li Z, Bates A. NoDoze: combatting threat alert fatigue with automated provenance triage. In: Proceedings of the 26th Annual Network and Distributed System Security Symposium. 2019

[15]

Hossain N, Milajerdi S M, Wang J, Eshete B, Gjomemo R, Sekar R, Stoller S D, Venkatakrishnan V N. SLEUTH: real-time attack scenario reconstruction from COTS audit data. In: Proceedings of the 26th USENIX Conference on Security Symposium. 2017, 487–504

[16]

Hossain N, Sheikhi S, Sekar R. Combating dependence explosion in forensic analysis using alternative tag propagation semantics. In: Proceedings of 2020 IEEE Symposium on Security and Privacy. 2020, 1139–1155

[17]

Milajerdi S M, Eshete B, Gjomemo R, Venkatakrishnan V N. POIROT: aligning attack behavior with kernel audit records for cyber threat hunting. In: Proceedings of 2019 ACM SIGSAC Conference on Computer and Communications Security. 2019, 1795–1812

[18]

Han X, Pasquier T F J M, Bates A, Mickens J, Seltzer M I. Unicorn: runtime provenance-based detector for advanced persistent threats. In: Proceedings of the 27th Annual Network and Distributed System Security Symposium. 2020

[19]

Liang R, Gao Y, Zhao X . Sequence feature extraction-based apt attack detection method with provenance graphs. Scientia Sinica Informationis, 2022, 52( 8): 1463–1480

[20]

Dey R, Salem F M. Gate-variants of gated recurrent unit (GRU) neural networks. In: Proceedings of the 60th IEEE International Midwest Symposium on Circuits and Systems. 2017, 1597–1600

[21]

Liu F, Wen Y, Zhang D, Jiang X, Xing X, Meng D. Log2vec: a heterogeneous graph embedding based approach for detecting cyber threats within enterprise. In: Proceedings of 2019 ACM SIGSAC Conference on Computer and Communications Security. 2019, 1777–1794

[22]

Xie Y, Feng D, Hu Y, Li Y, Sample S, Long D . Pagoda: a hybrid approach to enable efficient real-time provenance based intrusion detection in big data environments. IEEE Transactions on Dependable and Secure Computing, 2020, 17( 6): 1283–1296

[23]

Pascanu R, Mikolov T, Bengio Y. On the difficulty of training recurrent neural networks. In: Proceedings of the 30th International Conference on Machine Learning. 2013, 1310–1318

[24]

Koren Y, Bell R, Volinsky C . Matrix factorization techniques for recommender systems. Computer, 2009, 42( 8): 30–37

[25]

Shervashidze N, Schweitzer P, van Leeuwen E J, Mehlhorn K, Borgwardt K M . Weisfeiler-Lehman graph kernels. The Journal of Machine Learning Research, 2011, 12: 2539–2561

[26]

Nikolentzos G, Siglidis G, Vazirgiannis M . Graph kernels: a survey. Journal of Artificial Intelligence Research, 2021, 72: 943–1027

[27]

Kipf T N, Welling M. Semi-supervised classification with graph convolutional networks. In: Proceedings of the 5th International Conference on Learning Representations. 2017

[28]

Gu J, Wang Z, Kuen J, Ma L, Shahroudy A, Shuai B, Liu T, Wang X, Wang G, Cai J, Chen T . Recent advances in convolutional neural networks. Pattern Recognition, 2018, 77: 354–377

[29]

Kalman D . A singularly valuable decomposition: the SVD of a matrix. The College Mathematics Journal, 1996, 27( 1): 2–23

[30]

Roweis S. EM algorithms for PCA and SPCA. In: Proceedings of the 10th International Conference on Neural Information Processing Systems. 1997, 626–632

[31]

Perozzi B, Al-Rfou R, Skiena S. DeepWalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2014, 701–710

[32]

Grover A, Leskovec J. node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016, 855–864

[33]

Chen L, Asai K, Nonomura T, Xi G, Liu T . A review of backward-facing step (BFS) flow mechanisms, heat transfer and control. Thermal Science and Engineering Progress, 2018, 6: 194–216

[34]

Agarwal S, Sable A, Sawant D, Kahalekar S, Hanawal M K. Threat detection and response in Linux endpoints. In: Proceedings of the 14th International Conference on Communication Systems & Networks. 2022, 447–449

[35]

Ma S, Lee K H, Kim C H, Rhee J, Zhang X, Xu D. Accurate, low cost and instrumentation-free security audit logging for windows. In: Proceedings of the 31st Annual Computer Security Applications Conference. 2015, 401–410

[36]

Zhang F, Leach K, Wang H, Stavrou A. TrustLogin: securing password-login on commodity operating systems. In: Proceedings of the 10th ACM Symposium on Information, Computer and Communications Security. 2015, 333–344

[37]

Pohly D J, McLaughlin S, McDaniel P, Butler K. Hi-Fi: collecting high-fidelity whole-system provenance. In: Proceedings of the 28th Annual Computer Security Applications Conference. 2012, 259–268

[38]

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser Ł, Polosukhin I. Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, 6000–6010

[39]

Hou C, Xie Y, Zhang Z . An improved convolutional neural network based indoor localization by using Jenks natural breaks algorithm. China Communications, 2022, 19( 4): 291–301

[40]

Griffith J, Kong D, Caro A, Benyo B, Khoury J, Upthegrove T, Christovich T, Ponomorov S, Sydney A, Saini A, Shurbanov V, Willig C, Levin D, Dietz J. Scalable transparency architecture for research collaboration (STARC)-DARPA transparent computing (TC) program. Cambridge: Raytheon BBN Technologies Corp, 2020

RIGHTS & PERMISSIONS

Higher Education Press

AI Summary AI Mindmap
PDF (1362KB)

Supplementary files

Highlights

1520

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/