Human-machine interactive streaming anomaly detection by online self-adaptive forest
Qingyang LI, Zhiwen YU, Huang XU, Bin GUO
Human-machine interactive streaming anomaly detection by online self-adaptive forest
Anomaly detectors are used to distinguish differences between normal and abnormal data, which are usually implemented by evaluating and ranking the anomaly scores of each instance. A static unsupervised streaming anomaly detector is difficult to dynamically adjust anomaly score calculation. In real scenarios, anomaly detection often needs to be regulated by human feedback, which benefits adjusting anomaly detectors. In this paper, we propose a human-machine interactive streaming anomaly detection method, named ISPForest, which can be adaptively updated online under the guidance of human feedback. In particular, the feedback will be used to adjust the anomaly score calculation and structure of the detector, ideally attaining more accurate anomaly scores in the future. Our main contribution is to improve the tree-based streaming anomaly detection model that can be updated online from perspectives of anomaly score calculation and model structure. Our approach is instantiated for the powerful class of tree-based streaming anomaly detectors, and we conduct experiments on a range of benchmark datasets. The results demonstrate that the utility of incorporating feedback can improve the performance of anomaly detectors with a few human efforts.
anomaly detection / human-machine interaction / human feedback / random space tree / ensemble method
Qingyang Li received the bachelor’s degree from Northwestern Polytechnical University, China in 2016. She is currently a PhD student with the School of Computer Science, Northwestern Polytechnical University, China. Her research interests include ubiquitous computing, machine learning, and human-computer interaction
Zhiwen Yu received the PhD degree in computer science from Northwestern Polytechnical University, China in 2005. He is currently a Professor and the Dean of the School of Computer Science, Northwestern Polytechnical University, China. He was an Alexander Von Humboldt Fellow with Mannheim University, Germany and a Research Fellow with Kyoto University, Japan. His research interests include ubiquitous computing, HCI, and mobile sensing and computing
Huang Xu received the PhD degree in computer science from Northwestern Polytechnical University, China in 2019. His primary research interests include the area of data mining and ubiquitous computing. He has published in refereed conference proceedings, including ACM SIGKDD, IJCAI, and IEEE ICDM
Bin Guo received the PhD degree in computer science from Keio University, Japan in 2009, He was a Postdoctoral Researcher with the Institut TELECOM SudParis, France. He is currently a Professor with Northwestern Polytechnical University, China. His research interests include ubiquitous computing, mobile crowd sensing and computing, and HCI
[1] |
Hawkins D M. Identification of Outliers. London: Chapman and Hall, 1980
|
[2] |
Aggarwal C C. Outlier analysis. In: Aggarwal C C, ed. Data Mining. Cham: Springer, 2015, 237– 263
|
[3] |
Fiore U, De Santis A, Perla F, Zanetti P, Palmieri F . Using generative adversarial networks for improving classification effectiveness in credit card fraud detection. Information Sciences, 2019, 479: 448– 455
|
[4] |
Tseng V S, Ying J C, Huang C W, Kao Y, Chen K T. FrauDetector: a graph-mining-based framework for fraudulent phone call detection. In: Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2015, 2157– 2166
|
[5] |
Liu F T, Ting K M, Zhou Z H. Isolation forest. In: Proceedings of the 8th IEEE International Conference on Data Mining. 2008, 413– 422
|
[6] |
Yang X, Latecki L J, Pokrajac D. Outlier detection with globally optimal exemplar-based GMM. In: Proceedings of 2009 SIAM International Conference on Data Mining. 2009, 145– 154
|
[7] |
Zong B, Song Q, Min M R, Cheng W, Lumezanu C, Cho D K, Chen H F. Deep autoencoding Gaussian mixture model for unsupervised anomaly detection. In: Proceedings of the 6th International Conference on Learning Representations. 2018
|
[8] |
Manzoor E, Milajerdi S M, Akoglu L. Fast memory-efficient anomaly detection in streaming heterogeneous graphs. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016, 1035– 1044
|
[9] |
Paulheim H, Meusel R . A decomposition of the outlier detection problem into a set of supervised learning problems. Machine Learning, 2015, 100( 2): 509– 531
|
[10] |
Overby D, Wall J, Keyser J. Interactive analysis of situational awareness metrics. In: Proceedings of SPIE 8294 Visualization and Data Analysis 2012. 2012, 829406
|
[11] |
Cao N, Shi C, Lin S, Lu J, Lin Y R, Lin C Y . TargetVue: visual analysis of anomalous user behaviors in online communication systems. IEEE Transactions on Visualization and Computer Graphics, 2016, 22( 1): 280– 289
|
[12] |
Tan S C, Ting K M, Liu T F. Fast anomaly detection for streaming data. In: Proceedings of the 22nd International Joint Conference on Artificial Intelligence. 2011, 1511– 1516
|
[13] |
Wu K, Zhang K, Fan W, Edwards A, Yu P S. RS-Forest: a rapid density estimator for streaming anomaly detection. In: Proceedings of 2014 IEEE International Conference on Data Mining. 2014, 600– 609
|
[14] |
Pevný T . Loda: lightweight on-line detector of anomalies. Machine Learning, 2016, 102( 2): 275– 304
|
[15] |
Erfani S M, Rajasegarar S, Karunasekera S, Leckie C . High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning. Pattern Recognition, 2016, 58: 121– 134
|
[16] |
Zhang K, Hutter M, Jin H. A new local distance-based outlier detection approach for scattered real-world data. In: Proceedings of the 13th Pacific-Asia Conference on Knowledge Discovery and Data Mining. 2009, 813– 822
|
[17] |
Guha S, Mishra N, Roy G, Schrijvers O. Robust random cut forest based anomaly detection on streams. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning. 2016, 2712– 2721
|
[18] |
Mu X, Ting K M, Zhou Z H . Classification under streaming emerging new classes: a solution using completely-random trees. IEEE Transactions on Knowledge and Data Engineering, 2017, 29( 8): 1605– 1618
|
[19] |
Gomes H M, Bifet A, Read J, Barddal J P, Enembreck F, Pfharinger B, Holmes G, Abdessalem T . Adaptive random forests for evolving data stream classification. Machine Learning, 2017, 106( 9−10): 1469– 1495
|
[20] |
Ahmad S, Lavin A, Purdy S, Agha Z . Unsupervised real-time anomaly detection for streaming data. Neurocomputing, 2017, 262: 134– 147
|
[21] |
Malhotra P, Vig L, Shroff G, Agarwal P. Long short term memory networks for anomaly detection in time series. In: Proceedings of the 23rd European Symposium on Artificial Neural Networks. 2015, 89– 94
|
[22] |
Qiu J, Du Q, Qian C . KPI-TSAD: a time-series anomaly detector for KPI monitoring in cloud applications. Symmetry, 2019, 11( 11): 1350
|
[23] |
Munir M, Siddiqui S A, Dengel A, Ahmed S . DeepAnT: a deep learning approach for unsupervised anomaly detection in time series. IEEE Access, 2018, 7: 1991– 2005
|
[24] |
Dong Y, Japkowicz N . Threaded ensembles of autoencoders for stream learning. Computational Intelligence, 2018, 34( 1): 261– 281
|
[25] |
Veeramachaneni K, Arnaldo I, Korrapati V, Bassias C, Li K. AI2: training a big data machine to defend . In: Proceedings of the 2nd IEEE International Conference on Big Data Security on Cloud (BigDataSecurity), IEEE International Conference on High Performance and Smart Computing (HPSC), and IEEE International Conference on Intelligent Data and Security (IDS). 2016, 49– 54
|
[26] |
Das S, Wong W K, Fern A, Dietterich T G, Siddiqui M A. Incorporating feedback into tree-based anomaly detection. 2017, arXiv preprint arXiv: 1708.09441
|
[27] |
Das S, Wong W K, Dietterich T, Fern A, Emmott A. Incorporating expert feedback into active anomaly discovery. In: Proceedings of the 16th IEEE International Conference on Data Mining (ICDM). 2016, 853– 858
|
[28] |
Ting K M, Zhou G T, Liu F T, Tan J S C. Mass estimation and its applications. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2010, 989– 998
|
[29] |
Welford B P . Note on a method for calculating corrected sums of squares and products. Technometrics, 1962, 4( 3): 419– 420
|
[30] |
Bhatia S, Jain A, Li P, Kumar R, Hooi B. MStream: fast anomaly detection in multi-aspect streams. In: Proceedings of the Web Conference 2021. 2021, 3371– 3382
|
[31] |
Hand D J, Till R J . A simple generalisation of the area under the ROC curve for multiple class classification problems. Machine Learning, 2001, 45( 2): 171– 186
|
[32] |
Schölkopf B, Williamson R C, Smola A J, Shawe-Taylor J, Platt J C. Support vector method for novelty detection. In: Proceedings of the 12th International Conference on Neural Information Processing Systems. 1999, 582– 588
|
[33] |
Breunig M M, Kriegel H P, Ng R T, Sander J. LOF: identifying density-based local outliers. In: Proceedings of 2000 ACM SIGMOD International Conference on Management of Data. 2000, 93– 104
|
/
〈 | 〉 |