Performance issue monitoring, identification and diagnosis of SaaS software: a survey

Rui WANG; Xiangbo TIAN; Shi YING

doi:10.1007/s11704-023-2701-0

PDF(8477 KB)

Front. Comput. Sci. ›› 2025, Vol. 19 ›› Issue (1) : 191201. DOI: 10.1007/s11704-023-2701-0

Software

REVIEW ARTICLE

Performance issue monitoring, identification and diagnosis of SaaS software: a survey

Author information +

History +

Abstract

SaaS (Software-as-a-Service) is a service model provided by cloud computing. It has a high requirement for QoS (Quality of Software) due to its method of providing software service. However, manual identification and diagnosis for performance issues is typically expensive and laborious because of the complexity of the application software and the dynamic nature of the deployment environment. Recently, substantial research efforts have been devoted to automatically identifying and diagnosing performance issues of SaaS software. In this survey, we comprehensively review the different methods about automatically identifying and diagnosing performance issues of SaaS software. We divide them into three steps according to their function: performance log generation, performance issue identification and performance issue diagnosis. We then comprehensively review these methods by their development history. Meanwhile, we give our proposed solution for each step. Finally, the effectiveness of our proposed methods is shown by experiments.

Graphical abstract

Keywords

SaaS software / performance log generation / performance issue identification / performance issue diagnosis

Cite this article

EndNote

Ris (Procite)

Bibtex

Download citation ▾

Rui WANG, Xiangbo TIAN, Shi YING. Performance issue monitoring, identification and diagnosis of SaaS software: a survey. Front. Comput. Sci., 2025, 19(1): 191201 https://doi.org/10.1007/s11704-023-2701-0

Rui Wang is currently a lecturer with the College of Energy and Mining Engineering, Shandong University of Science and Technology, China. She received the PhD degree from Wuhan University, China in 2019. Her research interests include software engineering and data mining

Xiangbo Tian received the BE degree from the Shandong University of Science and Technology, China in 2018. He is currently working toward the PhD degree with the School of Computer Science, Wuhan University, China. His current research interests include service computing and microservices

Shi Ying is currently a professor in the School of Computer Science, Wuhan University, China. His main research interests include software engineering and artificial intelligence

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Chen Z, Kim M, Cui Y . SaaS application mashup based on high speed message processing. KSII Transactions on Internet and Information Systems (TIIS), 2022, 16( 5): 1446–1465

[2]	De León Guillén M Á D, Morales-Rocha V, Fernández Martínez L F . A systematic review of security threats and countermeasures in SaaS. Journal of Computer Security, 2020, 28( 6): 635–653

[3]	Soni D, Kumar N . Machine learning techniques in emerging cloud computing integrated paradigms: a survey and taxonomy. Journal of Network and Computer Applications, 2022, 205: 103419

[4]	Li W, Zhang Y, Guo Z, Liu L . Study on SaaS cloud service development for telecom operators. Telecommunications Science, 2012, 28( 1): 132–136

[5]	Ju J, Wang Y, Fu J, Wu J, Lin Z. Research on key technology in SaaS. In: Proceedings of 2010 International Conference on Intelligent Computing and Cognitive Informatics. 2010, 384−387

[6]	O’Dywer R, Neville S W. Assessing QoS consistency in cloud-based software-as-a-service deployments. In: Proceedings of 2017 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM). 2017, 1−6

[7]	He Q, Han J, Yang Y, Grundy J, Jin H. QoS-driven service selection for multi-tenant SaaS. In: Proceedings of the 5th IEEE International Conference on Cloud Computing. 2012, 566−573

[8]	Varshney S, Sandhu R, Gupta P K. QoS based resource provisioning in cloud computing environment: a technical survey. In: Proceedings of the 3rd International Conference on Advances in Computing and Data Sciences. 2019, 711−723

[9]	Park J, Jeong H Y . The QoS-based MCDM system for SaaS ERP applications with social network. The Journal of Supercomputing, 2013, 66( 2): 614–632

[10]	Luo H, Shyu M L . Quality of service provision in mobile multimedia-a survey. Human-centric Computing and Information Sciences, 2011, 1( 1): 5

[11]	Thain D, Tannenbaum T, Livny M. Distributed computing in practice: the condor experience. Concurrency and Computation: Practice and Experience, 2005, 17(2−4): 323−356

[12]	Berman F, Fox G, Hey A J G. Grid Computing: Making the Global Infrastructure A Reality. New York: John Wiley & Sons, 2003

[13]	Gao J, Pattabhiraman P, Bai X, Tsai W T. SaaS performance and scalability evaluation in clouds. In: Proceedings of the 6th International Symposium on Service Oriented System (SOSE). 2011, 61−71

[14]	Wang R, Ying S . SaaS software performance issue identification using HMRF-MAP framework. Software: Practice and Experience, 2018, 48( 11): 2000–2018

[15]	Munshi M, Shrimali T, Gaur S . A review of enhancing online learning using graph-based data mining techniques. Soft Computing, 2022, 26( 12): 5539–5552

[16]	Batool I, Khan T A . Software fault prediction using data mining, machine learning and deep learning techniques: a systematic literature review. Computers and Electrical Engineering, 2022, 100: 107886

[17]	El-Masri D, Petrillo F, Guéhéneuc Y G, Hamou-Lhadj A, Bouziane A . A systematic literature review on automated log abstraction techniques. Information and Software Technology, 2020, 122: 106276

[18]	Zhong Y, Guo Y, Liu C . FLP: a feature-based method for log parsing. Electronics Letters, 2018, 54( 23): 1334–1336

[19]	Zhang C, Meng X. Log parser with one-to-one markup. In: Proceedings of the 3rd International Conference on Information and Computer Technologies (ICICT). 2020, 251−257

[20]	Fang L, Di X, Liu X, Qin Y, Ren W, Ding Q. QuickLogS: a quick log parsing algorithm based on template similarity. In: Proceedings of the 20th IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom). 2021, 1085−1092

[21]	Zeng L, Xiao Y, Chen H, Sun B, Han W . Computer operating system logging and security issues: a survey. Security and Communication Networks, 2016, 9( 17): 4804–4821

[22]	Chen B, Jiang Z M . A survey of software log instrumentation. ACM Computing Surveys, 2022, 54( 4): 90

[23]	Behera A, Panigrahi C R, Pati B. Unstructured Log Analysis for System Anomaly Detection—A Study. Advances in Data Science and Management: Proceedings of ICDSM 2021. Singapore: Springer Nature Singapore, 2022, 497-509, 149−158

[24]	Fu Q, Lou J G, Lin Q, Ding R, Zhang D, Xie T. Contextual analysis of program logs for understanding system behaviors. In: Proceedings of the 10th Working Conference on Mining Software Repositories (MSR). 2013, 397−400

[25]	Clayman S, Galis A, Mamatas L. Monitoring virtual networks with lattice. In: Proceedings of 2010 IEEE/IFIP Network Operations and Management Symposium Workshops. 2010, 239−246

[26]	Yao K, De Pádua G B, Shang W, Sporea C, Toma A, Sajedi S . Log4perf: Suggesting and updating logging locations for web-based systems’ performance monitoring. Empirical Software Engineering, 2020, 25( 1): 488–531

[27]	Rong G, Zhang Q, Liu X, Gu S. A systematic review of logging practice in software engineering. In: Proceedings of the 24th Asia-Pacific Software Engineering Conference (APSEC). 2017, 534−539

[28]	He S, He P, Chen Z, Yang T, Su Y, Lyu M R . A survey on automated log analysis for reliability engineering. ACM Computing Surveys, 2022, 54( 6): 130

[29]	Gujral H, Lal S, Li H . An exploratory semantic analysis of logging questions. Journal of Software: Evolution and Process, 2021, 33( 7): e2361

[30]	Schwarz C . Ldagibbs: A command for topic modeling in Stata using latent dirichlet allocation. The Stata Journal: Promoting Communications on Statistics and Stata, 2018, 18( 1): 101–117

[31]	Joung J, Kim H M . Automated keyword filtering in latent dirichlet allocation for identifying product attributes from online reviews. Journal of Mechanical Design, 2021, 143( 8): 084501

[32]	Li H, Liu J, Zhang S. Hierarchical latent dirichlet allocation models for realistic action recognition. In: Proceedings of 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2011, 1297−1300

[33]	Fu J, Liu N, Hu C, Zhang X . Hot topic classification of microblogging based on cascaded latent dirichlet allocation. ICIC Express Letters, Part B: Applications, 2016, 7( 3): 621–625

[34]	Wu J, Son G, Wang S . A competency mining method based on latent dirichlet allocation (LDA) model. Journal of Physics: Conference Series, 2020, 1682: 012059

[35]	Liu Y, Jin Z. A text classification model constructed by latent dirichlet allocation and deep learning. In: Proceedings of the 4th International Conference on Mechatronics, Materials, Chemistry and Computer Engineering. 2015

[36]	Rus V, Niraula N, Banjade R. Similarity measures based on latent dirichlet allocation. In: Proceedings of the 14th International Conference on Computational Linguistics and Intelligent Text Processing. 2013, 459−470

[37]	Yuan D, Park S, Huang P, Liu Y, Lee M M, Tang X, Zhou Y, Savage S. Be conservative: enhancing failure diagnosis with proactive logging. In: Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation. 2012, 293−306

[38]	Fu Q, Zhu J, Hu W, Lou J G, Ding R, Lin Q, Zhang D, Xie T. Where do developers log? An empirical study on logging practices in industry. In: Proceedings of the 36th International Conference on Software Engineering. 2014, 24−33

[39]	Zhu J, He P, Fu Q, Zhang H, Lyu M R, Zhang D. Learning to log: helping developers make informed logging decisions. In: Proceedings of the 37th IEEE/ACM International Conference on Software Engineering. 2015, 415−425

[40]	Li Z. Studying and suggesting logging locations in code blocks. In: Proceedings of the 42nd ACM/IEEE International Conference on Software Engineering: Companion Proceedings. 2020, 125−127

[41]	Gholamian S. Leveraging code clones and natural language processing for log statement prediction. In: Proceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). 2021, 1043−1047

[42]	Cinque M, Cotroneo D, Pecchia A . Event logs for the analysis of software failures: a rule-based approach. IEEE Transactions on Software Engineering, 2013, 39( 6): 806–821

[43]	Li S, Niu X, Jia Z, Liao X, Wang J, Li T . Guiding log revisions by learning from software evolution history. Empirical Software Engineering, 2020, 25( 3): 2302–2340

[44]	Zhang H, Tang Y, Lamothe M, Li H, Shang W . Studying logging practice in test code. Empirical Software Engineering, 2022, 27( 4): 83

[45]	Zadrozny P, Kodali R. Big Data Analytics Using Splunk: Deriving Operational Intelligence from Social Media, Machine Data, Existing Data Warehouses, and Other Real-Time Streaming Sources. Berkeley: Apress, 2013

[46]	Patel H A, Meniya A D . A survey on commercial and open source cloud monitoring. International Journal of Science and Modern Engineering (IJISME), 2013, 1( 2): 42–44

[47]	George L. HBase: The Definitive Guide: Random Access to Your Planet-Size Data. Sebastopol: O’Reilly Media, Inc., 2011

[48]	Serrano D, Han D, Stroulia E. From relations to multi-dimensional maps: towards an SQL-to-HBase transformation methodology. In: Proceedings of the 8th IEEE International Conference on Cloud Computing. 2015, 81−89

[49]	Bhupathiraju V, Ravuri R P. The dawn of big data-Hbase. In: Proceedings of 2014 Conference on IT in Business, Industry and Government (CSIBIG). 2014, 1−4

[50]	Saloustros G, Magoutis K. Rethinking Hbase: design and implementation of an elastic key-value store over log-structured local volumes. In: Proceedings the 14th International Symposium on Parallel and Distributed Computing. 2015, 225−234

[51]	Zhang C, Liu X. HBaseMQ: a distributed message queuing system on clouds with HBase. In: Proceedings of 2013 Proceedings IEEE INFOCOM. 2013, 40−44

[52]

Hou Y, Yuan S, Xu W, Wei D. Transformation of an E-R model into HBase tables: a data store design for IHE-XDS document registry. In: Proceedings of the 12th IEEE International Conference on Ubiquitous Intelligence and Computing and the 12th IEEE International Conference on Autonomic and Trusted Computing and the 15th IEEE International Conference on Scalable Computing and Communications and Its Associated Workshops (UIC-ATC-ScalCom). 2015, 1809−1812

[53]	Bao X, Liu L, Xiao N, Liu F, Zhang Q, Zhu T. HConfig: resource adaptive fast bulk loading in HBase. In: Proceedings of the 10th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing. 2014, 215−224

[54]	Giblin C, Rooney S, Vetsch P, Preston A. Securing Kafka with encryption-at-rest. In: Proceedings of 2021 IEEE International Conference on Big Data (Big Data). 2021, 5378−5387

[55]	Wang Z, Dai W, Wang F, Deng H, Wei S, Zhang X, Liang B. Kafka and its using in high-throughput and reliable message distribution. In: Proceedings of the 8th International Conference on Intelligent Networks and Intelligent Systems (ICINIS). 2015, 117−120

[56]	Wu H. Research proposal: reliability evaluation of the apache Kafka streaming system. In: Proceedings of 2019 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW). 2019, 112−113

[57]	Zhang H, Fang L, Jiang K, Zhang W, Li M, Zhou L. Secure door on cloud: a secure data transmission scheme to protect Kafka’s data. In: Proceedings of the 26th IEEE International Conference on Parallel and Distributed Systems (ICPADS). 2020, 406−413

[58]	Tsai W, Bai X, Huang Y . Software-as-a-service (SaaS): perspectives and challenges. Science China Information Sciences, 2014, 57( 5): 1–15

[59]	Liu D, Pei D, Zhao Y. Application-aware latency monitoring for cloud tenants via CloudWatch+. In: Proceedings of the 10th International Conference on Network and Service Management (CNSM) and Workshop. 2014, 73−81

[60]	Stephen A, Benedict S, Kumar R P A . Monitoring IaaS using various cloud monitors. Cluster Computing, 2019, 22( 5): 12459–12471

[61]	Da Silva Rocha É, Da Silva L G F, Santos G L, Bezerra D, Moreira A, Gonçalves G, Marquezini M V, Mehta A, Wildeman M, Kelner J, Sadok D, Endo P T . Aggregating data center measurements for availability analysis. Software: Practice and Experience, 2021, 51( 5): 868–892

[62]	Tasquier L, Venticinque S, Aversa R, Di Martino B. Agent based application tools for cloud provisioning and management. In: Proceedings of the 3rd International Conference on Cloud Computing. 2012, 32−42

[63]	De Chaves S A, Uriarte R B, Westphall C B . Toward an architecture for monitoring private clouds. IEEE Communications Magazine, 2011, 49( 12): 130–137

[64]	Massie M L, Chun B N, Culler D E . The ganglia distributed monitoring system: design, implementation, and experience. Parallel Computing, 2004, 30( 7): 817–840

[65]	Nagios X. The industry standard in it infrastructure monitoring. See Logon-int.com/nagios/ website, 2011

[66]	Mardiyono A, Sholihah W, Hakim F. Mobile-based network monitoring system using Zabbix and telegram. In: Proceedings of the 3rd International Conference on Computer and Informatics Engineering (IC2IE). 2020, 473−477

[67]	Andreozzi S, De Bortoli N, Fantinel S, Ghiselli A, Rubini G L, Tortone G, Vistoli M C . GridiCE: a monitoring service for grid systems. Future Generation Computer Systems, 2005, 21( 4): 559–571

[68]	König B, Calero J M A, Kirschnick J . Elastic monitoring framework for cloud infrastructures. IET Communications, 2012, 6( 10): 1306–1315

[69]	Povedano-Molina J, Lopez-Vega J M, Lopez-Soler J M, Corradi A, Foschini L . DARGOS: a highly adaptable and scalable monitoring architecture for multi-tenant clouds. Future Generation Computer Systems, 2013, 29( 8): 2041–2056

[70]	Meng S, Liu L . Enhanced monitoring-as-a-service for effective cloud management. IEEE Transactions on Computers, 2013, 62( 9): 1705–1720

[71]	Calero J M A, Aguado J G . MonPaaS: An adaptive monitoring platformas a service for cloud computing infrastructures and services. IEEE Transactions on Services Computing, 2015, 8( 1): 65–78

[72]	Alhamazani K, Ranjan R, Jayaraman P P, Mitra K, Liu C, Rabhi F, Georgakopoulos D, Wang L . Cross-layer multi-cloud real-time application QoS monitoring and benchmarking as-a-service framework. IEEE Transactions on Cloud Computing, 2019, 7( 1): 48–61

[73]	Wang H, Zhang X, Ma Z, Li L, Gao J. An microservices-based openstack monitoring system. In: Proceedings of the 11th International Conference on Educational and Information Technology (ICEIT). 2022, 232−236

[74]	Badshah A, Jalal A, Farooq U, Rehman G U, Band S S, Iwendi C . Service level agreement monitoring as a service: an independent monitoring service for service level agreements in clouds. Big Data, 2023, 11( 5): 339–354

[75]	Mezni H, Sellami M, Aridhi S, Charrada F B . Towards big services: a synergy between service computing and parallel programming. Computing, 2021, 103( 11): 2479–2519

[76]	Mezni H . Web service adaptation: a decade’s overview. Computer Science Review, 2023, 48: 100535

[77]	Kumar R, Jain K, Maharwal H, Jain N, Dadhich A . Apache CloudStack: open source infrastructure as a service cloud computing platform. International Journal of Advancement in Engineering Technology, Management & Applied Science, 2014, 1( 2): 111–116

[78]	Schwartz B, Zaitsev P, Tkachenko V. High Performance MySQL: Optimization, Backups, and Replication. Sebastopol: O’Reilly Media, Inc., 2012

[79]	Sun W, Zhang X, Guo C J, Sun P, Su H. Software as a service: configuration and customization perspectives. In: Proceedings of 2008 IEEE Congress on Services Part II (Services-2 2008). 2008, 18−25

[80]	Lan Z, Zheng Z, Li Y . Toward automated anomaly identification in large-scale systems. IEEE Transactions on Parallel and Distributed Systems, 2010, 21( 2): 174–187

[81]	Yu L, Lan Z . A scalable, non-parametric method for detecting performance anomaly in large scale computing. IEEE Transactions on Parallel and Distributed Systems, 2016, 27( 7): 1902–1914

[82]	Odyurt U, Meyer H, Pimentel A D, Paradas E, Alonso I G. Software passports for automated performance anomaly detection of cyber-physical systems. In: Proceedings of the 19th International Conference on Embedded Computer Systems. 2019, 255−268

[83]	Wang R, Ying S . SaaS software performance issue diagnosis using independent component analysis and restricted boltzmann machine. Concurrency and Computation: Practice and Experience, 2020, 32( 14): e5729

[84]	Zhao N, Han B, Cai Y, Su J. SeqAD: an unsupervised and sequential autoencoder ensembles based anomaly detection framework for KPI. In: Proceedings of the 29th IEEE/ACM International Symposium on Quality of Service (IWQOS). 2021, 1−6

[85]	Chaturvedi A. Method and system for near real time reduction of insignificant key performance indicator data in a heterogeneous radio access and core network. In: Proceedings of 2020 IEEE Wireless Communications and Networking Conference Workshops (WCNCW). 2020, 1−7

[86]	Kusrini E, Safitri K N, Fole A. Design key performance indicator for distribution sustainable supply chain management. In: Proceedings of 2020 International Conference on Decision Aid Sciences and Application (DASA). 2020, 738−744

[87]	Hinderks A, Schrepp M, Mayo F J D, Escalona M J, Thomaschewski J . Developing a UX KPI based on the user experience questionnaire. Computer Standards & Interfaces, 2019, 65: 38–44

[88]	Fotrousi F, Fricker S A, Fiedler M, Le-Gall F. KPIs for software ecosystems: a systematic mapping study. In: Proceedings of the 5th International Conference of Software Business. 2014, 194−211

[89]	Zhang S, Zhao C, Sui Y, Su Y, Sun Y, Zhang Y, Pei D, Wang Y. Robust KPI anomaly detection for large-scale software services with partial labels. In: Proceedings of the 32nd IEEE International Symposium on Software Reliability Engineering (ISSRE). 2021, 103−114

[90]	Jiang Y, Haihong E, Song M, Zhang K. Research and application of newborn defects prediction based on spark and PU-learning. In: Proceedings of the 5th IEEE International Conference on Cloud Computing and Intelligence Systems (CCIS). 2018, 657−663

[91]	Shu S, Lin Z, Yan Y, Li L. Learning from multi-class positive and unlabeled data. In: Proceedings of 2020 IEEE International Conference on Data Mining (ICDM). 2020, 1256−1261

[92]	Chen X, Chen W, Chen T, Yuan Y, Gong C, Chen K, Wang Z. Self-PU: Self boosted and calibrated positive-unlabeled training. In: Proceedings of the 37th International Conference on Machine Learning. 2020, 141

[93]	Han K, Chen W, Xu M. Investigating active positive-unlabeled learning with deep networks. In: Proceedings of the 34th Australasian Joint Conference on Advances in Artificial Intelligence. 2022, 607−618

[94]	Hu W, Le R, Liu B, , Ji F, Ma J, Zhao D, Yan R. Predictive adversarial learning from positive and unlabeled data. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2021, 7806-7814

[95]	Qiu J, Cai X, Zhang X, Cheng F, Yuan S, Fu G . An evolutionary multi-objective approach to learn from positive and unlabeled data. Applied Soft Computing, 2021, 101: 106986

[96]	Gong C, Liu T, Yang J, Tao D . Large-margin label-calibrated support vector machines for positive and unlabeled learning. IEEE Transactions on Neural Networks and Learning Systems, 2019, 30( 11): 3471–3483

[97]	He P, Zhu J, Zheng Z, Lyu M R. Drain: an online log parsing approach with fixed depth tree. In: Proceedings of 2017 IEEE International Conference on Web Services (ICWS). 2017, 33−40

[98]	Plaat A, Schaeffer J, Pijls W, De Bruin A . Best-first fixed-depth minimax algorithms. Artificial Intelligence, 1996, 87( 1-2): 255–293

[99]	Du M, Li F . Spell: online streaming parsing of large unstructured system logs. IEEE Transactions on Knowledge and Data Engineering, 2019, 31( 11): 2213–2227

[100]

Wang B, Yang X, Li J. Locating longest common subsequences with limited penalty. In: Proceedings of the 22nd International Conference on Database Systems for Advanced Applications. 2017, 187−201

[101]

Weems B P, Bai Y. Finding longest common increasing subsequence for two different scenarios of non-random input sequences. In: Proceedings of 2005 International Conference on Foundations of Computer Science. 2005, 64−72

[102]

Meng W, Liu Y, Zaiter F, Zhang S, Chen Y, Zhang Y, Zhu Y, Wang E, Zhang R, Tao S, Yang D, Zhou R, Pei D. LogParse: making log parsing adaptive through word classification. In: Proceedings of the 29th International Conference on Computer Communications and Networks (ICCCN). 2020, 1−9

[103]

Vervaet A, Chiky R, Callau-Zori M. USTEP: unfixed search tree for efficient log parsing. In: Proceedings of 2021 IEEE International Conference on Data Mining (ICDM). 2021, 659−668

[104]

Chakrabarti A, Striegel A, Manimaran G. A case for tree evolution in QoS multicasting. In: Proceedings of the 10th IEEE International Workshop on Quality of Service (Cat. No.02EX564). 2002, 116−125

[105]

Li K. A random-walk-based dynamic tree evolution algorithm with exponential speed of convergence to optimality on regular networks. In: Proceedings of the 4th International Conference on Frontier of Computer Science and Technology. 2009, 80−85

[106]

Tomer A, Schach S R. The evolution tree: a maintenance-oriented software development model. In: Proceedings of the 4th European Conference on Software Maintenance and Reengineering. 2000, 209−214

[107]

Dai H, Li H, Chen C S, Shang W, Chen T H . Logram: efficient log parsing using n-gram dictionaries. IEEE Transactions on Software Engineering, 2022, 48( 3): 879–892

[108]

Fu Q, Lou J G, Wang Y, Li J. Execution anomaly detection in distributed systems through unstructured log analysis. In: Proceedings of the 9th IEEE International Conference on Data Mining. 2009, 149−158

[109]

Xu W, Huang L, Fox A, Patterson D, Jordan M I. Detecting large-scale system problems by mining console logs. In: Proceedings of the 22nd ACM SIGOPS Symposium on Operating Systems Principles. 2009, 117−132

[110]

Nedelkoski S, Cardoso J, Kao O. Anomaly detection from system tracing data using multimodal deep learning. In: Proceedings of the 12th IEEE International Conference on Cloud Computing (CLOUD). 2019, 179−186

[111]

Geiger A, Liu D, Alnegheimish S, Cuesta-Infante A, Veeramachaneni K. TadGAN: Time series anomaly detection using generative adversarial networks. In: Proceedings of 2020 IEEE International Conference on Big Data (Big Data). 2020, 33−43

[112]

Luo W, Wang P, Wang J, An W . The research process of generative adversarial networks. Journal of Physics: Conference Series, 2019, 1176( 3): 032008

[113]

Tran N T, Tran V H, Nguyen N B, Nguyen T K, Cheung N M . On data augmentation for GAN training. IEEE Transactions on Image Processing, 2021, 30: 1882–1897

[114]

Liu Z, Sabar N, Song A. Improving evolutionary generative adversarial networks. In: Proceedings of the 34th Australasian Joint Conference on Artificial Intelligence. 2022, 691−702

[115]

Sinha R, Sankaran A, Vatsa M, Singh R. AuthorGAN: Improving GAN reproducibility using a modular GAN framework. 2019, arXiv preprint arXiv: 1911.13250

[116]

Xia W, Zhang Y, Yang Y, Xue J H, Zhou B, Yang M H . GAN inversion: a survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45( 3): 3121–3138

[117]

Wang X, Cao Q, Wang Q, Cao Z, Zhang X, Wang P . Robust log anomaly detection based on contrastive learning and multi-scale mass. The Journal of Supercomputing, 2022, 78( 16): 17491–17512

[118]

Zhang Z, Wu S, Jiang D, Chen G. BERT-JAM: boosting BERT-enhanced neural machine translation with joint attention. 2020, arXiv preprint arXiv: 2011.04266

[119]

Trang N T M, Shcherbakov M. Vietnamese question answering system from multilingual BERT models to monolingual BERT model. In: Proceedings of the 9th International Conference System Modeling and Advancement in Research Trends (SMART). 2020, 201−206

[120]

Shi L, Liu D, Liu G, Meng K. AUG-BERT: an efficient data augmentation algorithm for text classification. In: Proceedings of the 8th International Conference in Communications, Signal Processing, and Systems. 2020, 2191−2198

[121]

Praechanya N, Sornil O. Improving Thai named entity recognition performance using BERT transformer on deep networks. In: Proceedings of the 6th International Conference on Machine Learning Technologies. 2021, 177−183

[122]

Yang L, Chen J, Wang Z, Wang W, Jiang J, Dong X, Zhang W. Semi-supervised log-based anomaly detection via probabilistic label estimation. In: Proceedings of the 43rd IEEE/ACM International Conference on Software Engineering (ICSE). 2021, 1448−1460

[123]

Farzad A, Gulliver T A . Log message anomaly detection with fuzzy c-means and MLP. Applied Intelligence, 2022, 52( 15): 17708–17717

[124]

Zhang C, Peng X, Sha C, Zhang K, Fu Z, Wu X, Lin Q, Zhang D. DeepTraLog: Trace-log combined microservice anomaly detection through graph-based deep learning. In: Proceedings of the 44th International Conference on Software Engineering. 2022, 623−634

[125]

Zhang C, Peng X, Zhou T, Sha C, Yan Z, Chen Y, Yang H. TraceCRL: contrastive representation learning for microservice trace analysis. In: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 2022, 1221−1232

[126]

Aguilera M K, Mogul J C, Wiener J L, Reynolds P, Muthitacharoen A . Performance debugging for distributed systems of black boxes. ACM SIGOPS Operating Systems Review, 2003, 37( 5): 74–89

[127]

Chen Y Y M. Path-Based Failure and Evolution Management. Berkeley: University of California at Berkeley, 2004

[128]

Barham P, Donnelly A, Isaacs R, Mortier R. Using magpie for request extraction and workload modelling. In: Proceedings of the 6th Symposium on Operating System Design & Implementation. 2004, 18

[129]

Chen H, Jiang G, Ungureanu C, Yoshihira K. Failure detection and localization in component based systems by online tracking. In: Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining. 2005, 750−755

[130]

Lim M H, Lou J G, Zhang H, Fu Q, Teoh A B J, Lin Q, Ding R, Zhang D. Identifying recurrent and unknown performance issues. In: Proceedings of 2014 IEEE International Conference on Data Mining. 2014, 320−329

[131]

Fischer A, Igel C . Training restricted Boltzmann machines: an introduction. Pattern Recognition, 2014, 47( 1): 25–39

[132]

Carreira-Perpinan M Á, Hinton G E. On contrastive divergence learning. In: Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics. 2005, 33−40

[133]

Liu P, Xu H, Ouyang Q, Jiao R, Chen Z, Zhang S, Yang J, Mo L, Zeng J, Xue W, Pei D. Unsupervised detection of microservice trace anomalies through service-level deep Bayesian networks. In: Proceedings of the 31st IEEE International Symposium on Software Reliability Engineering (ISSRE). 2020, 48−58

[134]

Kohyarnejadfard I, Aloise D, Dagenais M R, Shakeri M . A framework for detecting system performance anomalies using tracing data analysis. Entropy, 2021, 23( 8): 1011

[135]

Cai Y, Han B, Su J, Wang X. TraceModel: an automatic anomaly detection and root cause localization framework for microservice systems. In: Proceedings of the 17th International Conference on Mobility, Sensing and Networking (MSN). 2021, 512−519

[136]

Li M, Tang D, Wen Z, Cheng Y. Microservice anomaly detection based on tracing data using semi-supervised learning. In: Proceedings of the 4th International Conference on Artificial Intelligence and Big Data (ICAIBD). 2021, 38−44

[137]

Liu J, Hu Y, Wu B, Wang Y, Xie F . A hybrid generalized hidden markov model-based condition monitoring approach for rolling bearings. Sensors, 2017, 17( 5): 1143

[138]

Wang R, Ying S, Sun C, Wan H, Zhang H, Jia X. Model construction and data management of running log in supporting saas software performance analysis. In: Proceedings of the 29th International Conference on Software Engineering and Knowledge Engineering (SEKE 2017). 2017, 149−154

[139]

Fu X, Ren R, Zhan J, Zhou W, Jia Z, Lu G. LogMaster: mining event correlations in logs of large-scale cluster systems. In: Proceedings of the 31st IEEE Symposium on Reliable Distributed Systems. 2012, 71−80

[140]

Zou D, Qin H, Jin H, Qiang W, Han Z, Chen X. Improving log-based fault diagnosis by log classification. In: Proceedings of the 11th IFIP International Conference on Network and Parallel Computing. 2014, 446−458

[141]

Guo X, Peng X, Wang H, Li W, Jiang H, Ding D, Xie T, Su L. Graph-based trace analysis for microservice architecture understanding and problem diagnosis. In: Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 2020, 1387−1397

[142]

Huo Y, Dong J, Ge Z, Xie P, An N, Yang Y. IWApriori: an association rule mining and self-updating method based on weighted increment. In: Proceedings of the 21st Asia-Pacific Network Operations and Management Symposium (APNOMS). 2020, 167−172

[143]

Wang L, Zhao N, Chen J, Li P, Zhang W, Sui K. Root-cause metric location for microservice systems via log anomaly detection. In: Proceedings of 2020 IEEE International Conference on Web Services (ICWS). 2020, 142−150

[144]

Liu D, He C, Peng X, Lin F, Zhang C, Gong S, Li Z, Ou J, Wu Z. MicroHECL: High-efficient root cause localization in large-scale microservice systems. In: Proceedings of the 43rd IEEE/ACM International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). 2021, 338−347

[145]

Gan Y, Liang M, Dev S, Lo D, Delimitrou C. Sage: practical and scalable ML-driven performance debugging in microservices. In: Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. 2021, 135−151

[146]

Ma M, Lin W, Pan D, Wang P . ServiceRank: root cause identification of anomaly in large-scale microservice architectures. IEEE Transactions on Dependable and Secure Computing, 2022, 19( 5): 3087–3100

[147]

Li M, Li Z, Yin K, Nie X, Zhang W, Sui K, Pei D. Causal inference-based root cause analysis for online service systems with intervention recognition. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2022, 3230−3240

[148]

Zhao X, Zhang Y, Lion D, Ullah M F, Luo Y, Yuan D, Stumm M. lprof: a non-intrusive request ﬂow profiler for distributed systems. In: Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation. 2014, 629−644

[149]

Bare K, Kavulya S P, Tan J, Pan X, Marinelli E, Kasick M, Gandhi R, Narasimhan P. ASDF: an automated, online framework for diagnosing performance problems. In: Casimiro A, Lemos R, Gacek C, eds. Architecting Dependable Systems VII. Berlin: Springer, 2010, 201−226

[150]

Attariyan M, Chow M, Flinn J. X-ray: automating root-cause diagnosis of performance anomalies in production software. In: Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation. 2012, 307−320

[151]

Malik H, Hemmati H, Hassan A E. Automatic detection of performance deviations in the load testing of large scale systems. In: Proceedings of the 35th International Conference on Software Engineering (ICSE). 2013, 1012−1021

[152]

Tuncer O, Ates E, Zhang Y, Turk A, Brandt J, Leung V J, Egele M, Coskun A K . Online diagnosis of performance variation in hpc systems using machine learning. IEEE Transactions on Parallel and Distributed Systems, 2019, 30( 4): 883–896

[153]

Li M, Tang D, Wen Z, Cheng Y. Universal anomaly detection method based on massive monitoring indicators of cloud platform. In: Proceedings of 2021 IEEE International Conference on Software Engineering and Artifcial Intelligence (SEAI). 2021, 23−29

[154]

Borghesi A, Molan M, Milano M, Bartolini A . Anomaly detection and anticipation in high performance computing systems. IEEE Transactions on Parallel and Distributed Systems, 2022, 33( 4): 739–750

[155]

Stehman S V . Selecting and interpreting measures of thematic classification accuracy. Remote Sensing of Environment, 1997, 62( 1): 77–89

[156]

Powers D M W. Evaluation: from precision, recall and f-measure to roc, informedness, markedness and correlation. 2020, arXiv preprint arXiv: 2010.16061

[157]

Hsieh W W. Machine Learning Methods in the Environmental Sciences: Neural Networks and Kernels. Cambridge: Cambridge University Press, 2009

[158]

Qin J, He Z S. A SVM face recognition method based on Gabor-featured key points. In: Proceedings of 2005 International Conference on Machine Learning and Cybernetics. 2005, 5144−5149

[159]

Hearst M A, Dumais S T, Osuna E, Platt J, Scholkopf B . Support vector machines. IEEE Intelligent Systems and Their Applications, 1998, 13( 4): 18–28

[160]

Rish I. An empirical study of the naive Bayes classifier. In: Proceedings of IJCAI 2001Workshop on Empirical Methods in Artificial Intelligence. 2001, 41–46

[161]

Larose D T, Larose C D. k-nearest neighbor algorithm. In: Discovering Knowledge in Data: An Introduction to Data Mining. Wiley, 2014, 149–164

[162]

Manning C, Raghavan P, Schütze H. Vector space classification. In: An Introduction to Information Retrieval. 2009, 289–317

[163]

Freedman D A. Statistical Models: Theory and Practice. Cambridge, England: Cambridge University Press, 2009

[164]

Lam P, Wang L, Ngan H Y, Yung N H, Yeh A G. Outlier detection in large-scale traffic data by naïve Bayes method and Gaussian mixture model method. 2015, arXiv preprint arXiv: 1512.08413

[165]

Loh W Y . Classification and regression trees. WIREs: Data Mining and Knowledge Discovery, 2011, 1( 1): 14–23

[166]

Freund Y, Schapire R E. A desicion-theoretic generalization of on-line learning and an application to boosting. In: Proceedings of the 2nd European Conference on Computational Learning Theory. 1995, 23−37

[167]

Phillips S J, Anderson R P, Schapire R E . Maximum entropy modeling of species geographic distributions. Ecological Modelling, 2006, 190( 3-4): 231–259

Acknowledgments

The work was supported by the National Key R&D Program of China (2022YFB3304300), the Humanities and Social Sciences Youth Foundation, Ministry of Education (23YJCZH221) and the Natural Science Foundation of Shandong Province (ZR2023QE030).

Competing interests

The authors declare that they have no competing interests or financial conflicts to disclose.

Data availability statement

The data that support the findings of this study are available from the National Disaster Reduction Center of China but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of the National Disaster Reduction Center of China.

RIGHTS & PERMISSIONS

2025 Higher Education Press

AI Summary AI Mindmap

PDF(8477 KB)

Accesses

Citations

Detail

Sections

Recommended

Received	Accepted	Published
18 Nov 2022	05 Nov 2023	15 Jan 2025
Just Accepted Date	Issue Date
07 Nov 2023	06 Feb 2024

About the journal

Aims & scope

Description

Editorial board

Abstracting / Indexing

Contact us

Browse

Just accepted

Online first

Latest issue

All volumes and issues

Collections

Featured articles

Most accessed

Most cited

Collections

Multimedia collections

Authors & reviewers

Online submisson

Call for papers

Guidelines for authors

Download templates

Guidelines for reviewers