Performance issue monitoring, identification and diagnosis of SaaS software: a survey
Rui WANG , Xiangbo TIAN , Shi YING
Front. Comput. Sci. ›› 2025, Vol. 19 ›› Issue (1) : 191201
Performance issue monitoring, identification and diagnosis of SaaS software: a survey
SaaS (Software-as-a-Service) is a service model provided by cloud computing. It has a high requirement for QoS (Quality of Software) due to its method of providing software service. However, manual identification and diagnosis for performance issues is typically expensive and laborious because of the complexity of the application software and the dynamic nature of the deployment environment. Recently, substantial research efforts have been devoted to automatically identifying and diagnosing performance issues of SaaS software. In this survey, we comprehensively review the different methods about automatically identifying and diagnosing performance issues of SaaS software. We divide them into three steps according to their function: performance log generation, performance issue identification and performance issue diagnosis. We then comprehensively review these methods by their development history. Meanwhile, we give our proposed solution for each step. Finally, the effectiveness of our proposed methods is shown by experiments.
SaaS software / performance log generation / performance issue identification / performance issue diagnosis
| [1] |
|
| [2] |
|
| [3] |
|
| [4] |
|
| [5] |
|
| [6] |
|
| [7] |
He Q, Han J, Yang Y, Grundy J, Jin H. QoS-driven service selection for multi-tenant SaaS. In: Proceedings of the 5th IEEE International Conference on Cloud Computing. 2012, 566−573 |
| [8] |
|
| [9] |
|
| [10] |
|
| [11] |
|
| [12] |
|
| [13] |
Gao J, Pattabhiraman P, Bai X, Tsai W T. SaaS performance and scalability evaluation in clouds. In: Proceedings of the 6th International Symposium on Service Oriented System (SOSE). 2011, 61−71 |
| [14] |
|
| [15] |
|
| [16] |
|
| [17] |
|
| [18] |
|
| [19] |
Zhang C, Meng X. Log parser with one-to-one markup. In: Proceedings of the 3rd International Conference on Information and Computer Technologies (ICICT). 2020, 251−257 |
| [20] |
Fang L, Di X, Liu X, Qin Y, Ren W, Ding Q. QuickLogS: a quick log parsing algorithm based on template similarity. In: Proceedings of the 20th IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom). 2021, 1085−1092 |
| [21] |
|
| [22] |
|
| [23] |
Behera A, Panigrahi C R, Pati B. Unstructured Log Analysis for System Anomaly Detection—A Study. Advances in Data Science and Management: Proceedings of ICDSM 2021. Singapore: Springer Nature Singapore, 2022, 497-509, 149−158 |
| [24] |
Fu Q, Lou J G, Lin Q, Ding R, Zhang D, Xie T. Contextual analysis of program logs for understanding system behaviors. In: Proceedings of the 10th Working Conference on Mining Software Repositories (MSR). 2013, 397−400 |
| [25] |
Clayman S, Galis A, Mamatas L. Monitoring virtual networks with lattice. In: Proceedings of 2010 IEEE/IFIP Network Operations and Management Symposium Workshops. 2010, 239−246 |
| [26] |
|
| [27] |
Rong G, Zhang Q, Liu X, Gu S. A systematic review of logging practice in software engineering. In: Proceedings of the 24th Asia-Pacific Software Engineering Conference (APSEC). 2017, 534−539 |
| [28] |
|
| [29] |
|
| [30] |
|
| [31] |
|
| [32] |
Li H, Liu J, Zhang S. Hierarchical latent dirichlet allocation models for realistic action recognition. In: Proceedings of 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2011, 1297−1300 |
| [33] |
|
| [34] |
|
| [35] |
Liu Y, Jin Z. A text classification model constructed by latent dirichlet allocation and deep learning. In: Proceedings of the 4th International Conference on Mechatronics, Materials, Chemistry and Computer Engineering. 2015 |
| [36] |
|
| [37] |
Yuan D, Park S, Huang P, Liu Y, Lee M M, Tang X, Zhou Y, Savage S. Be conservative: enhancing failure diagnosis with proactive logging. In: Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation. 2012, 293−306 |
| [38] |
|
| [39] |
Zhu J, He P, Fu Q, Zhang H, Lyu M R, Zhang D. Learning to log: helping developers make informed logging decisions. In: Proceedings of the 37th IEEE/ACM International Conference on Software Engineering. 2015, 415−425 |
| [40] |
Li Z. Studying and suggesting logging locations in code blocks. In: Proceedings of the 42nd ACM/IEEE International Conference on Software Engineering: Companion Proceedings. 2020, 125−127 |
| [41] |
Gholamian S. Leveraging code clones and natural language processing for log statement prediction. In: Proceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). 2021, 1043−1047 |
| [42] |
|
| [43] |
|
| [44] |
|
| [45] |
|
| [46] |
|
| [47] |
|
| [48] |
Serrano D, Han D, Stroulia E. From relations to multi-dimensional maps: towards an SQL-to-HBase transformation methodology. In: Proceedings of the 8th IEEE International Conference on Cloud Computing. 2015, 81−89 |
| [49] |
|
| [50] |
Saloustros G, Magoutis K. Rethinking Hbase: design and implementation of an elastic key-value store over log-structured local volumes. In: Proceedings the 14th International Symposium on Parallel and Distributed Computing. 2015, 225−234 |
| [51] |
|
| [52] |
Hou Y, Yuan S, Xu W, Wei D. Transformation of an E-R model into HBase tables: a data store design for IHE-XDS document registry. In: Proceedings of the 12th IEEE International Conference on Ubiquitous Intelligence and Computing and the 12th IEEE International Conference on Autonomic and Trusted Computing and the 15th IEEE International Conference on Scalable Computing and Communications and Its Associated Workshops (UIC-ATC-ScalCom). 2015, 1809−1812 |
| [53] |
Bao X, Liu L, Xiao N, Liu F, Zhang Q, Zhu T. HConfig: resource adaptive fast bulk loading in HBase. In: Proceedings of the 10th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing. 2014, 215−224 |
| [54] |
|
| [55] |
Wang Z, Dai W, Wang F, Deng H, Wei S, Zhang X, Liang B. Kafka and its using in high-throughput and reliable message distribution. In: Proceedings of the 8th International Conference on Intelligent Networks and Intelligent Systems (ICINIS). 2015, 117−120 |
| [56] |
|
| [57] |
Zhang H, Fang L, Jiang K, Zhang W, Li M, Zhou L. Secure door on cloud: a secure data transmission scheme to protect Kafka’s data. In: Proceedings of the 26th IEEE International Conference on Parallel and Distributed Systems (ICPADS). 2020, 406−413 |
| [58] |
|
| [59] |
Liu D, Pei D, Zhao Y. Application-aware latency monitoring for cloud tenants via CloudWatch+. In: Proceedings of the 10th International Conference on Network and Service Management (CNSM) and Workshop. 2014, 73−81 |
| [60] |
|
| [61] |
|
| [62] |
Tasquier L, Venticinque S, Aversa R, Di Martino B. Agent based application tools for cloud provisioning and management. In: Proceedings of the 3rd International Conference on Cloud Computing. 2012, 32−42 |
| [63] |
|
| [64] |
|
| [65] |
Nagios X. The industry standard in it infrastructure monitoring. See Logon-int.com/nagios/ website, 2011 |
| [66] |
Mardiyono A, Sholihah W, Hakim F. Mobile-based network monitoring system using Zabbix and telegram. In: Proceedings of the 3rd International Conference on Computer and Informatics Engineering (IC2IE). 2020, 473−477 |
| [67] |
|
| [68] |
|
| [69] |
|
| [70] |
|
| [71] |
|
| [72] |
|
| [73] |
Wang H, Zhang X, Ma Z, Li L, Gao J. An microservices-based openstack monitoring system. In: Proceedings of the 11th International Conference on Educational and Information Technology (ICEIT). 2022, 232−236 |
| [74] |
|
| [75] |
|
| [76] |
|
| [77] |
|
| [78] |
|
| [79] |
|
| [80] |
|
| [81] |
|
| [82] |
Odyurt U, Meyer H, Pimentel A D, Paradas E, Alonso I G. Software passports for automated performance anomaly detection of cyber-physical systems. In: Proceedings of the 19th International Conference on Embedded Computer Systems. 2019, 255−268 |
| [83] |
|
| [84] |
Zhao N, Han B, Cai Y, Su J. SeqAD: an unsupervised and sequential autoencoder ensembles based anomaly detection framework for KPI. In: Proceedings of the 29th IEEE/ACM International Symposium on Quality of Service (IWQOS). 2021, 1−6 |
| [85] |
|
| [86] |
|
| [87] |
|
| [88] |
Fotrousi F, Fricker S A, Fiedler M, Le-Gall F. KPIs for software ecosystems: a systematic mapping study. In: Proceedings of the 5th International Conference of Software Business. 2014, 194−211 |
| [89] |
Zhang S, Zhao C, Sui Y, Su Y, Sun Y, Zhang Y, Pei D, Wang Y. Robust KPI anomaly detection for large-scale software services with partial labels. In: Proceedings of the 32nd IEEE International Symposium on Software Reliability Engineering (ISSRE). 2021, 103−114 |
| [90] |
Jiang Y, Haihong E, Song M, Zhang K. Research and application of newborn defects prediction based on spark and PU-learning. In: Proceedings of the 5th IEEE International Conference on Cloud Computing and Intelligence Systems (CCIS). 2018, 657−663 |
| [91] |
|
| [92] |
|
| [93] |
Han K, Chen W, Xu M. Investigating active positive-unlabeled learning with deep networks. In: Proceedings of the 34th Australasian Joint Conference on Advances in Artificial Intelligence. 2022, 607−618 |
| [94] |
Hu W, Le R, Liu B, , Ji F, Ma J, Zhao D, Yan R. Predictive adversarial learning from positive and unlabeled data. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2021, 7806-7814 |
| [95] |
|
| [96] |
|
| [97] |
|
| [98] |
|
| [99] |
|
| [100] |
Wang B, Yang X, Li J. Locating longest common subsequences with limited penalty. In: Proceedings of the 22nd International Conference on Database Systems for Advanced Applications. 2017, 187−201 |
| [101] |
Weems B P, Bai Y. Finding longest common increasing subsequence for two different scenarios of non-random input sequences. In: Proceedings of 2005 International Conference on Foundations of Computer Science. 2005, 64−72 |
| [102] |
Meng W, Liu Y, Zaiter F, Zhang S, Chen Y, Zhang Y, Zhu Y, Wang E, Zhang R, Tao S, Yang D, Zhou R, Pei D. LogParse: making log parsing adaptive through word classification. In: Proceedings of the 29th International Conference on Computer Communications and Networks (ICCCN). 2020, 1−9 |
| [103] |
|
| [104] |
Chakrabarti A, Striegel A, Manimaran G. A case for tree evolution in QoS multicasting. In: Proceedings of the 10th IEEE International Workshop on Quality of Service (Cat. No.02EX564). 2002, 116−125 |
| [105] |
Li K. A random-walk-based dynamic tree evolution algorithm with exponential speed of convergence to optimality on regular networks. In: Proceedings of the 4th International Conference on Frontier of Computer Science and Technology. 2009, 80−85 |
| [106] |
Tomer A, Schach S R. The evolution tree: a maintenance-oriented software development model. In: Proceedings of the 4th European Conference on Software Maintenance and Reengineering. 2000, 209−214 |
| [107] |
|
| [108] |
Fu Q, Lou J G, Wang Y, Li J. Execution anomaly detection in distributed systems through unstructured log analysis. In: Proceedings of the 9th IEEE International Conference on Data Mining. 2009, 149−158 |
| [109] |
Xu W, Huang L, Fox A, Patterson D, Jordan M I. Detecting large-scale system problems by mining console logs. In: Proceedings of the 22nd ACM SIGOPS Symposium on Operating Systems Principles. 2009, 117−132 |
| [110] |
Nedelkoski S, Cardoso J, Kao O. Anomaly detection from system tracing data using multimodal deep learning. In: Proceedings of the 12th IEEE International Conference on Cloud Computing (CLOUD). 2019, 179−186 |
| [111] |
|
| [112] |
|
| [113] |
|
| [114] |
Liu Z, Sabar N, Song A. Improving evolutionary generative adversarial networks. In: Proceedings of the 34th Australasian Joint Conference on Artificial Intelligence. 2022, 691−702 |
| [115] |
|
| [116] |
|
| [117] |
|
| [118] |
|
| [119] |
Trang N T M, Shcherbakov M. Vietnamese question answering system from multilingual BERT models to monolingual BERT model. In: Proceedings of the 9th International Conference System Modeling and Advancement in Research Trends (SMART). 2020, 201−206 |
| [120] |
|
| [121] |
Praechanya N, Sornil O. Improving Thai named entity recognition performance using BERT transformer on deep networks. In: Proceedings of the 6th International Conference on Machine Learning Technologies. 2021, 177−183 |
| [122] |
Yang L, Chen J, Wang Z, Wang W, Jiang J, Dong X, Zhang W. Semi-supervised log-based anomaly detection via probabilistic label estimation. In: Proceedings of the 43rd IEEE/ACM International Conference on Software Engineering (ICSE). 2021, 1448−1460 |
| [123] |
|
| [124] |
|
| [125] |
|
| [126] |
|
| [127] |
|
| [128] |
Barham P, Donnelly A, Isaacs R, Mortier R. Using magpie for request extraction and workload modelling. In: Proceedings of the 6th Symposium on Operating System Design & Implementation. 2004, 18 |
| [129] |
Chen H, Jiang G, Ungureanu C, Yoshihira K. Failure detection and localization in component based systems by online tracking. In: Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining. 2005, 750−755 |
| [130] |
|
| [131] |
|
| [132] |
Carreira-Perpinan M Á, Hinton G E. On contrastive divergence learning. In: Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics. 2005, 33−40 |
| [133] |
Liu P, Xu H, Ouyang Q, Jiao R, Chen Z, Zhang S, Yang J, Mo L, Zeng J, Xue W, Pei D. Unsupervised detection of microservice trace anomalies through service-level deep Bayesian networks. In: Proceedings of the 31st IEEE International Symposium on Software Reliability Engineering (ISSRE). 2020, 48−58 |
| [134] |
|
| [135] |
Cai Y, Han B, Su J, Wang X. TraceModel: an automatic anomaly detection and root cause localization framework for microservice systems. In: Proceedings of the 17th International Conference on Mobility, Sensing and Networking (MSN). 2021, 512−519 |
| [136] |
Li M, Tang D, Wen Z, Cheng Y. Microservice anomaly detection based on tracing data using semi-supervised learning. In: Proceedings of the 4th International Conference on Artificial Intelligence and Big Data (ICAIBD). 2021, 38−44 |
| [137] |
|
| [138] |
|
| [139] |
Fu X, Ren R, Zhan J, Zhou W, Jia Z, Lu G. LogMaster: mining event correlations in logs of large-scale cluster systems. In: Proceedings of the 31st IEEE Symposium on Reliable Distributed Systems. 2012, 71−80 |
| [140] |
Zou D, Qin H, Jin H, Qiang W, Han Z, Chen X. Improving log-based fault diagnosis by log classification. In: Proceedings of the 11th IFIP International Conference on Network and Parallel Computing. 2014, 446−458 |
| [141] |
|
| [142] |
|
| [143] |
|
| [144] |
Liu D, He C, Peng X, Lin F, Zhang C, Gong S, Li Z, Ou J, Wu Z. MicroHECL: High-efficient root cause localization in large-scale microservice systems. In: Proceedings of the 43rd IEEE/ACM International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). 2021, 338−347 |
| [145] |
|
| [146] |
|
| [147] |
|
| [148] |
Zhao X, Zhang Y, Lion D, Ullah M F, Luo Y, Yuan D, Stumm M. lprof: a non-intrusive request flow profiler for distributed systems. In: Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation. 2014, 629−644 |
| [149] |
|
| [150] |
Attariyan M, Chow M, Flinn J. X-ray: automating root-cause diagnosis of performance anomalies in production software. In: Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation. 2012, 307−320 |
| [151] |
Malik H, Hemmati H, Hassan A E. Automatic detection of performance deviations in the load testing of large scale systems. In: Proceedings of the 35th International Conference on Software Engineering (ICSE). 2013, 1012−1021 |
| [152] |
|
| [153] |
|
| [154] |
|
| [155] |
|
| [156] |
|
| [157] |
|
| [158] |
|
| [159] |
|
| [160] |
Rish I. An empirical study of the naive Bayes classifier. In: Proceedings of IJCAI 2001Workshop on Empirical Methods in Artificial Intelligence. 2001, 41–46 |
| [161] |
Larose D T, Larose C D. k-nearest neighbor algorithm. In: Discovering Knowledge in Data: An Introduction to Data Mining. Wiley, 2014, 149–164 |
| [162] |
Manning C, Raghavan P, Schütze H. Vector space classification. In: An Introduction to Information Retrieval. 2009, 289–317 |
| [163] |
Freedman D A. Statistical Models: Theory and Practice. Cambridge, England: Cambridge University Press, 2009 |
| [164] |
|
| [165] |
|
| [166] |
Freund Y, Schapire R E. A desicion-theoretic generalization of on-line learning and an application to boosting. In: Proceedings of the 2nd European Conference on Computational Learning Theory. 1995, 23−37 |
| [167] |
|
Higher Education Press
Supplementary files
/
| 〈 |
|
〉 |