Superior F1-score: I/O feature driven algorithms for stream computing systems workload identification
Yuxiao HAN , Yubo LIU , Ziyan ZHANG , Fei LI , Zhiguang CHEN , Nong XIAO
Front. Comput. Sci. ›› 2026, Vol. 20 ›› Issue (5) : 2005102
Superior F1-score: I/O feature driven algorithms for stream computing systems workload identification
Workload identification is fundamental for resource management in stream computing systems and is a key factor in improving their cost-benefit. However, existing workload identification algorithms often fail to handle the diversity of workload types and the complexity of the environments, making them usually unable to provide guidance for improving the performance of stream computing systems.
In this work, we propose two workload identification algorithms for different scenarios. The first one is the Fine-Grained I/O traces Workload Identification (FGWI) algorithm, which is suitable for the system that is not sensitive to overhead but mostly pursues the identification F1-score. FGWI analyzes the basic, time, spatial and temporal access features of every I/O operation, and then utilizes CatBoost to classify the workloads, meeting the high F1-score requirement. The second one is the simplified version of FGWI called AWI (Aggregated I/O traces Workload Identification), which mostly focuses on the temporal accesses features of minute-level aggregated I/O traces to reduce the overhead. We conduct experiments driven by the traces collected from Alibaba Cloud to evaluate the two algorithms. Experimental results demonstrate that, FGWI achieves an average 8.2% improvement in F1-score compared to the state-of-the-art algorithms, while AWI maintains a time overhead of only 0.22% relative to FGWI, but achieving an average of 6.8% improvement in F1-score compared to the state-of-the-art algorithms. Both algorithms present robustness and scalability across disks, proving their effectiveness for workload identification.
stream computing system / workload identificationtion / I/O feature
| [1] |
|
| [2] |
|
| [3] |
|
| [4] |
|
| [5] |
|
| [6] |
|
| [7] |
|
| [8] |
|
| [9] |
|
| [10] |
|
| [11] |
|
| [12] |
|
| [13] |
|
| [14] |
|
| [15] |
|
| [16] |
|
| [17] |
|
| [18] |
|
| [19] |
|
| [20] |
|
| [21] |
|
| [22] |
|
| [23] |
|
| [24] |
Kirpichov E, Denielou M. No shard left behind: dynamic work rebalancing in Google cloud dataflow. Google Cloud Blog, 2016 |
| [25] |
|
| [26] |
|
| [27] |
|
| [28] |
|
| [29] |
|
| [30] |
|
| [31] |
|
| [32] |
|
| [33] |
|
| [34] |
|
| [35] |
|
| [36] |
|
| [37] |
Pena D, Tiao G C, Tsay R S. A course in time series analysis. volume 409. Wiley Online Library, 2001 |
| [38] |
|
| [39] |
|
| [40] |
|
| [41] |
|
| [42] |
|
| [43] |
|
| [44] |
|
| [45] |
|
| [46] |
|
| [47] |
|
| [48] |
|
Higher Education Press
/
| 〈 |
|
〉 |