Research on performance optimization of virtual data space across WAN
Jiantong HUO , Zhisheng HUO , Limin XIAO , Zhenxue HE
Front. Comput. Sci. ›› 2024, Vol. 18 ›› Issue (6) : 186505
Research on performance optimization of virtual data space across WAN
For the high-performance computing in a WAN environment, the geographical locations of national supercomputing centers are scattered and the network topology is complex, so it is difficult to form a unified view of resources. To aggregate the widely dispersed storage resources of national supercomputing centers in China, we have previously proposed a global virtual data space named GVDS in the project of “High Performance Computing Virtual Data Space”, a part of the National Key Research and Development Program of China. The GVDS enables large-scale applications of the high-performance computing to run efficiently across WAN. However, the applications running on the GVDS are often data-intensive, requiring large amounts of data from multiple supercomputing centers across WANs. In this regard, the GVDS suffers from performance bottlenecks in data migration and access across WANs. To solve the above-mentioned problem, this paper proposes a performance optimization framework of GVDS including the multitask-oriented data migration method and the request access-aware IO proxy resource allocation strategy. In a WAN environment, the framework proposed in this paper can make an efficient migration decision based on the amount of migrated data and the number of multiple data sources, guaranteeing lower average migration latency when multiple data migration tasks are running in parallel. In addition, it can ensure that the thread resource of the IO proxy node is fairly allocated among different types of requests (the IO proxy is a module of GVDS), so as to improve the application’s performance across WANs. The experimental results show that the framework can effectively reduce the average data access delay of GVDS while improving the performance of the application greatly.
storage aggregation across WANs / large-scale applications / GVDS / data migration / allocation of IO proxy resource
| [1] |
|
| [2] |
|
| [3] |
|
| [4] |
|
| [5] |
|
| [6] |
|
| [7] |
|
| [8] |
|
| [9] |
|
| [10] |
|
| [11] |
|
| [12] |
|
| [13] |
Delimitrou C, Sanchez D, Kozyrakis C. Tarcil: reconciling scheduling speed and quality in large shared clusters. In: Proceedings of the 6th ACM Symposium on Cloud Computing. 2015, 97−110 |
| [14] |
|
| [15] |
|
| [16] |
|
| [17] |
|
| [18] |
|
| [19] |
|
| [20] |
|
| [21] |
|
| [22] |
Chang H S, Givan R, Chong E K P. On-line scheduling via sampling. In: Proceedings of the 5th International Conference on Artificial Intelligence Planning Systems. 2000, 62−71 |
| [23] |
Dong X, Wang Y, Liao H. Scheduling mixed real-time and non-real-time applications in MapReduce environment. In: Proceedings of the 17th IEEE International Conference on Parallel and Distributed Systems. 2011, 9−16 |
| [24] |
Ousterhout K, Wendell P, Zaharia M, Stoica I. Sparrow: distributed, low latency scheduling. In: Proceedings of the 24th ACM Symposium on Operating Systems Principles. 2013, 69–84 |
| [25] |
|
| [26] |
|
| [27] |
Verma A, Pedrosa L, Korupolu M, Oppenheimer D, Tune E, Wilkes J. Large-scale cluster management at Google with Borg. In: Proceedings of the 10th European Conference on Computer Systems. 2015, 18 |
| [28] |
Tumanov A, Zhu T, Park J W, Kozuch M A, Harchol-Balter M, Ganger G R. TetriSched: global rescheduling with adaptive plan-ahead in dynamic heterogeneous clusters. In: Proceedings of the 11th European Conference on Computer Systems. 2016, 35 |
| [29] |
|
| [30] |
|
| [31] |
Wang Z, Zhang G, Wang Y, Yang Q, Zhu J. Dayu: fast and low-interference data recovery in very-large storage systems. In: Proceedings of 2019 USENIX Conference on Usenix Annual Technical Conference. 2019, 993−1007 |
| [32] |
Ongaro D, Rumble S M, Stutsman R, Ousterhout J, Rosenblum M. Fast crash recovery in RAMCloud. In: Proceedings of the 23rd ACM Symposium on Operating Systems Principles. 2011, 29−41 |
| [33] |
|
| [34] |
|
| [35] |
|
| [36] |
|
| [37] |
|
| [38] |
Zhang Y, Jiang J, Xu K, Nie X, Reed M J, Wang H, Yao G, Zhang M, Chen K. BDS: a centralized near-optimal overlay network for inter-datacenter data replication. In: Proceedings of the 13th EuroSys Conference. 2018, 10 |
| [39] |
Park J W, Tumanov A, Jiang A, Kozuch M A, Ganger G R. 3Sigma: distribution-based cluster scheduling for runtime uncertainty. In: Proceedings of the 13th EuroSys Conference. 2018, 2 |
| [40] |
|
| [41] |
|
| [42] |
|
| [43] |
Ahmad E S. Infrastructure as a service: a practical study of alibaba cloud elastic compute service (ECS)[J]. Tartous University-A Project, 2019. |
| [44] |
GB/T 7714Axboe J. Fio-flexible i/o tester synthetic benchmark. URL, See github. com/axboe/fio website (Accessed: 2015-06-13), 2005 |
| [45] |
|
| [46] |
Mdtest hpc benchmark, available from the website of mdtest.sourceforge.net/ |
Higher Education Press
Supplementary files
/
| 〈 |
|
〉 |