ICCG: low-cost and efficient consistency with adaptive synchronization for metadata replication
Chenhao ZHANG , Liang WANG , Jing SHANG , Zhiwen XIAO , Limin XIAO , Meng HAN , Bing WEI , Runnan SHEN , Jinquan WANG
Front. Comput. Sci. ›› 2025, Vol. 19 ›› Issue (1) : 191105
ICCG: low-cost and efficient consistency with adaptive synchronization for metadata replication
The rapid growth in the storage scale of wide-area distributed file systems (DFS) calls for fast and scalable metadata management. Metadata replication is the widely used technique for improving the performance and scalability of metadata management. Because of the POSIX requirement of file systems, many existing metadata management techniques utilize a costly design for the sake of metadata consistency, leading to unacceptable performance overhead. We propose a new metadata consistency maintenance method (ICCG), which includes an incremental consistency guaranteed directory tree synchronization (ICGDT) and a causal consistency guaranteed replica index synchronization (CCGRI), to ensure system performance without sacrificing metadata consistency. ICGDT uses a flexible consistency scheme based on the state of files and directories maintained through the conflict state tree to provide an incremental consistency for metadata, which satisfies both metadata consistency and performance requirements. CCGRI ensures low latency and consistent access to data by establishing a causal consistency for replica indexes through multi-version extent trees and logical time. Experimental results demonstrate the effectiveness of our methods. Compared with the strong consistency policies widely used in modern DFSes, our methods significantly improve the system performance. For example, in file creation, ICCG can improve the performance of directory tree operations by at least 36.4 times.
metadata management / metadata replication / consistency / directory tree / replica index
| [1] |
|
| [2] |
Wrzeszcz M, Trzepla K, S ota R, Zemek K, Lichoń T, Opioła Ł, Nikolow D, Dutka Ł, Słota R, Kitowski J. Metadata organization and management for globalization of data access with Onedata. In: Proceedings of the 11th International Conference on Parallel Processing and Applied Mathematics. 2016, 312−321 |
| [3] |
|
| [4] |
|
| [5] |
|
| [6] |
Lv W H, Lu Y Y, Zhang Y M, Duan P L, Shu J W. InfiniFS: an efficient metadata service for Large-Scale distributed filesystems. In: Proceedings of the 20th USENIX Conference on File and Storage Technologies. 2022, 313−328 |
| [7] |
Ousterhout J K, Da Costa H, Harrison D, Kunze J A, Kupfer M, Thompson J G. A trace-driven analysis of the Unix 4.2 BSD file system. In: Proceedings of the 10th ACM Symposium on Operating Systems Principles. 1985, 15−24 |
| [8] |
Miller E L, Greenan K, Leung A, et al. Reliable and efficient metadata storage and indexing using nvram. Available: dcslab. hanyang. ac. kr/nvramos08/EthanMiller. pdf, 2008. |
| [9] |
|
| [10] |
Thomson A, Abadi D J. CalvinFS: Consistent WAN replication and scalable metadata management for distributed file systems. In: Proceedings of the 13th USENIX Conference on File and Storage Technologies. 2015, 1−14 |
| [11] |
|
| [12] |
Shvachko K, Kuang H, Radia S, Chansler R. The Hadoop distributed file system. In: Proceedings of the 26th IEEE Symposium on Mass Storage Systems and Technologies (MSST). 2010, 1−10 |
| [13] |
|
| [14] |
|
| [15] |
Matri P, Pérez M S, Costan A, Antoniu G. TýrFS: increasing small files access performance with dynamic metadata replication. In: Proceedings of the 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID). 2018: 452−461 |
| [16] |
|
| [17] |
Lipcon T, Alves D, Burkert D, et al. Kudu: Storage for fast analytics on fast data. Cloudera, Inc, 2015, 28: 36−77 |
| [18] |
Li Z Y, Xue R N, Ao L X. Replichard: towards tradeoff between consistency and performance for metadata. In: Proceedings of 2016 International Conference on Supercomputing. 2016, 25 |
| [19] |
Bravo M, Rodrigues L, Van Roy P. Saturn: a distributed metadata service for causal consistency. In: Proceedings of the 12th European Conference on Computer Systems. 2017, 111−126 |
| [20] |
|
| [21] |
Guerraoui R, Pavlovic M, Seredinschi D A. Incremental consistency guarantees for replicated objects. In: Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation. 2016, 169−184 |
| [22] |
|
| [23] |
Rodeh O, Teperman A. zFS-a scalable distributed file system using object disks. In: Proceedings of the 20th IEEE/ the 11th NASA Goddard Conference on Mass Storage Systems and Technologies. 2003, 207−218 |
| [24] |
|
| [25] |
Niazi S, Ismail M, Haridi S, Dowling J, Grohsschmiedt S, Ronström M. HopsFS: scaling hierarchical file system metadata using newSQL databases. In: Proceedings of the 15th USENIX Conference on File and Storage Technologies. 2017, 89−103 |
| [26] |
|
| [27] |
Lamport L. Paxos made simple. ACM SIGACT News (Distributed Computing Column) 32, 4 (Whole Number 121, December 2001), 2001: 51−58 |
| [28] |
|
| [29] |
|
| [30] |
Zhou J, Chen Y, Wang W P, Meng D. MAMS: a highly reliable policy for metadata service. In: Proceedings of the 44th International Conference on Parallel Processing. 2015, 729−738 |
| [31] |
|
| [32] |
Chandra T D, Griesemer R, Redstone J. Paxos made live: an engineering perspective. In: Proceedings of the 26th Annual ACM Symposium on Principles of Distributed Computing. 2007, 398−407 |
| [33] |
|
| [34] |
|
| [35] |
|
| [36] |
Bailis P, Fekete A, Franklin M J, Ghodsi A, Hellerstein J M, Stoica I. Feral concurrency control: an empirical investigation of modern application integrity. In: Proceedings of 2015 ACM SIGMOD International Conference on Management of Data. 2015, 1327−1342 |
| [37] |
|
| [38] |
|
| [39] |
|
| [40] |
|
| [41] |
|
| [42] |
|
| [43] |
|
| [44] |
Alibaba. Alibaba elastic compute service. See alibabacloud.com/zh/product/ecs website, 2023 |
| [45] |
HPC IO Benchmark Repository. Mdtest parallel I/O benchmark. See github.com/hpc/ior website, 2023 |
| [46] |
Gupta A, Milojicic D. Evaluation of HPC applications on cloud. In: Proceedings of the 6th Open Cirrus Summit. 2011, 22−26 |
| [47] |
|
| [48] |
Charapko A, Ailijiang A, Demirbas M. Linearizable quorum reads in Paxos. In: Proceedings of the 11th USENIX Workshop on Hot Topics in Storage and File Systems. 2019, 8 |
| [49] |
Jens A. Fio-flexible io tester. See freecode.com/projects/fio website, 2014. |
| [50] |
Glass G, Gopalan A, Koujalagi D, Palicherla A, Sakdeo S. Logical synchronous replication in the tintri VMstore file system. In: Proceedings of the 16th USENIX Conference on File and Storage Technologies. 2018, 295−308 |
| [51] |
Lampson B, Lomet D. A new presumed commit optimization for two phase commit. In: Proceedings of the 19th International Conference on Very Large Data Bases (VLDB'93). 1993: 630-640 |
| [52] |
|
| [53] |
|
| [54] |
|
| [55] |
|
| [56] |
|
Higher Education Press
Supplementary files
/
| 〈 |
|
〉 |