Optimizing B+-tree for hybrid memory with in-node hotspot cache and eADR awareness
Peiquan JIN, Zhaole CHU, Gaocong LIU, Yongping LUO, Shouhong WAN
Optimizing B+-tree for hybrid memory with in-node hotspot cache and eADR awareness
The advance in Non-Volatile Memory (NVM) has changed the traditional DRAM-only memory system. Compared to DRAM, NVM has the advantages of non-volatility and large capacity. However, as the read/write speed of NVM is still lower than that of DRAM, building DRAM/NVM-based hybrid memory systems is a feasible way of adding NVM into the current computer architecture. This paper aims to optimize the well-known B+-tree for hybrid memory. The novelty of this study is two-fold. First, we observed that the space utilization of internal nodes in B+-tree is generally below 70%. Inspired by this observation, we propose to maintain hot keys in the free space within internal nodes, yielding a new index named HATree (Hotness-Aware Tree). The new idea of HATree is to use the unused space of the parent of leaf nodes (PLNs) as the hotspot data cache. Thus, no extra space is needed, and the in-node hotspot cache can efficiently improve query performance. Second, to further improve the update performance of HATree, we propose to utilize the eADR technology supported by the third-generation Intel Xeon Scalable Processors to enhance HATree with instant log persistence, which results in the new HATree-Log structure. We conduct extensive experiments on real hybrid memory architecture involving DRAM and Intel Optane Persistent Memory to evaluate the performance of HATree and HATree-Log. Three state-of-the-art indices for hybrid memory, namely NBTree, LBTree, and FPTree, are included in the experiments, and the results suggest the efficiency of HATree and HATree-Log.
hybrid memory / B+-tree / hotspot / in-node cache / eADR
Peiquan Jin received his PhD degree in computer science and technology from the University of Science and Technology of China, China in 2003. He is currently an associate professor in the School of Computer Science and Technology, University of Science and Technology of China, China. He is a member of ACM and IEEE, and a senior member of CCF. His research interests focus on big data management, databases on new storage, and information retrieval
Zhaole Chu is currently pursuing the PhD degree with the School of Computer Science and Technology, University of Science and Technology of China, China. His research interests include AI4DB and databases for persistent memory
Gaocong Liu is currently pursuing the Master degree with the School of Computer Science and Technology, University of Science and Technology of China, China. His research interests include databases for persistent memory
Yongping Luo is currently pursuing the PhD degree with the School of Computer Science and Technology, University of Science and Technology of China, China. His research interests include databases for persistent memory and big data management
Shouhong Wan received her PhD degree in computer science and technology from the University of Science and Technology of China, China. She is currently an associate professor in the School of Computer Science and Technology, University of Science and Technology of China, China. She is a member of ACM and IEEE. Her research interests focus on big data management, image processing, and information retrieval
[1] |
Yang J, Kim J, Hoseinzadeh M, Izraelevitz J, Swanson S. An empirical guide to the behavior and use of scalable persistent memory. In: Proceedings of the 18th USENIX Conference on File and Storage Technologies. 2020, 169−182
|
[2] |
Raoux S, Burr G W, Breitwisch M J, Rettner C T, Chen Y C, Shelby R M, Salinga M, Krebs D, Chen S H, Lung H L, Lam C H. Phase-change random access memory: a scalable technology. IBM Journal of Research and Development, 2008, 52(4−5): 465−479
|
[3] |
Apalkov D, Khvalkovskiy A, Watts S, Nikitin V, Tang X, Lottis D, Moon K, Luo X, Chen E, Ong A, Driskill-Smith A, Krounbi M . Spin-transfer torque magnetic random access memory (STT-MRAM). ACM Journal on Emerging Technologies in Computing Systems, 2013, 9( 2): 13
|
[4] |
Akinaga H, Shima H . Resistive random access memory (ReRAM) based on metal oxides. Proceedings of the IEEE, 2010, 98( 12): 2237–2251
|
[5] |
Liu J, Chen S . Initial experience with 3D XPoint main memory. Distributed and Parallel Databases, 2020, 38( 4): 865–880
|
[6] |
Liu J, Chen S, Wang L . LB+Trees: optimizing persistent index performance on 3DXPoint memory. Proceedings of the VLDB Endowment, 2020, 13( 7): 1078–1090
|
[7] |
Lu B, Hao X, Wang T, Lo E . Dash: scalable hashing on persistent memory. Proceedings of the VLDB Endowment, 2020, 13( 8): 1147–1161
|
[8] |
Luo Y, Jin P, Zhang Q, Cheng B. TLBtree: a read/write-optimized tree index for non-volatile memory. In: Proceedings of the 37th International Conference on Data Engineering. 2021, 1889−1894
|
[9] |
Wang Q, Lu Y, Li J, Shu J. Nap: a black-box approach to NUMA-aware persistent memory indexes. In: Proceedings of the 15th USENIX Symposium on Operating Systems Design and Implementation. 2021, 93−111
|
[10] |
Zhang B, Zheng S, Qi Z, Huang L . NBTree: a lock-free PM-friendly persistent B+-tree for eADR-enabled PM systems. Proceedings of the VLDB Endowment, 2022, 15( 6): 1187–1200
|
[11] |
Zuo P, Hua Y, Wu J . Level hashing: a high-performance and flexible-resizing persistent hashing index structure. ACM Transactions on Storage, 2019, 15( 2): 13
|
[12] |
Lu B, Ding J, Lo E, Minhas U F, Wang T . APEX: a high-performance learned index on persistent memory. Proceedings of the VLDB Endowment, 2021, 15( 3): 597–610
|
[13] |
Benson L, Makait H, Rabl T . Viper: an efficient hybrid PMem-DRAM key-value store. Proceedings of the VLDB Endowment, 2021, 14( 9): 1544–1556
|
[14] |
Yan B, Cheng X, Jiang B, Chen S, Shang C, Wang J, Huang G, Yang X, Cao W . Revisiting the design of LSM-tree based OLTP storage engine with persistent memory. Proceedings of the VLDB Endowment, 2021, 14( 10): 1872–1885
|
[15] |
Zhang W, Zhao X, Jiang S, Jiang H. ChameleonDB: a key-value store for optane persistent memory. In: Proceedings of the 16th European Conference on Computer Systems. 2021, 194−209
|
[16] |
Gugnani S, Kashyap A, Lu X . Understanding the idiosyncrasies of real persistent memory. Proceedings of the VLDB Endowment, 2020, 14( 4): 626–639
|
[17] |
Oukid I, Lasperas J, Nica A, Willhalm T, Lehner W. FPTree: a hybrid SCM-DRAM persistent and concurrent B-tree for storage class memory. In: Proceedings of 2016 International Conference on Management of Data. 2016, 371−386
|
[18] |
Liu G, Luo Y, Jin P. HATree: a hotness-aware tree index with in-node hotspot cache for NVM/DRAM-based hybrid memory architecture. In: Proceedings of the 27th International Conference on Database Systems for Advanced Applications. 2022, 560−568
|
[19] |
Zhou X, Shou L, Chen K, Hu W, Chen G . DPTree: differential indexing for persistent memory. Proceedings of the VLDB Endowment, 2019, 13( 4): 421–434
|
[20] |
Hu D, Chen Z, Wu J, Sun J, Chen H . Persistent memory hash indexes: an experimental evaluation. Proceedings of the VLDB Endowment, 2021, 14( 5): 785–798
|
[21] |
Chen S, Jin Q . Persistent B+-trees in non-volatile main memory. Proceedings of the VLDB Endowment, 2015, 8( 7): 786–797
|
[22] |
Yang J, Wei Q, Chen C, Wang C, Yong K L, He B. NV-Tree: reducing consistency cost for NVM-based single level systems. In: Proceedings of the 13th USENIX Conference on File and Storage Technologies. 2015, 167−181
|
[23] |
Yang J, Wei Q, Wang C, Chen C, Yong K L, He B . NV-Tree: a consistent and workload-adaptive tree structure for non-volatile memory. IEEE Transactions on Computers, 2016, 65( 7): 2169–2183
|
[24] |
Cao Z, Dong S, Vemuri S, Du D H C. Characterizing, modeling, and benchmarking RocksDB key-value workloads at facebook. In: Proceedings of the 18th USENIX Conference on File and Storage Technologies. 2020, 209−223
|
[25] |
Baldassin A, Barreto J, Castro D, Romano P . Persistent memory: a survey of programming support and implementations. ACM Computing Surveys, 2022, 54( 7): 152
|
[26] |
Scargall S. Programming Persistent Memory: A Comprehensive Guide for Developers. Berkeley: Apress, 2020
|
[27] |
Yao A C C . On random 2–3 trees. Acta Informatica, 1978, 9( 2): 159–170
|
[28] |
Chen J, Chen L, Wang S, Zhu G, Sun Y, Liu H, Li F. HotRing: a hotspot-aware in-memory key-value store. In: Proceedings of the 18th USENIX Conference on File and Storage Technologies. 2020, 239−252
|
[29] |
Lehman P L, Yao S B . Efficient locking for concurrent operations on B-trees. ACM Transactions on Database Systems, 1981, 6( 4): 650–670
|
[30] |
Lo C C, Sue K L. Second chance replacement policy for mobile database overflow. In: Proceedings of the Global Telecommunications Conference. 2002, 1683−1687
|
[31] |
Luo Y, Jin P, Zhang Z, Zhang J, Cheng B, Zhang Q . Two birds with one stone: boosting both search and write performance for tree indices on persistent memory. ACM Transactions on Embedded Computing Systems, 2021, 20( 5s): 50
|
[32] |
Jin P, Xie X, Wang N, Yue L . Optimizing R-tree for flash memory. Expert Systems with Applications, 2015, 42( 10): 4676–4686
|
[33] |
Kim B K, Lee D H . LSB-Tree: a log-structured b-tree index structure for NAND flash SSDs. Design Automation for Embedded Systems, 2015, 19( 1−2): 77–100
|
[34] |
Purandare D R, Wilcox P, Litz H, Finkelstein S. Append is near: log-based data management on ZNS SSDs. In: Proceedings of the 12th Conference on Innovative Data Systems Research. 2022
|
[35] |
Cooper B F, Silberstein A, Tam E, Ramakrishnan R, Sears R. Benchmarking cloud serving systems with YCSB. In: Proceedings of the 1st ACM Symposium on Cloud Computing. 2010, 143−154
|
[36] |
Jamil S, Salam A, Khan A, Burgstaller B, Park S S, Kim Y. Scalable NUMA-aware persistent B+-tree for non-volatile memory devices. Cluster Computing, 2023, 26(5): 2865−2881
|
[37] |
Leis V, Scheibner F, Kemper A, Neumann T. The ART of practical synchronization. In: Proceedings of the 12th International Workshop on Data Management on New Hardware. 2016
|
/
〈 | 〉 |