Using multi-threads to hide deduplication I/O latency with low synchronization overhead

Rui Zhu; Lei-hua Qin; Jing-li Zhou; Huan Zheng

doi:10.1007/s11771-013-1650-4

Journal of Central South University ›› 2013, Vol. 20 ›› Issue (6) :1582 -1591. DOI: 10.1007/s11771-013-1650-4

Article

Using multi-threads to hide deduplication I/O latency with low synchronization overhead

Rui Zhu ¹^,²
, Lei-hua Qin ¹^,²^,^b
, Jing-li Zhou ¹^,²
, Huan Zheng ¹^,²

Author information +

History +

PDF

Abstract

Data deduplication, as a compression method, has been widely used in most backup systems to improve bandwidth and space efficiency. As data exploded to be backed up, two main challenges in data deduplication are the CPU-intensive chunking and hashing works and the I/O intensive disk-index access latency. However, CPU-intensive works have been vastly parallelized and speeded up by multi-core and many-core processors; the I/O latency is likely becoming the bottleneck in data deduplication. To alleviate the challenge of I/O latency in multi-core systems, multi-threaded deduplication (Multi-Dedup) architecture was proposed. The main idea of Multi-Dedup was using parallel deduplication threads to hide the I/O latency. A prefix based concurrent index was designed to maintain the internal consistency of the deduplication index with low synchronization overhead. On the other hand, a collisionless cache array was also designed to preserve locality and similarity within the parallel threads. In various real-world datasets experiments, Multi-Dedup achieves 3–5 times performance improvements incorporating with locality-based ChunkStash and local-similarity based SiLo methods. In addition, Multi-Dedup has dramatically decreased the synchronization overhead and achieves 1.5–2 times performance improvements comparing to traditional lock-based synchronization methods.

Keywords

multi-thread / multi-core / parallel / data deduplication

Cite this article

Download citation ▾

Rui Zhu, Lei-hua Qin, Jing-li Zhou, Huan Zheng. Using multi-threads to hide deduplication I/O latency with low synchronization overhead. Journal of Central South University, 2013, 20(6): 1582-1591 DOI:10.1007/s11771-013-1650-4

登录浏览全文

4963

注册一个新账户忘记密码

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	BiggarHExperiencing data de-duplication: Improving efficiency and reducing capacity requirements [R], 2007MAThe Enterprise Strategy Group

[2]	RheaS, CoxR, PesterevA. Fast, inexpensive content-addressed storage in foundation [C]. USENIX 2008 ANNUAL Technical Conference (USENIX ATC’ 08), 2008BostonUSENIX Association143-156

[3]	PolicroniadesC, PrattI. Alternatives for detecting redundancy in storage systems data [C]. Proceedings of the 2004 USENIX Annual Technical Conference (USENIX ATC’04), 2004BostonUSENIX Association73-86

[4]	ManberU. Finding similar files in a large file system [C]. Proceedings of the USENIX Winter 1994 Technical Conference, 1994San FransiscoUSENIX Association17-21

[5]	ZivJ, LempelA. A universal algorithm for sequential data compression [J]. IEEE Transactions on Information Theory, 1997, 23(3): 337-343

[6]	BiggarHExperiencing data de-duplication: Improving efficiency and reducing capacity requirements [R], 2007MAThe Enterprise Strategy Group

[7]	ZhuB, LiK, PattersonH. Avoiding the disk bottleneck in the data domain deduplication file system [C]. Proceedings of the 6th USENIX Conference on File and Storage Technologies (FAST’ 08), 2008San JoseUSENIX Association1-14

[8]	BhagwatD, EshghiK, LongD D, LillibridgeM. Extreme binning: Scalable, parallel deduplication for chunk-based file backup [C]. Proceedings of 17th IEEE/ACM International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS’09), 2009LondonIEEE Press1-9

[9]	XiaW, JiangH, FengD, HuaYu. SiLo: A similarity-locality based near-exact deduplication scheme with low RAM overhead and high throughput [C]. Proceedings of the 2011 conference on USENIX Annual Technical conference (USENIX ATC’11), 2011PortlandUSENIX Association26-38

[10]	DebnathB, SenguptaS, LiJin. ChunkStash: Speeding up inline storage deduplication using flash memory [C]. Proceedings of the 2010 conference on USENIX Annual Technical Conference (USENIX ATC’10), 2010BostonUSENIX Association16-16

[11]	The White Paper. EMC data domain SISL scaling architecture a detailed review [EB/OL]. [2012-02-12]. http://www.emc.com/collateral/hardware/white-papers/h7221-data-domain-sisl-sclg-arch-wp.pdf.

[12]	LiuC-y, XueY-b, JuD-p, WangD-sheng. A novel optimization method to improve de-duplication storage system performance [C]. 2009 15th International Conference on Parallel and Distributed Systems (ICPADS’09), 2009Hong KongIEEE Press228-235

[13]	GuoF-l, EfstathopoulosP. Building a high-performance deduplication system [C]. Proceedings of the 2011 conference on USENIX Annual Technical conference (USENIX ATC’11), 2011PortlandUSENIX Association1-14

[14]	XiaW, JiangH, FengD, TianLei. Accelerating Data Deduplication by Exploiting Pipelining and Parallelism with Multicore or Manycore Processors [C]. Proceedings of the 10th USENIX Conference on File and Storage Technologies (FAST’12), 2012San JoseUSENIX Association1-2

[15]	GharaibehA, Al-kiswanyS, GopalakrishnanS, RipeanuM. A GPU accelerated storage system [C]. Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing (HPDC’10), 2010ChicagoACM167-178

[16]	BhatotiaP, RodriguesR, VermaA. Shredder: GPU-accelerated incremental storage and computation [C]. Proceedings of the 10th USENIX Conference on File and Storage Technologies (FAST’ 12), 2012San JoseUSENIX Association157-172

[17]	TriplettJ, MckenneyP E, WalpoleJ. Resizable, scalable, concurrent hash tables via relativistic programming [C]. Proceedings of the 2011 Conference on USENIX Annual Technical Conference (USENIX ATC’11), 2011PortlandUSENIX Association102-116

[18]	MckenneyP E, SlingwineJ D. Read-copy update: Using execution history to solve concurrency problems [C]. 1998 Parallel and Distributed Computing and Systems, 1998Las VegasACTA Press509-518

[19]	TriplettJ, MckenneyP E, WalpoleJ. Scalable concurrent hash tables via relativistic programming [J]. ACM SIGOPS Operating Systems Review, 2010, 44(3): 102-109

[20]	DubnickiC, GryzL, HeldtL, KaczmarczykM, KilianW, StrzelczakP, SzczepkowskiJ, UngureanuC, WelnickiM. Hydrastor: A scalable secondary storage [C]. Proceedings of the 7th USENIX Conference on File and Storage Technologies (FAST’09), 2009San FranciscoUSENIX Association197-210

[21]	MuthitacharoenA, ChenB-j, MazieresD. A low-bandwidth network file system [J]. ACM SIGOPS Operating System Review, 2001, 35(4): 174-187

[22]	QuinlanS, DorwardS. Venti: A new approach to archival data storage [C]. Proceedings of the 1st USENIX Conference on File and Storage Technologies (FAST’02), 2002MontereyUSENIX Association4-4

[23]	LillibridgeM, EshghiK, BhagwatD, DeolalikarV, TreziseG, CambleP. Sparse indexing: Large scale, inline deduplication using sampling and locality [C]. Proceedings of the 7th USENIX Conference on File and Storage Technologies (FAST’ 09), 2009San FranciscoUSENIX Association111-123

[24]	FanL, CaoP, AlmeidaJ, BroderZ. Summary cache: A scalable wide-area web cache sharing protocol [J]. IEEE/ACM Transactions on Networking, 2000, 8(3): 281-293

[25]	ChenS-q, HuangJ-g, ChenJ-er. Approximation algorithm for multiprocessor parallel job scheduling [J]. Journal of Central South University of Technology (English Edition), 2002, 9(4): 267-272