Leach: an automatic learning cache for inline primary deduplication system

Bin LIN; Shanshan LI; Xiangke LIAO; Jing ZHANG; Xiaodong LIU

doi:10.1007/s11704-014-3377-2

Front. Comput. Sci. ›› 2014, Vol. 8 ›› Issue (2) :175 -183. DOI: 10.1007/s11704-014-3377-2

RESEARCH ARTICLE

Leach: an automatic learning cache for inline primary deduplication system

Author information +

History +

PDF (640KB)

Abstract

Deduplication technology has been increasingly used to reduce storage costs. Though it has been successfully applied to backup and archival systems, existing techniques can hardly be deployed in primary storage systems due to the associated latency cost of detecting duplicated data, where every unit has to be checked against a substantially large fingerprint index before it is written. In this paper we introduce Leach, for inline primary storage, a self-learning in-memory fingerprints cache to reduce the writing cost in deduplication system. Leach is motivated by the characteristics of realworld I/O workloads: highly data skew exist in the access patterns of duplicated data. Leach adopts a splay tree to organize the on-disk fingerprint index, automatically learns the access patterns and maintains hot working sets in cachememory, with a goal to service a majority of duplicated data detection. Leveraging the working set property, Leach provides optimization to reduce the cost of splay operations on the fingerprint index and cache updates. In comprehensive experiments on several real-world datasets, Leach outperforms conventional LRU (least recently used) cache policy by reducing the number of cache misses, and significantly improves write performance without great impact to cache hits.

Keywords

deduplication / duplicate detection / splay tree / cache

Cite this article

Download citation ▾

Bin LIN, Shanshan LI, Xiangke LIAO, Jing ZHANG, Xiaodong LIU. Leach: an automatic learning cache for inline primary deduplication system. Front. Comput. Sci., 2014, 8 (2) : 175-183 DOI:10.1007/s11704-014-3377-2

登录浏览全文

4963

注册一个新账户忘记密码

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	SrinivasanK, BissonT, GoodsonG, VorugantiK. Idedup: latencyaware, inline data deduplication for primary storage. In: Proceedings of the 10th Usenix Conference on File and Storage Technologies. 2012, 24: 1-24: 14

[2]	GeerD. Reducing the storage burden via data deduplication. Computer, 2008, 41(12): 15-17

[3]	ZhuB, LiK, PattersonH. Avoiding the disk bottleneck in the data domain deduplication file system. In: Proceedings of the 6th Usenix Conference on File and Storage Technologies. 2008, 18:1-18:14

[4]	RodehO, WildaniA, MillerE L. Hands: A heuristically arranged nonbackup in-line deduplication system. In: Proceedings of the 2013 IEEE International Conference on Data Engineering. 2013, 446-457

[5]	LillibridgeM, EshghiK, BhagwatD, DeolalikarV, TreziseG, CambleP. Sparse indexing: large scale, inline deduplication using sampling and locality. In: Proccedings of the 7th Conference on File and Storage technologies. 2009, 111-123

[6]	BhagwatD, EshghiK, LongD D, LillibridgeM. Extreme binning: Scalable, parallel deduplication for chunk-based file backup. In: Proceedings of the 2009 IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems. 2009, 1-9

[7]	MeyerD T, BoloskyW J. A study of practical deduplication. ACM Transactions on Storage, 2012, 7(4): 14:1-14:20

[8]	JinK, MillerE L. The effectiveness of deduplication on virtual machine disk images. In: Proceedings of the 2009 Israeli Experimental Systems Conference. 2009, 7:1-7:12

[9]	LuM, ChamblissD, GliderJ, ConstantinescuC. Insights for data reduction in primary storage: a practical analysis. In: Proceedings of the 5th Annual International Systems and Storage Conference. 2012, 17:1-17:7

[10]	KollerR, RangaswamiR. I/O deduplication: utilizing content similarity to improve I/O performance. ACM Transactions on Storage, 2010, 6(3): 13:1-13:26

[11]	AkuÿrekS, SalemK. Adaptive block rearrangement. Technical Report, 1993

[12]	CarsonS D. A system for adaptive disk rearrangement. Software: Practice and Experience, 1990, 20(3): 225-242

[13]	SleatorD D, TarjanR E. Self-adjusting binary search trees. Journal of the ACM, 1985, 32(3): 652-686

[14]	ZawE P, TheinN L. Improved live VM migration using LRU and Splay tree algorithm. International Journal of Computer Science and Telecommunications, 2012, 3(3): 1-7

RIGHTS & PERMISSIONS

Higher Education Press and Springer-Verlag Berlin Heidelberg

PDF (640KB)

1684

Accesses

Citation

Detail

Sections

Recommended

About the journal

Aims & scope

Description

Editorial board

Abstracting / indexing

Contact us

Browse

Just accepted

All volumes and issues

Collections

Featured articles

Most accessed

Most cited

Collections

Multimedia collections

Authors & reviewers

Online submission

Call for papers

Guidelines for authors

Download templates

Guidelines for reviewers

Abstract

Keywords

Cite this article

References

RIGHTS & PERMISSIONS