HC-Store: putting MapReduce’s foot in two camps

Huiju WANG; Furong LI; Xuan ZHOU; Yu CAO; Xiongpai QIN; Jidong CHEN; Shan WANG

doi:10.1007/s11704-014-3376-3

Front. Comput. Sci. ›› 2014, Vol. 8 ›› Issue (6) :859 -871. DOI: 10.1007/s11704-014-3376-3

RESEARCH ARTICLE

HC-Store: putting MapReduce’s foot in two camps

Huiju WANG ¹^,²^,⁴^,^*
, Furong LI ⁴
, Xuan ZHOU ¹
, Yu CAO ³
, Xiongpai QIN ¹^,²
, Jidong CHEN ³
, Shan WANG ¹^,²

Author information +

History +

PDF (626KB)

Abstract

MapReduce is a popular framework for largescale data analysis. As data access is critical forMapReduce’s performance, some recent work has applied different storage models, such as column-store or PAX-store, to MapReduce platforms. However, the data access patterns of different queries are very different. No storagemodel is able to achieve the optimal performance alone. In this paper, we study how MapReduce can benefit from the presence of two different column-store models — pure column-store and PAX-store. We propose a hybrid storage system called hybrid columnstore (HC-store). Based on the characteristics of the incoming MapReduce tasks, our storage model can determine whether to access the underlying pure column-store or PAX-store.We studied the properties of the different storage models and create a cost model to decide the data access strategy at runtime. We have implemented HC-store on top of Hadoop. Our experimental results show that HC-store is able to outperform PAX-store and column-store, especially when confronted with diverse workload.

Keywords

MapReduce / Hadoop / HC-store / cost model / column-store / PAX-store

Cite this article

Download citation ▾

Huiju WANG, Furong LI, Xuan ZHOU, Yu CAO, Xiongpai QIN, Jidong CHEN, Shan WANG. HC-Store: putting MapReduce’s foot in two camps. Front. Comput. Sci., 2014, 8 (6) : 859-871 DOI:10.1007/s11704-014-3376-3

登录浏览全文

4963

注册一个新账户忘记密码

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Dean J, Ghemawat S. Mapreduce: simplified data processing on large clusters. In: Proceedings of the 6th Symposium on Operating Systems and Implementation. 2004, 137-150

[2]	Floratou A, Patel J M, Shekita E J, Tata S. Column-oriented storage techniques for mapreduce. In: Proceedings of the 37th International Conference on Very Large Data Bases. 2011, 4(7): 419-429

[3]	He Y, Lee R, Huai Y, Shao Z, Jain N, Zhang X, Xu Z. RCFile: A fast and space-efficient data placement structure in mapreduce-based warehouse systems. In: Proceedings of the IEEE 27th International Conference on Data Engineering. 2011, 1199-1208

[4]	Copeland G P, Khoshafian S N. A decomposition storage model. In: Proceedings of the 1985 ACM SIGMOD International Conference on Management of Data. 1985, 268-279

[5]	Abadi D J, Madden S, Hachem N. Column-stores vs. row-stores: how different are they really? In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. 2008, 967-980

[6]	Stonebraker M, Abadi D J, Batkin A, Chen X, Cherniack M, Ferreira M, Lau E, Lin A, Madden S, O’Neil E J, O’Neil P E, Rasin A, Tran N, Zdonik S B. C-store: A column-oriented dbms. In: Proceedings of the 31st International Conference on Very Large Data Bases. 2005, 553-564

[7]	Pavlo A, Paulson E, Rasin A, Abadi D J, DeWitt D J, Madden S, Stonebraker M. A comparison of approaches to large-scale data analysis. In: Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data. 2009, 165-178

[8]	Chen S. Cheetah: A high performance, custom data warehouse on top of mapreduce. Proceedings of the Very Large Data Bases Endowment, 2010, 3(2): 1459-1468

[9]	Lin Y, Agrawal D, Chen C, Ooi B C, Wu S. Llama: leveraging columnar storage for scalable join processing in the mapreduce framework. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data. 2011, 961-972

[10]	Jindal A, Quiané-Ruiz J A, Dittrich J. Trojan data layouts: right shoes for a running elephant. In: Proceedings of the 2nd ACM Symposium on Cloud Computing. 2011, 21

[11]	Olston C, Reed B, Srivastava U, Kumar R, Tomkins A. Pig Latin: a notso-foreign language for data processing. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. 2008, 1099-1110

[12]	Batory D S. On searching transposed files. ACM Transactions on Database Systems, 1979, 4(4): 531-544

[13]	Ramamurthy R, DeWitt D J, Su Q. A case for fractured mirrors. The International Journal on Very Large Data Bases, 2003, 12(2): 89-101

RIGHTS & PERMISSIONS

Higher Education Press and Springer-Verlag Berlin Heidelberg

PDF (626KB)

1629

Accesses

Citation

Detail

Sections

Recommended

About the journal

Aims & scope

Description

Editorial board

Abstracting / indexing

Contact us

Browse

Just accepted

All volumes and issues

Collections

Featured articles

Most accessed

Most cited

Collections

Multimedia collections

Authors & reviewers

Online submission

Call for papers

Guidelines for authors

Download templates

Guidelines for reviewers

Abstract

Keywords

Cite this article

References

RIGHTS & PERMISSIONS