HC-Store: putting MapReduce’s foot in two camps

Huiju WANG, Furong LI, Xuan ZHOU, Yu CAO, Xiongpai QIN, Jidong CHEN, Shan WANG

PDF(626 KB)
PDF(626 KB)
Front. Comput. Sci. ›› 2014, Vol. 8 ›› Issue (6) : 859-871. DOI: 10.1007/s11704-014-3376-3
RESEARCH ARTICLE

HC-Store: putting MapReduce’s foot in two camps

Author information +
History +

Abstract

MapReduce is a popular framework for largescale data analysis. As data access is critical forMapReduce’s performance, some recent work has applied different storage models, such as column-store or PAX-store, to MapReduce platforms. However, the data access patterns of different queries are very different. No storagemodel is able to achieve the optimal performance alone. In this paper, we study how MapReduce can benefit from the presence of two different column-store models — pure column-store and PAX-store. We propose a hybrid storage system called hybrid columnstore (HC-store). Based on the characteristics of the incoming MapReduce tasks, our storage model can determine whether to access the underlying pure column-store or PAX-store.We studied the properties of the different storage models and create a cost model to decide the data access strategy at runtime. We have implemented HC-store on top of Hadoop. Our experimental results show that HC-store is able to outperform PAX-store and column-store, especially when confronted with diverse workload.

Keywords

MapReduce / Hadoop / HC-store / cost model / column-store / PAX-store

Cite this article

Download citation ▾
Huiju WANG, Furong LI, Xuan ZHOU, Yu CAO, Xiongpai QIN, Jidong CHEN, Shan WANG. HC-Store: putting MapReduce’s foot in two camps. Front. Comput. Sci., 2014, 8(6): 859‒871 https://doi.org/10.1007/s11704-014-3376-3

References

[1]
Dean J, Ghemawat S. Mapreduce: simplified data processing on large clusters. In: Proceedings of the 6th Symposium on Operating Systems and Implementation. 2004, 137-150
[2]
Floratou A, Patel J M, Shekita E J, Tata S. Column-oriented storage techniques for mapreduce. In: Proceedings of the 37th International Conference on Very Large Data Bases. 2011, 4(7): 419-429
[3]
He Y, Lee R, Huai Y, Shao Z, Jain N, Zhang X, Xu Z. RCFile: A fast and space-efficient data placement structure in mapreduce-based warehouse systems. In: Proceedings of the IEEE 27th International Conference on Data Engineering. 2011, 1199-1208
[4]
Copeland G P, Khoshafian S N. A decomposition storage model. In: Proceedings of the 1985 ACM SIGMOD International Conference on Management of Data. 1985, 268-279
CrossRef Google scholar
[5]
Abadi D J, Madden S, Hachem N. Column-stores vs. row-stores: how different are they really? In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. 2008, 967-980
CrossRef Google scholar
[6]
Stonebraker M, Abadi D J, Batkin A, Chen X, Cherniack M, Ferreira M, Lau E, Lin A, Madden S, O’Neil E J, O’Neil P E, Rasin A, Tran N, Zdonik S B. C-store: A column-oriented dbms. In: Proceedings of the 31st International Conference on Very Large Data Bases. 2005, 553-564
[7]
Pavlo A, Paulson E, Rasin A, Abadi D J, DeWitt D J, Madden S, Stonebraker M. A comparison of approaches to large-scale data analysis. In: Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data. 2009, 165-178
[8]
Chen S. Cheetah: A high performance, custom data warehouse on top of mapreduce. Proceedings of the Very Large Data Bases Endowment, 2010, 3(2): 1459-1468
[9]
Lin Y, Agrawal D, Chen C, Ooi B C, Wu S. Llama: leveraging columnar storage for scalable join processing in the mapreduce framework. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data. 2011, 961-972
CrossRef Google scholar
[10]
Jindal A, Quiané-Ruiz J A, Dittrich J. Trojan data layouts: right shoes for a running elephant. In: Proceedings of the 2nd ACM Symposium on Cloud Computing. 2011, 21
CrossRef Google scholar
[11]
Olston C, Reed B, Srivastava U, Kumar R, Tomkins A. Pig Latin: a notso-foreign language for data processing. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. 2008, 1099-1110
CrossRef Google scholar
[12]
Batory D S. On searching transposed files. ACM Transactions on Database Systems, 1979, 4(4): 531-544
CrossRef Google scholar
[13]
Ramamurthy R, DeWitt D J, Su Q. A case for fractured mirrors. The International Journal on Very Large Data Bases, 2003, 12(2): 89-101
CrossRef Google scholar

RIGHTS & PERMISSIONS

2014 Higher Education Press and Springer-Verlag Berlin Heidelberg
AI Summary AI Mindmap
PDF(626 KB)

Accesses

Citations

Detail

Sections
Recommended

/