Efficient query processing framework for big data warehouse: an almost join-free approach

Huiju WANG; Xiongpai QIN; Xuan ZHOU; Furong LI; Zuoyan QIN; Qing ZHU; Shan WANG

doi:10.1007/s11704-014-4025-6

Front. Comput. Sci. ›› 2015, Vol. 9 ›› Issue (2) :224 -236. DOI: 10.1007/s11704-014-4025-6

RESEARCH ARTICLE

Efficient query processing framework for big data warehouse: an almost join-free approach

Huiju WANG ¹^,²^,³^,^*
, Xiongpai QIN ¹^,²
, Xuan ZHOU ¹
, Furong LI ¹^,²
, Zuoyan QIN ¹
, Qing ZHU ¹^,²
, Shan WANG ¹^,²

Author information +

History +

PDF (527KB)

Abstract

The rapidly increasing scale of data warehouses is challenging today’s data analytical technologies. A conventional data analytical platform processes data warehouse queries using a star schema — it normalizes the data into a fact table and a number of dimension tables, and during query processing it selectively joins the tables according to users’ demands. This model is space economical. However, it faces two problems when applied to big data. First, join is an expensive operation, which prohibits a parallel database or a MapReduce-based system from achieving efficiency and scalability simultaneously. Second, join operations have to be executed repeatedly, while numerous join results can actually be reused by different queries.

In this paper, we propose a new query processing framework for data warehouses. It pushes the join operations partially to the pre-processing phase and partially to the postprocessing phase, so that data warehouse queries can be transformed into massive parallelized filter-aggregation operations on the fact table. In contrast to the conventional query processing models, our approach is efficient, scalable and stable despite of the large number of tables involved in the join. It is especially suitable for a large-scale parallel data warehouse. Our empirical evaluation on Hadoop shows that our framework exhibits linear scalability and outperforms some existing approaches by an order of magnitude.

Keywords

data warehouse / large scale / TAMP / join-free / multi-version schema

Cite this article

Download citation ▾

Huiju WANG, Xiongpai QIN, Xuan ZHOU, Furong LI, Zuoyan QIN, Qing ZHU, Shan WANG. Efficient query processing framework for big data warehouse: an almost join-free approach. Front. Comput. Sci., 2015, 9 (2) : 224-236 DOI:10.1007/s11704-014-4025-6

登录浏览全文

4963

注册一个新账户忘记密码

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Chaudhuri S, Dayal U. An overview of data warehousing and olap technology. SIGMOD Record, 1997, 26(1): 65-74

[2]	Dean J, Ghemawat S. Mapreduce: Simplified data processing on large clusters. In: Proceedings of the 6th Symposium on Operating Systems Design and Implementation. 2004, 137-150

[3]	Apache hadoop.

[4]	Pavlo A, Paulson E, Rasin A, Abadi D J, DeWitt D J, Madden S, Stonebraker M. A comparison of approaches to large-scale data analysis. In: Proceedings of the 35th SIGMOD International Conference on Management of Data. 2009, 165-178

[5]	Afrati F N, Ullman J D. Optimizing joins in a map-reduce environment. In: Proceedings of the 2010 International Conference on Extending Databas Technology. 2010, 99-110

[6]	Dawei Jiang G CA. K. H. Map-join-reduce: Towards scalable and efficient data analysis on large clusters. IEEE Transactions on Knowledge and Data Engineering, 2011, 23(9): 1299-1311

[7]	Olston C, Reed B, Srivastava U, Kumar R, Tomkins A. Pig latin: a notso-foreign language for data processing. In: Proceedings of the 2008 SIGMOD International Conference on Management of Data. 2008, 1099-1110

[8]	Dittrich J, Quiané-Ruiz J A, Jindal A, Kargin Y, Setty V, Schad J. Hadoop++: Making a yellow elephant run like a cheetah (without it even noticing). Proceedings of the VLDB Endowment, 2010, 3(1): 518-529

[9]	Floratou A, Patel J M, Shekita E J, Tata S. Column-oriented storage techniques for mapreduce. Proceedings of the VLDB Endowent, 2011, 4(7): 419-429

[10]	Lin Y, Agrawal D, Chen C, Ooi B C, Wu S. LLAMA: leveraging columnar storage for scalable join processing in the mapreduce framework. In: Proceedings of the 2011 SIGMOD International Conference on Management of Data. 2011, 961-972

[11]	Xu Y, Kostamaa P, Gao L. Integrating hadoop and parallel DBMS. In: Proceedings of the 2010 SIGMOD Conference on Management of Data. 2010, 969-974

[12]	Abouzeid A, Bajda-Pawlikowski K, Abadi D J, Rasin A, Silberschatz A. Hadoopdb: An architectural hybrid of mapreduce and DBMS technologies for analytical workloads. Proceedings of the VLDB Endowment, 2009, 2(1): 922-933

[13]	Swami A, Gupta A. Optimization of large join queries. SIGMOD Record, 1988, 17(3): 8-17

[14]	Raman V, Swart G, Qiao L, Reiss F, Dialani V, Kossmann D, Narang I, Sidle R. Constant-time query processing. In: Proceedings of the 2008 International Conference of Data Engineering. 2008, 60-69

[15]	Valduriez P. Join indices. ACM Transactions on Database Systems, 1987, 12: 218-246

[16]	Markl V, Ramsak F, Bayer R. Improving OLAP performance by multidimensional hierarchical clustering. In: Proceedings of the 1999 International Symposium on Database Engineering and Applications. 1999, 165-177

[17]	Karayannidis N, Tsois A, Sellis T K, Pieringer R, Markl V, Ramsak F, Fenk R, Elhardt K, Bayer R. Processing star queries on hierarchically-clustered fact tables. In: Proceedings of the 28th VLDB Conference. 2002, 730-741

[18]	Bayer R. The universal b-tree for multidimensional indexing: general concepts. In: Proceedings of the 1997 International Conference on Worldwide Computing and Its Applications. 1997, 198-209

[19]	Theodoratos D, Tsois A. Heuristic optimization of olap queries in multidimensionally hierarchically clustered databases. In: Proceedings of ACM 4th International Workshop on Data Warehousing and OLAP. 2001, 48-55

[20]	Korth H F, Kuper G M, Feigenbaum J, Gelder A V, Ullman J D. System/u: A database system based on the universal relation assumption. ACM Transactions on Database Systems, 1984, 9(3): 331-347

[21]	Floratou A, Patel J M, Shekita E J, Tata S. Column-oriented storage techn<?Pub Caret?>iques for mapreduce. Proceedings of the VLDB Endowent, 2011, 4(7): 419-429

RIGHTS & PERMISSIONS

Higher Education Press and Springer-Verlag Berlin Heidelberg

PDF (527KB)

1474

Accesses

Citation

Detail

Sections

Recommended

About the journal

Aims & scope

Description

Editorial board

Abstracting / indexing

Contact us

Browse

Just accepted

All volumes and issues

Collections

Featured articles

Most accessed

Most cited

Collections

Multimedia collections

Authors & reviewers

Online submission

Call for papers

Guidelines for authors

Download templates

Guidelines for reviewers

Abstract

Keywords

Cite this article

References

RIGHTS & PERMISSIONS