Skyline-join query processing in distributed databases

Mei BAI, Junchang XIN, Guoren WANG, Roger ZIMMERMANN, Xite WANG

PDF(1075 KB)
PDF(1075 KB)
Front. Comput. Sci. ›› 2016, Vol. 10 ›› Issue (2) : 330-352. DOI: 10.1007/s11704-015-4534-y
RESEARCH ARTICLE

Skyline-join query processing in distributed databases

Author information +
History +

Abstract

The skyline-join operator, as an important variant of skylines, plays an important role in multi-criteria decision making problems. However, as the data scale increases, previous methods of skyline-join queries cannot be applied to new applications. Therefore, in this paper, it is the first attempt to propose a scalable method to process skyline-join queries in distributed databases. First, a tailored distributed framework is presented to facilitate the computation of skyline-join queries. Second, the distributed skyline-join query algorithm (DSJQ) is designed to process skyline-join queries. DSJQ contains two phases. In the first phase, two filtering strategies are used to filter out unpromising tuples from the original tables. The remaining tuples are transmitted to the corresponding data nodes according a partition function, which can guarantee that the tuples with the same join value are transferred to the same node. In the second phase, we design a scheduling plan based on rotations to calculate the final skyline-join result. The scheduling plan can ensure that calculations are equally assigned to all the data nodes, and the calculations on each data node can be processed in parallel without creating a bottleneck node. Finally, the effectiveness of DSJQ is evaluated through a series of experiments.

Keywords

skyline-join / distributed / filtering strategy / scheduling plan / rotation

Cite this article

Download citation ▾
Mei BAI, Junchang XIN, Guoren WANG, Roger ZIMMERMANN, Xite WANG. Skyline-join query processing in distributed databases. Front. Comput. Sci., 2016, 10(2): 330‒352 https://doi.org/10.1007/s11704-015-4534-y

References

[1]
Borzsony S, Kossmann D, and Stocker K. The skyline operator. In: Proceedings of the 17th IEEE International Conference on Data Engineering. 2001, 421–430
CrossRef Google scholar
[2]
Balke W-T, Güntzer U, Zheng J X. Efficient distributed skylining for web information systems. In: Proceedings of the 9th International Conference on Extending Database Technology. 2004, 256–273
CrossRef Google scholar
[3]
Afrati F-N, Koutris P, Suciu D, Ullman J-D. Parallel skyline queries. In: Proceedings of the 15th International Conference on Database Theory. 2012, 274–284
CrossRef Google scholar
[4]
Chen L, Lian X. Dynamic skyline queries in metric spaces. In: Proceedings of the 11th International Conference on Extending Database Technology. 2008, 333–343
CrossRef Google scholar
[5]
Sun D L, Wu S, Li J Z, Tung A K H. Skyline-join in distributed databases. In: Proceedings of the 24th IEEE International Conference on Data Engineering. 2008, 176–181
CrossRef Google scholar
[6]
Nagendra M, Candan K S. Skyline-sensitive joins with LR-pruning. In: Proceedings of the 15th ACM International Conference on Extending Database Technology. 2012, 252–263
CrossRef Google scholar
[7]
Jin W, Ester M, Hu Z, Han J. The multi-relational skyline operator. In: Proceedings of the 23rd IEEE International Conference on Data Engineering. 2007, 1276–1280
CrossRef Google scholar
[8]
Vlachou A, Doulkeridis C, Polyzotis N. Skyline query processing over joins. In: Proceedings of the 2011 ACM SIGMOD International conference on Management Of Data. 2011, 73–84
CrossRef Google scholar
[9]
Jin W, Morse MD, Patel JM, Ester M, Hu Z. Evaluating skylines in the presence of equijoins. In: Proceedings of the 26th IEEE International Conference on Data Engineering. 2010, 249–260
CrossRef Google scholar
[10]
Kung H-T, Luccio F, Preparata F P. On finding the maxima of a set of vectors. Journal of the ACM, 1975, 22(4): 469–476
CrossRef Google scholar
[11]
Chomicki J, Godfrey P, Gryz J, Liang D. Skyline with presorting. In: Proceedings of the 19th International Conference on Data Engineering. 2003, 717–719
CrossRef Google scholar
[12]
Tan K-L, Eng P-K, Ooi B C. Efficient progressive skyline computation. In: Proceedings of the 27th International Conference on Very Large Data Bases. 2001, 301–310
[13]
Kossmann D, Ramsak F, Rost S. Shooting stars in the sky: An online algorithm for skyline queries. In: Proceedings of the 28th International Conference on Very Large Data Bases. 2002, 275–286
CrossRef Google scholar
[14]
Papadias D, Tao Y, Fu G, Seeger B. An optimal and progressive algorithm for skyline queries. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management Of Data. 2003, 467–478
CrossRef Google scholar
[15]
Lee K C, Zheng B, Li H, Lee WC. Approaching the skyline in Z order. In: Proceedings of the 33rd International Conference on Very Large Data Bases. 2007, 279–290
[16]
Dean J, Ghemawat S. Mapreduce: Simplified data processing on large clusters. In: Proceedings of the 6th USENIX Symposium on Operating Systems Design and Implementation. 2004, 137–150
[17]
Wang H J, Qin X P, Zhou X, Li F R, Qin Z Y, Zhu Q,Wang S. Efficient query processing framework for big data warehouse: an almost joinfree approach. Frontiers of Computer Science, 2015, 9(2): 224–236
CrossRef Google scholar
[18]
Vlachou A, Doulkeridis C, Kotidis Y, Vazirgiannis M. Skypeer: Efficient subspace skyline computation over distributed data. In: Proceedings of the 23rd IEEE International Conference on Data Engineering. 2007, 416–425
CrossRef Google scholar
[19]
Chen L, Cui B, Lu H, Xu L H, Xu Q Q. iSky: Efficient and progressive skyline computing in a structured P2P network. In: Proceedings of the 28th IEEE International Conference on Distributed Computing Systems. 2008, 160–167
[20]
Cui B, Lu H, Xu Q Q, Chen L J, Dai Y F, Zhou Y L. Parallel distributed processing of constrained skyline queries by filtering. In: Proceedings of the 24th IEEE International Conference on Data Engineering. 2008, 546–555
CrossRef Google scholar
[21]
Afrati F N, Koutris P, Suciu D, Ullman J D. Parallel skyline queries. In: Proceedings of the 15 th ACM International Conference on Digital Telecommunications. 2012, 274–284
CrossRef Google scholar
[22]
Köhler H, Yang J, Zhou X F. Efficient parallel skyline processing using hyperplane projections. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management Of Data. 2011, 85–96
CrossRef Google scholar
[23]
Park Y, Min J-K, Shim K. Parallel computation of skyline and reverse skyline queries using mapreduce. In: Proceedings of International Conference on Very Large Data Bases. 2013, 2002–2013
CrossRef Google scholar
[24]
Dean J, Ghemawat S. MapReduce: simplified data processing on large clusters. In: Proceedings of the 6th Symposium on Operating Systems Design and Implementation. 2004, 137–150
[25]
Bartolini I, Ciaccia P, Patella M. SaLSa: computing the skyline without scanning the whole sky. In: Proceedings of the 15th ACM International Conference on Information and Knowledge Management. 2006, 405–414
CrossRef Google scholar
[26]
Lin X M, Zhang Y, Zhang W J, Cheema M A. Stochastic skyline operator. In: Proceedings of the 27th IEEE International Conference on Data Engineering. 2011, 721–732
CrossRef Google scholar
[27]
Godfrey P, Shipley R, Gryz J. Maximal vector computation in large data sets. In: Proceedings of the 31st International Conference on Very Large Data Bases. 2005, 229–240
[28]
Khalefa M E, Mokbel M F, Levandoski J J. Skyline query processing for uncertain data. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management. 2010, 1293–1296
CrossRef Google scholar
[29]
Lian X, Chen L. Efficient processing of probabilistic group subspace skyline queries in uncertain databases. Information Systems, 2013, 38(3): 265–285
CrossRef Google scholar
[30]
Bloom B H. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 1970, 13(7): 422–426
CrossRef Google scholar

RIGHTS & PERMISSIONS

2014 Higher Education Press and Springer-Verlag Berlin Heidelberg
AI Summary AI Mindmap
PDF(1075 KB)

Accesses

Citations

Detail

Sections
Recommended

/