Guaranteeing the response deadline for general aggregation trees

Jiangfan LI, Chendie YAO, Junxu XIA, Deke GUO

PDF(525 KB)
PDF(525 KB)
Front. Comput. Sci. ›› 2020, Vol. 14 ›› Issue (6) : 146504. DOI: 10.1007/s11704-019-8437-1
RESEARCH ARTICLE

Guaranteeing the response deadline for general aggregation trees

Author information +
History +

Abstract

It is essential to provide responses to queries within time deadlines, even if not exact and complete. To reduce the query latency, systems usually partition large-scale data computations as a series of tasks over many processes and aggregate them to reduce the response time by using aggregation trees. An obstacle is that the involved processes of a query usually differ in their speeds, thus not all processes can complete their tasks in time. This would directly degrade the response quality (the number of outputs received by the root of an aggregation tree). In this paper, we propose a general aggregation tree model, Tarot, to maximize the response quality by systematically addressing the following challenging issues: (1) fine-grained partition of the query deadline along the multi-level aggregation tree; (2) learning the distribution of durations at each level in the aggregation tree to optimize the wait durations at aggregators; (3) adaptively reassigning tasks over processes according to their status; (4) performing periodic aggregation of received outputs from the low level to avoid missing the deadline. The prior model does not consider the four aspects simultaneously. Extensive evaluations indicate that Tarot can adapt to multi-level trees and considerably improve the response quality compared to prior work while guaranteeing the query deadline.

Keywords

aggregation query / performance variations / tasks reassignment

Cite this article

Download citation ▾
Jiangfan LI, Chendie YAO, Junxu XIA, Deke GUO. Guaranteeing the response deadline for general aggregation trees. Front. Comput. Sci., 2020, 14(6): 146504 https://doi.org/10.1007/s11704-019-8437-1

References

[1]
Guo D, Xie J, Zhou X, Zhu X, Wei W, Luo X. Exploiting efficient and scalable shuffle transfers in future data center networks. IEEE Transactions on Parallel and Distributed Systems, 2015, 26(4): 997–1009
CrossRef Google scholar
[2]
Yuan Y, Wang G, Chen L,Wang H. Efficient keyword search on uncertain graph data. IEEE Transactions on Knowledge and Data Engineering, 2013, 25(12): 2767–2779
CrossRef Google scholar
[3]
Yuan Y, Wang G, Chen L, Wang H. Graph similarity search on large uncertain graph databases. The International Journal on Very Large Data Bases, 2015, 24(2): 271–296
CrossRef Google scholar
[4]
Agarwal S, Iyer A P, Panda A, Madden S, Mozafari B, Stoica I. Blink and it’s done: interactive queries on very large data. Proceedings of the VLDB Endowment, 2012, 5(12): 1902–1905
CrossRef Google scholar
[5]
Abe T, Ueda T, Abe K, Ishibashi H, Matsuura T. Aggregation skip graph: a skip graph extension for efficient aggregation query over P2P networks. International Journal on Advances in Internet Technology, 2012, 4(3–4): 103–110
[6]
Ananthanarayanan G, Hung M C, Ren X, Stoica I, Wierman A, Yu M. GRASS: trimming stragglers in approximation analytics. In: Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation. 2014, 289–302
[7]
Ding Z, Guo D, Liu X, Luo X, Chen G. A mapreduce-supported network structure for data centers. Concurrency and Computation: Practice and Experience, 2012, 24(12): 1271–1295
CrossRef Google scholar
[8]
Naimi A I, Daniel W. Big data: a revolution that will transform how we live, work, and think. American Journal of Epidemiology. 2014, 179(9): 1143–1144
CrossRef Google scholar
[9]
Yuan Y, Wang G, Yu X J, Chen L. Efficient distributed subgraph similarity matching. The International Journal on Very Large Data Bases, 2015, 24: 369–394
CrossRef Google scholar
[10]
Kumar G, Ananthanarayanan G, Ratnasamy S, Stoica I. Hold ’em or fold ’em?: aggregation queries under performance variations. In: Proceedings of the 11th European Conference on Computer Systems. 2016
CrossRef Google scholar
[11]
Dean J, Barroso L A. The tail at scale. Communications of the ACM, 2013, 56(2): 74–80
CrossRef Google scholar
[12]
Guo D, Li M. Set reconciliation via counting bloom filters. IEEE Transactions on Knowledge and Data Engineering, 2013, 25(10): 2367–2380
CrossRef Google scholar
[13]
David H A. Order Statistics; 3rd ed. USA: Wiley, 2003
CrossRef Google scholar
[14]
Guo D, Wu J, Liu Y, Jin H, Chen H, Chen T. Quasi-kautz digraphs for peer-to-peer networks. IEEE Transactions on Parallel and Distributed Systems, 2010, 22(6): 1042–1055
CrossRef Google scholar
[15]
Luo L, Guo D, Ma R T B, Rottenstreich O, Luo X. Optimizing bloom filter: challenges, solutions, and comparisons. IEEE Communications Surveys and Tutorials, 2019, 21(2): 1912–1949
CrossRef Google scholar
[16]
Dean J, Ghemawat S. Mapreduce: simplified data processing on large clusters. In: Proceedings of the 6th Symposium on Operating Systems Design and Implementation. 2004
[17]
Zaharia M, Konwinski A, Joseph A D, Katz R, Stoica I. Improving mapreduce performance in heterogeneous environments. In: Proceedings of USENIX Conference on Operating Systems Design and Implementation. 2008, 29–42
[18]
Shvachko K, Kuang H, Radia S, Chansler R. The hadoop distributed file system. In: Proceedings of IEEE Symposium onMass Storage Systems and Technologies. 2010, 1–10
CrossRef Google scholar
[19]
Asanovic K, Bodík R, Demmel J, Keaveny T, Keutzer K, Kubiatowicz J, Morgan N, Patterson D, Sen K, Wawrzynek J, Wessel D, Yelick K A. A view of the parallel computing landscape. Communications of the ACM, 2009, 52(10): 56–67
CrossRef Google scholar
[20]
Ding Z, Guo D, Xue L, Luo X, Chen G. A mapreduce-supported network structure for data centers. Concurrency and Computation Practice and Experience, 2012, 24(12): 1271–1295
CrossRef Google scholar
[21]
Yuan Y, Lian X, Chen L, Sun Y, Wang G. RSkNN: kNN search on road networks by incorporating social influence. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(6): 1575–1588
CrossRef Google scholar
[22]
Liao S, Chen L, Li J, Xiong W, Wu Q. A spatiotemporal aggregation query method using multi-thread parallel technique based on regional division. ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences, 2015, 2(4): 1
CrossRef Google scholar
[23]
Tao Y, Kollios G, Considine J, Li F, Papadias D. Spatio-temporal aggregation using sketches. In: Proceedings of International Conference on Data Engineering. 2004, 214–225
[24]
Zhang Z, Hui J, Xie X, Pan H, Feng X. An online approximate aggregation query processing method based on hadoop. In: Proceedings of International Conference on Computer Supported Cooperative Work in Design. 2016, 117–122
CrossRef Google scholar
[25]
Yuan Y, Lian X, Chen L, Yu J, Wang G, Sun Y. Keyword search over distributed graphs. IEEE Transactions on Knowledge and Data Engineering, 2017, 29(6): 1212–1225
CrossRef Google scholar
[26]
Zhang D, Chan C Y, Tan K L. Processing spatial keyword query as a top-k aggregation query. In: Proceedings of International ACM SIGIR Conference on Research and Development in Information Retrieval. 2014, 355–364
CrossRef Google scholar
[27]
Rogge-Solti A, Weske M. Prediction of remaining service execution time using stochastic petri nets with arbitrary firing delays. In: Proceedings of International Conference on Service-Oriented Computing. 2013, 389–403
CrossRef Google scholar
[28]
Alinia B, Hajiesmaili M H, Khonsari A, Crespi N. Maximum-quality tree construction for deadline-constrained aggregation in WSNs. IEEE Sensors Journal, 2017, 17(12): 3930–3943
CrossRef Google scholar
[29]
Xu Y, Musgrave Z, Noble B, Bailey M. Bobtail: avoiding long tails in the cloud. In: Proceedings of USENIX Conference on Networked Systems Design and Implementation. 2013, 329–342
[30]
Alizadeh M, Greenberg A G, Maltz D A, Padhye J, Patel P, Prabhakar B, Sengupta S, Sridharan M. Data center TCP (DCTCP). In: Proceedings of the ACM Special Interest Group on Data Communication. 2010, 63–74
CrossRef Google scholar
[31]
Ananthanarayanan G, Ghodsi A, Warfield A, Borthakur D, Kandula S, Shenker S, Stoica I. Pacman: coordinated memory caching for parallel jobs. In: Proceedings of the 9th USENIX Symposium on Networked Systems Design and Implementation. 2012, 267–280
[32]
Isard M, Prabhakaran V, Currey J, Wieder U, Talwar K, Goldberg A. Quincy: fair scheduling for distributed computing clusters. In: Proceeds of IEEE International Conference on Recent Trends in Information Systems. 2009, 261–276
CrossRef Google scholar
[33]
Kavulya S, Tan J, Gandhi R, Narasimhan P. An analysis of traces from a production mapreduce cluster. In: Proceedings of IEEE/ACM International Conference on Cluster, Cloud and Grid Computing. 2010, 94–103
CrossRef Google scholar
[34]
Wilson C, Ballani H, Karagiannis T, Rowstron A I T. Better never than late: meeting deadlines in datacenter networks. In: Proceedings of the ACM Special Interest Group on Data Communication. 2011, 50–61
CrossRef Google scholar
[35]
Xiao W, Bao W, Zhu X, Liu L. Cost-aware big data processing across geo-distributed datacenters. IEEE Transactions on Parallel and Distributed Systems, 2017, 28(11): 3114–3127
CrossRef Google scholar
[36]
Tang G, Wu K, Brunner R. Rethinking cdn design with distributed time-varying traffic demands. In: Proceedings of International Conference on Computer Communications. 2017, 1–9
CrossRef Google scholar
[37]
Tang G, Wang H, Wu K, Guo D. Tapping the knowledge of dynamic traffic demands for optimal CDN design. IEEE/ACM Transactions on Networking, 2019, 27(1): 98–111
CrossRef Google scholar
[38]
Melnik S, Gubarev A, Long J J, Romer G, Shivakumar S, Tolton M, Vassilakis T. Dremel: interactive analysis of web-scale datasets. Proceedings of the VLDB Endowment, 2010, 3(1–2): 330–339
CrossRef Google scholar

RIGHTS & PERMISSIONS

2019 Higher Education Press and Springer-Verlag GmbH Germany, part of Springer Nature
AI Summary AI Mindmap
PDF(525 KB)

Accesses

Citations

Detail

Sections
Recommended

/