Guaranteeing the response deadline for general aggregation trees

Jiangfan LI; Chendie YAO; Junxu XIA; Deke GUO

doi:10.1007/s11704-019-8437-1

Front. Comput. Sci. ›› 2020, Vol. 14 ›› Issue (6) :146504 DOI: 10.1007/s11704-019-8437-1

RESEARCH ARTICLE

Guaranteeing the response deadline for general aggregation trees

Author information +

History +

PDF (525KB)

Abstract

It is essential to provide responses to queries within time deadlines, even if not exact and complete. To reduce the query latency, systems usually partition large-scale data computations as a series of tasks over many processes and aggregate them to reduce the response time by using aggregation trees. An obstacle is that the involved processes of a query usually differ in their speeds, thus not all processes can complete their tasks in time. This would directly degrade the response quality (the number of outputs received by the root of an aggregation tree). In this paper, we propose a general aggregation tree model, Tarot, to maximize the response quality by systematically addressing the following challenging issues: (1) fine-grained partition of the query deadline along the multi-level aggregation tree; (2) learning the distribution of durations at each level in the aggregation tree to optimize the wait durations at aggregators; (3) adaptively reassigning tasks over processes according to their status; (4) performing periodic aggregation of received outputs from the low level to avoid missing the deadline. The prior model does not consider the four aspects simultaneously. Extensive evaluations indicate that Tarot can adapt to multi-level trees and considerably improve the response quality compared to prior work while guaranteeing the query deadline.

Keywords

aggregation query / performance variations / tasks reassignment

Cite this article

Download citation ▾

Jiangfan LI, Chendie YAO, Junxu XIA, Deke GUO. Guaranteeing the response deadline for general aggregation trees. Front. Comput. Sci., 2020, 14(6): 146504 DOI:10.1007/s11704-019-8437-1

登录浏览全文

4963

注册一个新账户忘记密码

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Guo D, Xie J, Zhou X, Zhu X, Wei W, Luo X. Exploiting efficient and scalable shuffle transfers in future data center networks. IEEE Transactions on Parallel and Distributed Systems, 2015, 26(4): 997–1009

[2]	Yuan Y, Wang G, Chen L,Wang H. Efficient keyword search on uncertain graph data. IEEE Transactions on Knowledge and Data Engineering, 2013, 25(12): 2767–2779

[3]	Yuan Y, Wang G, Chen L, Wang H. Graph similarity search on large uncertain graph databases. The International Journal on Very Large Data Bases, 2015, 24(2): 271–296

[4]	Agarwal S, Iyer A P, Panda A, Madden S, Mozafari B, Stoica I. Blink and it’s done: interactive queries on very large data. Proceedings of the VLDB Endowment, 2012, 5(12): 1902–1905

[5]	Abe T, Ueda T, Abe K, Ishibashi H, Matsuura T. Aggregation skip graph: a skip graph extension for efficient aggregation query over P2P networks. International Journal on Advances in Internet Technology, 2012, 4(3–4): 103–110

[6]	Ananthanarayanan G, Hung M C, Ren X, Stoica I, Wierman A, Yu M. GRASS: trimming stragglers in approximation analytics. In: Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation. 2014, 289–302

[7]	Ding Z, Guo D, Liu X, Luo X, Chen G. A mapreduce-supported network structure for data centers. Concurrency and Computation: Practice and Experience, 2012, 24(12): 1271–1295

[8]	Naimi A I, Daniel W. Big data: a revolution that will transform how we live, work, and think. American Journal of Epidemiology. 2014, 179(9): 1143–1144

[9]	Yuan Y, Wang G, Yu X J, Chen L. Efficient distributed subgraph similarity matching. The International Journal on Very Large Data Bases, 2015, 24: 369–394

[10]	Kumar G, Ananthanarayanan G, Ratnasamy S, Stoica I. Hold ’em or fold ’em?: aggregation queries under performance variations. In: Proceedings of the 11th European Conference on Computer Systems. 2016

[11]	Dean J, Barroso L A. The tail at scale. Communications of the ACM, 2013, 56(2): 74–80

[12]	Guo D, Li M. Set reconciliation via counting bloom filters. IEEE Transactions on Knowledge and Data Engineering, 2013, 25(10): 2367–2380

[13]	David H A. Order Statistics; 3rd ed. USA: Wiley, 2003

[14]	Guo D, Wu J, Liu Y, Jin H, Chen H, Chen T. Quasi-kautz digraphs for peer-to-peer networks. IEEE Transactions on Parallel and Distributed Systems, 2010, 22(6): 1042–1055

[15]	Luo L, Guo D, Ma R T B, Rottenstreich O, Luo X. Optimizing bloom filter: challenges, solutions, and comparisons. IEEE Communications Surveys and Tutorials, 2019, 21(2): 1912–1949

[16]	Dean J, Ghemawat S. Mapreduce: simplified data processing on large clusters. In: Proceedings of the 6th Symposium on Operating Systems Design and Implementation. 2004

[17]	Zaharia M, Konwinski A, Joseph A D, Katz R, Stoica I. Improving mapreduce performance in heterogeneous environments. In: Proceedings of USENIX Conference on Operating Systems Design and Implementation. 2008, 29–42

[18]	Shvachko K, Kuang H, Radia S, Chansler R. The hadoop distributed file system. In: Proceedings of IEEE Symposium onMass Storage Systems and Technologies. 2010, 1–10

[19]	Asanovic K, Bodík R, Demmel J, Keaveny T, Keutzer K, Kubiatowicz J, Morgan N, Patterson D, Sen K, Wawrzynek J, Wessel D, Yelick K A. A view of the parallel computing landscape. Communications of the ACM, 2009, 52(10): 56–67

[20]	Ding Z, Guo D, Xue L, Luo X, Chen G. A mapreduce-supported network structure for data centers. Concurrency and Computation Practice and Experience, 2012, 24(12): 1271–1295

[21]	Yuan Y, Lian X, Chen L, Sun Y, Wang G. RSkNN: kNN search on road networks by incorporating social influence. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(6): 1575–1588

[22]	Liao S, Chen L, Li J, Xiong W, Wu Q. A spatiotemporal aggregation query method using multi-thread parallel technique based on regional division. ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences, 2015, 2(4): 1

[23]	Tao Y, Kollios G, Considine J, Li F, Papadias D. Spatio-temporal aggregation using sketches. In: Proceedings of International Conference on Data Engineering. 2004, 214–225

[24]	Zhang Z, Hui J, Xie X, Pan H, Feng X. An online approximate aggregation query processing method based on hadoop. In: Proceedings of International Conference on Computer Supported Cooperative Work in Design. 2016, 117–122

[25]	Yuan Y, Lian X, Chen L, Yu J, Wang G, Sun Y. Keyword search over distributed graphs. IEEE Transactions on Knowledge and Data Engineering, 2017, 29(6): 1212–1225

[26]	Zhang D, Chan C Y, Tan K L. Processing spatial keyword query as a top-k aggregation query. In: Proceedings of International ACM SIGIR Conference on Research and Development in Information Retrieval. 2014, 355–364

[27]	Rogge-Solti A, Weske M. Prediction of remaining service execution time using stochastic petri nets with arbitrary firing delays. In: Proceedings of International Conference on Service-Oriented Computing. 2013, 389–403

[28]	Alinia B, Hajiesmaili M H, Khonsari A, Crespi N. Maximum-quality tree construction for deadline-constrained aggregation in WSNs. IEEE Sensors Journal, 2017, 17(12): 3930–3943

[29]	Xu Y, Musgrave Z, Noble B, Bailey M. Bobtail: avoiding long tails in the cloud. In: Proceedings of USENIX Conference on Networked Systems Design and Implementation. 2013, 329–342

[30]	Alizadeh M, Greenberg A G, Maltz D A, Padhye J, Patel P, Prabhakar B, Sengupta S, Sridharan M. Data center TCP (DCTCP). In: Proceedings of the ACM Special Interest Group on Data Communication. 2010, 63–74

[31]	Ananthanarayanan G, Ghodsi A, Warfield A, Borthakur D, Kandula S, Shenker S, Stoica I. Pacman: coordinated memory caching for parallel jobs. In: Proceedings of the 9th USENIX Symposium on Networked Systems Design and Implementation. 2012, 267–280

[32]	Isard M, Prabhakaran V, Currey J, Wieder U, Talwar K, Goldberg A. Quincy: fair scheduling for distributed computing clusters. In: Proceeds of IEEE International Conference on Recent Trends in Information Systems. 2009, 261–276

[33]	Kavulya S, Tan J, Gandhi R, Narasimhan P. An analysis of traces from a production mapreduce cluster. In: Proceedings of IEEE/ACM International Conference on Cluster, Cloud and Grid Computing. 2010, 94–103

[34]	Wilson C, Ballani H, Karagiannis T, Rowstron A I T. Better never than late: meeting deadlines in datacenter networks. In: Proceedings of the ACM Special Interest Group on Data Communication. 2011, 50–61

[35]	Xiao W, Bao W, Zhu X, Liu L. Cost-aware big data processing across geo-distributed datacenters. IEEE Transactions on Parallel and Distributed Systems, 2017, 28(11): 3114–3127

[36]	Tang G, Wu K, Brunner R. Rethinking cdn design with distributed time-varying traffic demands. In: Proceedings of International Conference on Computer Communications. 2017, 1–9

[37]	Tang G, Wang H, Wu K, Guo D. Tapping the knowledge of dynamic traffic demands for optimal CDN design. IEEE/ACM Transactions on Networking, 2019, 27(1): 98–111

[38]	Melnik S, Gubarev A, Long J J, Romer G, Shivakumar S, Tolton M, Vassilakis T. Dremel: interactive analysis of web-scale datasets. Proceedings of the VLDB Endowment, 2010, 3(1–2): 330–339