An adaptive strategy for statistics collecting in distributed database

Jintao GAO, Wenjie LIU, Zhanhuai LI

PDF(640 KB)
PDF(640 KB)
Front. Comput. Sci. ›› 2020, Vol. 14 ›› Issue (5) : 145610. DOI: 10.1007/s11704-019-9107-z
RESEARCH ARTICLE

An adaptive strategy for statistics collecting in distributed database

Author information +
History +

Abstract

Collecting statistics is a time- and resourceconsuming operation in database systems. It is even more challenging to efficiently collect statistics without affecting system performance, meanwhile keeping correctness in distributed database. Traditional strategies usually consider one dimension during collecting statistics, which is lack of adaptiveness. In this paper, we propose an adaptive strategy for statistics collecting(ASC), which well balances collecting efficiency, correctness of statistics and effect to system performance. We formally define the procedure of collecting statistics and abstract the relationships among collecting efficiency, correctness of statistics and effect to system performance, and introduce an elastic structure(ESI) storing necessary information generated during proceeding our strategy. ASC can pick appropriate time to trigger collecting action and filter unnecessary tasks, meanwhile reasonably allocating collecting tasks to appropriate executing locations with right executing models through the information stored at ESI. We implement and evaluate our strategy in a distributed database. Experiments show that our solutions generally improve the efficiency and correctness of collecting statistics, moreover, reduce the negative effect to system performance comparing with other strategies.

Keywords

statistics collecting / distributed database / adaptive strategy / query optimization

Cite this article

Download citation ▾
Jintao GAO, Wenjie LIU, Zhanhuai LI. An adaptive strategy for statistics collecting in distributed database. Front. Comput. Sci., 2020, 14(5): 145610 https://doi.org/10.1007/s11704-019-9107-z

References

[1]
Hazar H, Felix N. Cardinality estimation: an experimental survey. Proceedings of the VLDB Endowment, 2017, 11(12): 499–512
CrossRef Google scholar
[2]
Woodruff D P, Zhang Q. Distributed statistical estimation of matrix products with applications. In: Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems. 2018, 383–394
CrossRef Google scholar
[3]
Grohe M, Schweikardt N. First-order query evaluation with cardinality conditions. In: Proceedings of the 37th ACM SIGMOD-SIGACTSIGAI Symposium on Principles of Database Sytems. 2018, 253–266
CrossRef Google scholar
[4]
Magnus M, Moerkotte G, Kolb O. Improved selectivity estimation by combining knowledge from sampling and synopses. Proceedings of the VLDB Endowment, 2018, 11(9): 1016–1028
CrossRef Google scholar
[5]
Srinath S, Rimma N, Josep A S, Andrew C, Mostafa E, Alan H, Eric R, Mahadevan S S, David D, César G L. Query optimization in microsoft SQL server PDW. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. 2012, 767–776
[6]
Chen J, Jindel S, Walzer R, Sen R, Jimsheleishvilli N, Andrews M. The Mem SQL query optimizer. Proceedings of the VLDB Endowment, 2016, 9(13): 1401–1412
CrossRef Google scholar
[7]
Soliman M A, Antova L, Raghavan V, El-Helw A, Gu Z, Shen E, Caragea G C, Garcia-Alvarado C, Rahman F, Petropoulos M, Waas F, Narayanan S, Krikellas K, Baldwin R. Orca: a modular query optimizer architecture for big data. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. 2014, 337–348
[8]
Chakkappen S, Budalakoti S, Krishnamachari R, Valluri S, Wood A, Zait M. Adaptive statistics in Oracle 12c. Proceedings of the VLDB Endowment, 2017, 10(12): 1813–1824
CrossRef Google scholar
[9]
Macke S, Zhang Y, Huang S, Parameswaran A. Adaptive sampling for rapidly matching histograms. Proceedings of the VLDB Endowment, 2018, 11(10): 1262–1275
CrossRef Google scholar
[10]
Chakkappen S, Cruanes T, Dageville B, Linan J, Uri H, Hong S, Mohamed Z. Efficient and scalable statistics gathering for large databases in Oracle 11g. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. 2008, 1053–1064
CrossRef Google scholar
[11]
Graefe G. The cascades framework for query optimization. Data Engineering Bulletin, 1995, 18(5): 19–29
[12]
Boncz P, Neumann T, Erling O. TPC-H analyzed: hidden messages and lessons learned from an influential benchmark. In: Proceedings of Technology Conference on Performance Evaluation & Benchmarking. 2014, 61–76
CrossRef Google scholar
[13]
Yang Z. The architecture of OceanBase relational database system. Journal of East China Normal University (Natural Sciences), 2014, 5: 141–148
[14]
Beyer K S, Haas P J, Reinwald B, Sismanis Y, Gemulla R. On synopses for distinct-value estimation under multiset operations. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2007, 199–210
CrossRef Google scholar
[15]
Gemulla R, Lehner W, Haas P J. A dip in the reservoir: maintaining sample synopses of evolving datasets. In: Proceedings of the 32nd International Conference on Very Large Data Bases. 2006, 595–606
[16]
Teimouri M, Rezakhah S, Mohammadpour A. Statistic for multivariate stable distributions. Journal of Probability and Statistics, 2017, 2017: 1–12
CrossRef Google scholar
[17]
Das D, Yan J, Zait M, Vallur S R, Vyas N, Krishnamachari R, Gaharwar P, Kamp J, Mukherjee N. Query optimization in Oracle 12c database in-memory. Proceedings of the VLDB Endowment, 2015, 8(12): 1770–1781
CrossRef Google scholar
[18]
Tian F, DeWitt D J. Tuple routing strategies for distributed eddies. In: Proceedings of the 29th International Conference on Very Large Data Bases. 2003, 333–344
CrossRef Google scholar
[19]
Zhou Y, Ooi B C, Tan K L. Dynamic load management for distributed continuous query systems. In: Proceedings of the 21st International Conference on Data Engineering. 2005, 322–323
[20]
Elseidy M, Elguindy A, Vitorovic A, Koch C. Scalarble and adaptive online joins. Proceedings of the VLDB Endowment, 2014, 7(6): 441–452
CrossRef Google scholar
[21]
Elhelw A, Ilyas I F, Lau W, Markl V, Zuzarte C. Collecting and maintaining just-in-time statistics. In: Proceedings of the 23rd IEEE International Conference on Data Engineering. 2007, 516–525
CrossRef Google scholar

RIGHTS & PERMISSIONS

2020 Higher Education Press and Springer-Verlag GmbH Germany, part of Springer Nature
AI Summary AI Mindmap
PDF(640 KB)

Accesses

Citations

Detail

Sections
Recommended

/