Learning database optimization techniques: the state-of-the-art and prospects

Shao-Jie QIAO , Han-Lin FAN , Nan HAN , Lan DU , Yu-Han PENG , Rong-Min TANG , Xiao QIN

Front. Comput. Sci. ›› 2025, Vol. 19 ›› Issue (12) : 1912612

PDF (4423KB)
Front. Comput. Sci. ›› 2025, Vol. 19 ›› Issue (12) : 1912612 DOI: 10.1007/s11704-025-41116-7
Information Systems
REVIEW ARTICLE

Learning database optimization techniques: the state-of-the-art and prospects

Author information +
History +
PDF (4423KB)

Abstract

Artificial intelligence-enabled database technology, known as AI4DB (Artificial Intelligence for Databases), is an active research area attracting significant attention and innovation. This survey first introduces the background of learning-based database techniques. It then reviews advanced query optimization methods for learning databases, focusing on four popular directions: cardinality/cost estimation, learning-based join order selection, learning-based end-to-end optimizers, and text-to-SQL models. Cardinality/cost estimation is classified into supervised and unsupervised methods based on learning models, with illustrative examples provided to explain the working mechanisms. Detailed descriptions of various query optimizers are also given to elucidate the working mechanisms of each component in learning query optimizers. Additionally, we discuss the challenges and development opportunities of learning query optimizers. The survey further explores text-to-SQL models, a new research area within AI4DB. Finally, we consider the future development prospects of learning databases.

Graphical abstract

Keywords

AI4DB / cardinality/cost estimation / join order selection / end-to-end optimizer / Text-to-SQL

Cite this article

Download citation ▾
Shao-Jie QIAO, Han-Lin FAN, Nan HAN, Lan DU, Yu-Han PENG, Rong-Min TANG, Xiao QIN. Learning database optimization techniques: the state-of-the-art and prospects. Front. Comput. Sci., 2025, 19(12): 1912612 DOI:10.1007/s11704-025-41116-7

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Zhou X, Chai C, Li G, Sun J . Database meets artificial intelligence: a survey. IEEE Transactions on Knowledge and Data Engineering, 2022, 34( 3): 1096–1116

[2]

Zhou X, Chai C, Li G, Sun J. Database meets artificial intelligence: a survey (extended abstract). In: Proceedings of the 39th IEEE International Conference on Data Engineering. 2023, 3901–3902

[3]

Li X, Zhu Y, Xu R, Wang J, Zhang Y . Indexing dynamic encrypted database in cloud for efficient secure k-nearest neighbor query. Frontiers of Computer Science, 2024, 18( 1): 181803

[4]

Li G, Zhou X, Cao L . Machine learning for databases. Proceedings of the VLDB Endowment, 2021, 14( 12): 3190–3193

[5]

Müller M, Woltmann L, Lehner W. Enhanced featurization of queries with mixed combinations of predicates for ML-based cardinality estimation. In: Proceedings of the 26th International Conference on Extending Database Technology (EDBT 2023). 2023, 273–284

[6]

Wu Y, Hu Z, Wang Y, Min F . Rare potential poor household identification with a focus embedded logistic regression. IEEE Access, 2022, 10: 32954–32972

[7]

Yu Z, Zhang C, Xiong N, Chen F . A new random forest applied to heavy metal risk assessment. Computer Systems Science and Engineering, 2022, 40( 1): 207–221

[8]

Zhou W, Zhan S, Dai B, Guo L. SOAR: A learned join order selector with graph attention mechanism. In: Proceedings of 2022 International Joint Conference on Neural Networks. 2022, 1–8

[9]

Jiang J, Wen Z, Wang Z, He B, Chen J . Parallel and distributed structured SVM training. IEEE Transactions on Parallel and Distributed Systems, 2022, 33( 5): 1084–1096

[10]

Arunkumar A, Surendran D . Autism spectrum disorder diagnosis using ensemble ML and max voting techniques. Computer Systems Science and Engineering, 2022, 41( 1): 389–404

[11]

Cui Y, Zhang D, Zhang J, Zhang T, Cao L, Chen L . Multi-user reinforcement learning based task migration in mobile edge computing. Frontiers of Computer Science, 2024, 18( 4): 184504

[12]

Li Y, Wang L, Wang S, Sun Y, Peng Z. A resource-aware deep cost model for big data query processing. In: Proceedings of the 38th IEEE International Conference on Data Engineering. 2022, 885–897

[13]

Wu J, Qu L, Yang G, Han N . Diabetes induced factors prediction based on various improved machine learning methods. Current Bioinformatics, 2022, 17( 3): 254–262

[14]

Gallinucci E, Golfarelli M. SparkTune: Tuning spark SQL through query cost modeling. In: Proceedings of the 22nd International Conference on Extending Database Technology. 2019, 546–549

[15]

Ortiz J, Balazinska M, Gehrke J, Keerthi S S. Learning state representations for query optimization with deep reinforcement learning. In: Proceedings of the 2nd Workshop on Data Management for End-To-End Machine Learning. 2018, 4

[16]

Li G, Zhou X, Li S, Gao B . QTune: A query-aware database tuning system with deep reinforcement learning. Proceedings of the VLDB Endowment, 2019, 12( 12): 2118–2130

[17]

Li G, Zhou X, Sun J, Yu X, Han Y, Jin L, Li W, Wang T, Li S . openGauss: An autonomous database system. Proceedings of the VLDB Endowment, 2021, 14( 12): 3028–3041

[18]

Chen J, Chen Y, Chen Z, Ghazal A, Li G, Li S, Ou W, Sun Y, Zhang M, Zhou M. Data management at Huawei: Recent accomplishments and future challenges. In: Proceedings of the 35th IEEE International Conference on Data Engineering. 2019, 13–24

[19]

Zhou X, Sun J, Li G, Feng J . Query performance prediction for concurrent queries using graph embedding. Proceedings of the VLDB Endowment, 2020, 13( 9): 1416–1428

[20]

Zhang J, Wu S, Zhao J, Xie Z, Li F, Gao Y, Chen G. A sampling-based learning framework for big databases. In: Proceedings of the ACM Web Conference 2022. 2022, 1871–1881

[21]

Lan H, Bao Z, Peng Y . A survey on advancing the DBMS query optimizer: Cardinality estimation, cost model, and plan enumeration. Data Science and Engineering, 2021, 6( 1): 86–101

[22]

Qiao S J, Yang G P, Han N, Chen H, Huang F L, Yue K, Yi Y G, Yuan C A . Cardinality estimator: Processing SQL with a vertical scanning convolutional neural network. Journal of Computer Science and Technology, 2021, 36( 4): 762–777

[23]

Yu X, Li G, Chai C, Tang N. Reinforcement learning with tree-LSTM for join order selection. In: Proceedings of the 36th IEEE International Conference on Data Engineering. 2020, 1297–1308

[24]

Marcus R C, Negi P, Mao H, Zhang C, Alizadeh M, Kraska T, Papaemmanouil O, Tatbul N . Neo: A learned query optimizer. Proceedings of the VLDB Endowment, 2019, 12( 11): 1705–1718

[25]

Wu C, Jindal A, Amizadeh S, Patel H, Le W, Qiao S, Rao S . Towards a learning optimizer for shared clouds. Proceedings of the VLDB Endowment, 2018, 12( 3): 210–222

[26]

Li L, Guan J, Peng X, Zhou L, Zhang Z, Ding L, Zheng L, Wu L, Hu Z, Liu L, Yao Y . Machine learning for the prediction of 1-year mortality in patients with sepsis-associated acute kidney injury. BMC Medical Informatics and Decision Making, 2024, 24( 1): 208

[27]

Amiriparian S, Pugachevskiy S, Cummins N, Hantke S, Pohjalainen J, Keren G, Schuller B W. CAST a database: Rapid targeted large-scale big data acquisition via small-world modelling of social media platforms. In: Proceedings of 2017 Seventh International Conference on Affective Computing and Intelligent Interaction. 2017, 340–345

[28]

Akbari Y, Kunhoth J, Elharrouss O, Al-Maadeed S, Abualsaud K, Mohamed A, Khattab T. Indoor multi-lingual scene text database with different views. In: Proceedings of 2023 International Symposium on Networks, Computers and Communications. 2023, 1–6

[29]

Akhter R, Sofi S A . Precision agriculture using IoT data analytics and machine learning. Journal of King Saud University-Computer and Information Sciences, 2022, 34( 8): 5602–5618

[30]

Zhou H, Qian W, Zhou X, Dong Q, Zhou A, Tan W . Scalable and adaptive log manager in distributed systems. Frontiers of Computer Science, 2023, 17( 2): 172205

[31]

Xiong X, Qiao S, Han N, Xiong F, Bu Z, Li R, Yue K, Yuan G . Where to go: An effective point-of-interest recommendation framework for heterogeneous social networks. Neurocomputing, 2020, 373: 56–69

[32]

Li G L, Zhou X H, Sun J, Yu X, Yuan H-T, Liu J-B, Han Y . A survey of machine learning based database techniques. Chinese Journal of Computers, 2020, 43( 11): 2019–2049

[33]

Al-Azani S, Sait S M, Al-Utaibi K A . A comprehensive literature review on children’s databases for machine learning applications. IEEE Access, 2022, 10: 12262–12285

[34]

Fantinato M, Peres S M, Kafeza E, Chiu D K W, Hung P C K . A review on the integration of deep learning and service-oriented architecture. Journal of Database Management, 2021, 32( 3): 95–119

[35]

Wu Z, Negi P, Alizadeh M, Kraska T, Madden S . FactorJoin: A new cardinality estimation framework for join queries. Proceedings of the ACM on Management of Data, 2023, 1( 1): 41

[36]

Qiu Y, Wang Y, Yi K, Li F, Wu B, Zhan C. Weighted distinct sampling: Cardinality estimation for SPJ queries. In: Proceedings of 2021 International Conference on Management of Data. 2021, 1465–1477

[37]

Durand M, Flajolet P. Loglog counting of large cardinalities (Extended abstract). In: Proceedings of the 11th Annual European Symposium on Algorithms. 2003, 605–617

[38]

Ioannidis Y E. The history of histograms (abridged). In: Proceedings of the 29th International Conference on Very Large Data Bases. 2003, 19–30

[39]

Zhang M, Wang H . Selectivity estimation with density-model-based multidimensional histogram. Knowledge and Information Systems, 2021, 63( 4): 971–992

[40]

Qi K, Yu J, He Z . A cardinality estimator in complex database systems based on TreeLSTM. Sensors, 2023, 23( 17): 7364

[41]

Park Y, Zhong S, Mozafari B. QuickSel: Quick selectivity learning with mixture models. In: Proceedings of 2020 ACM SIGMOD International Conference on Management of Data. 2020, 1017–1033

[42]

Dutt A, Wang C, Nazi A, Kandula S, Narasayya V, Chaudhuri S . Selectivity estimation for range predicates using lightweight models. Proceedings of the VLDB Endowment, 2019, 12( 9): 1044–1057

[43]

Woltmann L, Hartmann C, Thiele M, Habich D, Lehner W. Cardinality estimation with local deep learning models. In: Proceedings of the 2nd International Workshop on Exploiting Artificial Intelligence Techniques for Data Management. 2019, 5

[44]

Kipf A, Kipf T, Radke B, Leis V, Boncz P A, Kemper A. Learned cardinalities: Estimating correlated joins with deep learning. In: Proceedings of the 9th Biennial Conference on Innovative Data Systems Research. 2019

[45]

Tzoumas K, Deshpande A, Jensen C S . Lightweight graphical models for selectivity estimation without independence assumptions. Proceedings of the VLDB Endowment, 2011, 4( 11): 852–863

[46]

Sun R, Tao H, Chen Y, Liu Q . HACAN: A hierarchical answer-aware and context-aware network for question generation. Frontiers of Computer Science, 2024, 18( 5): 185321

[47]

Marcus R, Papaemmanouil O . Plan-structured deep neural network models for query performance prediction. Proceedings of the VLDB Endowment, 2019, 12( 11): 1733–1746

[48]

Sun J, Li G . An end-to-end learning-based cost estimator. Proceedings of the VLDB Endowment, 2019, 13( 3): 307–319

[49]

Li P, Hua Y, Jia J, Zuo P . FINEdex: A fine-grained learned index scheme for scalable and concurrent memory systems. Proceedings of the VLDB Endowment, 2021, 15( 2): 321–334

[50]

Heimel M, Kiefer M, Markl V. Self-tuning, GPU-accelerated kernel density models for multidimensional selectivity estimation. In: Proceedings of 2015 ACM SIGMOD International Conference on Management of Data. 2015, 1477–1492

[51]

Ma J, Fan A, Jiang X, Xiao G . Feature matching via motion-consistency driven probabilistic graphical model. International Journal of Computer Vision, 2022, 130( 9): 2249–2264

[52]

Hilprecht B, Schmidt A, Kulessa M, Molina A, Kersting K, Binnig C . DeepDB: Learn from data, not from queries!. Proceedings of the VLDB Endowment, 2020, 13( 7): 992–1005

[53]

Chow C, Liu C . Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory, 1968, 14( 3): 462–467

[54]

Wu T, Qian H, Liu Z, Zhou J, Zhou A . Bi-objective evolutionary Bayesian network structure learning via skeleton constraint. Frontiers of Computer Science, 2023, 17( 6): 176350

[55]

Zhu R, Wu Z, Han Y, Zeng K, Pfadler A, Qian Z, Zhou J, Cui B . FLAT: Fast, lightweight and accurate method for cardinality estimation. Proceedings of the VLDB Endowment, 2021, 14( 9): 1489–1502

[56]

Wang J, Chai C, Liu J, Li G . FACE: A normalizing flow based cardinality estimator. Proceedings of the VLDB Endowment, 2021, 15( 1): 72–84

[57]

Wang X, Qu C, Wu W, Wang J, Zhou Q . Are we ready for learned cardinality estimation?. Proceedings of the VLDB Endowment, 2021, 14( 9): 1640–1654

[58]

Sun J, Zhang J, Sun Z, Li G, Tang N . Learned cardinality estimation: A design space exploration and A comparative evaluation. Proceedings of the VLDB Endowment, 2021, 15( 1): 85–97

[59]

Sandell J, Asplund E, Ayele W Y, Duneld M. Performance comparison analysis of ArangoDB, MySQL, and Neo4j: An experimental study of querying connected data. In: Proceedings of the 57th Hawaii International Conference on System Sciences. 2024, 7760–7769

[60]

Lew D J, Yoo K, Nam K W . DeepVQL: Deep video queries on PostgreSQL. Proceedings of the VLDB Endowment, 2023, 16( 12): 3910–3913

[61]

Xu Y, Zhang D, Zhang S, Wu S, Feng Z, Chen G . Predictive and near-optimal sampling for view materialization in video databases. Proceedings of the ACM on Management of Data, 2024, 2( 1): 19

[62]

Wei J, Suriawinata A, Ren B, Liu X, Lisovsky M, Vaickus L, Brown C, Baker M, Tomita N, Torresani L, Wei J, Hassanpour S. A petri dish for histopathology image analysis. In: Proceedings of the 19th International Conference on Artificial Intelligence in Medicine. 2021, 11–24

[63]

Han Y, Wu Z, Wu P, Zhu R, Yang J, Tan L W, Zeng K, Cong G, Qin Y, Pfadler A, Qian Z, Zhou J, Li J, Cui B . Cardinality estimation in DBMS: A comprehensive benchmark evaluation. Proceedings of the VLDB Endowment, 2021, 15( 4): 752–765

[64]

He Z, Lee B S, Snapp R R. Self-tuning UDF cost modeling using the memory-limited quadtree. In: Proceedings of the 9th International Conference on Extending Database Technology on Advances in Database Technology. 2004, 513–531

[65]

He Z, Lee B S, Snapp R . Self-tuning cost modeling of user-defined functions in an object-relational DBMS. ACM Transactions on Database Systems (TODS), 2005, 30( 3): 812–853

[66]

Liu F, Blanas S. Forecasting the cost of processing multi-join queries via hashing for main-memory databases. In: Proceedings of the 6th ACM Symposium on Cloud Computing. 2015, 153–166

[67]

Leis V, Radke B, Gubichev A, Mirchev A, Boncz P, Kemper A, Neumann T . Query optimization through the looking glass, and what we found running the join order benchmark. The VLDB Journal, 2018, 27( 5): 643–668

[68]

Siddiqui T, Jindal A, Qiao S, Patel H, Le W. Cost models for big data query processing: Learning, retrofitting, and our findings. In: Proceedings of 2020 ACM SIGMOD International Conference on Management of Data. 2020, 99–113

[69]

Ioannidis Y E, Kang Y C. Left-Deep vs. Bushy Trees: An analysis of strategy spaces and its implications for query optimization. In: Proceedings of 1991 ACM SIGMOD International Conference on Management of Data. 1991, 168–177

[70]

Bennett K, Ferris M C, Ioannidis Y E. A genetic algorithm for database query optimization. In: Proceedings of the 4th International Conference on Genetic Algorithms. 1991, 400–407

[71]

Stillger M, Lohman G M, Markl V, Kandil M. LEO - DB2’s LEarning optimizer. In: Proceedings of the 27th International Conference on Very Large Data Bases. 2001, 19–28

[72]

Marcus R, Papaemmanouil O. Deep reinforcement learning for join order enumeration. In: Proceedings of the First International Workshop on Exploiting Artificial Intelligence Techniques for Data Management. 2018, 3

[73]

Song X, Huang H, Lian J, Jin H . XGCN: a library for large-scale graph neural network recommendations. Frontiers of Computer Science, 2024, 18( 3): 183343

[74]

Zhang J. AlphaJoin: Join order selection à la AlphaGo. In: Proceedings of the VLDB 2020 PhD Workshop. 2020

[75]

Avnur R, Hellerstein J M. Eddies: Continuously adaptive query processing. In: Proceedings of 2000 ACM SIGMOD International Conference on Management of Data. 2000, 261–272

[76]

Trummer I, Wang J, Wei Z, Maram D, Moseley S, Jo S, Antonakakis J, Rayabhari A . SkinnerDB: Regret-bounded query evaluation via reinforcement learning. ACM Transactions on Database Systems (TODS), 2021, 46( 3): 9

[77]

Krishnan S, Yang Z, Goldberg K, Hellerstein J M, Stoica I. Learning to optimize join queries with deep reinforcement learning. 2018, arXiv preprint arXiv: 1808.03196

[78]

Markl V, Lohman G M, Raman V . LEO: An autonomic query optimizer for DB2. IBM Systems Journal, 2003, 42( 1): 98–106

[79]

Marcus R, Negi P, Mao H, Tatbul N, Alizadeh M, Kraska T. Bao: Making learned query optimization practical. In: Proceedings of 2021 International Conference on Management of Data. 2021, 1275–1288

[80]

Negi P, Interlandi M, Marcus R, Alizadeh M, Kraska T, Friedman M T, Jindal A. Steering query optimizers: A practical take on big data workloads. In: Proceedings of 2021 International Conference on Management of Data. 2021, 2557–2569

[81]

Zhang C, Li Y, Zhang R, Qian W, Zhou A . Scalable and quantitative contention generation for performance evaluation on OLTP databases. Frontiers of Computer Science, 2023, 17( 2): 172202

[82]

Zhou S, Li J, Wang H, Shang S, Han P. GRLSTM: Trajectory similarity computation with graph-based residual LSTM. In: Proceedings of the 37th AAAI Conference on Artificial Intelligence. 2023, 4972–4980

[83]

Chen Y, Cao H, Zhou Y, Liu Z, Chen D, Zhao J, Shi J. A GCN-GRU based end-to-end LEO satellite network dynamic topology prediction method. In: Proceedings of 2023 IEEE Wireless Communications and Networking Conference. 2023, 1–6

[84]

Chatzianastasis M, Lutzeyer J F, Dasoulas G, Vazirgiannis M. Graph ordering attention networks. In: Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence. 2023, 7006–7014

[85]

Zhao W, Hu H, Zhou W, Shi J, Li H. BEST: BERT pre-training for sign language recognition with coupling tokenization. In: Proceedings of the 37th AAAI Conference on Artificial Intelligence. 2023, 3597–3605

[86]

Chen L, Huang H, Chen D . Join cardinality estimation by combining operator-level deep neural networks. Inf. Sci., 2021, 546: 1047–1062

[87]

Dey S, Vinayakarao V, Gupta M, Dechu S. Evaluating commit message generation: To BLEU or not to BLEU? In: Proceedings of the 44th ACM/IEEE International Conference on Software Engineering: New Ideas and Emerging Results. 2022, 31–35

[88]

Yavuz S, Gür I, Su Y, Yan X. What it takes to achieve 100% condition accuracy on WikiSQL. In: Proceedings of 2018 Conference on Empirical Methods in Natural Language Processing. 2018, 1702–1711

[89]

Yu T, Zhang R, Yang K, Yasunaga M, Wang D, Li Z, Ma J, Li I, Yao Q, Roman S, Zhang Z, Radev D. Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-SQL task. In: Proceedings of 2018 Conference on Empirical Methods in Natural Language Processing. 2018, 3911–3921

[90]

Salimzadeh S, Gadiraju U, Hauff C, van Deursen A. Exploring the feasibility of crowd-powered decomposition of complex user questions in text-to-SQL tasks. In: Proceedings of the 33rd ACM Conference on Hypertext and Social Media. 2022, 154–165

[91]

Gan Y, Chen X, Purver M. Exploring underexplored limitations of cross-domain text-to-SQL generalization. In: Proceedings of 2021 Conference on Empirical Methods in Natural Language Processing. 2021, 8926–8931

[92]

Gan Y, Chen X, Huang Q, Purver M, Woodward J R, Xie J, Huang P. Towards robustness of text-to-SQL models against synonym substitution. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021, 2505–2515

[93]

Lee C H, Polozov O, Richardson M. KaggleDBQA: Realistic evaluation of text-to-SQL parsers. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021, 2261–2273

[94]

Xu X, Liu C, Song D. SQLNet: Generating structured queries from natural language without reinforcement learning. 2017, arXiv preprint arXiv: 1711.04436

[95]

Min Q, Shi Y, Zhang Y. A pilot study for Chinese SQL semantic parsing. In: Proceedings of 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019, 3652–3658

[96]

Stower K, Krechel D. Seq2SQL - evaluating different deep learning architectures using word embeddings. In: Proceedings of Machine Learning and Data Mining in Pattern Recognition: 15th International Conference on Machine Learning and Data Mining. 2019, 343–354

[97]

Guo J, Zhan Z, Gao Y, Xiao Y, Lou J G, Liu T, Zhang D. Towards complex text-to-SQL in cross-domain database with intermediate representation. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019, 4524–4535

[98]

Cai Z, Li X, Hui B, Yang M, Li B, Li B, Cao Z, Li W, Huang F, Si L, Li Y. STAR: SQL guided pre-training for context-dependent text-to-SQL parsing. In: Proceedings of Findings of the Association for Computational Linguistics: EMNLP 2022. 2022, 1235–1247

[99]

Popescu O, Manotas I, Vo N P A, Yeo H, Khorashani E, Sheinin V. Addressing limitations of encoder-decoder based approach to text-to-SQL. In: Proceedings of the 29th International Conference on Computational Linguistics. 2022, 1593–1603

[100]

Li S, Zhou K, Zhuang Z, Wang H, Ma J . Towards text-to-SQL over aggregate tables. Data Intelligence, 2023, 5( 2): 457–474

[101]

Wei C, Huang S, Li R . Enhance text-to-SQL model performance with information sharing and reweight loss. Multimedia Tools and Applications, 2022, 81( 11): 15205–15217

[102]

Wolfson T, Deutch D, Berant J. Weakly supervised text-to-SQL parsing through question decomposition. In: Proceedings of Findings of the Association for Computational Linguistics: NAACL 2022. 2022, 2528–2542

[103]

Jeong G, Han M, Kim S, Lee Y, Lee J, Park S, Kim H . Improving text-to-SQL with a hybrid decoding method. Entropy, 2023, 25( 3): 513

[104]

Qi J, Tang J, He Z, Wan X, Cheng Y, Zhou C, Wang X, Zhang Q, Lin Z. RASAT: Integrating relational structures into pretrained Seq2Seq model for text-to-SQL. In: Proceedings of 2022 Conference on Empirical Methods in Natural Language Processing. 2022, 3215–3229

[105]

Xu K, Wang Y, Wang Y, Wang Z, Wen Z, Dong Y. SeaD: End-to-end text-to-SQL generation with schema-aware denoising. In: Proceedings of Findings of the Association for Computational Linguistics: NAACL 2022. 2022, 1845–1853

[106]

Qin B, Wang L, Hui B, Li B, Wei X, Li B, Huang F, Si L, Yang M, Li Y. SUN: Exploring intrinsic uncertainties in text-to-SQL parsers. In: Proceedings of the 29th International Conference on Computational Linguistics. 2022, 5298–5308

[107]

Shi P, Zhang R, Bai H, Lin J. XRICL: Cross-lingual retrieval-augmented in-context learning for cross-lingual text-to-SQL semantic parsing. In: Proceedings of Findings of the Association for Computational Linguistics: EMNLP 2022. 2022, 5248–5259

[108]

Pi X, Wang B, Gao Y, Guo J, Li Z, Lou J G. Towards robustness of text-to-SQL models against natural and realistic adversarial table perturbation. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2022, 2007–2022

[109]

Han S, Gao N, Guo X, Shan Y. RuleSQLova: Improving text-to-SQL with logic rules. In: Proceedings of 2022 International Joint Conference on Neural Networks. 2022, 1–8

[110]

Zheng Y, Wang H, Dong B, Wang X, Li C. HIE-SQL: History information enhanced network for context-dependent text-to-SQL semantic parsing. In: Proceedings of Findings of the Association for Computational Linguistics: ACL 2022. 2022, 2997–3007

[111]

Xiao D, Chai L, Zhang Q W, Yan Z, Li Z, Cao Y. CQR-SQL: Conversational question reformulation enhanced context-dependent text-to-SQL parsers. In: Proceedings of Findings of the Association for Computational Linguistics: EMNLP 2022. 2022, 2055–2068

[112]

Wang L, Qin B, Hui B, Li B, Yang M, Wang B, Li B, Sun J, Huang F, Si L, Li Y. Proton: Probing schema linking information from pre-trained language models for text-to-SQL parsing. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2022, 1889–1898

[113]

Awasthi A, Sathe A, Sarawagi S. Diverse parallel data synthesis for cross-database adaptation of text-to-SQL parsers. In: Proceedings of 2022 Conference on Empirical Methods in Natural Language Processing. 2022, 11548–11562

[114]

Zhao C, Su Y, Pauls A, Platanios E A. Bridging the generalization gap in text-to-SQL parsing with schema expansion. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2022, 5568–5578

[115]

Lee G, Hwang H, Bae S, Kwon Y, Shin W, Yang S, Seo M, Kim J P, Choi E. EHRSQL: A practical text-to-SQL benchmark for electronic health records. In: Proceedings of the 36th Conference on Neural Information Processing Systems (NeurIPS 2022). 2022

[116]

Praciano F D B S, Amora P R P, Abreu I C, Pereira F L F, Machado J C . Robust cardinality: a novel approach for cardinality prediction in SQL queries. Journal of the Brazilian Computer Society, 2021, 27( 1): 11

[117]

Yang J, Zhang Y, Wang B, Yang X. BoundEst: Estimating join cardinalities with tight upper bounds. In: Proceedings of the 7th International Joint Conference on Web and Big Data. 2023, 437–451

[118]

Zhou W, Zhan S, Guo L, Dai B. CELA: An accurate learned cardinality estimator with strong generalization ability and dimensional adaptability. In: Proceedings of the 22nd International Conference on Web Information Systems Engineering. 2021, 111–118

[119]

Liang Z, Chen X, Xia Y, Ye R, Chen H, Xie J, Zheng K. DACE: A database-agnostic cost estimator. In: Proceedings of the 40th IEEE International Conference on Data Engineering. 2024, 4925–4937

[120]

Huang S, Qin Y, Zhang X, Tu Y, Li Z, Cui B . Survey on performance optimization for database systems. Science China Information Sciences, 2023, 66( 2): 121102

[121]

Liang Z, Chen X, Zhao Y, Xie J, Zeng K, Zheng K. Efficient cardinality and cost estimation with bidirectional compressor-based ensemble learning. In: Proceedings of 2023 IEEE International Conference on Data Mining. 2023, 388–397

[122]

Guo W, Zhuang F, Zhang X, Tong Y, Dong J . A comprehensive survey of federated transfer learning: Challenges, methods and applications. Frontiers of Computer Science, 2024, 18( 6): 186356

[123]

Warnke B, Martens K, Winker T, Groppe S, Groppe J, Adhiyaman P, Srinivasan S, Krishnakumar S . ReJOOSp: Reinforcement learning for join order optimization in SPARQL. Big Data and Cognitive Computing, 2024, 8( 7): 71

[124]

Ji L, Zhao R, Dang Y, Liu J, Zhang H . Query join order optimization method based on dynamic double deep Q-network. Electronics, 2023, 12( 6): 1504

[125]

Gu R, Zhang Y, Yin L, Song L, Huang W, Yuan C, Wang Z, Zhu G, Huang Y . Coral: Federated query join order optimization based on deep reinforcement learning. World Wide Web, 2023, 26( 5): 3093–3118

[126]

Li G, Zhou X, Cao L. AI meets database: AI4DB and DB4AI. In: Proceedings of 2021 International Conference on Management of Data. 2021, 2859–2866

[127]

Zhou X, Sun Z, Li G . DB-GPT: Large language model meets database. Data Science and Engineering, 2024, 9( 1): 102–111

[128]

Chang S, Wang J, Dong M, Pan L, Zhu H, Li A H, Lan W, Zhang S, Jiang J, Lilien J, Ash S, Wang W Y, Wang Z, Castelli V, Ng P, Xiang B. vDr.Spider: A diagnostic evaluation benchmark towards text-to-SQL robustness. In: Proceedings of the 11th International Conference on Learning Representations. 2023

RIGHTS & PERMISSIONS

The Author(s) 2025. This article is published with open access at link.springer.com and journal.hep.com.cn

AI Summary AI Mindmap
PDF (4423KB)

Supplementary files

Highlights

507

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/