OCACO: an operator-level cardinality and cost joint estimator
Tao JI , Haoyang LI , Kai ZHONG , Jing ZHANG , Cuiping LI , Hong CHEN
Front. Comput. Sci. ›› 2026, Vol. 20 ›› Issue (9) : 2009349
OCACO: an operator-level cardinality and cost joint estimator
Cardinality and cost estimation are critical components of query optimization, as they directly influence the construction of efficient physical execution plans. While machine learning-based estimators have achieved notable success, they face several challenges: (1) Training data derived from rigid, template-driven benchmarks exhibits significant distributional divergence from real-world query workloads, a challenge further compounded by the manual template design in exhaustively representing the full spectrum of query patterns. (2) These methods demonstrate limited generalization capabilities, especially in scenarios involving sub-plan estimation or queries that significantly deviate from the training query templates. Furthermore, the inherent inefficiency of operator-level cardinality estimation frequently undermines its applicability for accurate cost estimation. (3) These approaches frequently fail to leverage the rich semantic information and dynamic dependencies between operators.
To address these challenges, we propose a novel operator-level cardinality and cost estimator that simultaneously estimates the cardinality and cost of all sub-plans within a query plan. First, we leverage large language models to generate high-quality and diverse SQL queries, which serve as the foundation for pre-training and fine-tuning our model. Second, we introduce a semantic-based operator encoding strategy, augmented with a novel tree-structure-aware neural network, to effectively represent each sub-plan. Third, we propose a specialized loss function tailored for joint cardinality and cost prediction at the operator level, fully utilizing labels from each sub-plan. Extensive experiments on both synthetic and real-world datasets demonstrate that our method consistently outperforms state-of-the-art approaches.
cardinality / cost / machine learning / AI4DB
| [1] |
|
| [2] |
|
| [3] |
Lohman G M. Is query optimization a ‘solved’ problem? ACM SIGMOD Blog, 2014 |
| [4] |
|
| [5] |
|
| [6] |
|
| [7] |
|
| [8] |
|
| [9] |
|
| [10] |
|
| [11] |
|
| [12] |
Li Y, Wang L, Wang S, Sun Y, Peng Z. A resource-aware deep cost model for big data query processing. In: Proceedings of the 38th IEEE International Conference on Data Engineering (ICDE). 2022, 885–897 |
| [13] |
|
| [14] |
|
| [15] |
|
| [16] |
|
| [17] |
|
| [18] |
|
| [19] |
|
| [20] |
|
| [21] |
|
| [22] |
|
| [23] |
|
| [24] |
|
| [25] |
|
| [26] |
|
| [27] |
|
| [28] |
|
| [29] |
|
| [30] |
|
| [31] |
|
| [32] |
|
| [33] |
|
| [34] |
|
| [35] |
|
| [36] |
|
| [37] |
|
| [38] |
|
| [39] |
Liang Z, Chen X, Xia Y, Ye R, Chen H, Xie J, Zheng K. DACE: a database-agnostic cost estimator. In: Proceedings of the 40th IEEE International Conference on Data Engineering (ICDE). 2024, 4925–4937 |
| [40] |
|
| [41] |
Yu X, Li G, Chai C, Tang N. Reinforcement learning with tree-LSTM for join order selection. In: Proceedings of the 36th IEEE International Conference on Data Engineering (ICDE). 2020, 1297–1308 |
| [42] |
|
| [43] |
|
| [44] |
|
| [45] |
|
| [46] |
|
| [47] |
|
| [48] |
|
| [49] |
|
| [50] |
|
| [51] |
|
| [52] |
|
| [53] |
|
| [54] |
|
| [55] |
|
| [56] |
|
Higher Education Press
/
| 〈 |
|
〉 |