1. Engineering Research Center of Database and Business Intelligence, Ministry of Education, Beijing 100872, China
2. School of Information, Renmin University of China, Beijing 100872, China
licuiping@ruc.edu.cn
Show less
History+
Received
Accepted
Published
2024-05-21
2024-09-08
Issue Date
Revised Date
2024-09-11
PDF
(4718KB)
Abstract
The interconnection between query processing and data partitioning is pivotal for the acceleration of massive data processing during query execution, primarily by minimizing the number of scanned block files. Existing partitioning techniques predominantly focus on query accesses on numeric columns for constructing partitions, often overlooking non-numeric columns and thus limiting optimization potential. Additionally, these techniques, despite creating fine-grained partitions from representative queries to enhance system performance, experience from notable performance declines due to unpredictable fluctuations in future queries. To tackle these issues, we introduce LRP, a learned robust partitioning system for dynamic query processing. LRP first proposes a method for data and query encoding that captures comprehensive column access patterns from historical queries. It then employs Multi-Layer Perceptron and Long Short-Term Memory networks to predict shifts in the distribution of historical queries. To create high-quality, robust partitions based on these predictions, LRP adopts a greedy beam search algorithm for optimal partition division and implements a data redundancy mechanism to share frequently accessed data across partitions. Experimental evaluations reveal that LRP yields partitions with more stable performance under incoming queries and significantly surpasses state-of-the-art partitioning methods.
Pengju LIU, Pan CAI, Kai ZHONG, Cuiping LI, Hong CHEN.
LRP: learned robust data partitioning for efficient processing of large dynamic queries.
Front. Comput. Sci., 2025, 19(9): 199607 DOI:10.1007/s11704-024-40509-4
Data partitioning is a pervasive concept in our daily, with a simple analogy being the organization of goods within supermarkets into distinct sections for easier navigation. This principle is extensively adopted in database management systems (DBMS) as a crucial step in database physical design [1–3] to enhance the efficiency of data retrieval and management. Horizontal partitioning (HP) [4–9] stands out as a critical branch of data partitioning that aims for minimal row granularity during data division to optimize data management. It involves selecting commonly used column distributions or mining the query-data access relationship as partitioning features to finely allocate table data into smaller partition files, thereby accelerating queries. For example, in Fig.1①, if a two-dimensional tablespace (400 MB in size) is partitioned based on the areas targeted by queries (black dots), we only require accessing two partitions and (165 MB in total) to fetch all the required data. This method circumvents the need to scan the entire tablespace, which often occurs when using simplistic data-driven partitioning strategies, such as uniform or random division of the tablespace.
Recent HP studies [7–14] regard query acceleration as the primary optimization objective, extracting valuable predicate conditions to greedily split given for maximizing data skipping. However, there are two major limitations:
(1) Current research only considers numeric column-related predicates, making the split partitions not fully adaptable to real querying areas. For example, by comparing Fig.1① and 1②, the measured querying boundaries (black dots) are larger than the real querying ones (green dots) due to the neglect of non-numeric predicates. This discrepancy results in an inaccurate data layout (solid boxes, 165 MB) that scans an additional 25 MB of data compared to the optimal layout (dotted boxes, 145 MB). Thus, considering predicates related to non-numeric data is crucial for partitioning, especially for tables like TPC-H [15], which have a few numeric columns. In such tables, 73.6% of the columns are non-numeric, and 86% of queries utilize these columns in filter conditions.
(2) Another limitation is that most studies demonstrate superior performance for static queries but are difficult to adapt to query shifts. Only a few studies [11,13,14] adapt to dynamic query workload through periodic data re-partitioning, which can be a costly operation for DBMS. For example, in Fig.1③, we have created two partitions and , for old queries (black dots) in . However, as new queries (red dots) arrive, the amount of scanned data can increase from 165 to 400 MB. This occurs because the partially queried areas outside and are randomly allocated to different partitions, potentially requiring a search of the entire table. By assessing the similarity of querying boundaries between new and old queries (introduced from [9]), we categorize new queries into two types: similarity queries (red dotted boxes) and exploratory queries (other red dots). If the robust partition is created in advance for incoming similarity and exploratory queries (see Fig.1④), we can save 215 MB of data scanning without incurring the expensive re-partitioning required to update for query shifts.
Overall, there are three primary challenges. The first challenge arises from the absence of non-numeric-type predicates, which alters the candidate split condition set for partitioning and consequently impacts the order in which the tablespace is split. Hence, how to fully incorporate the complete predicate conditions in the partition construction process (C1) is crucial. The second challenge involves the dynamic nature of collected workloads; it is essential to predict query distribution changes accurately to create robust partitions (C2) that mitigate the need for frequent re-partitioning. The third challenge, a common objective in previous studies, is refining predicate-based tablespace partitioning (C3) to improve query performance.
Our proposed solution. We propose a Learned Robust Partitioning system (LRP) that utilizes historical query logs to generate robust data layouts. Firstly, LRP identifies queries with similar characteristics submitted at distinct times, then designs complete data encoding schemes and a logical tree structure to embed these raw queries into structured query vectors. This vector representation incorporates various types of predicate conditions and their logical relationships (addressing C1). Secondly, LRP employs two neural networks, multi-layer perceptron (MLP) [16] and long short-term memory (LSTM) [17], selected based on the temporal features of the queries, to predict incoming similarity queries. LRP also introduces a novel loss function that replaces the traditional Mean Squared Error (MSE) with a partition semantic correlation error to improve prediction accuracy (addressing C2). Thirdly, LRP decodes the predicted query vectors into candidate predicates and utilizes a beam search algorithm to determine the locally optimal predicate condition for each tablespace split. Furthermore, LRP introduces an effective data redundancy mechanism that replicates frequently accessed data to minimize query access contention at each partition (addressing C3). To enhance the robustness of fixed layouts against exploratory queries, LRP employs a data-driven KD-Tree [4] to partition the remaining tablespace after predicate-based splitting.
It is important to note that, similar to QdTree [8] and PAW [9], our approach does not directly optimize multi-table queries, such as their join phases. Instead, it indirectly optimizes them by reducing the number of blocks required for joins.
Contributions. In summary, we make the following contributions:
● We propose multiple data and query encoding schemes to capture the often-overlooked access features of non-numeric columns. Then, we present a logical tree for the transformation of raw queries into vectors.
● We design two optional predictive networks, along with a loss function integrating partition semantics, to accurately predict changes in similarity queries, ensuring robust layout creation over these predictions.
● We employ beam search and KD-Tree strategies to find the optimal predicate allocation order for splitting a given tablespace. Moreover, we design a data redundancy policy that aims to minimize query access contention at a negligible storage cost.
● Through rigorous testing on benchmark datasets, we demonstrate that our method outperforms existing techniques in both performance and robustness.
2 Preliminaries
In this section, we introduce the relevant concepts of queries and partitioning, followed by defining the two main problems to be addressed. Next, we present the related work. Tab.1 summarizes the necessary notations used in this paper.
2.1 Similarity queries
The DBMS optimizer generates a query plan to execute a user query efficiently. Within this plan, filter operations are typically executed first, determining the satisfied data rows through predicate pushdown. Given the complete predicates extracted from a query and a table dataset with columns , we can characterize its query feature as the column domains for the satisfied rows, represented in vector form , i.e.,
where denote the lower and upper bounds of the th column domain for the queried data by . Similarly, we represent the domains of the entire table as , where denote the lower and upper bounds of the th column data.
In real-world scenarios, many new and old queries often share the same filter columns and similar predicate conditions [9]. This leads to certain rows being repeatedly accessed by user queries, as illustrated by the red dotted boxes in Fig.1③. We define two optional metrics to measure the similarity feature between queries :
The first optional similarity metric is the overlapping data area accessed by the two queries, which can be measured by calculating the size (in bytes) of the set of co-accessed rows.
where the th row in is denoted as , with a length of , and as the th column value of .
The second similarity metric evaluates the access distance, denoted as , between the two queries on each column dimension. This is calculated by finding the maximum difference between column boundaries.
Then we can define how to identify queries that satisfy different levels of similarity.
Definition 1 (-similar queries). Let be a distance threshold. Consider historical load () and future load () with distinct submission timestamps. If we can find two sets and , for any , there always exists a mapping , such as or , then we deem them as -similar. The remaining exploratory queries can be obtained as . Here, the computation method of and is defined as follows:
2.2 Horizontal partitioning layout
A query-aware data layout consists of disjoint partitions () created from query samples . Each partition is materialized as block files with sizes limited to ( in HDFS [18]). The I/O cost of processing query over is evaluated as the total size of accessed partitions, denoted as . To build a data layout, recent works [8,9,12–14,19] propose a partition tree structure, similar to an index tree, which has been proven effective in guiding data allocation. We give its definition as follows:
Definition 2 (Partition index tree). The partition index tree () acts as a router, allocating data to specific partitions. Creating starts with a root node covering the entire table space, and we select a feasible predicate to split it into multiple child nodes. Each leaf node is then processed sequentially until no more child nodes can be generated. In , all nodes maintain metadata to guide query skipping and support data routing. Each leaf node () is ultimately materialized as a single partition (), with a node size , which is determined by the number of rows it contains. This node size ranges from , i.e.,
Next, we formally outline the two main problems to be addressed in this paper.
Problem 1 (Robust partitioning). Given a training set and a test set of , each mapping satisfies -similarity, we consider two cases: (1) is evenly divided into an ordered sequence based on query submission time; (2) Submission time is not recorded, rendering all queries unordered. Our goal is to find an optimal model that fits the mapping, enabling it to generate a predicted load based on . This can maximize cumulative co-accessed data area with while ensuring that a robust data layout built upon has the smallest query cost difference versus the layout generated by , i.e.,
Problem 2 (Optimal layout). Given , , , and relation tables, this problem asks for constructing an individual partition tree for the th table, followed by routing its table data on to create a robust data layout , such that the total I/O cost of executing (i.e., ) over all layouts is minimized. This process is formulated as follows:
2.3 Related work
Data-driven partitioning. In most popular database products such as TiDB [20], ClickHouse [21], and Snowflake [22], partitioning rules based on data distribution are still recommended as the preferred option. This method is suitable for data with prior statistics but requires careful selection of partition boundaries, including hash and range partitioning [4], along with partition maintenance structures like SMA [23], Zone Maps [24]. They exhibit low sensitivity and high adaptability to rapidly changing load scenarios, but with relatively lower performance. Conversely, our LRP ensures both superior performance and minimized partition updates to accommodate dynamic workloads.
Query-driven partitioning. There are two approaches for creating query-driven partitioning rules. Classifier-based methods [7,10] extract representative predicate conditions from historical loads, cluster tuples based on the similarity of their satisfying predicates, and subsequently compute classification features for each cluster. Tree-based methods [8,9,12–14,19] incrementally build a partition tree by selecting numeric-type predicates with the maximum query skipping benefit as the split condition at each tree expansion stage. Jigsaw [25] provides optimal data skipping with tetris-shaped partitions managed by logical segments, yet it requires multiple hash tables for frequent tuple reconstruction. Unlike prior studies, LRP is the first tree partitioner to identify additional non-numeric predicates and implement a refined beam search policy for selecting better predicate split combinations.
Adaptive partitioning. To cope with dynamic loads, [11,13,14,19] manage re-partitioning based on the filling of a predefined-length query window, while [8] relies on periodically monitoring distribution differences between new and old data. Learning-based methods [26–30] use RNN/RL-style models to monitor and predict query distribution changes, aiding in calculating potential re-partitioning benefits. Another solution is to try creating a robust layout to reduce the re-partitioning frequency. PAW [9] introduces a similar-load scenario and search for a fixed expansion value to fit query changes, which cannot always guarantee high layout quality. LRP addresses this by training prediction networks that incorporate partition evaluation factors into the loss function.
3 LRP overview
3.1 LRP framework
Overview.Fig.2 shows the architecture of LRP framework with a three-stage task.
Step 1 — Query feature embedding. Initially, LRP reads raw queries from real system logs and encodes non-numeric column data using predefined coding rules based on the table schema. The non-numeric columns in the raw table domain are then updated based on the encoding scheme of the corresponding columns. Next LRP extracts complete logical predicate trees (defined in Subsection 4.2) from queries, encodes them, and serializes them into structured query vectors by a level-by-level logical domain computation.
Step 2 — Similarity load prediction. Depending on whether the submission time is known, these query vectors are classified as ordered or unordered. Accordingly, LRP adopts the corresponding network, LSTM for ordered and MLP for unordered, to predict changes in query vectors.
Steps 3–5 — Data layout construction. To create the data layout instance, LRP decodes predicted vectors into logical predicates, followed by a step-by-step construction of the partition tree (). Starting from the root node, LRP generates the candidate predicate split set for the current node and applies a beam search-based split policy to select the optimal split condition for creating child nodes. The is incrementally expanded until all leaf nodes meet the partition size requirements. LRP then further reallocates leaf node data by replicating frequently accessed data across multiple partitions for greater data skipping. All table data is routed by to leaf nodes, which are subsequently materialized into specified partition files.
Query processing. When processing a new query, the query optimizer uses the integrated to identify required partitions, developing an appropriate query plan and forwarding it to the query executor to obtain the result set.
Example 1 Given a query and the domain for table in Fig.2, where (numeric column) and (categorical column) respectively use min-max values and distinct values as their raw domain representations. First, LRP will use a dictionary table to encode the country column because it is an enumeration type, i.e., {1:‘USA’, 2:‘China’, 3:‘Germany’, 4:‘Japan’, 5:‘India’}, to obtain a new numeric table domain, i.e., . Subsequently, all predicates are extracted from and those related to the encoded columns are processed; for instance, ‘’ is converted to ‘’. Through logical computation among predicates, LRP obtains the vector representation of as . Since individual queries lack temporal order, LRP will select the MLP network to generate the vector prediction , which is then decoded into four predicates . Employing a split order recommended by the tree split policy, LRP constructs a four-layer partition tree () comprising five leaf nodes. These leaf nodes are further refined by replicating high-frequency hot data among partitions. Finally, by routing the entire table data through , five partition files are created as the final data layout.
3.2 Role of LRP
LRP primarily targets boosting partitioning performance while emphasizing robustness under dynamic loads. It is applicable in the following scenarios:
● Static optimal layout. By utilizing more comprehensive predicate features and refining the table space partitioning strategy, LRP functions as an efficient static partitioning algorithm when the load prediction module is disabled.
● Complementary to re-partitioning. Although LRP creates robust layouts, it does not eliminate the need for re-partitioning operations to maintain system stability during significant performance declines (e.g., an increase in exploratory queries). Instead, LRP aims to reduce the frequency of re-partitioning and can also work collaboratively with existing re-partitioning methodologies [13,31] rather than conflicting with them.
4 Data encoding and query vector extraction
In this section, we fully leverage non-numeric column data’s access properties to generate complete logical predicates as partitioning features, which can then be embedded into unified query vectors for subsequent load prediction.
4.1 Data encoding phase
There are various types of non-numeric columns in a table schema, including date strings, enumeration data, and other fixed-length and variable-length texts. We design multiple encoding functions to convert them into numeric data, facilitating the computation of their column domains based on the encoded data distribution.
Encoding strategy. As shown in Fig.3①, for a non-numeric column :
If is of date type, a date formatting function is applied, i.e., .
If is of enumeration type or if is less than a constant , we directly utilize a dictionary encoding strategy by creating a finite-sized dictionary table for all distinct values () in column , i.e., .
If has a maximum text length less than the predefined constant (classified as Complex Type-1), a trie-based index tree [32] is constructed, which is suitable for long and non-categorical strings. The encoding process starts at the root node, identifying the encoding value of the th letter in a given string from nodes at depth , ultimately generating a sequence as its encoding key.
Otherwise (classified as Complex Type-2), no encoding is performed.
We determine suitable values for the above constants (, ) through empirical judgment and experimental validation.
Encoding Key Allocation Order. For each encoding column, we avoid allocating continuous dictionary keys to column values in alphabetical order. Instead, our goal is to ensure that the most frequently co-accessed texts are allocated continuous keys, facilitating subsequent domain simplification of non-numeric columns. To achieve this, we first record the co-occurrence frequency of column values referenced by queries, sort them in descending order, and then determine the allocation priority for each column value accordingly. For example, in Fig.2, if the countries ‘USA’ and ‘Germany’ are frequently queried together, we modify the encoding scheme of (‘USA’, ‘China’, ‘Germany’) from (1, 2, 3) to (1, 3, 2).
Domain computation. We then compute the column domains for the encoded table dataset, which supports non-numeric column-based node splitting in partition trees (see Section 6).
For numeric and date columns (), we can directly use the min/max functions to compute their domain boundaries, i.e., .
For encoded text columns (), we adopt the distinct encoding keys as their list-type domain representation, i.e., .
For unencoded text columns (), whose data distribution cannot be effectively quantified, we use ‘None’ as the domain identifier, i.e., .
Example 2 Fig.3② illustrates how a simple raw table with 4 rows and 5 columns is encoded. For each non-numeric column of , we employ distinct encoding schemes. This includes converting the date column () using a general timestamp function, constructing dictionary tables for the enumeration columns (,), and building a trie-based tree for the irregular text column (). After encoding, the table data consists only of pure numerical values. We then utilize the three domain computation rules to represent the domains for each column, forming the encoded numeric table domain.
4.2 Query encoding phase
Predicate formatting. Given any predicate, we represent it as a triplet , consisting of the column (), operator (), and condition value (). To our knowledge, previous studies [6,8–10,14] have predominantly focused on predicate conditions involving comparison operators (e.g., ‘’, ‘’, ‘=’, and ‘’) associated with numeric columns. To support more predicate types, we extract and format all predicates in a query from three parts: special operators, query clauses, and logical relationships.
Special operators. To utilize all operators as the partition tree split condition, certain special operators, including set operators and pattern matching operators, must be formatted into unified comparison operators. As shown in Fig.3③, ‘’ is converted to two predicates: and ; ‘’ and ‘’ are easily converted to and ; if is a common enumeration column (e.g., gender), ‘’ is converted to by identifying all column values that satisfy the given wildcard.
Query clauses. Fig.3④ designs the formatting rules for three common clauses. In cases involving a Join operation (e.g., two tables ) on the column, where they share a common join key , we can distribute all predicate conditions related to to each table to identify potential predicates. For Group By and Order By operations on the column, the column data needs to participate in comparison operators such as ‘’ and ‘’ to complete the grouping and sorting of .
Logical relationship (shown in Fig.3⑤). We consider the logical relationships (AND, OR) among single predicates or predicate groups, which is crucial for accurately encoding query vectors and consequently facilitating the search for the queried leaf nodes in partition trees.
Query embedding. For any query, extracting all queried rows from the sampling table so as to calculate its vector representation is expensive; instead, we propose a lightweight query embedding strategy that allows obtaining its vector representation without query pre-execution. This strategy consists of two steps (see Fig.3⑥):
Query Encoding. Using extracted predicates and logical relationships, we can construct a logical tree structure () (see Fig.3⑦) composed of logical nodes () and predicate nodes (). Child nodes (including and sub-trees) sharing the same parent node () imply that they satisfy the corresponding logical relationship (AND, OR). Subsequently, all condition values related to non-numeric columns in are encoded using predefined data encoding schemes.
Logical computation. We leverage the encoded for efficient vector calculations, which are conducted via a bottom-up traversal on , as depicted in Fig.3⑦. When traversing to each node level, all sibling nodes along with their parent node are grouped into a subtree, and then column domains are computed by performing logical operations among predicates. The process begins with the lowest level subtree (depicted by dashed triangles), sequentially visiting subtrees at higher levels, and cumulatively updating domain values until reaching the root node.
Example 3 As shown in Fig.3⑧, to embed a query , we first extract a 3-level logical tree and then format these predicates with non-numeric condition values or non-comparison operators using the encoding structures (Fig.3①) and predicate conversion rules (Fig.3③ and Fig.3④). Next, logical calculations are performed over to obtain the final query vector. Specifically, the subtree at the bottom is executed first in terms of the column , i.e., Subsequently, the subtree is executed over . Since is not referenced by the left subtree , we can select the ’s table domain (i.e., ) as its replacement, then . Next, we perform over , i.e., . For and , since they are not involved in any predicates, their domains are also replaced by the corresponding table domains. Finally, we concatenate all domains to generate the final query vector.
Query vector decoding. To create partitions over these vector predictions , we need to pre-decode them into predicate conditions. Considering the one-to-one mapping relationship between and (), we can employ a slot-binding strategy before predicting queries. We first identify boundary values in with equivalent relationships to its corresponding predicate nodes of , performing slot binding. When decoding vectors, we only fill domain values of into these pre-saved slots to rewrite logical predicates.
5 Predicting similarity-load
In this section, we initially define a loss function to reflect the prediction effectiveness on robust partitioning. Subsequently, we employ two predictive networks to learn the similarity changes of given mappings (), generating .
5.1 Prediction loss incorporating partition semantics
As described in Subsection 4.1, there are three distinct domain types. To achieve a unified model input, we must process the list-type and None-type domains of non-numeric columns () to match the dimensionality of the min–max domain. This ensures consistency in the dimensions of all domains within the query vector.
Vector preprocessing. Before defining the loss function, we process query vectors as follows:
If is a list-type domain, intuitively, predicting its changes is challenging due to its larger number of elements compared to the min–max domain. In contrast, predicting only its domain boundary changes can significantly improve accuracy, with minimal errors in the query predicate features of column . As shown in Eq. (1), we use the min–max values of all dictionary keys in and consider the minimum interval between keys as boundary values to continuous the ’s discrete domain. This is because these discrete domains are typically derived from query predicates, comprising co-accessed column values, and are more likely to be allocated continuous key values based on the key allocation order mentioned in Subsection 4.1.
where are obtained keys after encoding .
If is a None-type domain, as it does not contain any specific knowledge, we apply a simple numeric encoding, i.e.,
Loss function design. We do not directly use the mean square error (MSE) between and as the loss function, as MSE implies that the two query vector values are closely matched, but this does not necessarily mean that the partitions constructed on them have similar quality. Instead, we incorporate both the MSE and partition semantic information to accurately estimate potential partition performance over . Specifically, LRP tailors a function for reliable loss estimation by using a weighted deviation to account for the impact of different dimension deviations between and on partition quality.
Fig.4 shows all possible intersection positions of (white rectangle) and (green rectangle) in a 2D table (i.e., ), which is categorized by the number of edges () in not covered by . Intuitively, we formulate it as follows:
where the function .
As increases from 0 to 4, the number of queried areas outside of increases. Consequently, each has a higher probability of accessing an extra number of partitions in the -based data layout, resulting in a higher loss value. Moreover, a smaller intersection area between and should also correspond to a higher loss value. Based on these two observations, is primarily composed of two parts of losses, i.e.,
where indicates the prediction inaccuracy of , affecting the created partition size. Conversely, represents the uncovered area of , determining as well as the number of future accessed partitions. Intuitively, should be assigned a higher weight, as increasing the number of accessed partitions has a greater impact on the overall cost than the size of accessed partitions. Thus, we assign weights to them as (). The weight is dynamic, influenced by the increase in and the ratio , which is the query density of (denoted as ) relative to the table’s average query density (denoted as ). A higher typically indicates that, under the same , more partitions will be accessed in the future. Therefore, we represent as , where is a hyper-parameter and is calculated as follows:
In this formula, we represent the by using the proportion of the cumulative query access area within the area. The is estimated by the proportion of the cumulative access area of all queries to the total table area.
5.2 Design of load predictor
In this subsection, we first categorize historical query vectors based on their temporal information, and then design suitable prediction networks for each category.
Load classification. We identify query submission times from logs. If is distributed across continuous time intervals, and there exists a query sequence at each interval that satisfies -similarity with adjacent sequences, it is referred to as ordered load; otherwise, it is unordered, i.e.,
where .
Prediction dimensionality reduction. To facilitate model convergence, we shift from predicting the changes of each domain boundary in the query to predicting the expansion ratio () for each domain radius. This reduces the model output dimension from to (). During model training, this shift will lead to an increase in but a decrease in . Nevertheless, since plays a dominant role in reducing the value, while has a smaller weight coefficient, the impact on the final value is limited. Overall, this shift, by reducing the prediction dimensionality, also stabilizes the decrease of , thereby promoting a greater decrease in total loss on the test set.
Model structure. Next, we introduce the load predictor, as showcased in Fig.5, to fit the query changes in training samples , taking as input and generating the output .
Step 1 Normalization process. Due to significant differences in the domain ranges across different columns, we first normalize the structured vector to the 0–1 intervals based on the table domains , as follows:
Step 2 Estimation process. Next, for ordered loads, to learn dependency relationships among continuous sequences , we utilize an LSTM network [17] comprising multiple hidden layers and memory cells, each containing input, forget, and output gate. This enables the network to adaptively store and update long-term dependency information regarding query domain changes. For unordered loads, given the one-to-one mappings , we directly employ a simple MLP network [16] composed of stacked linear layers with a ReLU activation function to fit the query changing trends. Finally, we use a fully-connected (FC) layer to map the obtained prediction features ( or ) to an expansion ratio and apply a Sigmoid activation function to transform the output range to . This is then input into the scale-layer to extend , yielding
where .
Step 3 Model training. The model is trained to aggregate the features of the query vectors with non-linear transformations and iteratively update the network weights to minimize the loss between and . We use L2 regularization [33] to prevent overfitting. Assuming the optimal network weights are , we can obtain using or .
6 Robust partitioning layout construction
In this section, we first present the construction process of a partition tree (a.k.a., logical partition structure) based on obtained query predictions. This process is divided into three phases: parsing, expansion, and optimization. Subsequently, we describe how this tree is integrated into the DBMS.
6.1 Parsing phase: candidate split set generation
To construct a partition tree, we need to pre-generate a candidate split set (denoted as ) before each node split. Starting from the root node, we first obtain predicate set () decoded from the vector predictions . If , inspired by KD-Tree [4], we proceed to derive the column median condition set () from the node data distribution; otherwise, we set . The two sets are then merged as the split set for the current node, i.e., , where denotes the predicate-based split and denotes the median-based split. At each split, only one element in is selected as the split condition. After splitting this node, we need to reallocate the remaining predicates for each child node, an operation denoted as . Specifically, we filter the feasible predicates for the child nodes, checking whether each predicate can split a given child node into nodes that satisfy the size constraints, and then update the median conditions for each child node.
The use of split conditions. Given a node and the split condition sj(,op,), we denote the split operation as Split(Vi, sj). If the split column is of (encoded) numeric type, we execute the expression ‘’ (where ) to split the node data. Rows that satisfy the condition are directed to the left child node; otherwise, they are directed to the right. If is an unencoded text type, the expression ‘’ represents an invalid split and is treated as a special median split, i.e., we randomly split the node data and evenly distribute it to the two child nodes.
6.2 Expansion phase: beam-search tree split
During the entire tree extension, we aim to find an optimal split order by assigning appropriate split conditions to nodes at each depth, as the current depth’s condition selection impacts both the next depth’s decision and the overall split order of the tablespace.
Basic idea. We adopt a refined greedy policy, beam search (BS), which first explores local split condition combinations of multiple tree depths at each step, and then selects the split condition corresponding to the current depth from the combination with the highest skipping benefit . Here, represents the reduction in query cost before and after splitting. This process involves two important parameters: (1) represents the number of split condition combinations explored in each decision-making step; (2) is the number of split conditions in each combination, i.e., the additional explored maximal tree depth at the current node.
Beam search-based tree split. Algorithm 1 provides the execution details of . We first traverse each available leaf node (initially only the root node) of the partition tree in turn (line 4). Then, we generate a dynamically changing during the node split process (lines 2,5,16) and sort by (line 5). Subsequently, the top- items in are selected as the exploration conditions, allowing distinct candidate split paths for (lines 7–13). Each path recursively executes the same beam search subprocess (denoted as ) to return a query cost () of accessing the current child nodes. Here, each will continue to extend more secondary paths (i.e., ) until is executed. Next, we choose the current split condition () from the optimal path among the paths, splitting into child nodes and , and updating the for them (lines 14–17). After splitting, we proceed to the next available leaf node and repeat this process. The algorithm terminates when no more leaf nodes can be split (line 3).
6.3 Optimization phase: node shape refinement via data replication
In horizontal partitioning, row data acts as the minimal construction unit of partition files. This can result in significant query access contention between leaf nodes, especially for dense queries, where different parts of each row are likely to be accessed by different queries. To mitigate the contention, LRP introduces data replication for frequently accessed rows, thereby creating super nodes, which are defined as follows:
Definition 3 (Super Node). Leaf nodes maintain distinct data domains to ensure each row is routed to only one partition. However, when a small batch of rows from one partition is copied into another, this rule is broken, and the modified partition becomes a redundant partition. We refer to its corresponding tree node as a super node, which preserves a primary domain and multiple secondary domains to route non-redundant and redundant data independently.
We denote the replication process of data from node to as , where is the super node and is assigned an additional secondary domain to maintain the redundant data. When more data is replicated from to , any intersecting secondary domains are merged to save storage space for repeated redundant data. When searching for leaf nodes that match a given query, the primary domain continues to function as the main router. However, adjustments are made to super nodes by effectively utilizing secondary domains, thereby minimizing the number of reads required for leaf nodes.
Example 4 Given a two-dimensional table and a decoded query set , Fig.6(a) displays a partition tree is expanded using the split order , where and represent predicate and median splits extracted from and the node dataset, respectively. The final contains six leaf nodes. Specifically in the tablespace, these nodes are instantiated six partitions (see Fig.6(b)). Based on the primary domain of partitions, needs to access . If we replicate the green-shaded data area from to (i.e., ) and add the data domain of as the secondary domain of (see Fig.6(c)), then only needs to access because ’s scanning data in satisfies the secondary domain of . Similarly, when creating , the access plan for can be changed from to .
Replicate node creation. We stipulate that a node will be considered for replication if the ratio of redundant data to the entire node’s data is below the set threshold . Generally, the more data is replicated, the greater the data skipping benefit . However, if replicating an equal amount of data does not yield a proportional increase in (e.g., when the super node serves only a few queries), then this replication operation is not recommended due to its high data maintenance and storage costs with minimal . To address this trade-off, we first define a function to represent the change in the ratio of skipping benefit to redundant data proportion as increases, i.e.,
where and respectively are user-defined weights for and (we set ); is all redundant data when the replication threshold is ; denotes the table data. Since we aim to achieve greater skipping benefits with as low as possible, we can search for the inflection point of the y-curve’s growth as the optimal , i.e., solving . It signifies that when , increasing at the same proportion will result in limited growth in .
Super node selection. We categorize the visited nodes into two groups: replicate nodes () with data to be replicated and candidate nodes () without such redundant data. For each query, after deciding these replicate nodes (satisfying ), we must select which node among the candidate nodes will serve as the super nodes. Generally, to maximize , the with the fewest query accesses is preferred as the super node to avoid additional reads of redundant data for irrelevant accessing queries. Therefore, for each pair of nodes and , we define a benefit () for to reflect its allocation priority. The arises from the skipped querying node size after replicating, minus the additional read size of redundant data for other irrelevant queries, that is
where is the skipped node for all queries that benefit from the creation of ; indicates the existing redundant data in before allocation; represents the data replicated from to ; represents the number of queries accessing .
Next, we select the top element with the highest benefit, , as the super node (), updating its accordingly, i.e., . Each can be allocated once or multiple times. This allocation process is repeated until positive benefits cannot be obtained.
6.4 Functionality of the LRP partition tree
The LRP partition tree serves two primary functions: (1) Routing support: Each row in the raw table is processed, matching with leaf nodes via the node metadata of , and subsequently written to the corresponding partition files (a.k.a., physical partitions). (2) Partition filtering: When a new query is submitted, its encoded logical tree is executed across all relevant partition trees. The logical predicates at each level of are examined from bottom to top to identify the satisfied leaf nodes of . This limits the search space for partition files and assists the query optimizer in formulating an efficient plan.
7 Efficiency analysis of LRP design
In this section, we analyze the execution efficiency of four phases within LRP and compare it with two classical partitioning algorithms (QdTree [8], PAW [9]). The results are shown in Tab.2.
Data and query feature extraction: QdTree incurs minimal overhead during predicate feature extraction due to the absence of a query prediction module, thus eliminating the need for query vectorization. In contrast, PAW requires traversing the entire sampling table to pre-execute queries for generating query vectors. LRP, however, only encodes non-numeric columns and performs logical computations, resulting in lower costs compared to PAW.
Query prediction: The training time for LRP’s prediction module depends on the number of epochs, introducing additional costs due to weight updates across stacked network layers. Conversely, PAW is faster as it only necessitates multiple iterations of heuristic determinations to incrementally identify the optimal .
Partition tree construction: The selection of candidate predicates during each tree split process is the primary time bottleneck. Both QdTree and PAW employ a similar greedy strategy. However, LRP explores a larger solution space through beam search in each predicate selection round, potentially leading to exponential growth in time complexity if is not properly set.
Data routing: All methods share the same partition instantiation process. To deploy partitions, all row data must be routed through the partition tree and written to specified block addresses. Here, we primarily considered the volume of data written and network factors, without accounting for finer-grained factors such as cache hit rate.
Overall, both PAW and LRP involve query encoding and prediction, leading to higher execution overhead compared to QdTree. When comparing LRP and PAW, LRP incurs lower costs only during the feature extraction phase, but its overall overhead is greater. However, as illustrated in Tab.2②③, this issue can be partially mitigated by setting appropriate hyperparameters (e.g., keeping as small as possible, removing redundant intermediate network layers to reduce , etc.) or by employing multithreading to build different partition sub-trees in parallel. Furthermore, as shown in Tab.2④, when routing large-scale datasets to create partition files (i.e., ), the time scale differences among the algorithms have a smaller impact on overall system performance.
8 Experiments
In this section, we show the evaluation results of LRP system from three aspects: (1) We compare the scanning ratios of LRP (including its variants) with two classical partitioning algorithms (QdTree, PAW) and the optimal reference for raw and numeric tables under static loads; (2) We separately evaluate the table scanning ratios of different prediction methods combined with the same or different partitioners under -similarity loads. We also explore the impact of different mixing ratios of exploratory and similarity queries on partitioner performance; (3) We further evaluate the above experimental configurations in a real Spark cluster.
8.1 Experimental setup
We conduct extensive experiments on a Spark cluster with four computer nodes (3.8 GHz CPUs, 32 GB RAM, 1 TB disk), utilizing HDFS [18] as the underlying file system and Parquet files [34] to store partitioned block data. To speed up data routing, a distributed acceleration framework Ray [35] is utilized to fully leverage the computational resources of each machine. The neural networks (prediction module) are trained on a Tesla P40 GPU with a 24 GB frame buffer.
Dataset and static workload. (1) TPC-H benchmark [15] (50 GB data) comprises 8 tables (62 columns in total) and 22 query templates. (2) TPC-DS [36] (62 GB data) is used to simulate the sales operations of a department store. Compared to TPC-H, it offers a more comprehensive schema, including 7 fact tables and 14 dimension tables (a total of 429 columns), as well as 99 query templates. (3) JOB benchmark [37] is a 13 GB real-world IMDB dataset containing 12 tables (134 columns in total) and 33 query templates. (4) ClickBench [38] (20 GB data) is derived from an actual traffic platform, featuring a large table (105 columns, 100M records) and 43 query templates. Here, query templates are used to generate arbitrary number of static synthetic queries.
Similarity workload. Following PAW [9], we collect 4250 static queries as for TPC-H and JOB, and generate the mappings , named TPC-D and JOB-D, respectively. To be specific, we divide each tablespace into regions and extend each domain of with a random distance threshold , where , , and represents that falls within the th region. For the ordered queries, we set and . For each benchmark, we randomly shuffle -similar queries and split them into training/validation/test sets by 7:2:1.
Evaluation metrics. (1) Access Ratio indicates the ratio of accessed data to the entire table size. (2) Query Latency refers to the end-to-end response time for processing queries. (3) Model Execution Time denotes the time taken to generate logical partition structures, including query parsing, query prediction, and partition tree creation.
Baseline methods.
● QdTree [8] generates the candidate split set using only numeric predicate conditions during the creation of the partition tree. Its split policy consistently selects the predicate condition that maximizes data skipping benefit at each extension step.
● PAW [9] optimizes the division of small nodes in QdTree by merging multiple steps of predicate conditions into a single-step split condition.
● LB-Cost represents a theoretical lower bound cost in an ideal scenario where all accessed data per query is searched and directly written to separate block files.
● is a refined version of PAW [9]’s prediction module. It selects the optimal distance threshold from all possible values for extending domain boundaries in , aiming to minimize the MSE between and .
● is an abbreviation for the independent MLP or LSTM predictor component in LRP, aiming to minimize the access ratio when executing over -based partitions.
● LRP-E and LRP-S are two variants of LRP, each with a component removed to conduct an ablation study. They remove the column encoding and data redundancy modules from LRP, respectively.
8.2 Exp-1: Static scenario experiments
Evaluation on pure numeric tables. We first conduct experiments on pure numeric tables, i.e., by removing all non-numeric column data. Fig.7(a) reveals that LRP-S and LRP-E gains over a 8% and 15.7% reduction in table access ratio than PAW across all benchmarks, respectively. Meanwhile, Fig.8 provides detailed comparisons for representative tables in each benchmark. All cases follow the same performance ranking: , where LRP is equivalent to LRP-E since the encoding component is invalid when non-numeric column data is excluded from the raw tables. These results have three-fold reasons: (1) LRP-S outperforms PAW, demonstrating that our beam search policy can identify better split orders among numerous condition combinations; (2) LRP-E outperforms LRP-S, indicating that super nodes help resolve query access contention caused by row-granularity limitations when handling high-density queries on tables like D_wb_sales and D_store_sales; (3) The performance differences between the algorithms are stable and do not vary significantly across different datasets or numerical predicates.
Evaluation on raw tables. When experimenting on raw tables, Fig.7(b) reveals larger performance discrepancies among the algorithms compared to the numeric environments shown in Fig.7(a). Detailed comparisons for specific tables are provided in Fig.9. LRP-S and LRP reduce the data access ratio by 38.9% and 48.8%, respectively, compared to PAW. This reduction is especially pronounced in tables with a high proportion of predicates, such as H_part, D_item, and J_name (average ). This is because when more textual predicates (especially those involving costly join operations) are mixed with numeric predicates, LRP and LRP-S are still able to effectively capture the complete query access patterns. In contrast, QdTree and PAW perform worse due to their incomplete query information, rendering numeric predicate-based splits inefficient. This inefficiency is also reflected in their similar table scanning ratios in cases like H_part, D_customer, and J_name, where a large proportion of text predicates are present. Moreover, the result that LRP outperforms LRP-S indicates that LRP’s allowance for data redundancy among partitions further enhances performance on raw tables. These factors contribute to LRP exhibiting more significant reductions in scanning data compared to the other baselines.
Evaluation on model overhead. Regarding the raw tables, Fig.10(a) evaluates the model execution time for each algorithm and their average data routing time when deploying physical partitions. Despite LRP being 17.87 s slower than PAW in logical partition generation due to its additional data encoding and redundancy modules, their cumulative processing time ( s) is far lower than the data routing time ( h). Thus, this time gap can be considered negligible. Fig.10(b) displays the changes in the data redundant ratio. We observe that SuperNode minimizes block addressing and scanning by replicating a small portion of data. Specifically, LRP yields an average scanning ratio reduction of 22.7% (i.e., ) over LRP-S, with only a 0.0034 data redundancy ratio. This is because queries with more textual predicates are more likely to access dispersed, smaller amounts of data, thereby promoting the of redundant partitions.
Evaluation on model adaptability. We also conduct model sensitivity tests with varying workload configuration. Fig.11(a) generates TPC-H queries ranging from 50 to 1000, increasing the query density and requiring finer partitions. LRP initially achieves a 6.7% performance improvement over PAW, eventually reaching 29%. Fig.11(b) explores the impact of maximum query range: an increasing queried range results in higher access ratios and a larger performance gap between LRP and PAW. However, at 0.3 or above, the quality of all layouts begins to deteriorate until reaching 1, at which point the quality of all layouts become consistent, with their queried data being stored in the same number of blocks. Fig.11(c) restricts the maximum number of columns accessed by each query. As shown, the more columns involved in the predicates, the fewer the average rows typically meet the conditions, resulting in a smaller proportion of data scanned. This does not affect predicate selection during partition tree construction, thus having minimal impact on algorithm performance. Fig.11(d) explores two extreme query distributions. Under uniform distribution, each algorithm effectively partitions rows to accommodate each query, with LRP performing best. When queries are densely distributed, some table areas cannot be effectively partitioned due to contention among queries, reducing the performance of all algorithms. However, LRP still maintains the lowest access ratio, benefiting from data replication.
8.3 Exp-2: -similarity scenario experiments
In addition to the provided and predictors, we establish two additional prediction baselines: N/A, representing no prediction, and OPT, referring to the predicted load satisfying .
Evaluation on prediction approaches.Fig.12 displays the -based layout performance for various prediction methods, along with the performance deviation (shaded areas) across single-table layouts. Regardless of whether the same partitioner (e.g., LRP) or distinct partitioners (i.e., QdTree, PAW, LRP, and LB-Cost) are used to create layouts based on the given predictions, our predictor always outperforms the best alternative, achieving performance improvements of and compared to in unordered and ordered loads, respectively. Additionally, shows narrower shaded boundaries, indicating better optimization for each sub-layout. Moreover, compared to ordered loads, proves less effective in unordered loads with fewer historical queries, while exhibits greater stability.
Case study.Fig.13(a)–Fig.13(c) and Fig.13(d)–Fig.13(f) present 3D visualizations of query prediction cases from TPC-D and JOB-D, respectively, across dimensions , , and . In Fig.13(a) and Fig.13(d), a close-up of a randomly selected query is shown, with the coordinate system adjusted for clarity. Blue and red boxes represent the generated by and , respectively, while the green boxes depict the actual . In Fig.13(a), although ’s aligns more closely with compared to , i.e., , achieves a smaller prediction loss () due to its value being 0. Intuitively, if the red box is used as the basis for constructing partition files, then for queries requiring scanning the green box, only one file needs to be accessed to complete the task. Similarly, in Fig.13(d), the of is 0, which is lower than the 4 of . This suggests that predicts domain values that are slightly larger than the real , making robust partitioning feasible.
Fig.13(b)–Fig.13(c) and Fig.13(e)–Fig.13(f) randomly select six and eight query cases, respectively, with their prediction metrics compared in Tab.3. In all cases, partitions based on ’s predictions consistently yield fewer scanned blocks () and lower data scan ratios (). This can be attributed to ’s smaller (), leading to a smaller value (), irrespective of the value ().
Evaluation on the exploratory queries.Fig.14 shows that LRP mitigates performance degradation when more exploratory queries are mixed into TPC-D and JOB-D test queries. As the random percentage increases from 0% to 85%, the access ratio gap between PAW and QdTree expands from 2.2 to 13.1 in TPC-D, and from 1.4 to 10.8 in JOB-D. Similarly, the access ratio gap between LRP and PAW expands from 1.2 to 9 in TPC-D and from 0.5 to 3.5 in JOB-D. This improvement is attributed to PAW/LRP’s use of data distribution-based median splits, which enhances their layout adaptability to exploratory queries compared to QdTree’s sole use of predicate splits. Moreover, LRP’s consideration of median splits for encoded non-numeric column data further boosts its adaptability.
8.4 Exp-3: efficiency analysis in Spark cluster
There is a positive correlation between the access ratio and query latency. Reducing the data access ratio typically results in decreased query latency due to fewer accessed partition files, less scanned metadata and column data on disk, and less frequent merging of scan results.
To assess the quality of constructed layouts, we utilized Spark-SQL, averaging five executions for each indicator measurement. Tab.4 presents statistical data such as averages, minimums, percentiles, regarding query latency. Our findings are as follows. (1) Static Scenarios: For both numeric and raw datasets, LRP consistently outperforms other baselines across various statistical indicators. The optimizations in the partition tree and the complete consideration of logical predicates make LRP achieve an average query latency reduction of 15.9% and 47.5% for the two types of datasets, respectively, compared to PAW. LRP exhibits more significant performance improvements in raw tables, similar to the trends observed in Fig.7(b). By effectively leveraging text-type predicates as partitioning features to partition data, LRP accelerates filter operations, especially in queries with a higher proportion of non-numeric columns. (2) Dynamic Scenarios: In TPC-D and JOB-D, LRP consistently achieves greater performance improvements compared to static scenarios, attributable to its robust layout supported by predictive networks with a novel loss function. This function allows to establish a positive correlation between prediction loss and layout robustness while minimizing prediction loss. Compared to PAW, LRP achieves query response times that are 2.49 faster () on TPC-D and 3.17 faster () on JOB-D.
9 Conclusion and future work
In this paper, we propose an end-to-end LRP system. LRP begins by designing a comprehensive data and query encoding scheme to extract valuable query access patterns across both numeric and non-numeric column data types. It then trains MLP/LSTM networks to minimize loss values by integrating partition semantic information, thereby predicting future query pattern shifts. Finally, it implements beam search-based tree-splitting and node redundancy policies to generate a partition tree, which can materialize a robust data layout tailored to predicted query patterns. Experimental results on Spark demonstrate that our method significantly outperforms existing solutions, achieving a 49.20% reduction in query latency for static queries and a 64.15% reduction in dynamic environments.
However, some aspects of LRP still require further refinement, which we plan to address in future work. (1) Join Optimization: When selecting tree-splitting conditions, we did not fully consider the cost of data shuffling, which often exceeds scanning costs. Therefore, predicate and median conditions related to Join columns should be weighted more heavily to reduce shuffling overhead. (2) Load Balancing: Although table data has been allocated to partition files, the distribution of these files across machines needs further optimization. Future work will focus on optimizing the physical placement of files based on access frequency and other factors to ensure load balancing and maximize the utilization of cluster resources.
Taylor R W, Sacca D, Wiederhold G. Database partitioning in a cluster of processors. ACM Transactions on Database Systems (TODS), 1985, 10( 1): 29–56
[2]
Copeland G, Alexander W, Boughter E, Keller T. Data placement in bubba. In: Proceedings of 1988 ACM SIGMOD International Conference on Management of Data. 1988, 99−108
[3]
Stöhr T, Märtens H, Rahm E. Multi-dimensional database allocation for parallel data warehouses. In: Proceedings of the 26th International Conference on Very Large Data Bases. 2000, 273−284
[4]
Bentley J L. Multidimensional binary search trees used for associative searching. Communications of the ACM, 1975, 18( 9): 509–517
[5]
Zhan C, Su M, Wei C, Peng X, Lin L, Wang S, Chen Z, Li F, Pan Y, Zheng F, Chai C. AnalyticDB: real-time OLAP database system at alibaba cloud. Proceedings of the VLDB Endowment, 2019, 12( 12): 2059–2070
[6]
Papadomanolakis S, Ailamaki A. AutoPart: automating schema design for large scientific databases using data partitioning. In: Proceedings of the 16th International Conference on Scientific and Statistical Database Management. 2004, 383−392
[7]
Sun L, Franklin M J, Krishnan S, Xin R S. Fine-grained partitioning for aggressive data skipping. In: Proceedings of 2014 ACM SIGMOD International Conference on Management of Data. 2014, 1115−1126
[8]
ang Z, Chandramouli B, Wang C, Gehrke J, Li Y, Minhas U F, Larson P Å, Kossmann D, Acharya R. Qd-tree: learning data layouts for big data analytics. In: Proceedings of 2020 ACM SIGMOD International Conference on Management of Data. 2020, 193−208
[9]
Li Z, Yiu M L, Chan T N. PAW: data partitioning meets workload variance. In: Proceedings of the 38th IEEE International Conference on Data Engineering. 2022, 123−135
[10]
Sun L, Franklin M J, Wang J, Wu E. Skipping-oriented partitioning for columnar layouts. Proceedings of the VLDB Endowment, 2016, 10( 4): 421–432
[11]
Li C, Markl V, Aly A M, Mahmood A R, Hassan M S, Aref W G, Ouzzani M, Elmeleegy H, Qadah T. AQWA: adaptive query workload aware partitioning of big spatial data. Proceedings of the VLDB Endowment, 2015, 8( 13): 2062–2073
[12]
Aly A M, Elmeleegy H, Qi Y, Aref W. Kangaroo: workload-aware processing of range data and range queries in hadoop. In: Proceedings of the 9th ACM International Conference on Web Search and Data Mining. 2016, 397−406
[13]
Lu Y, Shanbhag A, Jindal A, Madden S. AdaptDB: adaptive partitioning for distributed joins. Proceedings of the VLDB Endowment, 2017, 10( 5): 589–600
[14]
Ding J, Minhas U F, Chandramouli B, Wang C, Li Y, Li Y, Kossmann D, Gehrke J, Kraska T. Instance-optimized data layouts for cloud analytics workloads. In: Proceedings of 2021 International Conference on Management of Data. 2021, 418−431
[15]
TPC-H benchmark. See tpc.org/tpch/ website, 1999.
[16]
Rosenblatt F. Principles of neurodynamics: perceptrons and the theory of brain mechanisms. The American Journal of Psychology, 1963, 76( 4): 705–707
[17]
Hochreiter S, Schmidhuber J. Long short-term memory. Neural Computation, 1997, 9( 8): 1735–1780
[18]
Shvachko K, Kuang H, Radia S, Chansler R. The hadoop distributed file system. In: Proceedings of the 26th IEEE Symposium on Mass Storage Systems and Technologies (MSST). 2010, 1−10
[19]
Shanbhag A, Jindal A, Madden S, Quiane J, Elmore A J. A robust partitioning scheme for ad-hoc query workloads. In: Proceedings of 2017 Symposium on Cloud Computing. 2017, 229−241
[20]
Huang D, Liu Q, Cui Q, Fang Z, Ma X, Xu F, Shen L, Tang L, Zhou Y, Huang M, Wei W, Liu C, Zhang J, Li J, Wu X, Song L, Sun R, Yu S, Zhao L, Cameron N, Pei L, Tang X. TIDB: a raft-based HTAP database. Proceedings of the VLDB Endowment, 2020, 13( 12): 3072–3084
[21]
ClickHouse: an open-source columnar database management system. See clickhouse.com/docs/en/observability/managing-data website, 2016
[22]
Dageville B, Cruanes T, Zukowski M, Antonov V, Avanes A, Bock J, Claybaugh J, Engovatov D, Hentschel M, Huang J S, Lee A W, Motivala A, Munir A Q, Pelley S, Povinec P, Rahn G, Triantafyllis S, Unterbrunner P. The snowflake elastic data warehouse. In: Proceedings of 2016 International Conference on Management of Data. 2016, 215−226
[23]
Moerkotte G. Small materialized aggregates: a light weight index structure for data warehousing. In: Proceedings of the 24th International Conference on Very Large Data Bases. 1998, 476−487
[24]
Graefe G. Fast loads and fast queries. In: Proceedings of the 11th International Conference on Data Warehousing and Knowledge Discovery. 2009, 111−124
[25]
Kang D, Jiang R, Blanas S. Jigsaw: a data storage and query processing engine for irregular table partitioning. In: Proceedings of 2021 International Conference on Management of Data. 2021, 898−911
[26]
han A, Yan X, Tao S, Anerousis N. Workload characterization and pre diction in the cloud: a multiple time series approach. In: Proceedings of 2012 IEEE Network Operations and Management Symposium. 2012, 1287−1294
[27]
Pavlo A, Angulo G, Arulraj J, Lin H, Lin J, Ma L, Menon P, Mowry T C, Perron M, Quah I, Santurkar S, Tomasic A, Toor S, Van Aken D, Wang Z, Wu Y, Xian R, Zhang T. Self-driving database management systems. In: Proceedings of the 8th Biennial Conference on Innovative Data Systems Research. 2017, 1
[28]
Ma L, Van Aken D, Hefny A, Mezerhane G, Pavlo A, Gordon G J. Query-based workload forecasting for self-driving database management systems. In: Proceedings of 2018 International Conference on Management of Data. 2018, 631−645
[29]
Hilprecht B, Binnig C, Röhm U. Learning a partitioning advisor for cloud databases. In: Proceedings of 2020 ACM SIGMOD International Conference on Management of Data. 2020, 143−157
[30]
Zhou X, Li G, Feng J, Liu L, Guo W. Grep: a graph learning based database partitioning system. Proceedings of the ACM on Management of Data, 2023, 1( 1): 94
[31]
Jindal A, Dittrich J. Relax and let the database do the partitioning online. In: Proceedings of the 5th International Workshop on Business Intelligence for the Real-Time Enterprise. 2011, 65−80
[32]
Wang J, Chai C, Liu J, Li G. Face: a normalizing flow based cardinality estimator. Proceedings of the VLDB Endowment, 2021, 15( 1): 72–84
[33]
Hoerl A E, Kennard R W. Ridge regression: biased estimation for nonorthogonal problems. Technometrics, 1970, 12( 1): 55–67
[34]
Bertino E, Atzeni P, Tan K L, Chen Y, Tay Y C, Melnik S, Gubarev A, Long J J, Romer G, Shivakumar S, Tolton M, Vassilakis T. Dremel: interactive analysis of web-scale datasets. Proceedings of the VLDB Endowment, 2010, 3( 1–2): 330–339
[35]
Ray: an open source framework to build and scale your ML and Python applications. See docs.ray.io/en/latest/ website, 2017
[36]
TPC-DS benchmark. See www.tpc.org/tpcds/ website, 2005
[37]
JOB benchmark. See developer.imdb.com/non-commercial-datasets/ website, 2016
[38]
ClickBench benchmark. See github.com/ClickHouse/ClickBench website, 2019
RIGHTS & PERMISSIONS
Higher Education Press
AI Summary 中Eng×
Note: Please be aware that the following content is generated by artificial intelligence. This website is not responsible for any consequences arising from the use of this content.