1. College of Mathematics and Information Science, Hebei Normal University, Shijiazhuang 050024, China
2. Information Department, Children's Hospital of Hebei Province, Shijiazhuang 050031, China
3. Hebei Key Laboratory of Computational Mathematics and Applications, Shijiazhuang 050024, China
gshzhang@hebtu.edu.cn
Show less
History+
Received
Accepted
Published
2023-10-15
Issue Date
Revised Date
2023-12-27
PDF
(532KB)
Abstract
A combinatorial batch code has strong practical motivation in the distributed storage and retrieval of data in a database. In this survey, we give a brief introduction to the combinatorial batch codes and some progress.
Dongdong JIA, Yuebo SHEN, Gengsheng ZHANG.
A survey of the study of combinatorial batch code.
Front. Math. China, 2023, 18(5): 301-312 DOI:10.3868/s140-DDD-023-0024-x
In 2004, Ishai et al. first proposed the concept of batch codes based on the actual background of data storage in [12]. Assuming that there is a large database containing item data, it is now necessary to allocate this item data to servers (different servers can have the same data, and this process can be regarded as encoding). After the data is allocated, when any item data in the item data is needed, it is hoped that each server can read up to item (for load balancing between servers) to recover this item data (this process can be regarded as decoding). The concept of batch code is obtained by abstracting the above problems: the batch code on character set encodes the string into group string (also known as server), and the total length of this group string is , so that for different index metrics , the information at each of the positions corresponding to in can be translated by reading up to messages from each server. In [12], Ishai et al. introduced some important applications of batch code and their connection with practical problems, described the relationship between this type of code and other types, and gave several methods to construct batch code. Among the many encoding and decoding modes of batch code, Ishai et al. gave the simplest coding and decoding mode, that is, it does not need to be encoded when stored, but is stored directly, so the decoding is to read out directly. They call it a replication-based batch code. Because the structure of this batch code has pure combination characteristics, in 2009, Paterson et al. called it combinatorial batch codes in [15], referred to as CBC for short. Since then, people have begun to study CBC from the perspective of pure combination. Here, the finely determined fork of CBC is transcribed as follows.
Definition 1.1 [15] An -CBC is a set system , where is a set of elements, is a collection of subsets of and . For each -subset , there exists a subset , where , such that
where the elements of are called points and the elements of are called intervals.
For the given parameter , the desired CBC should minimize the total storage amount in each server (i.e., the value of N), in order to save as much storage space as possible. CBC that meets this condition is called optimal, and this optimal value is denoted as . Therefore, finding this optimal value is the key to constructing the optimal CBC. Up to now, most of the research results are aimed at the situation of . Obviously, when , it is required to read up to 1 element from each server. At this time, CBC is synxed as -CBC, and the optimal value is s synxed as .
In addition to the above analysis and construction of the combinatorial batch codes from the perspective of saving storage space, we can also consider the optimality of the combinatorial batch codes from another perspective, that is, the so-called storage rate . The existing conclusions have shown that for fixed , when , the optimal storage rate of -CBC is close to . Therefore, the following questions become very meaningful: for fixed and , how to construct -CBC with the largest (at this time, is a function of ). Paterson et al. proposed a special type of CBC in [15], that is, each element of it is stored in servers, that is to say, the storage rate is . Such CBC is called consistent. The study of the optimal properties of this kind of CBC is the given parameter , find the maximum value of that is consistent -CBC. This maximum value is recorded as , and when , it is written as . Consistent CBC reaching this maximum value is called optimal.
In addition, Chen introduced a combinatorial batch codes called ECBC codes in his master’s thesis [11], that is, each the combinatorial batch codes with the same number of elements in the server, and some preliminary research results have been obtained.
In the past three years, we have conducted a preliminary study on the combinatorial batch codes. In this process, we have collected almost all the literature about the combinatorial batch codes so far, and wrote this article on the basis of careful study. For CBC and uniform CBC, we have some major research results are introduced and summarized, hoping to be helpful for subsequent research.
2 Determine the condition for the set system to become the combinatorial batch codes
Generally speaking, it is difficult to directly judge whether a set system meets the conditions of the combinatorial batch codes according to the definition. Therefore, people need to start from the definition and find some conditions to indirectly determine whether a set system is a combinatorial batch code with certain parameters. The research results in this regard mainly focus on two aspects: first, the association matrix of the set system (group mark row, point mark column); second, the dual set system.
Paterson et al. [15] gave a judgment that the matrix is the sufficient and necessary condition of the CBC's association marix: an matrix is the association matrix of -CBC if and only if for any column of , it contains a submatrix, and there is at least one transversal containing ones at this submatrix.
The dual set system of -CBC is denoted as , where . Buitas et al. [7] presented 4 equivalent conditions for determing whether a set system is , i.e., becoming a -CBC dual set system.
(1) is a system, that is, for every element subsystem , there exist of mutually different elements , such that holds for every .
(2) satisfies the k-Hall condition, that is, every subsystem of members of satisfies the inequality .
(3) For any , every element subset of contains at most groups from the system .
(4) For any , every element subset of intersects at least groups from the system .
The later research shows that the above conclusion has become a powerful tool to verify whether a set system is CBC.
For general , Buitas et al. [8] gave a similar conclusion.
For positive integers , if each subset , there are elements such that holds for every , and the multiple of each element of in is at most t, then the set system is a system. If for each subset , satisfying ,it is said that the set system meets the (k,t)-Hall condition. Therefore, the following three equivalent conditions can be used to determine that the set system is a dual set system of -CBC.
(1) is a -system.
(2) satisfies the -Hall condition.
(3) For any , every element subset of contains at most members of .
3 The construction and optimal value of the combinatorial batch codes
For any given parameter several , the problem of determining has not been completely solved. So, in some cases, it is impossible to judge whether the constructed CBC is optimal. There are roughly two ways to determine . Fisrt, the constructed CBC determines the upper bound of , and then determines that the upper bound is also the lower bound of through further discussion, so that the optimal CBC construction of the given parameter is solved together with the optimal value problem. Second, directly determine by discussing the combinatorial properties of the CBC with given parameters.
In 2009, Paterson et al. [15] first conducted a detailed study on the combinatorial batch codes when . For parameters , Paterson et al. gave the optimal value of CBC in a variety of cases.
(1) Two ordinary situations.
When , different points are stored separately in servers to get -CBC. Obviously, it is optimal. When , the optional points are stored in the servers, respectively, and the remaining points are stored once in each server, which means -CBC, this CBC is also optimal. If , then at least one server contains or fewer points. For points in this server, they cannot be read as required, see the following theorem.
Paterson et al. [15] used the -FS diagram to construct the association matrix of CBC to get a -CBC, of which,
where is a positive integer, . When or , there is a similar structure respectively. In this way, the upper bound of is determined, and then they prove that the upper bound is also at the lower bound of at , so there is
You can see that the values in this optimal construction are only 1 more than the value. When take other values, CBC constructed by this method is not sure if it is optimal. For the so-called “large” , Paterson et al. also studied in [15].
(3) The situation when .
For the case of , Paterson et al. construct the group in the dual set system of CBC as all subsets of element set repeat times, and the others are arbitrary element subsets. By k-Hall condition that each element subset can be repeated at most time, so that a . To prove this CBC is optimal, they constructed a association matrix. By calculating the number of 1 in association matrix, it is concluded that when the number of long groups in CBC's dual set system is 0, has a minimum lower bound, and this lower bound is exactly . This also shows that the constructed CBC is optimal.
Subsequently, people studied the situation when . For the “slightly smaller” . When , Ruj et al. [16] and Bhattacharya et al. [2] according to the optimal , the optimal CBC for this “slightly smaller” is constructed in different ways. Bujitas et al. [6] get the optimal value of CBC by direct calculation.
So far, when , the optimal value and optimal construction problems of CBC have been completely solved. Next, we need to study the optimal value and optimal structure of CBC when . This process doesn't seem to be going well.
In 2012, Bhattacharya et al. [2] gave an incomplete conclusion. On the basis of the construction of the optimal , they use binary constant recoding to construct the -CBC when ( is the maximum number of code words of binary constant code with a length of , weight of , and Hamming distance of 4), of which . It also proves that only part of this structure is the optimal CBC, the other part of the CBC can't determine the optimality, but only knows one of their upper bounds.
For smaller values, it is difficult to determine the optimal value of CBC in batches. Brualdi et al. proved the Theorem 3.2 with a new method in [3, 4], and connected CBC with the transverse matrix along this road, using the transverse matrix to obtain the optimal structure of CBC when . In [7], Buitas et al. use the pure combination method to discuss the CBC in detail after the nature of the dual set system, the optimal value of CBC when is obtained, and a construction method is given.
Naturally, according to this idea, the situation should be discussed when , but this discussion is more complicated. Chen et al. [10], Jia et al. [13] only gave the optimal value in the case of and , respectively.
The above two results mainly benefit from the quantitative relationship between and (here ), the k-Hall conditions that need to be met in CBC's dual set system when take smaller values (the method in [13] is not applicable to larger ) and the so-called monotonous nature of CBC' optimal value.
In addition to discussing the optimal CBC along the path of , Bujtas et al. gave another path.
For , is ordinary. For , can be obtained from Theorem 3.3. For , according to Theorem 3.3 and Theorem 3.4, it can be concluded that when , . when , . Therefore, it seems feasible to study along . However, the optimal value of CBC is much more complicated when . In [6], Bujtás et al. analyzed the type of CBC* (k) system (called the type of the set system is , if there is for each , the following conclusion is reached.
Lemma 3.1 [6] If , there is an optimal , for eachoptimal , , its type isor .
In particular, when , Bujtás et al. [6] made some changes to the structure of so that the number of singletons, pairs and triplets group of is the same as , respectively, and there is no pairs group in . The elements in each singletons set belong to a pairs group at most, which simplifies the structure of , and does not change the size of the total storage amount of . Similarly, we can require the optimal to have such a structure. Bujtás et al. proved that if there is a optimal with type , then there is an optimal with type or .
Lemma 3.2 [6] For every integers , , thereexists an optimalof type , .
Obviously, when , it can be seen from the 4-Hall condition that any different two element subsets can be used as a group to get a . Therefore, when , . According to Lemma 3.2, there is an optimal of type . At this time, it is only necessary to determine the maximum number of single point sets in the dual set system under the condition of 4-Hall.
The method of calculating by Bujtás et al. is very unique, and the 4-Hall condition is relatively simple. Calculate in this way obviously doesn't work, because the 5-Hall condition is more complicated than the 4-Hall condition, so you need to get the optimal value of CBC when also requires a new calculation method applicable to 5-Hall conditions.
In addition to some “batch” CBC optimal values identified above, there are also some sporadic results.
Bhattacharya et al. [2] gave a lower bound of the CBC optimal value when .
Theorem 3.12 [2] Letis the smallest integer that meets the following conditions,
Then there is
It can be seen that the optimal value in Theorem 3.4 and Theorem 3.5 is equal to the lower bound in (1), and the value at the right end of the inequality in Theorem 3.5 is only different from the lower bound in (1). Silberstein et al. in [17] are constructed by decomposable cross-sectional design ( is the prime powers). They used the above theorem to calculate in CBC, is equal to the lower bound with (1), so there is . Some people once guessed that the lower bound in (1) was a infimum, but there have been counterexamples to show that the lower bound is not the exact bound. Liu Xian et al. constructed a category through in [14], but its optimality cannot be judged by the above theorem.
Up to a few days ago, almost all people's sunlight was focused on the situation of . For , only Rui et al. [16] and Buitás et al. [5, 8] gave some conclusions about the optimal value of CBC.
There are not many precise conclusions about the optimal value that needs to be determined in the uniform combination batch code, and the optimal uniform combination batch code as a general -CBC needs to be further studied. Paterson et al. gave an upper bound of in [15].
Paterson et al. [15] constructed some uniform CBC, particularly the points stored (i.e., the value of n) is equal to the upper bound of and of Theorem 4.1 when uniform of and of . That is to say, these two uniform CBC is optimal.
From the above conclusion, it can be seen that when , the upper bound in (2) is the supremum. When , the upper bound of can only be obtained by Theorem 4.1 is . In 2014, Balachadran and Bhattacharya improved the upper bound of (2) in [1], and it is concluded that if , then there is . After that, Bujtás et al. [9] further improved and found a better upper bound when , that is, .
Paterson et al. [15] used the probability method to discuss a lower bound of : for the definite integer , there is a uniform -CBC and , where is the normal number that only depends on , that is to say, . Balachandran et al. [1] have reached this conclusion that the lower bound of the (i.e., ) has been improved, and the following theorem is obtained.
Silberstein et al. [17] constructed the uniform -CBC ( is the prime powers), and constructed by a decomposable transverse design -CBC, from the upper bound of (2), it can be seen that these two uniform CBC are optimal. Silberstein et al. [17] constructed uniform -CBC and uniform -CBC according to the transverse design TD. From the conclusion in Theorem 4.5, it can be seen that these two uniform CBC is optimal. When , Bhattacharya et al. constructed -CBC by the original constant code in [2], where .
When , there are few studies on the optimal value of the uniform CBC, and only Ruj et al. gave in [16] some of the following conclusions.
Balachandran N, Bhattacharya S. On an extremal hypergraph problem related to combinatorial batch codes. Discrete Appl Math2014; 162: 373–380
[2]
Bhattacharya S, Ruj S, Roy B. Combinatorial batch codes: a lower bound and optimal constructions. Adv Math Comm2012; 6(2): 165–174
[3]
Brualdi R A, Kiernan K P, Meyer S A, Schroeder M W. Combinatorial batch codes and transversal matroids. Adv Math Comm2010; 4: 419–431
[4]
Brualdi R A, Kiernan K P, Meyer S A, Schroeder M W. Erratum to “Combinatorial batch codes and transversal matroids”. Adv In Math Comm2010; 4(3): 597
[5]
Bujtás C, Tuza Z. Combinatorial batch codes: extremal problems under Hall-type conditions. Electron Notes Discrete Math2011; 38: 201–206
[6]
Bujtás C, Tuza Z. Optimal batch codes: many items or low retrieval requirement. Adv Math Comm2011; 5(3): 529–541
[7]
Bujtás C, Tuza Z. Optimal combinatorial batch codes derived from dual systems. Miskolc Math Notes2011; 12(1): 11–23
[8]
Bujtás C, Tuza Z. Relaxations of Hall’s condition: optimal batch codes with multiple queries. Appl Anal Discrete Math2012; 6(1): 72–81
[9]
Bujtás C, Tuza Z. Turán numbers and batch codes. Discrete Appl Math2015; 186: 45–55
[10]
Chen J F, Zhang S M, Zhang G S. Optimal combinatorial batch code: monotonicity, lower and upper bounds. Sci Sin Math2015; 45(3): 311–320
[11]
ChenS B. A New Combinatorial Batch Code. Master Thesis. Beijing: Beijing Jiaotong University, 2010 (in Chinese)
[12]
IshaiYKushilevitzEOstrovskyRSahaiA. Batch codes and their applications, In: STOC’04 (Proceedings of the 36th Annual ACM Symposium on Theory of Computing). New York, 2004, 262–271
[13]
Jia D D, Zhang G S, Yuan L D. A class of optimal combinatorial batch code. Acta Math Sin, Chin Ser2016; 59(2): 267–278
[14]
Liu X, Zhang S M, Zhang G S. Combinatorial batch codes based on RTD (q−2, q). Adv Math (China)2016; 45(5): 700–710
[15]
Patterson M B, Stinson D R, Wei R. Combinatorial batch codes. Adv Math Comm2009; 3(1): 13–27
[16]
RujSRoyB. More on combinatorial batch codes. arXiv: 0809.3357v1
[17]
Silberstein N, Gál A. Optimal combinatorial batch codes based on block designs. Des Codes and Cryptogr2016; 78: 409–424
RIGHTS & PERMISSIONS
Higher Education Press 2023
AI Summary 中Eng×
Note: Please be aware that the following content is generated by artificial intelligence. This website is not responsible for any consequences arising from the use of this content.