1 Introduction
Submodular maximization has become one of the central problems in optimization and combinatorics, with submodular functions embodying the essence of diminishing returns. The inherent trait of a submodular function—that the marginal value of an item decreases as one adds more items to the set—makes it a particularly intriguing function to maximize. The submodular maximization problem arises in a variety of applications including influence maximization [
1], machine learning [
2,
3], sensor placement [
4,
5], feature selection [
6,
7], and information gathering [
8].
Due to its remarkable significance, submodular maximization has been studied over the past forty years, especially in the offline model. For the monotone case, it cannot exceed
under cardinality constraints [
9,
10], a tight approximation ratio met by the natural greedy algorithm [
11]. When combined with enumeration techniques, the greedy algorithm also achieves a
approximation ratio for knapsack constraints [
12,
13]. The continuous greedy algorithm [
14], proposed by Vondrák in 2008, generalizes to address various constraints, including matroid, and
d-knapsack and so on constraints with down-closed properties, necessitating randomization due to its sampling process [
15,
16]. Recently, Buchbinder and Feldman [
17] proposed a deterministic algorithm with an approximation ratio of
. For the non-monotone case, where randomization is a tool, the continuous greedy technique facilitates a 0.401 approximation [
18], trailing behind the 0.491 limit under cardinality constraints but being superior to the deterministic
for the cardinality and 1/4 for the knapsack constraint [
19,
20].
However, as the digital era progresses, the real challenge arises when we have to deal with massive datasets that are constantly evolving. This is where streaming algorithms come into play. Traditional optimization algorithms that work on static, small-scale datasets fall short when applied to large streaming data, which is continuously fed and cannot be stored or processed in its entirety. Streaming algorithms, in contrast, allow for processing massive datasets in real-time, by seeing each piece of data only once or a few times, making only a fraction of the entire data available at any point. Badanidiyuru et al. [
21] proposed a deterministic
approximation algorithm for monotone submodular maximization with cardinality constraints in a streaming model with
memory and sublinear time. Feldman et al. [
22] showed that even under the cardinality constraint in the streaming model, the approximation ratio for this problem cannot be better than 1/2, regardless of whether the objective function is monotone or not. For the non-monotone case, Alaluf et al. [
23] proposed a 1-pass semi-streaming algorithm framework with a
-approximation ratio using an offline
-approximate algorithm. When using the offline continuous greedy algorithm [
18], a 0.286-approximation random approximation algorithm can be obtained. When using the deterministic algorithm [
19], a 0.268-approximation deterministic algorithm can be obtained. We can see that streaming algorithms are more difficult than offline algorithms because the information mastered is not the whole picture, but only a part of it.
Regarding the knapsack constraint, there are several related works for monotone objective functions. Huang et al. [
24,
25] proposed a 1-pass streaming algorithm that achieves a
approximation, which was later improved to
in their subsequent work. Huang and Kakimura [
26] proposed a multi-pass streaming algorithm that achieves a
approximation. Meanwhile, Yaroslavtsev et al. [
27] proposed another multi-pass streaming algorithm that achieves the same approximation ratio and number of passes, but with less memory usage. For the
d-knapsack constraint, deterministic streaming algorithms with a
approximation ratio for monotone submodular maximization have been proposed in [
28,
29]. When the function is non-monotone, to the best of our knowledge, there is currently no streaming algorithm for these two constraints.
The non-monotone submodular maximization problem, unlike its monotone counterpart, does not assume that adding elements to a set necessarily increases the value of the function. Many real-world applications require the handling of non-monotone objectives due to the complex interdependencies and trade-offs inherent in real systems. For example, in sensor placement, adding more sensors does not always lead to better coverage due to potential interferences, or in document summarization, adding sentences to a summary might make it less coherent or relevant. In today’s era, data is being generated on an unprecedented scale. This vast influx of data necessitates algorithms capable of processing information in a streaming fashion, where data is observed sequentially and decisions must be made on-the-fly. Deterministic streaming algorithms address this need by enabling scalable and efficient processing of massive datasets without storing all elements in memory. In summary, the motivation for designing deterministic streaming algorithms for non-monotone submodular maximization lies in addressing the challenges presented by modern data-driven applications.
1.1 Our contribution
In this paper, we present several improved deterministic streaming algorithms for the submodular maximization problem subject to different constraints, with a focus on the
non-monotone objective function. To address the submodular maximization problem with non-monotone objective functions, we propose a streaming algorithm framework based on an offline algorithm [
30]. We modify this framework to create different versions of the streaming algorithms that are customized to different constraints.
We propose a 1-pass streaming algorithm for the d-knapsack constraint. As a result, we also have a 1-pass streaming algorithm for the knapsack constraint. Additionally, we propose a multi-pass streaming algorithm for knapsack constraint that achieves an improved approximation ratio. Our main results are as follows:
● For the d-knapsack constraint, we propose a deterministic streaming algorithm with approximation. It does 1-pass over the data set, stores at most elements, and makes at most queries per element, where is the maximum number of elements the d-knapsack can store. When d = 1, the d-knapsack constraint reduces to the knapsack constraint. As a special case of the d-knapsack constraint, we have the 1-pass streaming algorithm with a approximation ratio to the knapsack constraint.
● For the knapsack constraint, we propose a multi-pass streaming algorithm with approximation. The algorithm uses passes, stores elements, and makes at most queries per element.
As far as we know, there is currently no streaming algorithm subject to the d-knapsack constraint when the objective function is non-monotone, even when d = 1. We have developed the first 1-pass streaming algorithm to solve this problem. Moreover, we have introduced another deterministic multi-pass streaming algorithm that can enhance the approximation ratio for the knapsack constraint.
We make a more detailed comparison between our and previous results in Tab.1.
1.2 Related work
To better illustrate the improvement of our results, this subsection provides a list of results in the literature on non-monotone submodular maximization under a d-knapsack constraint and a knapsack constraint.
For the knapsack constraint, considering the maximum element and the greedy solution can achieve a
approximation for monotone submodular maximization [
24] by 1-pass. Huang and Kakimura [
25] improved this approximation ratio to 2/5 through more detailed discussion. However, the current results of 1-pass cannot match the cardinality constraints whose approximation ratio is tight. Huang and Kakimura [
26] proposed a multi-pass streaming algorithm, which achieved
for monotonous issues. Meanwhile, Yaroslavtsev et al. [
27] proposed another multi-pass streaming algorithm with a
approximation ratio using the thresholding technique. For the non-monotone submodular maximization problem subject to the knapsack constraint, to the best of our knowledge, there is no streaming algorithm in the literature.
For the
d-knapsack constraint, when
d is constant or the width of the constraints is large, a constant approximation is possible [
31–
33]. Currently, the best-known algorithm is again based on the continuous greedy algorithm and has an approximation ratio of 0.401 [
18]. Sun et al. [
34] proposed a deterministic algorithm for the offline setting with
approximation when the width of the knapsack is large enough. To the best of our knowledge, there is no streaming algorithm for the non-monotone submodular maximization problem in the literature. For the monotone case, Yu et al. [
29] proposed a deterministic streaming algorithm for the
d-knapsack constraint with
approximation ratio, whose memory cost is
and query time is
per element.
1.3 Organization
In Section 2, we formally introduce the problem of non-monotone submodular maximization under the cardinality constraint, the d-knapsack constraint, and the knapsack constraint. In Section 3, we propose a deterministic 1-pass streaming algorithm for the cardinality constraint as a warm-up with approximation ratio using memory. In Section 4, we propose a deterministic 1-pass streaming algorithm for the d-knapsack constraint with approximation ratio using memory, where is the maximum number of elements that the knapsack can store. In Section 5, we propose a deterministic multi-pass algorithm for the knapsack constraint with a approximation ratio using memory. In Section 6, we conclude the paper and list some future directions.
2 Preliminaries
In this section, we state the problems studied in this paper.
Definition 1 (Submodular function). Given a finite ground set N of elements, a set function is submodular if for all ,
Equivalently, f is submodular if for all and ,
For convenience, we use to denote , to denote , to denote the marginal value of with respect to , and to denote the marginal value . The function f is non-negative if for all . The function f is monotone if for all .
Definition 2 (Cardinality). Given a finite ground set N and an integer k. For set , the cardinality constraints can be defined as .
Definition 3 (Knapsack). Given a finite ground set N, assume there is a budget , and each element is associated with a cost . For set , its cost . We say S is feasible if . The knapsack can be written as .
Without loss of generality, we can simply assume that all cost .
If for all in the knapsack and let , the knapsack reduces to the cardinality constraint.
Definition 4 (d-Knapsack). Given a finite ground set N a matrix C , and a vector b . For set , the d-knapsack constraints can be written as , where xS stands for the characteristic vector of the set S.
Without loss of generality, for , we assume that . For the sake of simplicity, we can standardize the constrained problem. We can simply assume that all cost . Let . For , we replace each with and with such that the d-knapsack has equal capacity in all dimensions. The standardized problem has the same optimal solution. In the remainder of the paper, we will only consider the standardized version of the submodular maximization problem subject to a d-knapsack constraint.
Let which means the maximum number of elements the knapsack can store.
When d = 1, the d-knapsack constraint reduces to the knapsack constraint.
Definition 5 (Constrained submodular maximization). The constrained submodular maximization problem has the form
In this paper, the constraint is assumed to be the cardinality constraint or the d-knapsack constraint, respectively. The objective function f is assumed to be non-negative and submodular.
Besides, f is accessed by a value oracle that returns when S is queried. As query access can be costly, the number of queries is usually considered to be a performance metric. Space complexity, on the other hand, refers to the amount of memory required to process and make decisions on the data stream. If the length of the stream is , and the size of the parameters is k, the algorithm is generally constrained to using memory proportional to k or the logarithm of k, but independent of the length of the stream n. In most cases, the algorithm can only make a small constant number of passes over the stream, sometimes just one.
3 Warm-up: streaming algorithm for cardinality constraint
In this section, we use a cardinality-constrained streaming algorithm as a warm-up to fully demonstrate the basic version of our algorithm framework. Our framework proposes a streaming algorithm to solve the non-monotone submodular maximization problem, which can be applied to many constraints. To illustrate the framework, we present a 1-pass streaming algorithm for the cardinality constraint in this section. Although the approximation performance is slightly worse under cardinality constraints [
23], it exhibits good adaptability to more complex constraints. Next, in the following section, we use this framework to design a 1-pass streaming algorithm for the
d-knapsack constraint. Furthermore, we employ this framework to create a multi-pass streaming algorithm for the knapsack constant.
Our algorithm is obtained by modifying the technique from [
30] to a streaming model, using the threshold method in the Sieve-Streaming algorithm [
21]. First, we assume that we have some knowledge of OPT, and then remove this assumption by estimating OPT based on the max value of a single element. Finally, we remove all the assumptions and propose a 1-pass streaming algorithm subject to a cardinality constraint. Our algorithm framework is simpler than the state-of-the-art algorithm proposed by [
23].
3.1 Streaming algorithm knowing OPT
Suppose that we have a value v such that OPT OPT, for some . We construct the algorithm to choose elements by the threshold according to the value of v. The resulting algorithm is depicted as Algorithm 1. It first invokes the thresholding progress to obtain a solution . Then it runs the thresholding progress again over the remaining elements to obtain another solution . These two solutions still cannot guarantee any constant approximation ratios. To remedy this, the algorithm produces the third solution by solving the unconstrained submodular maximization problem over .
Theorem 1 Assuming that the input
v satisfies
, Algorithm 1 satisfies the following properties, where the unconstrained submodular maximization is solved by a deterministic 1/2-approximation algorithm proposed by [
19].
● It outputs a set S such that and .
● It does 1 pass over the data set, stores at most elements, and has query complexity per element.
Proof We prove this theorem by two cases subject to the size of two candidate sets and at the end of streaming. Let set O be the optimal set such that , and be the candidate set at the end of the -iteration, , , and , .
Case 1 At the end of the algorithm, at least one candidate set has k elements. Without loss of generality, let , and for , let be the th element added into . Then we have that . Thus, .
Case 2 Neither candidate set is full, that is, and at the end of the algorithm. For every element , let it be rejected by (in line 5) in iteration . Then we have , which implies that , and further . Similarly, we have , which implies that .
The last inequality is due to submodularity and non-negativity of
f. Note that
. We use a
-approximation deterministic unconstrained submodular maximization algorithm to solve the part of
[
19]. That is, if
, then we have
. Otherwise,
, then
where the last inequality is by . Then we have in any case. □
3.2 Streaming algorithm knowing the max value
We use the maximum value element to estimate OPT. By submodularity, we have that . Consider the following set . At least one of the values should be a pretty good estimate of OPT, such that . We can run Algorithm 1 once for each value in parallel, producing one candidate set for each . As the final output, we return the best solution obtained.
Theorem 2 Assuming that the input
m satisfies
, Algorithm 2 satisfies the following properties, where the unconstrained submodular maximization is solved by a deterministic 1/2-approximation algorithm proposed by [
19].
● It outputs a set S such that and OPT.
● It does 1 pass over the data set, stores at most elements, and has query complexity per element.
Proof Because at least one of the values should be a pretty good estimation of OPT such that , the approximation ratio of Algorithm 2 is the same as that of Algorithm 1, together with their proofs. We only analyze the memory and the query complexity of Algorithm 2. Note that , we keep track of many sets of size at most k each, bounding the size of memory by , . Moreover, the query time is per element. □
3.3 1-pass streaming algorithm for cardinality constraint
In order to find the maximum value element during the streaming, we modify the set into . It can be seen that when a threshold v is instantiated from the set Q, every element with marginal value to will appear on or after v is instantiated.
Theorem 3 Algorithm 3 satisfies the following properties, where the unconstrained submodular maximization is solved by a deterministic 1/2-approximation algorithm proposed by [
19].
● It outputs a set S such that and OPT.
● It does 1 pass over the data set, stores at most elements, and has query complexity per element.
Proof For a specific threshold, we maintain two candidate sets. Thus, for a set of threshold ranges, we maintain two sets of candidate sets. When a threshold v is instantiated from the set Q, every element with marginal value to will appear on or after v is instantiated. To ensure this, we expand the range of the threshold in Algorithm 3 by a factor of 6k. For those thresholds smaller than this range, we can directly discard them, because the marginal density increment there cannot guarantee the correct optimal solution threshold. The remaining proof is similar to Theorem 2. □
4 Streaming algorithm for d-knapsack constraint
In this section, we design a 1-pass streaming algorithm for
d-knapsack constraint, whose algorithm framework is consistent with that shown in the warm-up section. When dealing with
d-knapsack constraints, we encounter a challenge where we cannot use the enumerate technique (used in the offline setting) to solve the problem caused by insufficient capacity. To address this issue, we use a technique from [
29] for the monotone case, which takes into account the large single element. Additionally, to handle the lack of monotonicity, we use the streaming algorithm framework which allows us to maintain two candidate sets. We also start describing the algorithm from the assumption that we know the value of OPT. Then, we remove this assumption by estimating OPT based on the maximum marginal unit value of all single elements. Finally, we remove all the assumptions and propose a 1-pass streaming algorithm subject to the
d-knapsack constraint.
4.1 Streaming algorithm knowing OPT
Suppose we have a value v such that OPT OPT, for some . Then we develop the algorithm to choose elements by the threshold according to the value of v.
Theorem 4 Assuming that the input
v satisfies
OPT
OPT, Algorithm 4 satisfies the following properties, where the unconstrained submodular maximization is solved by a deterministic 1/2-approximation algorithm proposed by [
19].
● It outputs a set S such that CxS b and .
● It does 1 pass over the data set, stores at most elements, and has query complexity per element.
Proof Similarly, we consider the size of the two candidate sets at the end of the algorithm. The algorithm will terminate when either we find an element such that and for some , or we finish one pass through the dataset. Here we define that an element is a big element if it satisfies the condition in line 5.
Let set be the optimal set such that , and be the candidate set at the end of the -iteration, , , and , .
1. We first prove that if we find a big element, the set will obtain a good approximate ratio. Assume N has at least one big element. Let be the first big element that the algorithm finds. Then the algorithm outputs and terminates. Then by line 5, we have . Thus, output S of Algorithm 4 satisfies .
2. Otherwise, if N has no big element, we discuss the following two cases subject to the size of two candidate sets and at the end of streaming.
Case 1. At the end of the algorithm, at least one candidate set satisfies for some , .
Without loss of generality, let . Assume that the elements in is selected in order . We have . By the algorithm, we have that , for all . Then for such that , we have .
Case 2. Both of the sizes of and have that for all , at the end of the algorithm, .
First, let’s consider the case of . For each element , let it be recorded as if rejected in iteration . There exists an index , with such that . By contradiction, if , since is not a big element and f is submodular, we have , for . Then can be added into , where a contradiction occurs.
Let be the set containing elements such that , for . Then . We have that , which implies that and further .
Similarly, we have , which implies that .
where the third inequality is due to submodularity and non-negativity of
f. Note that
. We bound the value of
by two cases. That is, if
, then we have
by the unconstrained submodular maximization algorithm [
19]. Otherwise,
, then
where the last inequality is by . Thus, we have in any case. □
4.2 Streaming algorithm knowing the max density
We obtain an approximation of OPT by Algorithm 4 which requires OPT as an input. That is, we have to estimate OPT first. Like in the cardinality constraint that we estimate OPT using the max value of the element, we can estimate OPT by the max density of the element.
Lemma 1 Let , where is the maximum number of elements that the d-knapsack can store. Then there is at least one such that .
Proof Without loss of generality, let . Since , we have . On the other side, we have .
Let . We obtain . □
By Lemma 1, we design the following algorithm requiring the max density as an input.
Theorem 5 Assuming that the input
m satisfies
, Algorithm 5 satisfies the following properties, where the unconstrained submodular maximization is solved by a deterministic 1/2-approximation algorithm proposed by [
19].
● It outputs a set S such that CxS b and OPT.
● It does 1 pass over the data set, stores at most elements, and has query complexity per element.
Proof By Lemma 1, we choose such that . Then by Theorem 4, the output set S satisfies .
Notice that there are elements in Q, and for each v there are at most elements in . Thus, Algorithm 5 stores at most elements and has query complexity per element. □
4.3 1-pass streaming algorithm for d-knapsack constraint
We modify the estimation candidate set Q into . Let m be the current maximum marginal value per weight of all single elements. The streaming algorithm will perform a parallel threshold algorithm while updating m and the estimation candidate set Q. It can be seen that when a threshold v is instantiated from the set Q, every element with marginal value to will appear on or after v is instantiated. Finally, we develop the final 1-pass streaming algorithm and establish the following theorem, whose proof follows the same lines as the proof of Theorem 5.
Theorem 6 Algorithm 6 satisfies the following properties, where the unconstrained submodular maximization is solved by a deterministic 1/2-approximation algorithm proposed by [
19].
● It outputs a set S such that CxS b and OPT.
● It does 1 pass over the data set, stores at most elements and has query complexity per element.
Proof When a threshold v is instantiated from the set Q, every element with marginal value to will appear on or after v is instantiated. The remaining proof is similar to Theorem 5. In the setting of the streaming algorithm, we consider to be much smaller than and treat it as a constant when considering memory space. □
One immediate corollary is that the Algorithm 6 can achieve a approximation when d = 1, which means the single knapsack constraint.
Corollary 1 Let . There exists a 1-pass streaming algorithm (Algorithm 6) that uses space and outputs a -approximation to the submodular maximization problem under a knapsack constraint, with query per element.
5 Multi-pass streaming algorithm for knapsack
In this section, we develop a multi-pass streaming algorithm for the knapsack constraint using the same algorithmic framework. When
d = 1, the
d-knapsack constraint reduces to the knapsack constraint. As a special case of
d-knapsack, we can apply additional techniques, such as multi-pass techniques, to enhance algorithmic performance. Initially, we outline our offline algorithm, which serves as a foundation for subsequent adaptation to a streaming algorithm. As mentioned in the previous section, we also face the challenge of not being able to use enumeration technology to solve the problem of insufficient capacity. Our algorithm modifies the technique from the Greedy+Max algorithm [
27] for the monotone submodular maximization subject to the knapsack constraint. Moreover, we adapt and refine methods from [
30] to tackle the lack of monotonicity which allows us to maintain two candidate sets.
We begin the algorithm by assuming that we know the maximum value of the elements. We later remove this assumption by reading one more pass of data.
5.1 Offline repeat Greedy+Max algorithm
To begin with, we introduce our adjustments to the Greedy+Max algorithm when the objective function is non-monotone in the offline algorithm. The Greedy algorithm starts with an empty set called . In each iteration, it selects an item that has the highest marginal density and can fit into the knapsack. Different from the monotone case, in the non-monotone case, we need to require that the marginal density of is greater than 0. It works by adding an item with the highest marginal value (instead of density) to each partial solution that is constructed by Greedy. Then, the best solution is chosen from these augmentations. For the needs of subsequent algorithms, we also output the greedy result at the same time. You can find the implementation in Algorithm 7.
Lemma 2 For each , let be the first selected in the construction of . For the feasible set returned by Algorithm 7, there exists a partial greedy solution satisfies for any feasible set satisfying .
Proof First, we declare some symbols that will be used in the proof. Let be the -th element selected into . Let be the element with the maximum marginal value to . Let be the augmentation solution generated by . Let be the item with the largest cost in . We consider the last item added by the greedy solution before its cost exceeds .
If the greedy algorithm stops before the cost meets this point, then there exists a such that .
Otherwise, we define so that is the cost of the greedy solution before this item is taken. Let be the greedy solution at this point, so that .
where the inequality is by submodularity and the last equality is by the definition of marginal density. Since all items in still fit for , as is the largest cost item in . Since the greedy algorithm always selects the item with the largest marginal density, then
Hence,
If , then we finish the proof because is a lower bound on the value of the augmented solution when the cost of the greedy part is . Otherwise,
where the second inequality follows from the fact that , and the third inequality follows from the greedy procedure in the algorithm. □
The Greedy+Max algorithm itself can not solve with a constant approximation ratio, due to the lack of monotonicity. To address this issue, we use a technique from [
30]. The resulting algorithm is depicted as Algorithm 8. It first invokes the Greedy+Max algorithm to obtain a solution
. Then it runs the Greedy+Max algorithm again over the remaining elements
to obtain another solution
. Note that, unlike the offline algorithm framework, because the algorithm may output an augmented solution in the process, we need to deduct the final greedy solution. These two solutions still can not guarantee any constant approximation ratio. To remedy this, the algorithm produces the third solution
by solving the unconstrained submodular maximization problem over
.
Our algorithm finally returns the maximum solution among , , and . We will show that this achieves a 1/6 approximation ratio.
Theorem 7 Algorithm 8 achieves a 1/6 approximation ratio and uses queries.
Proof Let set be the optimal set such that , set be the first greedy solution output by the Offline Greedy+Max phase, and be the partial greedy solution of containing the first elements. For the convenience of writing the proof, we record the greedy result in the second Greedy+Max phase as and the partial greedy solution as . By Lemma 2, there exist such that
If
, then by [
19] we have
. If
, then
Thus, we have that
The third and fourth inequalities hold due to submodularity. The last inequality holds due to non-negativity. Therefore, the approximation ratio of the returned set is at least . Let . We get that the approximation ratio is at least .
Finally, since both the algorithm from [
19] using
queries and Algorithm 7 make
queries, Algorithm 8 also makes
queries in total. □
5.2 Multi-pass streaming algorithm repeat Greedy+Max
We are introducing the multi-pass streaming algorithm, which is represented as Algorithm 9. We first give the algorithm under the assumption that it is given a parameter m, which is the max value of the element. We can remove this assumption using another pass before the algorithm. During the algorithm, we use a multi-pass thresholding technique to simulate the greedy process. Then we use 1-pass to obtain the augmented solution for the partial greedy solution. The final output will be the best among all augmented solutions. For the needs of subsequent algorithms, we also output the greedy result at the same time. As discussed in the description of our techniques use passes over the data to simulate the execution of offline Greedy+Max approximately.
Lemma 3 For the feasible set returned by Algorithm 9, there is partial greedy solution (defined in line 9) satisfies for any feasible set satisfying .
Proof Like the previous proof in Lemma 2, we first state some definitions. Let be the -th element selected. Let be the element with the maximum marginal value to and be the augmentation solution generated by . Let set be the optimal set such that . Let be the item of the largest cost in . Consider the last item added by the greedy solution in the thresholding stage before the cost of this solution exceeds .
If the thresholding stage stops before the cost meets this point, by the thresholding there exists a such that
where the first inequality is by submodularity and the second inequality is by the threshold at termination. Thus, we have and .
Otherwise, we define so that is the cost of the greedy solution before this item is taken. Let be the greedy solution at this point, so that .
where the second inequality is by submodularity and the last equality is by the definition of marginal density. Since all items in still fit, as is the largest item in . In all passes, the thresholding algorithm always selects an element that gives -approximation of the highest possible marginal density. Since the greedy algorithm always selects the item with the largest marginal density, then
Hence,
If , then we have . Otherwise,
where the second inequality follows from the fact that and the third inequality follows from the greedy procedure in the algorithm. Note that , then we finish the proof of the lemma by □
We present the complete multi-pass streaming algorithm. First, we need to run 1-pass to find the maximum value of the elements . Then we apply the streaming algorithm framework for non-monotone submodular maximization, and we need to deduct the final greedy solution. The resulting algorithm is depicted as Algorithm 10. Our algorithm finally returns the maximum solution among , , and . We will show that this achieves a approximation ratio using passes.
Theorem 8 For any fixed , Algorithm 10 achieves a approximation ratio and uses passes, stores at most elements, and makes at most queries per elements.
Proof Let . Set be the first greedy solution output by the Offline Greedy+Max phase, and be the partial greedy solution of containing the first elements. For the convenience of writing the proof, we record the greedy result in the second Greedy+Max phase as and the partial greedy solution as . By Lemma 3, there exist such that
If
, then
by [
19]. If
, then
Thus, we have that
The third and fourth inequalities hold due to submodularity. The last inequality holds due to non-negativity. Therefore, the approximation ratio of the returned set is at least . Let . We get that the approximation ratio is at least .
Finally, since the algorithm from Algorithm 9 make passes, Algorithm 10 also makes passes in total. At most elements are stored by the multi-pass streaming algorithm. The multi-pass streaming algorithm makes at most queries per element. □
6 Conclusions and future work
We have created a streaming algorithm framework for submodular maximization with a general (possibly non-monotone) objective function. This framework enables us to propose 1-pass streaming algorithms that provide improved approximation ratios for non-monotone submodular maximization under a d-knapsack constraint or a knapsack constraint. Additionally, we have developed a multi-pass streaming algorithm for knapsack constraints. All of these algorithms are efficient and deterministic.
For the d-knapsack constraint, a question that we are more concerned about next is whether the approximation ratio of the streaming algorithm can be further improved. The current analysis of the cost of the d-knapsack is still a bit rough. We believe that through more detailed discussion, better results can be achieved. We also believe that if we can find a way to balance the cost of each dimension, the approximation ratio of this problem can be improved through a multi-pass streaming algorithm.
As for the single knapsack constraint, there is an unresolved issue on how to enhance the approximation of 1-pass streaming algorithms. Another future research direction is whether increasing the number of feasible solutions of candidate sets will improve the approximation ratio. In theory, the knapsack constraint can yield the same approximation ratio as the cardinality constraint, but this has not been achieved even in the monotone case.
When the objective function is non-monotone, though our algorithms improve the best-known deterministic algorithms, their approximation ratios are still worse than the best randomized algorithms. It is very interesting to fill these gaps.