Deterministic streaming algorithms for non-monotone submodular maximization

Xiaoming SUN , Jialin ZHANG , Shuo ZHANG

Front. Comput. Sci. ›› 2025, Vol. 19 ›› Issue (6) : 196404

PDF (1672KB)
Front. Comput. Sci. ›› 2025, Vol. 19 ›› Issue (6) : 196404 DOI: 10.1007/s11704-024-40266-4
Theoretical Computer Science
RESEARCH ARTICLE

Deterministic streaming algorithms for non-monotone submodular maximization

Author information +
History +
PDF (1672KB)

Abstract

Submodular maximization is a significant area of interest in combinatorial optimization. It has various real-world applications. In recent years, streaming algorithms for submodular maximization have gained attention, allowing real-time processing of large data sets by examining each piece of data only once. However, most of the current state-of-the-art algorithms are only applicable to monotone submodular maximization. There are still significant gaps in the approximation ratios between monotone and non-monotone objective functions.

In this paper, we propose a streaming algorithm framework for non-monotone submodular maximization and use this framework to design deterministic streaming algorithms for the d-knapsack constraint and the knapsack constraint. Our 1-pass streaming algorithm for the d-knapsack constraint has a 14(d+1)ϵ approximation ratio, using O(B~logB~ϵ) memory, and O(logB~ϵ) query time per element, where B~=min(n,b) is the maximum number of elements that the knapsack can store. As a special case of the d-knapsack constraint, we have the 1-pass streaming algorithm with a 1/8ϵ approximation ratio to the knapsack constraint. To our knowledge, there is currently no streaming algorithm for this constraint when the objective function is non-monotone, even when d = 1. In addition, we propose a multi-pass streaming algorithm with 1/6ϵ approximation, which stores O(B~) elements.

Graphical abstract

Keywords

submodular maximization / streaming algorithms / cardinality constraint / knapsack constraint

Cite this article

Download citation ▾
Xiaoming SUN, Jialin ZHANG, Shuo ZHANG. Deterministic streaming algorithms for non-monotone submodular maximization. Front. Comput. Sci., 2025, 19(6): 196404 DOI:10.1007/s11704-024-40266-4

登录浏览全文

4963

注册一个新账户 忘记密码

1 Introduction

Submodular maximization has become one of the central problems in optimization and combinatorics, with submodular functions embodying the essence of diminishing returns. The inherent trait of a submodular function—that the marginal value of an item decreases as one adds more items to the set—makes it a particularly intriguing function to maximize. The submodular maximization problem arises in a variety of applications including influence maximization [1], machine learning [2,3], sensor placement [4,5], feature selection [6,7], and information gathering [8].

Due to its remarkable significance, submodular maximization has been studied over the past forty years, especially in the offline model. For the monotone case, it cannot exceed 11/e under cardinality constraints [9,10], a tight approximation ratio met by the natural greedy algorithm [11]. When combined with enumeration techniques, the greedy algorithm also achieves a 11/e approximation ratio for knapsack constraints [12,13]. The continuous greedy algorithm [14], proposed by Vondrák in 2008, generalizes to address various constraints, including matroid, and d-knapsack and so on constraints with down-closed properties, necessitating randomization due to its sampling process [15,16]. Recently, Buchbinder and Feldman [17] proposed a deterministic algorithm with an approximation ratio of 11/eϵ. For the non-monotone case, where randomization is a tool, the continuous greedy technique facilitates a 0.401 approximation [18], trailing behind the 0.491 limit under cardinality constraints but being superior to the deterministic 1/e for the cardinality and 1/4 for the knapsack constraint [19,20].

However, as the digital era progresses, the real challenge arises when we have to deal with massive datasets that are constantly evolving. This is where streaming algorithms come into play. Traditional optimization algorithms that work on static, small-scale datasets fall short when applied to large streaming data, which is continuously fed and cannot be stored or processed in its entirety. Streaming algorithms, in contrast, allow for processing massive datasets in real-time, by seeing each piece of data only once or a few times, making only a fraction of the entire data available at any point. Badanidiyuru et al. [21] proposed a deterministic 1/2ϵ approximation algorithm for monotone submodular maximization with cardinality constraints in a streaming model with O(klogkϵ) memory and sublinear time. Feldman et al. [22] showed that even under the cardinality constraint in the streaming model, the approximation ratio for this problem cannot be better than 1/2, regardless of whether the objective function is monotone or not. For the non-monotone case, Alaluf et al. [23] proposed a 1-pass semi-streaming algorithm framework with a α/(1+α)-approximation ratio using an offline α-approximate algorithm. When using the offline continuous greedy algorithm [18], a 0.286-approximation random approximation algorithm can be obtained. When using the deterministic algorithm [19], a 0.268-approximation deterministic algorithm can be obtained. We can see that streaming algorithms are more difficult than offline algorithms because the information mastered is not the whole picture, but only a part of it.

Regarding the knapsack constraint, there are several related works for monotone objective functions. Huang et al. [24,25] proposed a 1-pass streaming algorithm that achieves a 1/3ϵ approximation, which was later improved to 2/5ϵ in their subsequent work. Huang and Kakimura [26] proposed a multi-pass streaming algorithm that achieves a 1/2ϵ approximation. Meanwhile, Yaroslavtsev et al. [27] proposed another multi-pass streaming algorithm that achieves the same approximation ratio and number of passes, but with less memory usage. For the d-knapsack constraint, deterministic streaming algorithms with a 12d+1ϵ approximation ratio for monotone submodular maximization have been proposed in [28,29]. When the function is non-monotone, to the best of our knowledge, there is currently no streaming algorithm for these two constraints.

The non-monotone submodular maximization problem, unlike its monotone counterpart, does not assume that adding elements to a set necessarily increases the value of the function. Many real-world applications require the handling of non-monotone objectives due to the complex interdependencies and trade-offs inherent in real systems. For example, in sensor placement, adding more sensors does not always lead to better coverage due to potential interferences, or in document summarization, adding sentences to a summary might make it less coherent or relevant. In today’s era, data is being generated on an unprecedented scale. This vast influx of data necessitates algorithms capable of processing information in a streaming fashion, where data is observed sequentially and decisions must be made on-the-fly. Deterministic streaming algorithms address this need by enabling scalable and efficient processing of massive datasets without storing all elements in memory. In summary, the motivation for designing deterministic streaming algorithms for non-monotone submodular maximization lies in addressing the challenges presented by modern data-driven applications.

1.1 Our contribution

In this paper, we present several improved deterministic streaming algorithms for the submodular maximization problem subject to different constraints, with a focus on the non-monotone objective function. To address the submodular maximization problem with non-monotone objective functions, we propose a streaming algorithm framework based on an offline algorithm [30]. We modify this framework to create different versions of the streaming algorithms that are customized to different constraints.

We propose a 1-pass streaming algorithm for the d-knapsack constraint. As a result, we also have a 1-pass streaming algorithm for the knapsack constraint. Additionally, we propose a multi-pass streaming algorithm for knapsack constraint that achieves an improved approximation ratio. Our main results are as follows:

● For the d-knapsack constraint, we propose a deterministic streaming algorithm with 14(d+1)ϵ approximation. It does 1-pass over the data set, stores at most O(B~logB~ϵ) elements, and makes at most O(logB~ϵ) queries per element, where B~=min(n,b) is the maximum number of elements the d-knapsack can store. When d = 1, the d-knapsack constraint reduces to the knapsack constraint. As a special case of the d-knapsack constraint, we have the 1-pass streaming algorithm with a 1/8ϵ approximation ratio to the knapsack constraint.

● For the knapsack constraint, we propose a multi-pass streaming algorithm with 1/6ϵ approximation. The algorithm uses O(logbϵ) passes, stores O(B~) elements, and makes at most O(logbϵ) queries per element.

As far as we know, there is currently no streaming algorithm subject to the d-knapsack constraint when the objective function is non-monotone, even when d = 1. We have developed the first 1-pass streaming algorithm to solve this problem. Moreover, we have introduced another deterministic multi-pass streaming algorithm that can enhance the approximation ratio for the knapsack constraint.

We make a more detailed comparison between our and previous results in Tab.1.

1.2 Related work

To better illustrate the improvement of our results, this subsection provides a list of results in the literature on non-monotone submodular maximization under a d-knapsack constraint and a knapsack constraint.

For the knapsack constraint, considering the maximum element and the greedy solution can achieve a 1/3ϵ approximation for monotone submodular maximization [24] by 1-pass. Huang and Kakimura [25] improved this approximation ratio to 2/5 through more detailed discussion. However, the current results of 1-pass cannot match the cardinality constraints whose approximation ratio is tight. Huang and Kakimura [26] proposed a multi-pass streaming algorithm, which achieved 1/2ϵ for monotonous issues. Meanwhile, Yaroslavtsev et al. [27] proposed another multi-pass streaming algorithm with a 1/2ϵ approximation ratio using the thresholding technique. For the non-monotone submodular maximization problem subject to the knapsack constraint, to the best of our knowledge, there is no streaming algorithm in the literature.

For the d-knapsack constraint, when d is constant or the width of the constraints is large, a constant approximation is possible [3133]. Currently, the best-known algorithm is again based on the continuous greedy algorithm and has an approximation ratio of 0.401 [18]. Sun et al. [34] proposed a deterministic algorithm for the offline setting with 1/6 approximation when the width of the knapsack is large enough. To the best of our knowledge, there is no streaming algorithm for the non-monotone submodular maximization problem in the literature. For the monotone case, Yu et al. [29] proposed a deterministic streaming algorithm for the d-knapsack constraint with 12d+1ϵ approximation ratio, whose memory cost is O(B~logB~ϵ) and query time is O(logB~ϵ) per element.

1.3 Organization

In Section 2, we formally introduce the problem of non-monotone submodular maximization under the cardinality constraint, the d-knapsack constraint, and the knapsack constraint. In Section 3, we propose a deterministic 1-pass streaming algorithm for the cardinality constraint as a warm-up with 16ϵ approximation ratio using O(klogkϵ) memory. In Section 4, we propose a deterministic 1-pass streaming algorithm for the d-knapsack constraint with 14(d+1)ϵ approximation ratio using O(B~logB~ϵ) memory, where B~=min(n,b) is the maximum number of elements that the knapsack can store. In Section 5, we propose a deterministic multi-pass algorithm for the knapsack constraint with a 1/6ϵ approximation ratio using O(B~) memory. In Section 6, we conclude the paper and list some future directions.

2 Preliminaries

In this section, we state the problems studied in this paper.

Definition 1 (Submodular function). Given a finite ground set N of n elements, a set function f:2NR is submodular if for all S,TN,

f(S)+f(T)f(ST)+f(ST).

Equivalently, f is submodular if for all STN and uNT,

f(S{u})f(S)f(T{u})f(T).

For convenience, we use f(u) to denote f({u}), f(S+u) to denote f(S{u}), f(uT) to denote the marginal value f(T+u)f(T) of u with respect to T, and f(ST) to denote the marginal value f(ST)f(T). The function f is non-negative if f(S)0 for all SN. The function f is monotone if f(S)f(T) for all STN.

Definition 2 (Cardinality). Given a finite ground set N and an integer k. For set SN, the cardinality constraints can be defined as I={S|S|k}.

Definition 3 (Knapsack). Given a finite ground set N, assume there is a budget b, and each element uN is associated with a cost c(u)>0. For set SN, its cost c(S)=uSc(u). We say S is feasible if c(S)b. The knapsack can be written as I={Sc(S)b}.

Without loss of generality, we can simply assume that all cost c(u)1.

If c(u)=1 for all uN in the knapsack and let b=k, the knapsack reduces to the cardinality constraint.

Definition 4 (d-Knapsack). Given a finite ground set N a matrix C (0,)d×n, and a vector b (0,)d. For set SN, the d-knapsack constraints can be written as I= {SCxSb}, where xS stands for the characteristic vector of the set S.

Without loss of generality, for 1id,eN, we assume that ci,ebi. For the sake of simplicity, we can standardize the constrained problem. We can simply assume that all cost ci,e1. Let b=max1idbi. For 1id,eN, we replace each ci,e with bci,e/bi and bi with b such that the d-knapsack has equal capacity b in all dimensions. The standardized problem has the same optimal solution. In the remainder of the paper, we will only consider the standardized version of the submodular maximization problem subject to a d-knapsack constraint.

Let B~=min(|N|,b) which means the maximum number of elements the knapsack can store.

When d = 1, the d-knapsack constraint reduces to the knapsack constraint.

Definition 5 (Constrained submodular maximization). The constrained submodular maximization problem has the form

max{f(S)SI}.

In this paper, the constraint I is assumed to be the cardinality constraint or the d-knapsack constraint, respectively. The objective function f is assumed to be non-negative and submodular.

Besides, f is accessed by a value oracle that returns f(S) when S is queried. As query access can be costly, the number of queries is usually considered to be a performance metric. Space complexity, on the other hand, refers to the amount of memory required to process and make decisions on the data stream. If the length of the stream is n, and the size of the parameters is k, the algorithm is generally constrained to using memory proportional to k or the logarithm of k, but independent of the length of the stream n. In most cases, the algorithm can only make a small constant number of passes over the stream, sometimes just one.

3 Warm-up: streaming algorithm for cardinality constraint

In this section, we use a cardinality-constrained streaming algorithm as a warm-up to fully demonstrate the basic version of our algorithm framework. Our framework proposes a streaming algorithm to solve the non-monotone submodular maximization problem, which can be applied to many constraints. To illustrate the framework, we present a 1-pass streaming algorithm for the cardinality constraint in this section. Although the approximation performance is slightly worse under cardinality constraints [23], it exhibits good adaptability to more complex constraints. Next, in the following section, we use this framework to design a 1-pass streaming algorithm for the d-knapsack constraint. Furthermore, we employ this framework to create a multi-pass streaming algorithm for the knapsack constant.

Our algorithm is obtained by modifying the technique from [30] to a streaming model, using the threshold method in the Sieve-Streaming algorithm [21]. First, we assume that we have some knowledge of OPT, and then remove this assumption by estimating OPT based on the max value of a single element. Finally, we remove all the assumptions and propose a 1-pass streaming algorithm subject to a cardinality constraint. Our algorithm framework is simpler than the state-of-the-art algorithm proposed by [23].

3.1 Streaming algorithm knowing OPT

Suppose that we have a value v such that αOPTv OPT, for some α(0,1]. We construct the algorithm to choose elements by the threshold according to the value of v. The resulting algorithm is depicted as Algorithm 1. It first invokes the thresholding progress to obtain a solution S1. Then it runs the thresholding progress again over the remaining elements NS1 to obtain another solution S2. These two solutions still cannot guarantee any constant approximation ratios. To remedy this, the algorithm produces the third solution S3 by solving the unconstrained submodular maximization problem over S1.

Theorem 1 Assuming that the input v satisfies αOPTvOPT, Algorithm 1 satisfies the following properties, where the unconstrained submodular maximization is solved by a deterministic 1/2-approximation algorithm proposed by [19].

● It outputs a set S such that |S|k and f(S)α6OPT.

● It does 1 pass over the data set, stores at most 2k elements, and has O(1) query complexity per element.

Proof We prove this theorem by two cases subject to the size of two candidate sets S1 and S2 at the end of streaming. Let set O be the optimal set such that f(O)=OPT, and Slj be the candidate set at the end of the j-iteration, j=1,2,,n, Sl0=, and Sln=Sl, l=1,2.

Case 1 At the end of the algorithm, at least one candidate set has k elements. Without loss of generality, let |S1|=k, and for 1ik, let ui be the ith element added into S1. Then we have that f(S)=i=1k(f({u1,u2,,ui})f({u1,u2,, ui1}))kτk=v6. Thus, f(S)f(S1)α6OPT.

Case 2 Neither candidate set is full, that is, |S1|<k and |S2|<k at the end of the algorithm. For every element uOS1, let it be rejected by S1 (in line 5) in iteration j. Then we have f(uS1)f(uS1j)τk, which implies that f(OS1)uOS1f(uS1)τ, and further f(S1) f(S1O)τ. Similarly, we have f((OS1)S2)τ, which implies that f(S2)f(S2(OS1))τ.

f(S)12(f(S1)+f(S2))12(f(S1O)+f(S2(OS1))2τ)=12f(S1O)+12f(S2(OS1))+12f(S1O)12f(S1O)τ12f(O)12f(S1O)τ.

The last inequality is due to submodularity and non-negativity of f. Note that f(S1O)+f(S2(OS1))+f(S1O) f(S1O)+f(S2O)f(O)+f(S1S2O)f(O). We use a 12-approximation deterministic unconstrained submodular maximization algorithm to solve the part of f(S1O) [19]. That is, if f(S1O)2τ, then we have f(S)f(S3)τ. Otherwise, f(S1O)<2τ, then

f(S)12f(O)12f(S1O)τ12f(O)2τ3τ2τ=τ,

where the last inequality is by f(O)v6τ. Then we have f(S)α6OPT in any case.              □

3.2 Streaming algorithm knowing the max value

We use the maximum value element m=maxuNf(u) to estimate OPT. By submodularity, we have that m OPTkm. Consider the following set Q= {(1+ϵ)iiZ,m(1+ϵ)ikm}. At least one of the values vQ should be a pretty good estimate of OPT, such that 11+ϵOPTvOPT. We can run Algorithm 1 once for each value vQ in parallel, producing one candidate set for each vQ. As the final output, we return the best solution obtained.

Theorem 2 Assuming that the input m satisfies m=maxuNf(u), Algorithm 2 satisfies the following properties, where the unconstrained submodular maximization is solved by a deterministic 1/2-approximation algorithm proposed by [19].

● It outputs a set S such that |S|k and f(S) (16ϵ)OPT.

● It does 1 pass over the data set, stores at most O(klogkϵ) elements, and has O(logkϵ) query complexity per element.

Proof Because at least one of the values vQ should be a pretty good estimation of OPT such that 11+ϵOPTvOPT, the approximation ratio of Algorithm 2 is the same as that of Algorithm 1, together with their proofs. We only analyze the memory and the query complexity of Algorithm 2. Note that |Q|=O(logkϵ), we keep track of O(logkϵ) many sets Slv of size at most k each, bounding the size of memory by O(klogkϵ), l=1,2,3. Moreover, the query time is O(logkϵ) per element.  □

3.3 1-pass streaming algorithm for cardinality constraint

In order to find the maximum value element m=maxuNf(u) during the streaming, we modify the set into Q= {(1+ϵ)iiZ,m(1+ϵ)i6km}. It can be seen that when a threshold v is instantiated from the set Q, every element with marginal value v6k to Sv will appear on or after v is instantiated.

Theorem 3 Algorithm 3 satisfies the following properties, where the unconstrained submodular maximization is solved by a deterministic 1/2-approximation algorithm proposed by [19].

● It outputs a set S such that |S|k and f(S) (16ϵ)OPT.

● It does 1 pass over the data set, stores at most O(klogkϵ) elements, and has O(logkϵ) query complexity per element.

Proof For a specific threshold, we maintain two candidate sets. Thus, for a set of threshold ranges, we maintain two sets of candidate sets. When a threshold v is instantiated from the set Q, every element with marginal value v6k to Sv will appear on or after v is instantiated. To ensure this, we expand the range of the threshold in Algorithm 3 by a factor of 6k. For those thresholds smaller than this range, we can directly discard them, because the marginal density increment there cannot guarantee the correct optimal solution threshold. The remaining proof is similar to Theorem 2.         □

4 Streaming algorithm for d-knapsack constraint

In this section, we design a 1-pass streaming algorithm for d-knapsack constraint, whose algorithm framework is consistent with that shown in the warm-up section. When dealing with d-knapsack constraints, we encounter a challenge where we cannot use the enumerate technique (used in the offline setting) to solve the problem caused by insufficient capacity. To address this issue, we use a technique from [29] for the monotone case, which takes into account the large single element. Additionally, to handle the lack of monotonicity, we use the streaming algorithm framework which allows us to maintain two candidate sets. We also start describing the algorithm from the assumption that we know the value of OPT. Then, we remove this assumption by estimating OPT based on the maximum marginal unit value of all single elements. Finally, we remove all the assumptions and propose a 1-pass streaming algorithm subject to the d-knapsack constraint.

4.1 Streaming algorithm knowing OPT

Suppose we have a value v such that αOPTv OPT, for some α(0,1]. Then we develop the algorithm to choose elements by the threshold according to the value of v.

Theorem 4 Assuming that the input v satisfies αOPTv OPT, Algorithm 4 satisfies the following properties, where the unconstrained submodular maximization is solved by a deterministic 1/2-approximation algorithm proposed by [19].

● It outputs a set S such that CxS b and f(S) α4(d+1)OPT.

● It does 1 pass over the data set, stores at most 2B~ elements, and has O(d) query complexity per element.

Proof Similarly, we consider the size of the two candidate sets at the end of the algorithm. The algorithm will terminate when either we find an element uN such that ci,ub2 and f(u)ci,u2τb for some 1id, or we finish one pass through the dataset. Here we define that an element uN is a big element if it satisfies the condition in line 5.

Let set O be the optimal set such that f(O)=OPT, and Slj be the candidate set at the end of the j-iteration, j=1,2,,n, Sl0=, and Sln=Sl, l=1,2.

1. We first prove that if we find a big element, the set S={u} will obtain a good approximate ratio. Assume N has at least one big element. Let u be the first big element that the algorithm finds. Then the algorithm outputs S={u} and terminates. Then by line 5, we have f(S)=f(u)2τbb2=τ. Thus, output S of Algorithm 4 satisfies f(S)v4(d+1)α4(d+1)OPT.

2. Otherwise, if N has no big element, we discuss the following two cases subject to the size of two candidate sets S1 and S2 at the end of streaming.

Case 1. At the end of the algorithm, at least one candidate set satisfies uSlci,ub2 for some 1id, l={1,2}.

Without loss of generality, let l=1. Assume that the elements in S1 is selected in order {u1,u2,,u|S1|}. We have f(S)j=1|S1|(f({u1,u2,,uj})f({u1,u2,,uj1})). By the algorithm, we have that f({u1,u2,,uj})f({u1,u2,, uj1})2τbci,uj, for all 1id. Then for such i that ujS1ci,ujb2, we have f(S)j=1|S1|2τbci,ujτ.

Case 2. Both of the sizes of S1 and S2 have that uSlci,u<b2 for all 1id, at the end of the algorithm, l={1,2}.

First, let’s consider the case of S1. For each element eOS1, let it be recorded as ej if rejected in iteration j. There exists an index μ(ej), with 1μ(ej)d such that f(ejS1)cμ(ej),ejf(ejS1j)cμ(ej),ej<2τb. By contradiction, if f(ejS1j)cμ(ej),uj2τb, since ej is not a big element and f is submodular, we have ci,ej<b/2, for 1id. Then ej can be added into S1, where a contradiction occurs.

Let Yi be the set containing elements ejOS1 such that μ(ej)=i, for 1id. Then OS1=1idYi. We have that f(YiS1)<2τbejYicμ(ej),ej<2τ, which implies that f(OS1)i=1df(YiS1)<2dτ and further f(S1) f(S1O)2dτ.

Similarly, we have f((OS1)S2)2dτ, which implies that f(S2)f(S2(OS1))2dτ.

f(S)12(f(S1)+f(S2))12f(S1O)+12f(S2(OS1))2dτ=12f(S1O)+12f(S2(OS1))+12f(S1O)12f(S1O)2dτ12f(O)12f(S1O)2dτ,

where the third inequality is due to submodularity and non-negativity of f. Note that f(S1O)+f(S2(OS1))+ f(S1O)f(S1O)+f(S2O)f(O)+f(S1S2O) f(O). We bound the value of f(S1O) by two cases. That is, if f(S1O)v22dτ, then we have f(S)f(S3) v4dττ by the unconstrained submodular maximization algorithm [19]. Otherwise, f(S1O)v22dτ, then

f(S)12f(O)12f(S1O)2dτ12f(O)v4+dτ2dτ2(d+1)τ(d+1)τdτ=τ,

where the last inequality is by f(O)v4(d+1)τ. Thus, we have f(S)α4(d+1)OPT in any case.         □

4.2 Streaming algorithm knowing the max density

We obtain an approximation of OPT by Algorithm 4 which requires OPT as an input. That is, we have to estimate OPT first. Like in the cardinality constraint that we estimate OPT using the max value of the element, we can estimate OPT by the max density of the element.

Lemma 1 Let Q={(1+ϵ)iiZ,m/(1+ϵ)(1+ϵ)iB~m}, where B~=min{n,b} is the maximum number of elements that the d-knapsack can store. Then there is at least one vQ such that (1ϵ)OPTvOPT.

Proof Without loss of generality, let m= max1id,1jnf(uj)ci,uj. Since ci,uj1, we have OPT f(uj)=mci,ujm. On the other side, we have OPT= i=1|O|(f({u1,u2,,ui})f({u1,u2,,ui1}))i=1|O|f(ui) mi=1|O|c1,uiB~m.

Let v=(1+ϵ)log1+ϵOPT. We obtain m1+ϵ(1ϵ)OPT vOPTB~m.                   □

By Lemma 1, we design the following algorithm requiring the max density as an input.

Theorem 5 Assuming that the input m satisfies m=max1id,1jnf(uj)/ci,uj, Algorithm 5 satisfies the following properties, where the unconstrained submodular maximization is solved by a deterministic 1/2-approximation algorithm proposed by [19].

● It outputs a set S such that CxS b and f(S) (14(d+1)ϵ)OPT.

● It does 1 pass over the data set, stores at most O(B~logB~ϵ) elements, and has O(logB~ϵ) query complexity per element.

Proof By Lemma 1, we choose vQ such that (1ϵ)OPTvOPT. Then by Theorem 4, the output set S satisfies f(S)(14(d+1)ϵ)OPT.

Notice that there are O(logB~ϵ) elements in Q, and for each v there are at most B~ elements in Skv. Thus, Algorithm 5 stores at most O(B~logB~ϵ) elements and has O(logB~ϵ) query complexity per element.                    □

4.3 1-pass streaming algorithm for d-knapsack constraint

We modify the estimation candidate set Q into Q={(1+ϵ)kkZ,m/(1+ϵ)(1+ϵ)k2(d+1)bm}. Let m be the current maximum marginal value per weight of all single elements. The streaming algorithm will perform a parallel threshold algorithm while updating m and the estimation candidate set Q. It can be seen that when a threshold v is instantiated from the set Q, every element with marginal value v4(d+1) to Sv will appear on or after v is instantiated. Finally, we develop the final 1-pass streaming algorithm and establish the following theorem, whose proof follows the same lines as the proof of Theorem 5.

Theorem 6 Algorithm 6 satisfies the following properties, where the unconstrained submodular maximization is solved by a deterministic 1/2-approximation algorithm proposed by [19].

● It outputs a set S such that CxS b and f(S) (14(d+1)ϵ)OPT.

● It does 1 pass over the data set, stores at most O(B~logB~ϵ) elements and has O(logB~ϵ) query complexity per element.

Proof When a threshold v is instantiated from the set Q, every element with marginal value v4(d+1) to Sv will appear on or after v is instantiated. The remaining proof is similar to Theorem 5. In the setting of the streaming algorithm, we consider d to be much smaller than b and treat it as a constant when considering memory space.           □

One immediate corollary is that the Algorithm 6 can achieve a 18ϵ approximation when d = 1, which means the single knapsack constraint.

Corollary 1 Let B~=min(n,B). There exists a 1-pass streaming algorithm (Algorithm 6) that uses O(B~logB~ϵ) space and outputs a (18ϵ)-approximation to the submodular maximization problem under a knapsack constraint, with O(B~ϵ) query per element.

5 Multi-pass streaming algorithm for knapsack

In this section, we develop a multi-pass streaming algorithm for the knapsack constraint using the same algorithmic framework. When d = 1, the d-knapsack constraint reduces to the knapsack constraint. As a special case of d-knapsack, we can apply additional techniques, such as multi-pass techniques, to enhance algorithmic performance. Initially, we outline our offline algorithm, which serves as a foundation for subsequent adaptation to a streaming algorithm. As mentioned in the previous section, we also face the challenge of not being able to use enumeration technology to solve the problem of insufficient capacity. Our algorithm modifies the technique from the Greedy+Max algorithm [27] for the monotone submodular maximization subject to the knapsack constraint. Moreover, we adapt and refine methods from [30] to tackle the lack of monotonicity which allows us to maintain two candidate sets.

We begin the algorithm by assuming that we know the maximum value of the elements. We later remove this assumption by reading one more pass of data.

5.1 Offline repeat Greedy+Max algorithm

To begin with, we introduce our adjustments to the Greedy+Max algorithm when the objective function is non-monotone in the offline algorithm. The Greedy algorithm starts with an empty set called G. In each iteration, it selects an item u that has the highest marginal density and can fit into the knapsack. Different from the monotone case, in the non-monotone case, we need to require that the marginal density of u is greater than 0. It works by adding an item with the highest marginal value (instead of density) to each partial solution that is constructed by Greedy. Then, the best solution is chosen from these augmentations. For the needs of subsequent algorithms, we also output the greedy result G at the same time. You can find the implementation in Algorithm 7.

Lemma 2 For each i, let Gi be the first i selected in the construction of G. For the feasible set A returned by Algorithm 7, there exists a partial greedy solution Gi1 satisfies f(A)12f(Gi1Y) for any feasible set Y satisfying c(Y)b.

Proof First, we declare some symbols that will be used in the proof. Let ui be the i-th element selected into G. Let ai be the element with the maximum marginal value to Gi1. Let Ai be the augmentation solution generated by Gi1. Let y1 be the item with the largest cost in Y. We consider the last item added by the greedy solution before its cost exceeds bc(y1).

If the greedy algorithm stops before the cost meets this point, then there exists a j such that f(GjY) f(Gj)+eYGjf(eGj)f(Gj).

Otherwise, we define c[0,bc(y1)] so that bc(y1)c is the cost of the greedy solution before this item is taken. Let Gi1 be the greedy solution at this point, so that c(Gi1)bc(y1)<c(Gi).

f(Gi1Y)=f((Gi1y1)(Yy1))=f(Gi1y1)+f(Y(y1Gi1)Gi1y1)f(Gi1ai)+eY(y1Gi1)f(eGi1y1)=f(Ai)+eY(y1Gi1)c(e)f(eGi1y1)c(e),

where the inequality is by submodularity and the last equality is by the definition of marginal density. Since all items in Y(y1Gi1) still fit for Gi1, as y1 is the largest cost item in Y. Since the greedy algorithm always selects the item with the largest marginal density, then

maxeY(y1Gi1)f(eGi1y1)c(e)maxeY(y1Gi1)f(eGi1)c(e)f(uiGi1)c(ui).

Hence,

f(Gi1Y)f(Ai)+f(uiGi1)c(ui)eY(y1Gi1)c(e).

If f(Ai)12f(Gi1Y), then we finish the proof because f(Ai) is a lower bound on the value of the augmented solution when the cost of the greedy part is bc(y1)c. Otherwise,

12f(Gi1Y)f(uiGi1)c(ui)eY(y1Gi1)c(e)f(uiGi1)c(ui)(bc(y1))j=1if(ujGj1)f(Gi),

where the second inequality follows from the fact that y1Y, and the third inequality follows from the greedy procedure in the algorithm.                   □

The Greedy+Max algorithm itself can not solve with a constant approximation ratio, due to the lack of monotonicity. To address this issue, we use a technique from [30]. The resulting algorithm is depicted as Algorithm 8. It first invokes the Greedy+Max algorithm to obtain a solution S1. Then it runs the Greedy+Max algorithm again over the remaining elements NG to obtain another solution S2. Note that, unlike the offline algorithm framework, because the algorithm may output an augmented solution in the process, we need to deduct the final greedy solution. These two solutions still can not guarantee any constant approximation ratio. To remedy this, the algorithm produces the third solution S3 by solving the unconstrained submodular maximization problem over G.

Our algorithm finally returns the maximum solution among S1, S2, and S3. We will show that this achieves a 1/6 approximation ratio.

Theorem 7 Algorithm 8 achieves a 1/6 approximation ratio and uses O(n2) queries.

Proof Let set O be the optimal set such that f(O)=OPT, set G be the first greedy solution output by the Offline Greedy+Max phase, and Gi be the partial greedy solution of G containing the first i elements. For the convenience of writing the proof, we record the greedy result in the second Greedy+Max phase as T and the partial greedy solution as Tj. By Lemma 2, there exist i,j such that

f(S1)12f(Gi1O),f(S2)12f(Tj1(OG)).

If f(GO)δf(O), then by [19] we have f(S3) 12f(GO)δ2f(O). If f(GO)δf(O), then

f(S1)12(f(Gi1O)+f(GO)δf(O)).

Thus, we have that

max{f(S1),f(S2)}12(f(S1)+f(S2))14f(Gi1O)+14f(GO)+14f(Tj1(OG))δ4f(O)14f(Gi1Tj1O)+14f(OG)+14f(GO)δ4f(O)14f(Gi1Tj1O)+14f(O)δ4f(O)1δ4f(O).

The third and fourth inequalities hold due to submodularity. The last inequality holds due to non-negativity. Therefore, the approximation ratio of the returned set is at least min{δ/2,(1δ)/4}. Let δ=1/3. We get that the approximation ratio is at least 1/6.

Finally, since both the algorithm from [19] using O(|G|2) queries and Algorithm 7 make O(n2) queries, Algorithm 8 also makes O(n2) queries in total.           □

5.2 Multi-pass streaming algorithm repeat Greedy+Max

We are introducing the multi-pass streaming algorithm, which is represented as Algorithm 9. We first give the algorithm under the assumption that it is given a parameter m, which is the max value of the element. We can remove this assumption using another pass before the algorithm. During the algorithm, we use a multi-pass thresholding technique to simulate the greedy process. Then we use 1-pass to obtain the augmented solution for the partial greedy solution. The final output will be the best among all augmented solutions. For the needs of subsequent algorithms, we also output the greedy result G at the same time. As discussed in the description of our techniques use O(logb/ϵ) passes over the data to simulate the execution of offline Greedy+Max approximately.

Lemma 3 For the feasible set A returned by Algorithm 9, there is partial greedy solution Gi1 (defined in line 9) satisfies f(A)(12ϵ)f(Gi1Y) for any feasible set Y satisfying c(Y)b.

Proof Like the previous proof in Lemma 2, we first state some definitions. Let ui be the i-th element selected. Let ai be the element with the maximum marginal value to Gi1 and Ai=Gi1ai be the augmentation solution generated by Gi1. Let set O be the optimal set such that f(O)=OPT. Let y1 be the item of the largest cost in Y. Consider the last item added by the greedy solution in the thresholding stage before the cost of this solution exceeds bc(y1).

If the thresholding stage stops before the cost meets this point, by the thresholding there exists a j such that

f(GjY)f(Gj)+eYGjf(eGj)f(Gj)+(1+ϵ)mbc(YGj)f(Gj)+(1+ϵ)m,

where the first inequality is by submodularity and the second inequality is by the threshold at termination. Thus, we have f(GjY)f(Gj)+(1+ϵ)f(A1) and f(A)(12ϵ)f(GjY).

Otherwise, we define c so that bc(y1)c is the cost of the greedy solution before this item is taken. Let Gi1 be the greedy solution at this point, so that c(Gi1)bc(y1)< c(Gi).

f(Gi1Y)=f((Gi1y1)(Yy1))=f(Gi1y1)+f(Y(y1Gi1)Gi1y1)f(Gi1ai)+eY(y1Gi1)f(eGi1y1)=f(Ai)+eY(y1Gi1)c(e)f(eGi1y1)c(e),

where the second inequality is by submodularity and the last equality is by the definition of marginal density. Since all items in Y(y1Gi1) still fit, as y1 is the largest item in Y. In all passes, the thresholding algorithm always selects an element that gives 11+ϵ-approximation of the highest possible marginal density. Since the greedy algorithm always selects the item with the largest marginal density, then

maxeY(y1Gi1)f(eGi1y1)c(e)maxeY(y1Gi1)f(eGi1)c(e)(1+ϵ)f(uiGi1)c(ui).

Hence,

f(Gi1Y)f(Ai1)+(1+ϵ)f(uiGi1)c(ui)eY(y1Gi1)c(e).

If f(Ai1)12f(Gi1Y), then we have f(A)f(Ai1) 12f(Gi1Y). Otherwise,

12f(Gi1Y)(1+ϵ)f(uiGi1)c(ui)eY(y1Gi1)c(e)(1+ϵ)f(uiGi1)c(ui)(bc(y1))(1+ϵ)2j=1if(ujGj1)(1+ϵ)2f(Gi),

where the second inequality follows from the fact that y1Y and the third inequality follows from the greedy procedure in the algorithm. Note that ϵ>0, then we finish the proof of the lemma by f(A)f(Gi)(12ϵ)f(Gi1Y).     □

We present the complete multi-pass streaming algorithm. First, we need to run 1-pass to find the maximum value of the elements m=maxeNf(e). Then we apply the streaming algorithm framework for non-monotone submodular maximization, and we need to deduct the final greedy solution. The resulting algorithm is depicted as Algorithm 10. Our algorithm finally returns the maximum solution among S1, S2, and S3. We will show that this achieves a 1/6ϵ approximation ratio using O(logb/ϵ) passes.

Theorem 8 For any fixed ϵ>0, Algorithm 10 achieves a 1/6ϵ approximation ratio and uses O(logb/ϵ) passes, stores at most 2B~ elements, and makes at most O(logb/ϵ) queries per elements.

Proof Let Oargmax{f(S)|c(S)b}. Set G be the first greedy solution output by the Offline Greedy+Max phase, and Gi be the partial greedy solution of G containing the first i elements. For the convenience of writing the proof, we record the greedy result in the second Greedy+Max phase as T and the partial greedy solution as Tj. By Lemma 3, there exist i,j such that

f(S1)(12ϵ)f(Gi1O),f(S2)(12ϵ)f(Tj1(OG)).

If f(GO)δf(O), then f(S3)12f(GO)δ2f(O) by [19]. If f(GO)δf(O), then

f(S1)(12ϵ)(f(Gi1O)+f(GO)δf(O)).

Thus, we have that

max{f(S1),f(S2)}12(f(S1)+f(S2))1ϵ4f(Gi1O)+1ϵ4f(GO)+1ϵ4f(Tj1(OG))(1ϵ)δ4f(O)1ϵ4f(Gi1Tj1O)+1ϵ4f(OG)+1ϵ4f(GO)(1ϵ)δ4f(O)1ϵ4(f(Gi1Tj1O)+f(O)δf(O))(1ϵ)(1δ)4f(O).

The third and fourth inequalities hold due to submodularity. The last inequality holds due to non-negativity. Therefore, the approximation ratio of the returned set is at least min{δ/2,(1ϵ)(1δ)/4}. Let δ=(1ϵ)/(3ϵ). We get that the approximation ratio is at least (1ϵ)/6.

Finally, since the algorithm from Algorithm 9 make O(logb/ϵ) passes, Algorithm 10 also makes O(logb/ϵ) passes in total. At most O(B~) elements are stored by the multi-pass streaming algorithm. The multi-pass streaming algorithm makes at most O(logb/ϵ) queries per element.      □

6 Conclusions and future work

We have created a streaming algorithm framework for submodular maximization with a general (possibly non-monotone) objective function. This framework enables us to propose 1-pass streaming algorithms that provide improved approximation ratios for non-monotone submodular maximization under a d-knapsack constraint or a knapsack constraint. Additionally, we have developed a multi-pass streaming algorithm for knapsack constraints. All of these algorithms are efficient and deterministic.

For the d-knapsack constraint, a question that we are more concerned about next is whether the approximation ratio of the streaming algorithm can be further improved. The current analysis of the cost of the d-knapsack is still a bit rough. We believe that through more detailed discussion, better results can be achieved. We also believe that if we can find a way to balance the cost of each dimension, the approximation ratio of this problem can be improved through a multi-pass streaming algorithm.

As for the single knapsack constraint, there is an unresolved issue on how to enhance the approximation of 1-pass streaming algorithms. Another future research direction is whether increasing the number of feasible solutions of candidate sets will improve the approximation ratio. In theory, the knapsack constraint can yield the same approximation ratio as the cardinality constraint, but this has not been achieved even in the monotone case.

When the objective function is non-monotone, though our algorithms improve the best-known deterministic algorithms, their approximation ratios are still worse than the best randomized algorithms. It is very interesting to fill these gaps.

References

[1]

Kempe D, Kleinberg J, Tardos É. Maximizing the spread of influence through a social network. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2003, 137−146

[2]

Das A, Kempe D. Algorithms for subset selection in linear regression. In: Proceedings of the 40th Annual ACM Symposium on Theory of Computing. 2008, 45−54

[3]

Khanna R, Elenberg E R, Dimakis A G, Negahban S N, Ghosh J. Scalable greedy feature selection via weak submodularity. In: Proceedings of the 20th International Conference on Artificial Intelligence and Statistics. 2017, 1560−1568

[4]

Iyer R, Bilmes J. Algorithms for approximate minimization of the difference between submodular functions, with applications. In: Proceedings of the 28th Conference on Uncertainty in Artificial Intelligence. 2012, 407−417

[5]

Iyer R, Bilmes J. Submodular optimization with submodular cover and submodular knapsack constraints. In: Proceedings of the 26th International Conference on Neural Information Processing Systems. 2013, 2436−2444

[6]

Jin L B, Zhang L, Zhao L . Max-difference maximization criterion: a feature selection method for text categorization. Frontiers of Computer Science, 2023, 17( 1): 171337

[7]

Liang J, Zhang Y, Chen K, Qu B, Yu K, Yue C, Suganthan P N . An evolutionary multiobjective method based on dominance and decomposition for feature selection in classification. Science China Information Sciences, 2024, 67( 2): 120101

[8]

Krause A, Guestrin C . Submodularity and its applications in optimized information gathering. ACM Transactions on Intelligent Systems and Technology, 2011, 2( 4): 32

[9]

Nemhauser G L, Wolsey L A . Best algorithms for approximating the maximum of a submodular set function. Mathematics of Operations Research, 1978, 3( 3): 177–188

[10]

Feige U . A threshold of ln n for approximating set cover. Journal of the ACM, 1998, 45( 4): 634–652

[11]

Nemhauser G L, Wolsey L A, Fisher M L. An analysis of approximations for maximizing submodular set functions—I. Mathematical Programming, 1978, 14(1): 265−294

[12]

Khuller S, Moss A, Naor J . The budgeted maximum coverage problem. Information Processing Letters, 1999, 70( 1): 39–45

[13]

Sviridenko M . A note on maximizing a submodular set function subject to a knapsack constraint. Operations Research Letters, 2004, 32( 1): 41–43

[14]

Vondrak J. Optimal approximation for the submodular welfare problem in the value oracle model. In: Proceedings of the 40th Annual ACM Symposium on Theory of Computing. 2008, 67−74

[15]

Kulik A, Shachnai H, Tamir T. Maximizing submodular set functions subject to multiple linear constraints. In: Proceedings of the 20th Annual ACM-SIAM Symposium on Discrete Algorithms. 2009, 545−554

[16]

Feldman M, Naor J, Schwartz R. A unified continuous greedy algorithm for submodular maximization. In: Proceedings of the 52nd IEEE Annual Symposium on Foundations of Computer Science. 2011, 570−579

[17]

Buchbinder N, Feldman M. Deterministic algorithm and faster algorithm for submodular maximization subject to a matroid constraint. In: Proceedings of the 65th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2024. 2024

[18]

Buchbinder N, Feldman M. Constrained submodular maximization via new bounds for DR-submodular functions. In: Proceedings of the 56th Annual ACM Symposium on Theory of Computing. 2024, 1820−1831

[19]

Buchbinder N, Feldman M . Deterministic algorithms for submodular maximization problems. ACM Transactions on Algorithms, 2018, 14( 3): 32

[20]

Sun X, Zhang J, Zhang S, Zhang Z. Improved deterministic algorithms for non-monotone submodular maximization. In: Proceedings of Computing and Combinatorics: 28th International Conference. 2022, 496−507

[21]

Badanidiyuru A, Mirzasoleiman B, Karbasi A, Krause A. Streaming submodular maximization: massive data summarization on the fly. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2014, 671−680

[22]

Feldman M, Norouzi-Fard A, Svensson O, Zenklusen R. The one-way communication complexity of submodular maximization with applications to streaming and robustness. In: Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing. 2020, 1363−1374

[23]

Alaluf N, Ene A, Feldman M, Nguyen H L, Suh A . An optimal streaming algorithm for submodular maximization with a cardinality constraint. Mathematics of Operations Research, 2022, 47( 4): 2667–2690

[24]

Huang C C, Kakimura N, Yoshida Y . Streaming algorithms for maximizing monotone submodular functions under a knapsack constraint. Algorithmica, 2020, 82( 4): 1006–1032

[25]

Huang C C, Kakimura N . Improved streaming algorithms for maximizing monotone submodular functions under a knapsack constraint. Algorithmica, 2021, 83( 3): 879–902

[26]

Huang C C, Kakimura N . Multi-pass streaming algorithms for monotone submodular function maximization. Theory of Computing Systems, 2022, 66( 1): 354–394

[27]

Yaroslavtsev G, Zhou S, Avdiukhin D. “Bring your own greedy”+max: near-optimal 1/2-approximations for submodular knapsack. In: Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics. 2020, 3263−3274

[28]

Kumar R, Moseley B, Vassilvitskii S, Vattani A . Fast greedy algorithms in mapReduce and streaming. ACM Transactions on Parallel Computing, 2015, 2( 3): 14

[29]

Yu Q, Xu E L, Cui S. Submodular maximization with multi-knapsack constraints and its applications in scientific literature recommendations. In: Proceedings of 2016 IEEE Global Conference on Signal and Information Processing. 2016, 1295−1299

[30]

Gupta A, Roth A, Schoenebeck G, Talwar K. Constrained non-monotone submodular maximization: offline and secretary algorithms. In: Proceedings of the 6th International Workshop on Internet and Network Economics. 2010, 246−257

[31]

Lee J, Mirrokni V S, Nagarajan V, Sviridenko M . Maximizing nonmonotone submodular functions under matroid or knapsack constraints. SIAM Journal on Discrete Mathematics, 2010, 23( 4): 2053–2078

[32]

Fadaei S, Fazli M A, Safari M A . Maximizing non-monotone submodular set functions subject to different constraints: combined algorithms. Operations Research Letters, 2011, 39( 6): 447–451

[33]

Buchbinder N, Feldman M . Constrained submodular maximization via a nonsymmetric technique. Mathematics of Operations Research, 2019, 44( 3): 988–1005

[34]

Sun X, Zhang J, Zhang S, Zhang Z . Improved deterministic algorithms for non-monotone submodular maximization. Theoretical Computer Science, 2024, 984: 114293

RIGHTS & PERMISSIONS

Higher Education Press

AI Summary AI Mindmap
PDF (1672KB)

Supplementary files

Highlights

927

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/