Correlation Robust Stochastic Optimization
We consider a robust model proposed by Scarf, 1958, for stochastic optimization when only the marginal probabilities of (binary) random variables are given, and the correlation between the random variables is unknown. In the robust model, the objective is to minimize expected cost against worst possible joint distribution with those marginals. We introduce the concept of correlation gap to compare this model to the stochastic optimization model that ignores correlations and minimizes expected cost under independent Bernoulli distribution. We identify a class of functions, using concepts of summable cost sharing schemes from game theory, for which the correlation gap is well-bounded and the robust model can be approximated closely by the independent distribution model. As a result, we derive efficient approximation factors for many popular cost functions, like submodular functions, facility location, and Steiner tree. As a byproduct, our analysis also yields some new results in the areas of social welfare maximization and existence of Walrasian equilibria, which may be of independent interest.
Stochastic optimization models decision making under uncertain or unknown problem data. We consider stochastic optimization problems in which the uncertain variable is the “demand” set. For example, in stochastic network design problems, the random variable is the subset of source-destination pairs to be connected; in stochastic facility location problem, the random variable is the subset of potential clients that will have a demand; and in stochastic set cover problem, it is the subset of elements that need to be covered. In general, such a stochastic program can be expressed as
where is the decision variable which lies in a constrained
set , and the random subset cannot be observed
before the decisions is made. is the cost
function which depends on both the decision and the outcome
The objective of stochastic programming is to minimize the
expected cost, which depends on the joint distribution of items in
In stochastic optimization, it is typically assumed that the distribution of random variable is either known or can be sampled from [1, 3, 14]. In this model, sample average approximation (SAA) has been used give approximation algorithms for many two-stage stochastic discrete optimization problems, including stochastic set cover , uncapacitated facility location , and Steiner tree problem . Those models are suitable when one does have access to a lot of time invariant reliable statistical information. In this paper, we study the problem when information about a part of the distribution (marginals) is known. In the case when only marginal probabilities of each element are available, a common heuristic is to assume that the distribution of random set a product distribution. In other words, each element may appear in independently with a given probability . For example, see [8, 9]. However, there is a conventional wisdom that ignoring correlations can have catastrophic consequences. Examples can be constructed such that the cost of the solution optimized against the independent distribution performs very poorly once certain correlations are introduced.
To address such problems, Scarf (1958, ) proposed a correlation-robust or distributionally-robust stochastic model, which minimizes the expected cost over distributions having a fixed marginal probability for each , but with any possible correlations. For a problem instance , we wish to find
where is the expected cost under worst-case distribution when decision has been made, given by
We believe this is a very useful model because it takes advantage of
the stochasticity of the input, and at the same time efficiently
utilizes the available information. On the other hand, it defines an
exponential size linear program which makes the problem potentially
difficult to solve. A common strategy for such linear programs is to solve the corresponding dual LP with exponential number of constraints, using separating hyperplane approach. However, for the above model, approximating the separating hyperplane problem can be shown to be harder than the max-cut problem even for the special case when the function is submodular in .
A natural question is how much risk it involves to simply ignore the correlations and minimize the expected cost of independent distribution instead of the worst case distribution. Or, in other words, how well the stochastic optimization model with independent distribution approximates the correlation robust model. The focus of this paper is to study this correlation gap. For a particular problem instance and a decision , we define the correlation gap as the ratio between the expected cost under the worst case distribution and that under the independent distribution on . Correlation gap has many interesting implications for stochastic optimization problems. A small upper bound on correlation gap allows relaxation of the stochastic optimization problem under any distribution, including the worst case distribution model (2), to the product distribution case which is often more efficient to solve either by sampling or by other algorithmic techniques [8, 9]. Further, in many real data collection scenarios, practical constraints can make it very difficult (or costly) to learn the complete information about correlations in data. In those cases, the correlation gap can provide a guideline to decide how important it is to spend resources on learning these correlations. In other words, it measures the “value of correlations” in the statistical data. Our main result is to characterize a wide class of functions for which the correlation gap can be well bounded. We also provide counter-examples showing large correlation gap for various other classes of functions.
Below, we summarize our key results:
A class of functions with bounded correlation gap: For functions that are non-decreasing in and have a cross-monotone, -budget balance, (weak) -summable cost-sharing scheme, we show that the correlation gap is upper bounded by . This will give correlation gap bounds (and matching approximation factors for robust model) of for submodular functions, for facility location, and for Steiner forest, where , the size of ground set.
Hardness results: We show examples with correlation gap of for functions supermodular in , for monotone subadditive functions in , and for submodular functions. These examples will also prove corresponding lower bounds on approximation factors that can be achieved by substituting independent distribution for the robust model.
Polynomial-time algorithm for supermodular functions: We analytically characterize the worst case distribution when function is supermodular in , and consequently give a polynomial-time algorithm for the correlation robust model provided is convex in .
New results for welfare maximization problems: As a byproduct, our result provides a -approximation algorithm for the well-studied problem of social welfare maximization in combinatorial auctions, when the utility functions are identical and admit -cost-sharing scheme. Notably, this implies -approximation for identical submodular utility functions, matching the best approximation factor (Vondrak, 2008 ) for this case.
We also provide a simple counterexample for the conjecture by Bikhchandani  that markets that have buyers with identical submodular utilities admit a Walrasian price equilibria.
The rest of the paper is organized as follows. To begin, Section 2 will provide a mathematical definition of correlation gap, and examples showing large correlation gap for certain classes of cost functions. In Section 3, we present our main technical theorem that upper bounds the correlation gap for a wide class of cost functions, and discuss its implications on various stochastic optimization problems and the welfare maximization problem. The proof of this theorem is presented in Section 4. Finally, in Section 5, we end with a direct solution of correlation robust model for supermodular functions.
2 Correlation Gap
For a problem instance and at a given decision , we define correlation gap as the ratio between the expected cost of the worst case distribution and that of the independent distribution, i.e.,
where is the independent Bernoulli distribution (also called product distribution) with marginals , and is the worst-case distribution (as given by (3)).
Suppose that for some particular cost function , the correlation gap can be upper bounded above by for all , then it is not difficult to show that the decision obtained assuming independent distribution will give a -approximate solution to the corresponding robust optimization problem. More precisely, let is the optimal solution to the stochastic optimization problem (1) with independent Bernoulli distribution, and is the optimal solution to the correlation robust problem (2). Then,
Using the bound on correlation gap at , this implies
Unfortunately, for general cost functions, the correlation gap and hence the corresponding approximation factor can be large in order of , as demonstrated by the following examples.
Example 1. (Minimum cost flow: correlation gap for supermodular functions)
(Sketch) Consider a two-stage minimum cost flow problem as in Figure 1. There is a single source , and sinks . Each sink has a probability to request a demand, and then a unit flow has to be sent from to . Each arc has a fixed capacity , but the the capacity of arc needs to be purchased at a cost in the first stage, and a higher cost in the second stage after the set of demand requests is revealed. , are given as
Given the first stage decision , the cost of edges that need to be bought in the second stage to serve a set of requests is given by: . It is easy to check that is supermodular in for any given , i.e. for any . The objective is to minimize the total expected cost . If the decision maker assumes independent demands from the sinks, then minimizes the expected cost, and the expected cost is ; however, for the worst case distribution the expected cost of this decision will be (when and all other scenario have zero probability). Hence, the correlation gap at is exponentially high. A risk-averse strategy is to use the robust solution , which leads to a cost . Thus, approximation ratio . ∎
Example 2. (Stochastic set cover: correlation gap for subadditive functions)
(Sketch) Consider a set cover problem with elements . Each item has a marginal probability of to appear in the random set . The covering sets are defined as follows. Consider a partition of into sets each containing elements. The covering sets are all the sets in the cartesian product . Each set has unit cost. Then, cost of covering a set is given by subadditive function
The worst case distribution with marginal probabilities is one where probabilities for , , and otherwise. The expected value of under this distribution is . For independent distribution, , where are independent -binomially distributed random variables.
As approaches , since expected value of remains fixed at , the Binomial(, ) distribution approaches the Poisson distribution with expected value . Using some known results on maxima of independent poisson random variables in , it can be shown that for large , the expected value of the maximum of i.i.d. poisson random variables is bounded by (refer to Appendix A for a detailed proof). This implies that is bounded by for large . So the correlation gap is atleast .
To obtain approximation lower bound for two-stage stochastic set cover instance, extend the above instance as follows. For ease of notation, let , where is a constant such that . Let the first stage cost of a covering set to be for some small , and the second stage cost to be . For a given first stage cover , let be the set of elements covered by , then . Using above analysis for function , the optimal solution for independent distribution will be to buy no (or very few) sets in the first stage giving for independent distribution, but cost for worst case distribution. On the other hand, the optimal robust solution considering worst case distribution is to cover all the elements in the first stage giving cost in the worst case. Thus, approximation ratio .
These examples indicate that using independent distribution may not always give a good approximation to the robust model. However, below we identify a wide class of functions for which correlations may be ignored to get efficient solutions for stochastic optimization problems.
3 A class of functions with low correlation gap
A key contribution of our paper is to identify a class of cost functions for which the correlation gap is well bounded.
To our interest, many popular cost functions including submodular functions, facility location, Steiner forest, etc. belong to this class, which will lead to efficient approximations for these problems.
We derive our characterization using concepts of cost-sharing. A cost-sharing scheme is a function defining how to share the cost of a service among the serviced customers. We consider the class of cost functions such that for every feasible , there exists some cost-sharing scheme for allocating the cost among members of set with (a) -budget balance (b) weak cross-monotonicity, and (c) weak -summability. Below we precisely state these properties. Since we assume that can take any fixed value, we will abbreviate as for simplicity when clear from the context.
A cost-sharing scheme is cross-monotonic if it satisfies the property that everyone is better off when the set of people who receive the service expands . Roughgarden et al  introduced an additional property of summability for cost-sharing schemes. Here, we will define a slightly weaker version of these properties by requiring them to hold only for given ordering on a subset of . More precisely, we define a cost-sharing scheme as a function that, for each element and ordering on , specifies the share of in . The three properties of budget-balance, weak cross-monotonicity and weak summability are now stated as follows:
-budget balance: For all , and orderings on :
Cross-monotonicity: For all , , :
Here , means that the ordering is a restriction of ordering to subset .
Weak -summability: For all , and orderings :
where is the element and is the set of the first members of according to ordering . And, is the restriction of on . Note that this is a weaker requirement than the conventional definition of summability, where a single cost-sharing function must satisfy the given inequality for all orderings on the ground set .
We may re-emphasize that any cost-sharing scheme satisfying the conventional definition of -budget-balance, cross-monotonicity and -summability (as in [10, 11]) will always satisfy the above weaker conditions. However, this relaxation to weak conditions could give significant savings in approximation factors for some cases. For example, submodular functions satisfy the above weak conditions with and for the incremental cost-sharing scheme:
where is the set of the first members of according to ordering .
On the other hand, for the conventional definition of summability, a lower bound of was shown for submodular functions in .
Let us call a cost-sharing scheme satisfying the above three properties an -cost-sharing scheme. Also, we say that a function is non-decreasing in if for every and every , . Our main result is the following theorem, which we will prove in the next section:
For any instance , if for all feasible , the cost function is non-decreasing in and has an -cost-sharing scheme for elements in , then the correlation gap is bounded as .
As described in Section 2, this gives following corollary for approximating the correlation robust model:
For instances as defined in Theorem 1, an approximate solution for correlation robust optimization problem can be constructed by solving the corresponding stochastic optimization problem under independent distribution.
Further, it is easy to show that for these functions, the variance under independent distribution is bounded by , where . Thus, if the cost function is convex in , these stochastic optimization problems may be solved efficiently using sample average approximation (SAA) method . For specific problems, the structural simplicity provided by independent distribution may even eliminate the need of using sample average approximation.
Before moving on to the proof of Theorem 1, let us briefly discuss its implications for various stochastic optimization problems, and for a seemingly unrelated problem of welfare maximization in combinatorial auctions:
3.1 Stochastic optimization with submodular functions
A function is submodular if for all , and . These cost functions are characterized by diminishing marginal costs, which is common for resource allocation problems where a resource can be shared by multiple users and thereby the marginal cost decreases as number of users increases. As discussed earlier, for submodular functions . Therefore, Theorem 1 directly leads to the following corollary:
If the cost function is non-decreasing and submodular in for all feasible , then for any instance , the correlation gap is bounded by the constant .
The next example shows the bound is tight for submodular functions.
Example 3. (Tightness) Let , define if , and . Let each item has a probability . Then the worst case distribution is for each , with expected value . The independent distribution has an expected cost as . ∎
3.2 Stochastic Uncapacitated Facility Location (SUFL)
In two-stage stochastic facility location problem, any facility can be bought at a low cost in the first stage, and higher cost in the second stage, that is, after the random set of cities to be served is revealed. The decision maker’s problem is to decide , the facilities to be build in the first stage so that the total expected cost of facility location is minimized (refer to  for further details on the problem definition).
Given a first stage decision , the cost function , where is the cost of deterministic UFL for set of customers and set of facilities such that the facilities already bought in first stage are available freely at no cost, while any other facility costs . For this deterministic UFL cost function there exists a cross-monotonic, -budget balanced, summable cost-sharing scheme . Therefore, using Theorem 1, we get following bound on correlation gap:
The correlation gap for Stochastic uncapacitated facility location is bounded by , where , the number of cities to be served.
This observation reduces our robust facility location problem to the well-studied stochastic UFL problem under known (independent Bernoulli) distribution  at the expense of an approximation factor.
3.3 Stochastic Steiner Tree (SST)
In the two-stage stochastic Steiner tree problem, we are given a graph . An edge can be bought at cost in the first stage. The random set of terminals to be connected are revealed in the second stage. More edges may be bought at a higher cost in the second stage after observing the actual set of terminals. Here, decision variable is the edges to be bought in the first stage, and cost function , where is the Steiner tree cost function for set given that the edges in are already bought. Since a -summable, -budget balanced cost sharing method is known for this cost function [12, 4], we can conclude:
The correlation gap for Stochastic Steiner tree is bounded by , where , the number of terminals to be connected.
This observation reduces our robust problem to the well-studied (for example see ) SST problem under known (independent Bernoulli) distribution at the expense of an -approximation factor.
3.4 Welfare Maximization Problem
Finally, Theorem 1 extends some existing results for social welfare maximization in combinatorial auctions.
Consider the problem of maximizing total utility achieved by partitioning goods among players each with utility function for subset of goods
Observe that on relaxing the integrality constraints on and scaling it by , the above problem reduces to that of finding the worst-case distribution (i.e. one that maximizes expected value of function ) such that the marginal probability of each element is . Therefore:
Consequently, the correlation gap bound in Theorem 1 leads to the following corollary for welfare maximization problems:
For welfare maximization problems with goods and players with identical utility functions , the randomized algorithm that assigns goods independently to each of the players with probability gives approximation to the optimal partition; given that function is non-decreasing and admits an -cost-sharing scheme.
Since for submodular functions, the above result matches the approximation factor provided by Vondrak  for this problem in case of identical monotone submodular functions.
The reader may observe that even though approximating the worst case distribution directly provides a matching approximation for the corresponding welfare maximization problem, the converse is not true. In addition to having uniform probabilities , solutions for welfare maximization approximate the integer program (5), where as the worst case distribution requires solving the corresponding LP relaxation. The latter is a strictly harder problem unless the integrality gap is . A notable example is the above-mentioned case of identical submodular functions. This case was studied by Bhikchandani  in context of Walrasian equilibria who conjectured a integrality gap for this problem implying the existence of Walrasian equilibria. However, in appendix C , we show a simple counter-example with non-zero integrality gap () for this problem. As a byproduct, this counter-example proves that even for identical submodular valuation functions, Walrasian equilibria may not exist.
For a problem instance and fixed , use and to denote the expected cost of worst-case distribution and independent Bernoulli distribution respectively. In this section, we prove our main technical result that the correlation gap
when is non-decreasing and admits cost-sharing in . As before, we will abbreviate as for simplicity.
The proof is structured as follows. We first focus on special instances of the problem in which all ’s are equal to for some integer , and the worst case distribution is a “K-partition-type” distribution. That is, the worst case distribution divides the elements of into disjoint sets , and each occurs with probability . Observe that for such instances, the expected value under worst case distribution is . In Lemma 1, we show that for such “nice” instances the correlation gap is bounded by . Then, we use a “split” operation to reduce any given instance of our problem to a nice instance such that the reduction can only increase the correlation gap. This will show that the bound for nice instances is an upper bound for any instance of the problem, thus concluding the proof of the theorem.
For instances such that (a) is non-decreasing and admits an -cost-sharing scheme (b) marginal probabilities are all equal to for some integer , and (c) the worst case distribution is a -partition-type distribution, the correlation gap is bounded as:
Let the optimal -partition corresponding to the worst case distribution is . Assume w.l.o.g that . Fix an order on elements of such that for all , the elements in come before . For every set , let be the restriction of ordering on set elements of set . Let is the cost-sharing scheme for function , as per the assumptions of the lemma. Then by weak -summability of :
where the expected value is taken over independent distribution.
Denote . Let . We will show that
Recursively using this inequality will prove the result. To prove this inequality, denote , , for any . Since elements in come after the elements in in ordering , note that for any , , and for , .
Since , using cross-monotonicity of , the second term above can be bounded as:
Because and are mutually independent, for any fixed , each will have the same conditional probability of appearing in . Therefore,
Again, using independence and cross-monotonicity, analyze the first term in the right hand side of (4),
The last inequality follows from monotonicity of . Expanding the above recursive inequality for , , , we get
Since is decreasing in , and by simple arithmetic one can show
By definition of , this gives:
Next, we reduce a general problem instance to an instance satisfying the properties required in Lemma 1.
We use the following split operation.
Given a problem instance , and integers , define a new instance as follows: split each item into copies , and assign a marginal probability of to each copy. Let denote the new ground set containing all the duplicates. Define the new cost function as:
where is the original subset of elements whose duplicates appear in , i.e. .
The split operation has following properties. Their proofs will be given in Appendix B .
If is a non-decreasing function in , then so is .
If is non-decreasing in , then splitting does not change the worst case expected value, that is:
If is non-decreasing in , then splitting can only decrease the expected value over independent distribution:
The remaining proof tries to use these properties of split operation for reducing any given instance to a “nice” instance so that Lemma 1 can be invoked for proving the correlation gap bound.
Proof of Theorem 1. Suppose that the worst case distribution for instance is not a partition-type distribution. Then, split any element that appears in two different sets. Simultaneously, split the distribution by assigning probability to the each set that contains exactly one copy of . Repeat until the distribution becomes a partition. Since each new set in the new distribution contains exactly one copy of , by definition of function , this splitting does not change the expected function value. By Property 2 of Split operation, the worst case expected values for the two instances (before and after splitting) must be the same, so this partition forms a worst case distribution for the new instance. Then, we further split each element (and simultaneously the distribution) until such that the marginal probability of each new element is for some large enough integer
By the properties 2, 3 of Split operation, the correlation gap can only becomes larger on splitting. So, we can focus on proving the correlation gap bound for the new instance. Now, let us consider the remaining condition (a) of Lemma 1. By Property 1, the cost function obtained by splitting is non-decreasing. Given the original cost-sharing method for , we show that there exists a cost-sharing method for the new instance such that is (1) -budget balanced (2) weak -summable, and (3) cross monotone in following weaker sense. is cross-monotone for any such that respect the partial order of elements, and is a partial-prefix of , that is, for some , , and . The construction of this cost-sharing scheme is given in appendix, Lemma 3.
Thus, all the conditions in Lemma 1 are satisfied by the new instance except for the cross-monotonicity. The weaker cross-monotonicity that the new instance satisfies is actually sufficient to prove Lemma 1. To see this, observe that cross monotonicity is used only in Equation 4 and 4, and at both of these places, the required prefix condition is satisfied. Thus, Lemma 1 can be invoked to bound the correlation gap for the new instance, thereby completing the proof. ∎
5 Supermodular functions
In the end, we directly consider the correlation robust model for cost functions which are supermodular in . As shown in Section 2, the correlation gap for these cost functions can be exponentially high, so independent distribution does not give a good approximation to the worst case distribution. However, it is easy to characterize the worst case distribution and directly solve the correlation robust model in this case.
Given that function is supermodular, the worst case distribution over has the following closed form
where ; is the member of and is the set of first members of , both with respect to a specific ordering over such that .
For cost functions that are supermodular in for any feasible , the robust optimization problem is simply formulated as:
Thus, if is convex in and is a convex set, then it is a convex optimization problem and can be solved efficiently.
The authors would like to thank Ashish Goel and Mukund Sundarajan for many useful insights on the problem.
Appendix A Maximum of Poisson Random Variables
In this section, we show that the expected value of the maximum of a set of independent identically distributed poisson random variables can be bounded as for large .
Let denote the mean, and denote the distribution of i.i.d. poisson variables . Define . Also define continuous extension of :
Note that for any non-negative integer . Let is defined by . Define continuous function . Then, in , it is shown that for large , .
We use these asymptotic results to derive a bound on expectation of for large .
Next, we show that the integral term on the right hand side is bounded by a constant for large . Substituting in the integration on the right hand side, we get
denotes the derivative of function . The last step follows because for large enough (i.e. if ). Further, since is a decreasing function in , it follows that: