Testing submodularity and other properties of valuation functions

# Testing submodularity and other properties of valuation functions

Eric Blais
School of Computer Science
University of Waterloo
eric.blais@uwaterloo.ca
Abhinav Bommireddi
School of Computer Science
University of Waterloo
vabommir@uwaterloo.ca
###### Abstract

We show that for any constant and , it is possible to distinguish functions that are submodular from those that are -far from every submodular function in distance with a constant number of queries.

More generally, we extend the testing-by-implicit-learning framework of Diakonikolas et al. (2007) to show that every property of real-valued functions that is well-approximated in distance by a class of -juntas for some can be tested in the -testing model with a constant number of queries. This result, combined with a recent junta theorem of Feldman and Vondrák (2016), yields the constant-query testability of submodularity. It also yields constant-query testing algorithms for a variety of other natural properties of valuation functions, including fractionally additive (XOS) functions, OXS functions, unit demand functions, coverage functions, and self-bounding functions.

## 1 Introduction

Property testing is concerned with approximate decision problems of the following form: given oracle access to some function and some fixed property of such functions, how many oracle calls (or queries) to does a bounded-error randomized algorithm need to distinguish the cases where has the property from the case where is -far—under some appropriately defined metric—from having the same property? Remarkably, many natural properties of functions can be tested with a number of queries that is independent of the size of the function’s domain. For example, for any constant and , a constant number of queries suffices to test whether a Boolean function is linear [7]; a polynomial of degree at most  [27]; a -junta [18, 5]; a monomial [26]; computable by a Boolean circuit of size  [10]; or a linear threshold function [24].

In this work, we consider the problem of testing properties of bounded real-valued functions over the Boolean hypercube. In particular, are there natural examples of such properties that are testable with a constant number of queries? This question is best considered in the testing framework introduced by Berman, Raskhodnikova, and Yaroslavtsev [4]. In this setting, the distance between a function and some property of these functions is .

### 1.1 Testing properties of valuation functions

Natural properties of bounded real-valued Boolean functions have been studied extensively in the context of valuation functions in algorithmic game theory. For a sequence of goods labeled with the indices , we can encode the value of each subset of these goods to some agent with a function by setting to be the (possibly normalized) value of the subset to the agent. Such a valuation function is

if there are weights such that ;

a Coverage function

if there exists a universe , non-negative weights , and subsets such that .

Unit demand

if there are weights such that ;

OXS

if there are unit demand functions such that where the maximum is taken over all such that for every , ;

Gross Substitutes

if for any and any that maximize and , respectively, every for which and also satisfies ;

Submodular

if for every , where and are the bitwise AND and OR operations;

iff there are non-negative real valued weights such that ;

Self-bounding

if , where and is the bitwise XOR operator; and

if for every .

Each of these properties enforces some structure on valuation functions, and much work has been devoted to better understanding these structures (and their algorithmic implications) by studying the properties through the lenses of learning theory [3, 2, 15], optimization [13, 14], approximation [17, 16], and sketching [1]. The problem of testing whether an unknown valuation function satisfies one of these properties offers another angle from which we can learn more about the structure imposed on the functions that satisfy these properties.

Indeed, there has already been some recent developments on the study of testing these properties. Notably, Seshadhri and Vondrák [29] initiated the study of testing submodularity for functions over the hypercube and showed that in the setting where we measure the distance to submodularity in terms of Hamming distance (rather than distance), submodularity can be tested with queries and that it cannot be tested with a number of queries that is independent of . Subsequently, Feldman and Vondrák [17] showed that in the testing framework, we can do much better: testing submodularity in this model requires a number of queries that is only logarithmic in . Our first result shows that, in fact, for any value of , it is possible to test submodularity in the setting with a number of queries that is completely independent of .

###### Theorem 1.1.

For any and any , there is an -tester for submodularity in the testing model with query complexity .

Another property that has been considered in the (standard Hamming distance) testing model is that of being a coverage function. Chakrabarty and Huang [8] showed that for constant values of , queries suffice to -test whether a function is a coverage function on some universe of size . Note that, unlike in the learning and approximation settings, bounds on the number of queries required to test some property do not imply anything about number of queries required to test properties , so even though coverage functions are submodular, results on testing submodularity do not imply any bounds on the query complexity for testing coverage functions. Nonetheless, our next result shows that this property—along with most of the other properties of valuation functions listed above—can also be tested with a number of queries that is independent of .

###### Theorem 1.2.

For any and any , there are -testers in the testing model for

• additive functions, coverage functions, unit demand functions, OXS functions, and gross substitute functions that each have query complexity ; and

• fractional subadditivity and self-bounded functions that have query complexity .

Theorems 1.1 and 1.2 are both special cases of a general testing result that we obtain by extending the technique of testing by implicit learning of Diakonikolas et al. [10]. We describe this general result in more details below.

### 1.2 Testing real-valued functions by implicit learning

There is a strong connection between property testing and learning theory that goes back to the seminal work of Goldreich, Goldwasser, and Ron [21]. As they first observed, any proper learning algorithm for the class of functions that have some property can also be used to test : run the learning algorithm, and verify whether the resulting hypothesis function is close to the tested function or not. This approach yields good bounds on the number of queries required to test many properties of functions, but, as simple information theory arguments show, it cannot yield query complexity bounds that are smaller than for almost all natural properties of functions over .

Diakonikolas et al. [10] bypassed this limitation for the special case when every function that has some property is close to a junta. A function is a -junta if there is a set of cardinality such that the value of on any input is completely determined by the values for each . Every -junta has corresponding “core” functions that define its value based on the value of the relevant coordinates of its input. Diakonikolas et al.’s key insight is that for testing properties whose functions are (very) close to juntas, it suffices to learn the core of the input function—without having to identify the identity of the relevant coordinates.

The wide applicability of the testing-by-implicit-learning methodology is due to the fact that for many natural properties of Boolean functions, the functions that have these properties must necessarily be close to juntas under the Hamming distance. The starting point for the current research is a recent breakthrough of Feldman and Vondrák, who showed that a similar junta theorem holds for many properties of real-valued functions when closeness is measured according to distance.

###### Feldman–Vondrák junta theorem.

Fix any . For every ,

• if is submodular then there exists a submodular function that is a -junta such that ; and

• if is self-bounding then there exists a self-bounding function that is a -junta such that ;

The logarithmic dependence on for the problem of testing submodularity in the testing model [17] follows directly from Feldman and Vondrák’s junta theorem and the (standard) testing-by-proper-learning connection. This junta theorem also suggests a natural approach for obtaining a constant query complexity for the same problem by combining it with a testing-by-implicit-learning algorithm. In order to implement this approach, however, new testing-by-implicit-learning techniques are required to overcome two obstacles.

The first obstacle is that all existing testing-by-implicit-learning algorithms [10, 11, 9, 22] are designed for properties that contain functions which are close to juntas in Hamming distance, not distance. This is a stronger condition, and enables the analysis of these algorithms to assume that with large probability, when is very close to a -junta , the queries made by the algorithm all satisfy . In the distance model, however, we can have a function that is extremely close to a -junta but still has for many (or even every!) input .

The second (related) obstacle that we encounter when considering submodular functions is that current testing-by-implicit-learning algorithms only work in the regime where the functions in are -close to -juntas for some . (See for example the discussion in §2.5 of [28].) This condition is satisfied by the properties of Boolean functions that have been studied previously, but the bounds in the Feldman–Vondrák junta theorem, however, do not satisfy this requirement.

We give a new algorithm for testing-by-implicit-learning that overcomes both of these obstacles. As a result, we obtain the following general theorem.

###### Theorem 1.3.

For any and any property of functions mapping , if is such that for every function , there is a -junta that satisfies , then there is an -tester for in the testing model with query complexity .

Theorems 1.1 and 1.2 are both obtained directly from Theorem 1.3, the Feldman–Vondrák junta theorem and Fact 2.1.

### 1.3 Overview of the proofs

#### The algorithm.

The current testing-by-implicit-learning algorithms proceed in two main stages. In the first stage, the coordinates in are randomly partitioned into parts, and an influence test is used to identify the (at most ) parts that contain relevant variables of an unknown input function that is very close to being a -junta. In the second stage, inputs are drawn at random according to some distribution, the value is observed, and the value of the relevant coordinate in each of the parts identified in the second stage is determined using more calls to the influence test.

The Implicit Learning Tester algorithm that we introduce in this paper reverses the order of the two main stages. In the first stage, it draws a sequence of queries at random and queries the value of on each of these queries. It also uses to partition the coordinates in into random parts according to the values of the coordinate on the queries. In the second stage, the algorithm then uses an influence estimator to identify the parts that contain the relevant coordinates of a -junta that is close to and, since all the coordinates in a common part have the same value on each of the queries, learn the value of the relevant coordinates on each of these initial queries. The algorithm then checks whether the core function thus learned is consistent with those of functions in the property being tested.

The main advantage of the Implicit Learning Tester algorithm that its analysis does not require the assumption that our samples are exactly consistent with those of an actual -junta (instead of those of a function that is only promised to be close to a -junta). This feature enables us to overcome the obstacles listed in the previous section, at the cost of adding a few complications to the analysis, as described below.

#### The analysis.

There are two main technical ingredients in the analysis of the algorithm. The first, established in Lemma 3.1, is used to show that when is close to a -junta in distance, the search procedure identifies parts that contain the relevant coordinates of some -junta that is close to . (Note that the search is not guaranteed to find the parts that contain the relevant coordinates of the -junta that is closest to , but it suffices to find those of any close -junta.)

The second technical ingredient addresses the fact that by drawing the samples first and then using these samples to provide the initial partition of the coordinates in , we no longer will obtain uniformly random samples of the core of the input function . Nonetheless, in Lemma 3.2, we show that when is close to a -junta, the distribution of these samples on the core function still enables us to accurately estimate the distance of to the core functions of any other -junta.

### 1.4 Discussion and open problems

Theorems 1.11.3 raise a number of intriguing questions. The most obvious question left open is whether we can also test subadditivity of real-valued functions with a constant number of queries: subadditive functions need not be close to juntas, so such a result would appear to require a different technique.

It is also useful to compare our bounds for submodularity testing with those for testing monotonicity: in the Hamming distance testing model, Seshadhri and Vondrák [29] showed that the query complexity for testing submodularity is at least as large as that for testing monotonicity. However, the best current bounds for testing monotonicity in the testing model have a linear dependence on  [4]. Is it also possible to test monotonicity with a constant number of queries? Or is it the case that testing submodularity is strictly easier than testing monotonicity in the testing setting?

## 2 Preliminaries

Let denote the set of functions mapping to . For any and with complement , when and , we write to denote the value for the input that satisfies for each and otherwise.

We use the standard definitions and notation for the Fourier analysis of functions . For a complete introduction to the topic, see [25]. Throughout the paper, unless otherwise specified all probabilities and expectations are over the uniform distribution on the random variable’s domain.

### 2.1 Property testing

A property of functions in is a subset of these functions that is invariant under relabeling of the coordinates. The Hamming distance between is and the Hamming distance between and a property is . For , the distance between and is and the distance between and is .

Given , An -tester in the Hamming testing model (resp., testing model) for some property is a randomized algorithm that (i) accepts every function with probability at least and (ii) rejects every function that satisfies (resp., ) with probability at least . An -tester for is an -tolerant tester, for some if it additionally accepts every function that satisfies (resp., ) with probability at least .

Our proofs of Theorems 1.11.3 are established in the testing model. The result for general testing models is obtained from the following elementary relation between the query complexities of testing any property in different testing models.

###### Fact 2.1 (c.f. Fact 5.2 in [4]).

For any , any , and any , the number of queries required to -test in the testing model satisfies

Theorem 1.2 also relies on the following hierarchy of properties. (See, e.g., [23].)

###### Lemma 2.2.

The properties of defined in the introduction satisfy the inclusion hierarchy

### 2.2 Juntas

The function is a junta on the set if for every that satisfy for every , we have . The function is a -junta if it is a junta on some set of cardinality . The function is a core function of the -junta if there is a projection defined by setting for some distinct such that for every , .

###### Definition 2.3.

For any function and set , the -junta projection of is the function defined by setting for every .

A basic fact that we will require is that is the -junta that is closest to under the metric.

###### Proposition 2.4.

For every and , if is a -junta, then .

###### Proof.

By applying the identity and by expanding the right-hand side, we obtain

 ∥f−g∥22 =∥f−fJ∥22+∥fJ−g∥22+2Ex[Ey[(f(x,y)−fJ(x,y))(fJ(x,y)−g(x,y))]].

Since is a -junta, it does not depend on and, by the definition of , the last term equals . Therefore, and the claim follows. ∎

The property is a property of -juntas if every function is a -junta. The core property of a property of -juntas is the property defined by . For any , the -discretized approximation of a function is the function obtained by rounding the value for each to the nearest multiple of . The -discretized approximation of a property is the property .

### 2.3 Influence

The notion of influence of coordinates in functions over the Boolean hypercube plays a central role in both our algorithm and its analysis. Informally, the influence of a set of coordinate measures how much re-randomizing these coordinates affects the value of the function. This notion is made precise as follows.

###### Definition 2.5.

The influence of a set of coordinates in the function is

 Inff(S):=Ex∈{0,1}¯¯¯S[Vary∈{0,1}Sf(x,y)]=12Ex∈{0,1}¯¯¯S[Ey,y′∈{0,1}S[(f(x,y)−f(x,y′))2]].

Our proofs make use of a few standard facts regarding the influence of sets of coordinates in .

###### Fact 2.6.

The influence of in is

###### Fact 2.7.

For every and , we have

###### Fact 2.8.

For every and , we have

###### Proposition 2.9.

Fix , and let satisfy . Then for any set ,

###### Proof.

By Fact 2.8, we have and . By Proposition 2.4, we also have that . Combining these observations with the triangle inequality, we obtain Hence and, similarly, as well. ∎

###### Proposition 2.10.

There is an algorithm EstimateInf such that for every , , , and , it makes queries to and returns an estimate of the influence of in that satisfies

 Pr[|Inff(S)−\textscEstimateInf(f,S,m)|≥t]≤2e−2mt2.

We also use the following key lemma from [6].

###### Lemma 2.11 (Lemma 2.3 in [6]).

Let be a function that is -far from -juntas and be a random partition of into parts. Then with probability at least , for any union of parts from .

For the reader’s convenience, we include the proof of Lemma 2.11 in Appendix A; though the original lemma in [6] was only for Boolean-valued functions, the proof remains essentially unchanged.

## 3 Testing by implicit learning

The proof of Theorem 1.3 is established by analyzing the Implicit Learning Tester algorithm.

### 3.1 Proof of Theorem 1.3

The analysis of the Implicit Learning Tester relies on two technical lemmas. The first shows that when the input function is close to a -junta, then with reasonably large probability, the function is close to a junta on the set of parts that is identified by the algorithm.

###### Lemma 3.1.

For any , if the function is -close to a -junta and every call to EstimateInf returns an influence estimate with additive error at most , then the set obtained by the Junta-Property Tester satisfies

The second lemma shows that the estimate in Step 20 provides a good estimate of the distance between and the functions in .

###### Lemma 3.2.

Fix . Let be a function that satisfies for some function that is a junta on , . Then for every , the mapping defined in the Implicit Learning Tester and the function satisfy

 ∣∣(1qq∑i(f(x(i)))−h(x(i)))2)12−dist2(g,h)∣∣≤3ε

except with probability at most .

The proofs of these lemmas are presented in Sections 3.2 and 3.3. We now show how they are used to complete the proof of Theorem 1.3.

As a first observation, we note that by Hoeffding’s inequality and the union bound, all of the calls to EstimateInf have additive error at most except with probability at most . In the following, we assume that this condition holds and show how, when it does, the algorithm correctly accepts or rejects with probability with probability at least .

###### Claim 3.3 (Completeness).

When is -close to the property of -juntas, the Implicit Learning Tester accepts with probability at least .

###### Proof.

First, by Lemma 3.1, the probability that is rejected on step 17 is at most . In the rest of the proof, we will show that except with probability at most , there is a function for which the algorithm accepts on line 20.

Let be a function that satisfies . Without loss of generality, we can assume that is a junta on . Let be the set of the junta variables of that are contained in the final parts selected by the algorithm. Again without loss of generality (by relabeling the input variables once again if necessary), we can assume that for some , and , for .

Define to be the mapping defined by where are representative coordinates from the remaining parts for which .

Let be the core of corresponding to the projection , and let be the discretized approximation to . Define . By our choice of , we have . In order to invoke Lemma 3.2, we now want to bound .

Let , be the discretized approximation of . Then and the triangle inequality implies that

 dist2(f,h∗)≤dist2(f,g)+dist2(g,h∗)≤2ϵ106

and that

 dist2(g,h)≤dist2(g,h∗)+dist2(h∗,h)≤dist2(h∗,h)+ϵ106.

Furthermore, since ,

 dist2(h∗,h) =Ex[(h∗core(x1,…,xk)−h∗core(x1,…,xj,xi1,…,xik−j))2]12 =2Infh∗core([k]∖[j])12=2Infh∗([n]∖[j])12.

By Proposition 2.9 and Lemma 3.1, except with probability at most ,

 Infh∗([n]∖[j])12≤Inff([n]∖[j])12+dist2(f,h∗)≤Inff([n]∖SB)12+2ϵ106≤12ϵ106

and the distance between and is bounded by . When this bound holds, by Lemma 3.2 with , the algorithm accepts for this except with probability at most . ∎

###### Claim 3.4 (Soundness I).

If is -far from being a -junta, then the Implicit Learning Tester rejects with probability at least .

###### Proof.

The initial partition is a random partition of with more than parts so, by Lemma 2.11, with probability at least , for any union of at most of these parts we have . When this is the case, the inclusion and the fact that is the complement of the union of some set of parts in the random partition imply that

 Inff([n]∖SB)≥Inff(L0)≥ϵ2400

and, under the assumed accuracy of EstimateInf calls, the algorithm rejects in Step 17. ∎

###### Claim 3.5 (Soundness II).

If is -close to a -junta, but is -far from , then the Implicit Learning Tester rejects with probability at least .

###### Proof.

Let be any -junta that satisfies . For any and any injective mapping , the function is in and so by the triangle inequality,

 dist2(f,F(ϵ1000))≥dist2(f,F)−ϵ1000

and

 dist2(g,h)≥dist2(f,h)−dist2(f,g)≥99100ϵ−ϵ1000−ϵ100≥97100ϵ.

Then, by Proposition 2.9 and the union bound over all functions in , with probability at least , the condition in Step 20 is never satisfied and the algorithm rejects. ∎

To complete the proof of Theorem 1.3 in the case where , consider now any property that contains only functions which are -close to some -junta. Let be the property that includes all -juntas that are -close to . Claim 3.3 shows that Implicit Learning Tester accepts every function in with the desired probability, and Claims 3.4 and 3.5 shows that it rejects all functions that are -far from . Finally, we note that the query complexity of the algorithm is at most , as claimed. Finally, the general result for testing when follows from Fact 2.1.

### 3.2 Proof of Lemma 3.1

Let be any function -close to a -junta and assume without loss of generality (by relabeling the input variables if necessary) that is close to a junta on . The definition of in step 3, means that is a random partition of . So by the union bound, the probability that any two of the coordinates in land in the same part is at most .

For each , let denote the set of variables that have been “eliminated” after iterations of the loop. Then and

 Inff([n]∖SB)=Inff(L0)+r∑ℓ=1(Inff(Lℓ)−Inff(Lℓ−1)). (1)

We bound both terms on the right-hand side of the expression separately.

By Proposition 2.9, we have and so by the monotonicity of influence there is a choice of of size for which . The guaranteed accuracy on EstimateInf then implies that

 Inff(L0)≤(1+2100k2)ε2. (2)

Define to be the set of rounds for which the algorithm eliminated at least one of the coordinates in . By this definition, each satisfies and

 ∑ℓ∈[r]∖EInff(Lℓ)−Inff(Lℓ−1) =∑ℓ∈[r]∖E ∑T:T∩Lℓ≠∅∧T∩Lℓ−1=∅^f(T)2 ≤∑T⊆[n]∖[k]^f(T)2≤Inff([n]∖[k])≤ε2. (3)

For each , define to be the set of coordinates in the parts that contain a coordinate in that was eliminated in the th iteration of the loop. Let also be the coordinates in the parts that were kept instead. Then the guaranteed accuracy of EstimateInf and the choice of implies that

 Inff(Lℓ)≤Inff(