Quantum Algorithms for Learning and Testing Juntas

# Quantum Algorithms for Learning and Testing Juntas

Chicago, IL 60603
Rocco A. Servedio Department of Computer Science
Columbia University
New York, NY 10027
July 20, 2007
###### Abstract

In this article we develop quantum algorithms for learning and testing juntas, i.e. Boolean functions which depend only on an unknown set of out of input variables. Our aim is to develop efficient algorithms:

• whose sample complexity has no dependence on , the dimension of the domain the Boolean functions are defined over;

• with no access to any classical or quantum membership (“black-box”) queries. Instead, our algorithms use only classical examples generated uniformly at random and fixed quantum superpositions of such classical examples;

• which require only a few quantum examples but possibly many classical random examples (which are considered quite “cheap” relative to quantum examples).

Our quantum algorithms are based on a subroutine FS which enables sampling according to the Fourier spectrum of ; the FS subroutine was used in earlier work of Bshouty and Jackson on quantum learning. Our results are as follows:

• We give an algorithm for testing -juntas to accuracy that uses quantum examples. This improves on the number of examples used by the best known classical algorithm.

• We establish the following lower bound: any FS-based -junta testing algorithm requires queries.

• We give an algorithm for learning -juntas to accuracy that uses quantum examples and random examples. We show that this learning algorithms is close to optimal by giving a related lower bound.

juntas, quantum query algorithms, quantum property testing, computational learning theory, quantum computation, lower bounds
###### pacs:
03.67.-a, 03.67.Lx
thanks: Work done while at the Department of Mathematics, Columbia University, New York, NY 10027thanks: Supported in part by NSF award CCF-0347282, by NSF award CCF-0523664, and by a Sloan Foundation Fellowship.

## I Introduction

### i.1 Motivation

The field of computational learning theory deals with the abilities and limitations of algorithms that learn functions from data. Many models of how learning algorithms access data have been considered in the literature. Among these, two of the most prominent are via membership queries and via random examples. Membership queries are “black-box” queries; in a membership query, a learning algorithm submits an input to an oracle and receives the value of . In models of learning from random examples, each time the learning algorithm queries the oracle it receives a labeled example where is independently drawn from some fixed probability distribution over the space of all possible examples. (We give precise definitions of these, and all the learning models we consider, in Section II.)

In recent years a number of researchers have considered quantum variants of well-studied models in computational learning theory, see e.g. AKMPY (); AS05 (); BSHJA (); C06 (); HMPPR (); IKRY (); RSSG (). As we describe in Section II, models of learning from quantum membership queries and from fixed quantum superpositions of labeled examples (we refer to these as quantum examples) have been considered; such oracles have been studied in the context of quantum property testing as well BFNR (); FMSS (); MN (). One common theme in the existing literature on quantum computational learning and testing is that these works study algorithms whose only access to the function is via some form of quantum oracle such as the quantum membership oracle or quantum example oracles mentioned above. For instance, BSHJA () modifies the classical Harmonic Sieve algorithm of JACKSON () so that it uses only uniform quantum examples to learn formulas. BFNR () considers the problem of quantum property testing using quantum membership queries to give an exponential separation between classical and quantum testers for certain concept classes. AS05 () studies the information-theoretic requirements of exact learning using quantum membership queries and Probably Approximately Correct (PAC) learning using quantum examples. Many other articles such as RSSG (); AKMPY (); HMPPR () could further extend this list.

As the problem of building large scale quantum computers remains a major challenge, it is natural to question the technical feasibility of large scale implementation of the quantum oracles considered in the literature. It is desirable to minimize the number of quantum (as opposed to classical) oracle queries or examples required by quantum algorithms. Thus motivated, in this paper we are interested in designing testing and learning algorithms with access to both quantum and classical sources of information (with the goal of minimizing the quantum resources required).

### i.2 Our results

All of our positive results are based on a quantum subroutine due to BSHJA (), which we will refer to as an FS (Fourier Sample) oracle call. As explained in Section II, a call to the FS oracle yields a subset of (this set should be viewed as a subset of the input variables of ) drawn according to the Fourier spectrum of the Boolean function . As demonstrated by BSHJA (), such an oracle can be implemented using uniform quantum examples from a uniform distribution quantum example oracle. In fact, all of our algorithms will be purely classical apart from their use of the FS oracle. Thus, all of our algorithms can be implemented within the (uniform distribution) quantum PAC model first proposed by BSHJA (). This model is a natural quantum extension of the classical PAC model introduced by Valiant Val84 (), as described in Section II. We emphasize that no membership queries, classical or quantum, are used in our algorithms, only uniform quantum superpositions of labeled examples, and we recall that such uniform quantum examples cannot efficiently simulate even classical membership queries in general (see BSHJA ()).

Our approach of focusing only on the FS oracle allows us to abstract away from the intricacies of quantum computation, and renders our results useful in any setting in which an FS oracle can be provided to the user. In fact, learning and testing with FS oracle queries may be regarded as a new distinct model (which may possibly be weaker than the uniform distribution quantum example model).

We are primarily interested in the information theoretic requirements (i.e. the number of oracle calls needed) of the learning and testing problems that we discuss. We give upper and lower bounds for a range of learning and testing problems related to -juntas; these are Boolean functions that depend only on (an unknown subset of) at most of the input variables . Juntas have been the subject of intensive research in learning theory and property testing in recent years, see e.g. AR (); AR2 (); Blum (); CG04 (); FKRSS (); LMMV (); MOS04 ().

Our first result, in Section III, is a -junta testing algorithm which uses FS oracle calls. Our algorithm uses fewer queries than the best known classical junta testing algorithm due to Fischer et al. FKRSS (), which uses membership queries. However, since the best lower bound known for classical membership query based junta testing (due to Chockler and Gutfreund CG04 ()) is , our result does not rule out the possibility that there might exist a classical membership query algorithm with the same query complexity.

To complement our FS based testing algorithm, we establish a new lower bound: Any -junta testing algorithm that uses only a FS oracle requires calls to the FS oracle. This shows that our testing algorithm is not too far from optimal.

Finally, we consider algorithms that can both make FS queries and also access classical random examples. In Section IV we give an algorithm for learning -juntas over that uses FS queries and random examples. Since any classical learning algorithm requires examples (even if it is allowed to use membership queries), this result illustrates that it is possible to reduce the classical query complexity substantially (in particular, to eliminate the dependence on ) if the learning algorithm is also permitted to have some very limited quantum information. Moreover most of the consumption of our algorithm is from classical random examples which are considered quite “cheap” relative to quantum examples. From another perspective, our result shows that for learning -juntas, almost all the quantum examples used by the algorithm of Bshouty and Jackson BSHJA () can in fact be converted into ordinary classical random examples. We show that our algorithm is close to best possible by giving a nearly matching lower bound.

### i.3 Organization

In Section II we describe the models and problems we will consider and present some useful preliminaries from Fourier analysis and probability. Section III gives our results on testing juntas and Section IV gives our results on learning juntas.

## Ii Preliminaries

### ii.1 The problems and the models

In keeping with standard terminology in learning theory, a concept over is a Boolean function , where stands for True and stands for False. A concept class is a set of concepts where consists of those concepts in whose domain is For ease of notation throughout the paper we will omit the subscript in and simply write to denote a collection of concepts over .

The concept class we will chiefly be interested in is the class of -juntas. A Boolean function is a -junta if depends only on out of its input variables.

#### ii.1.1 The problems

We are interested in the following computational problems:

PAC Learning under the uniform distribution:

Given any target concept , an -learning algorithm for concept class under the uniform distribution outputs a hypothesis function which, with probability at least , agrees with on at least a fraction of the inputs in This is a widely studied framework in the learning theory literature both in classical (see for instance KM (); JACKSON ()) and in quantum (see BSHJA ()) versions.

Property testing:

Let be any Boolean function . A property testing algorithm for concept class is an algorithm which, given access to , behaves as follows:

• If then the algorithm outputs Accept with probability at least ;

• If is -far from any concept in (i.e. for every concept , and differ on at least an fraction of all inputs), then the algorithm outputs Reject with probability at least .

The notion of property testing was first developed by GGR () and RS96 (). Quantum property testing was first studied by Buhrman et al. BFNR (), who first gave an example of an exponential separation between the query complexity of classical and quantum testers for a particular concept class.

Note that a learning or testing algorithm for “knows” the class but does not know the identity of the concept . While our primary concern is the number of oracle calls that our algorithms use, we are also interested in time efficient algorithms for testing and learning; for the concept class of -juntas, these are algorithms running in poly time steps.

#### ii.1.2 Classical oracles

In order for learning and testing algorithms to gather information about the unknown concept , they need an information source called an oracle. The number of times an oracle is queried by an algorithm is referred to as the query complexity. Sometimes our algorithms will be allowed access to more than one type of oracle in our discussion.

In this paper we will consider the following types of oracles that provide classical information:

Membership oracle :

For a Boolean function, a membership oracle is an oracle which, when queried with input , outputs the label assigned by to

Uniform random example oracle :

A query of the random example oracle returns an ordered pair where is drawn uniformly random from the set of all possible inputs.

Clearly a single call to an oracle can simulate the random example oracle . Indeed oracle queries are considered “cheap” compared to membership queries. For example, in many settings it is possible to obtain random labeled examples but impossible to obtained the label of a particular desired example (consider prediction problems dealing with phenomena such as weather or financial markets). We note that the set of concept classes that are known to be efficiently PAC learnable from uniform random examples only is rather limited, see e.g. KL (); MANSOUR (). In contrast, there are known efficient algorithms that use membership queries to learning important function classes such as (Disjunctive Normal Form) formulas JACKSON ().

#### ii.1.3 Quantum oracles:

We will consider the following quantum oracles, which are the natural quantum generalizations of membership queries and uniform random examples respectively.

Quantum membership oracle :

The quantum membership oracle is the quantum oracle whose query acts on the computational basis states as follows:

 QMQ(f):|x,b⟩↦|x,b⋅f(x)⟩, where x∈{−1,1}n and b∈{−1,1}.
Uniform quantum examples :

The uniform quantum example oracle is the quantum oracle whose query acts on the computational basis state as follows:

 QEX(f):|1n,1⟩↦∑x∈{−1,1}n12n/2|x,f(x)⟩.

The action of a query is undefined on other basis states, and an algorithm may only invoke the query on the basis state .

It is clear that a oracle can simulate a oracle or an oracle, and a oracle can simulate an oracle.

The model of PAC learning with a uniform quantum example oracle was introduced by Bshouty and Jackson in BSHJA (). Several researchers have also studied learning from a more powerful oracle, see e.g. AKMPY (); AS05 (); IKRY (); RSSG (). Turning to property testing, we are not aware of prior work on quantum testing using only the oracle; instead researchers have considered quantum testing algorithms that use the more powerful oracle, see e.g. BFNR (); FMSS (); MN ().

### ii.2 Harmonic analysis of functions over {−1,1}n

We will make use of the Fourier expansion of real valued functions over . We write to denote the set of variables .

Consider the set of real valued functions over endowed with the inner product

 ⟨f,g⟩=E[fg]=12n∑xf(x)g(x)

and induced norm . For each , let be the parity function It is a well known fact that the functions form an orthonormal basis for the vector space of real valued functions over with the above inner product. Consequently, every can be expressed uniquely as:

 f(x)=∑S⊆[n]^f(S)χS(x)

which we refer to as the Fourier expansion or Fourier transform of . Alternatively, the values are called the Fourier coefficients or the Fourier spectrum of .

Parseval’s Identity, which is an easy consequence of orthonormality of the basis functions, relates the values of the coefficients to the values of the function:

###### Lemma II.1 (Parseval’s Identity)

For any , we have . Thus for a Boolean valued function .

We will use the following simple and well-known fact:

###### Fact II.2 (See Km ())

For any and any , we have

 Prx[f(x)≠sgn(g(x))]≤Ex[(f(x)−g(x))2]=∑S⊆[n]|^f(S)−^g(S)|2

Recall that the influence of a variable on a Boolean function is the probability (taken over a uniform random input for ) that changes its value when the -th bit of is flipped, i.e.

 Infi(f)=Prx[f(xi←−1)≠f(xi←1)].

It is well known (see e.g. KKL ()) that

###### Fact II.3 (Data Processing Inequality)

Let be two random variables over the same domain. For any (possibly randomized) algorithm , one has that

 ∥A(X1)−A(X2)∥1≤∥X1−X2∥1.

Let be random variables corresponding to sequences of draws taken from two different distributions over the same domain. By the above inequality, if is known to be small, then the probability of success must be small for any algorithm designed to distinguish if the draws are made according to or .

We will also use standard Chernoff bounds on tails of sums of independent random variables:

Let be i.i.d. random variables with mean taking values in the range . Then for all we have .

### ii.4 The Fourier sampling oracle: Fs

###### Definition II.5

Let be a Boolean function. The Fourier sampling oracle is the classical oracle which, at each invocation, returns each subset of variables with probability , where denotes the Fourier coefficient corresponding to as defined in Section II.2.

This oracle will play an important role in our algorithms. Note that by Parseval’s Identity we have so the probability distribution over sets indeed has total weight 1.

In BSHJA () Bshouty and Jackson describe a simple constant-size quantum network QSAMP, which has its roots in an idea from BV97 (). QSAMP allows sampling from the Fourier spectrum of a Boolean function using oracle queries:

###### Fact II.6 (See Bshja ())

For any Boolean function , it is possible to simulate a draw from the oracle with probability using queries to .

All the algorithms we describe are actually classical algorithms that make FS queries.

## Iii Testing juntas

Fischer et al. FKRSS () studied the problem of testing juntas given black-box access (i.e., classical membership query access) to the unknown function using harmonic analysis and probabilistic methods. They gave several different algorithms with query complexity independent of , the most efficient of which yields the following:

###### Theorem III.1 (See (Fkrss, , Theorem 6))

There is an algorithm that tests whether an unknown is a -junta using membership queries.

Fischer et al. also gave a lower bound on the number of queries required for testing juntas, which was subsequently improved by Chockler et al. to the following:

###### Theorem III.2 (See Cg04 ())

Any algorithm that tests whether is a -junta or is -far from every -junta must use membership queries.

We emphasize that that both of these results concern algorithms with classical membership query access.

### iii.1 A testing algorithm using O(k/ϵ)Fs oracle calls

In this section we describe a new testing algorithm that uses the FS oracle and prove the following theorem about its performance:

###### Theorem III.3

There is an algorithm that tests the property of being a -junta using calls to the FS oracle.

As described in Section II, the algorithm can thus be implemented using uniform quantum examples from .

Proof: Consider the following algorithm which has FS oracle access to an unknown function . Algorithm first makes calls to the FS oracle; let denote the union of all the sets of variables received as responses to these oracle calls. Algorithm then outputs “Accept” if and outputs “Reject” if .

It is clear that if is a -junta then outputs “Accept” with probability 1. To prove correctness of the test it suffices to show that if is -far from any -junta then outputs “Reject

The argument is similar to the standard analysis of the coupon collector’s problem. Let us view the set as growing incrementally step by step as successive calls to the FS oracle are performed.

Let be a random variable which denotes the number of FS queries that take place starting immediately after the -st new variable is added to , up through the draw when the -th new variable is added to . If the -st and -th new variables are obtained in the same draw then . (For example, if the first three queries to the FS oracle are , , then we would have , , , , .)

Since is -far from any -junta, we know that for any set of variables, it must be the case that

 ∑S⊆T^f(S)2≤1−ϵ

(since otherwise if we set and use Fact II.2, we would have

 Prx[f(x)≠h(x)]≤Ex[(f(x)−g(x))2]=∑S⊈T^f(S)2<ϵ

which contradicts the fact that is -far from any -junta). It follows that for each , if at the current stage of the construction of we have , then the probability that the next FS query yields a new variable outside of is at least . Consequently we have for each , and hence

 E[X1+⋯+Xk+1]≤(k+1)ϵ.

By Markov’s inequality, the probability that is at least , and therefore with probability at least it will be the case after draws that and the algorithm will consequently output “Reject.”

Note that the uniform quantum examples required for Algorithm improves on the query complexity of the best known classical algorithm. However our result does not conclusively show that queries are more powerful than classical membership queries for this problem since it is conceivable that there could exist an as yet undiscovered classical membership query algorithm.

### iii.2 Lower bounds for testing with a Fs oracle

#### iii.2.1 A first approach

As a first attempt to obtain a lower bound on the number of FS oracle calls required to test -juntas, it is natural to consider the approach of Chockler et al. from CG04 (). To prove Theorem III.2, Chockler et al. show that any classical algorithm which can successfully distinguish between the following two probability distributions over black-box functions must use queries:

• Scenario I: The distribution is uniform over the set of all Boolean functions over variables which do not depend on variables

• Scenario II: The distribution is defined as follows: to draw a function from this distribution, first an index is chosen uniformly from , and then is chosen uniformly from among those functions that do not depend on variables or on variable .

The following observation shows that this approach will not yield a strong lower bound for algorithms that have access to a FS oracle:

###### Observation III.4

With queries to a FS oracle, it is possible to determine w.h.p. whether a function is drawn from Scenario I or Scenario II.

Proof: It is easy to see that a function drawn from Scenario I is simply a random function on the first variables. The Fourier spectrum of random Boolean functions is studied in OS03 (), where it is shown that sums of squares of Fourier coefficients of random Boolean functions are tightly concentrated around their expected value. In particular, Proposition 6 of OS03 () directly implies that for any fixed variable we have:

 Prf←D(0)k,n⎡⎣∑S∋xi^f(S)2>13⎤⎦

Thus with overwhelmingly high probability, if is drawn from Scenario I then each FS query will “expose” variable with probability at least . It follows that after queries all variables will have been exposed; so by making FS queries and simply checking whether or not variables have been exposed, one can determine w.h.p. whether is drawn from Scenario I or Scenario II.
Thus we must adopt a more sophisticated approach to prove a strong lower bound on FS oracle algorithms.

#### iii.2.2 An Ω(√k) lower bound for Fs oracle algorithms

Our main result in this section is the following theorem:

###### Theorem III.5

Any algorithm that has FS oracle access to an unknown must use oracle calls to test whether is a -junta.

Proof: Let be such that for some positive integer We let denote The addressing function on variables has “addressing variables,” which we shall denote and “addressee variables” which we denote The output of the function is the value of variable where the “address” is the element of whose binary representation is given by . Figure 1 depicts a decision tree that computes the addressing function in the case . Formally, the Addressing function is defined as follows:

 {Addressing}(x1,x2,…,xr,z0,z1,…,zR−1)=zx, where x=(1−x12)∘(1−x22)∘…∘(1−xr2) in binary form and ∘ is binary concatenation.

Intuitively, the Addressing function will be useful for us because as we will see the Fourier spectrum is “spread out” over the addressee variables; this will make it difficult to distinguish the Addressing function (which is not a -junta since and as we shall see is in fact far from every -junta) from a variant which is a -junta.

Let be the variables that our Boolean functions are defined over. We now define two distributions , over functions on these variables.

The distribution is defined as follows: to make a draw from ,

1. First uniformly choose a subset of variables from ;

2. Next, replace the variables in the function

with the variables in (choosing the variables from in a uniformly random order). Return the resulting function.

Note that step (2) in the description of making a draw from above corresponds to placing the variables in uniformly at the leaves of the decision tree for Addressing (see Figure 1).

Equivalently, if we write to denote the following function over variables

a draw from is a function chosen uniformly at random from the set where ranges over all permutations of

It is clear that every function in (the support of ) depends on variables and thus is not a -junta. In fact, every function in is far from being a -junta:

###### Lemma III.6

Every that has nonzero probability under is -far from any -junta.

Proof: Fix any such and let be any -junta. It is clear that at least of the “addressee” variables of are not relevant variables for . For a fraction of all inputs to , the value of is determined by one of these addressee variables; on such inputs the error rate of relative to will be precisely

Fix any function in . We now give an expression for the Fourier representation of . The expression is obtained by viewing as a sum of subfunctions, one for each leaf of the decision tree, where each subfunction takes the appropriate nonzero value on inputs which reach the corresponding leaf and takes value 0 on all other inputs:

 fτ(x1,…,xr,y0,…,yn−r−1) =R−1∑i=i1i2…ir=0yτ(i)(1+(−1)i1x12)(1+(−1)i2x22)…(1+(−1)irxr2) (III.2)
 =12rR−1∑i=0∑X⊆{x1,…,xr}(−1)(∑xj∈Xij)yτ(i)χX. (III.3)

Note that whenever , the sum on the RHS of Equation (III.2) has precisely one non-zero term which is . This is because the rest of the terms are annihilated since in each of these terms there is some index such that which makes . Consequently this sum gives rise to exactly the Addressing function in Equation (III.1) which is defined as and consequently the equality in Equation (III.2) follows. Equation (III.3) follows easily from rearranging (III.2).

Now we turn to

The distribution is defined as follows: to make a draw from ,

1. First uniformly choose a subset of variables from ;

2. Next, replace the variables in the function

with the variables in (choosing the variables from in a uniformly random order).

3. Finally, for each do the following: if variable was used to replace variable in the previous step, let be a fresh uniform random value and replace variable with . Return the resulting function.

Observe that for any integer with binary expansion , we have that the binary expansion of is . Thus steps (2) and (3) in the description of making a draw from may be restated as follows in terms of the decision tree representation for Addressing:

• Place the variables randomly among the leaves of the decision tree with index less than .

• For each variable placed at the leaf with index above, throw a valued coin and place at the antipodal leaf location with index: .

Equivalently, if we write to denote the following function over variables

a draw from is a function chosen uniformly at random from the set where ranges over all permutations of and ranges over all of . It is clear that every function in depends on at most variables, and thus is indeed a -junta.

By considering the contribution to the Fourier spectrum from each pair of leaves of the decision tree, we obtain the following expression for the Fourier expansion of each function in the support of :

 gτ,s(x1,…,xr,y0,…,yn−r−1)= + R/2−1∑i=0siyτ(i)(1+(−1)¯¯¯¯i1x12)(1+(−1)¯¯¯¯i2x22)…(1+(−1)¯¯¯¯irxr2) (III.5)
 (III.6)

Just as in the Equation (III.2), whenever , the sum on the RHS of Equation (III.5) has precisely one non-zero term which is if and if . Therefore this sum gives rise to exactly the Addressing function in Equation (III.4) which is defined as and consequently the equality in Equation (III.5) follows.

It follows that for each in the support of and for any fixed , all elements of the set will have the same parity. Moreover, when draws from are considered, for every distinct this odd/even parity is independent and uniformly random.

Now we are ready to prove Theorem III.5. Recall that a FS oracle query returns with probability for every subset of input variables to the function. Considering the equations (III.3) and (III.6), for any in or its FS oracle will return a pair of the form .

Let us define a set of “typical” outcomes from FS oracle queries. Fix any , and let denote the set of all sequences of length which have the property that no occurs more than once among .

Note that for any fixed , every non-zero Fourier coefficient satisfies due to Equation (III.3). Therefore after is drawn, for any fixed the probability of receiving a response of the form as the outcome of a FS query is either

,

if is not a function of , i.e. ; or

,

if . This is because each of the responses occurs with probability .

Similarly, for any fixed , every non-zero Fourier coefficient satisfies due to Equation (III.6). Therefore after is drawn, for any fixed the probability of receiving a response of the form as the outcome of a FS query is either

,

if is not a function of , i.e. ; or

,

if . This is because each of the responses occurs with probability .

Now let us consider the probability of obtaining a sequence from under each scenario.

• If the function is drawn from : the probability is at least

• If the function is from : the probability is at least

Now the crucial observation is that whether the function is drawn from or from , each sequence in is equiprobable by symmetry in the construction. To see this, simply consider the probability of receiving a fixed for some new in the next FS query of an unknown function drawn from either one of these distributions. Using the above calculations for , one can directly calculate that these probabilities are equal in either scenario. Alternatively, for a function drawn from one can observe that since each successive is “new”, a fresh random bit determines whether the support is an with odd or even; once this is determined, the choice of is uniform from all subsets with the correct parity. Thus the overall draw of is uniform over all ’s. Considering that the subset of relevant variables is uniformly chosen from , this gives the equality of the probabilities for each with a new when the function is drawn from . The argument for the case of is clear.

Consequently the statistical difference between the distributions corresponding to the sequence of outcomes of the FS oracle calls under the two distributions is at most . Now Fact II.3 implies that no algorithm making only oracle calls can distinguish between these two scenarios with high probability. This gives us the result, and concludes the proof of Theorem III.5.

Intuitively, under either distribution on functions, each element of a sequence of FS oracle calls will “look like” a uniform random draw from subsets of and from where and are independent. Note that this argument breaks down at . This is because if the algorithm queried the FS oracle times it will start to see some ’s more than once with constant probability (again by the birthday paradox). But when the functions are drawn from the corresponding ’s will always have a fixed parity for a given whereas for functions drawn from the parity will be random each time. This will provide the algorithm with sufficient evidence to distinguish with constant probability between these two scenarios.

## Iv Learning juntas

### iv.1 Known results

The problem of learning an unknown -junta has been well studied in the computational learning theory literature, see e.g. MOS04 (); AR (); Blum (). The following classical lower bound will be a yardstick against which we will measure our results.

###### Lemma IV.1

Any classical membership query algorithm for learning -juntas to accuracy must use membership queries.

Proof: Consider the restricted problem of learning an unknown function which is simply a single Boolean variable from . Since any two variables disagree on half of all inputs, any -learning algorithm can be easily modified into an algorithm that exactly learns an unknown variable with no more queries. It is well known that any set of concepts requires queries for any exact learning algorithm that uses membership queries only, see e.g. BCG+96 (). This gives the lower bound.

For the lower bound, we may suppose that the algorithm “knows” that the junta has relevant variables . Even in this case, if fewer than membership queries are made the learner will have no information about at least of the function’s output values. A straightforward application of the Chernoff bound shows that it is very unlikely for such a learner’s hypothesis to be -accurate, if the target junta is a uniform random function over the relevant variables. This establishes the result.

Learning juntas from uniform random examples is a seemingly difficult computational problem. Simple algorithms based on exhaustive search can learn from examples but require runtime. The fastest known algorithm in this setting, due to Mossel et al., uses examples and runs in examples time, where is the matrix multiplication exponent MOS04 ().

Bshouty and Jackson BSHJA () gave an algorithm using uniform quantum examples from the oracle to learn general formulas. Their algorithm uses calls to to learn an -term over variables to accuracy . Since any -junta is expressible as a with at most terms, their result immediately yields the following statement.

###### Theorem IV.2 (See Bshja ())

There exists an -learning quantum algorithm for -juntas using quantum examples under the uniform distribution quantum PAC model.

Note that BSHJA () did not try to optimize the quantum query complexity of their algorithms in the special case of learning juntas. In contrast, our goal is to obtain a more efficient algorithm for juntas.

The lower bound of (AS05, , Observation 6.3) for learning with quantum membership queries for an arbitrary concept class can be rephrased for the purpose of learning -juntas as follows.

###### Fact IV.3 (See As05 ())

Any algorithm for learning -juntas to accuracy with quantum membership queries must use queries.

Proof: Since we are proving a lower bound we may assume that the algorithm is told in advance that the junta depends on variables Consequently we may assume that the algorithm makes all its queries with nonzero amplitude only on inputs of the form . Now (AS05, , Observation 6.3) states that any quantum algorithm which makes queries only over a shattered set (as is the set of inputs for the class of -juntas) must make at least VC-DIM()/100 queries to learn with error rate at most ; here VC-DIM() is the Vapnik-Chervonenkis dimension of concept class . Since the VC dimension of the class of all Boolean functions over variables is , the result follows.
This shows that a oracle cannot provide sufficient information to learn a -junta using queries to high accuracy. It is worth noting that there are other similar learning problems known where an -query algorithm can exactly identify a target concept whose description length is bits. For instance, a single FS oracle call (which can be implemented by a single query) can potentially give up to bits of information; if the concept class is the class of all parity functions over the first variables, then any concept in the class can be exactly learned by a single FS oracle call.

Note that all the results we have discussed in this subsection concern algorithms with access to only one type of oracle; this is in contrast with the algorithm we present in the next section.

### iv.2 A new learning algorithm

The motivating question for this section is: “Is it possible to reduce the classical query/sample complexity drastically for the problem of junta learning if the learning algorithm is also permitted to have very limited quantum information?” We will give an affirmative answer to this question by describing a new algorithm that uses both FS queries (i.e. quantum examples) and classical uniform random examples.

###### Lemma IV.4

Let be a function whose value depends on the set of variables . Then there is an algorithm querying the FS oracle times which w.h.p. outputs a list of variables such that

• the list contains all the variables for which ; and

• all the variables in the list have non-zero influence: .

Proof: The algorithm simply queries the FS oracle many times and outputs the union of all the sets of variables received as responses to these queries.

If then the probability that never occurs in any response obtained from the FS oracle calls is at most The union bound now yields that with probability at least , every with is output by the algorithm.

###### Theorem IV.5

There is an efficient algorithm -learning -juntas with queries of the FS oracle and random examples.

Proof: We claim Algorithm 1 satisfies these requirements.

Assume we are given a Boolean function whose value depends on the set of variables with . By Lemma IV.4, queries of the FS oracle will reveal all variables with influence at least with high probability during Stage 1.

Assuming the algorithm of Lemma IV.4 was successful, we group the variables as follows:

Group Description
The set of variables encountered in Stage 1.
The set of relevant variables .
The remaining variables the function does not depend on.

Note that by Lemma IV.4 and by the assumption that is a -junta.

We reorder the variables of so that the new order is for notational simplicity, i.e. is now considered to be over . We will denote an assignment to these variables by .

In Stage 2 the algorithm draws random examples until at least fraction of all assignments to the variables in are observed. Let us call this set of assignments by , and for every , let us denote the first example drawn in Stage 2 for which by . At the end of the algorithm, the following hypothesis is produced as the output:

 H(a,∗,∗)={