Faster Private Release of Marginals on Small Databases^{†}^{†}thanks: Harvard University, School of Engineering and Applied Sciences. Email: {karthe, jthaler, jullman, atw12}@seas.harvard.edu. Karthekeyan Chandrasekaran is supported by Simons Fellowship. Justin Thaler is supported by an NSF Graduate Research Fellowship and NSF grants CNS1011840 and CCF0915922. Jonathan Ullman is supported by NSF grant CNS1237235 and a Siebel Scholarship. Andrew Wan is supported by NSF grant CCF0964401 and NSFC grant 61250110218.
Abstract
We study the problem of answering way marginal queries on a database , while preserving differential privacy. The answer to a way marginal query is the fraction of the database’s records with a given value in each of a given set of up to columns. Marginal queries enable a rich class of statistical analyses on a dataset, and designing efficient algorithms for privately answering marginal queries has been identified as an important open problem in private data analysis. For any , we give a differentially private online algorithm that runs in time
per query and answers any (adaptively chosen) sequence of way marginal queries with error at most on every query, provided . To the best of our knowledge, this is the first algorithm capable of privately answering marginal queries with a nontrivial worstcase accuracy guarantee for databases containing records in time . Our algorithm runs the private multiplicative weights algorithm (Hardt and Rothblum, FOCS ’10) on a new approximate polynomial representation of the database.
We derive our representation for the database by approximating the OR function restricted to low Hamming weight inputs using lowdegree polynomials with coefficients of bounded norm. In doing so, we show new upper and lower bounds on the degree of such polynomials, which may be of independent approximationtheoretic interest. First, we construct a polynomial that approximates the variate function on inputs of Hamming weight at most such that the degree of the polynomial is at most and the norm of its coefficient vector is . Then we show the following lower bound that exhibits the tightness of our approach: for any , any polynomial whose coefficient vector has norm that pointwise approximates the variate OR function on all inputs of Hamming weight at most must have degree .
1 Introduction
Consider a database in which each of the rows corresponds to an individual’s record, and each record consists of binary attributes. The goal of privacypreserving data analysis is to enable rich statistical analyses on the database while protecting the privacy of the individuals. In this work, we seek to achieve differential privacy [DMNS06], which guarantees that no individual’s data has a significant influence on the information released about the database.
One of the most important classes of statistics on a dataset is its marginals. A marginal query is specified by a set and a pattern . The query asks, “What fraction of the individual records in has each of the attributes set to ?” A major open problem in privacypreserving data analysis is to efficiently release a differentially private summary of the database that enables analysts to answer each of the marginal queries. A natural subclass of marginals are way marginals, the subset of marginals specified by sets such that .
Privately answering marginal queries is a special case of the more general problem of privately answering counting queries on the database, which are queries of the form, “What fraction of individual records in satisfy some property ?” Early work in differential privacy [DN03, BDMN05, DMNS06] showed how to privately answer any set of counting queries approximately by perturbing the answers with appropriately calibrated noise, ensuring good accuracy (say, within of the true answer) provided .
However, in many settings data is difficult or expensive to obtain, and the requirement that is too restrictive. For instance, if the query set includes all way marginal queries then , and it may be impractical to collect enough data to ensure , even for moderate values of . Fortunately, a remarkable line of work initiated by Blum et al. [BLR08] and continuing with [DNR09, DRV10, RR10, HR10, HLM12, GRU12, JT12], has shown how to privately release approximate answers to any set of counting queries, even when is exponentially larger than . For example, the online private multiplicative weights algorithm of Hardt and Rothblum [HR10] gives accurate answers to any (possibly adaptively chosen) sequence of queries provided . Hence, if the sequence consists of all way marginal queries, then the algorithm will give accurate answers provided . Unfortunately, all of these algorithms have running time at least per query, even in the simplest setting where is the set of way marginals.
Given this state of affairs, it is natural to seek efficient algorithms capable of privately releasing approximate answers to marginal queries even when . The most efficient algorithm known for this problem, due to Thaler, Ullman, and Vadhan [TUV12] (building on the work of Hardt, Rothblum, and Servedio [HRS12]) runs in time and releases a summary from which an analyst can compute the answer to any way marginal query in time .
Even though can be much smaller than , a major drawback of this algorithm and other efficient algorithms for releasing marginals (e.g. [GHRU11, CKKL12, HRS12, FK13, DNT13]) is that the database still must be significantly larger than , which we know would suffice for inefficient algorithms. Recent experimental work of Hardt, Ligett, and McSherry [HLM12] demonstrates that for some databases of interest, even the time private multiplicative weights algorithm is practical, and also shows that more efficient algorithms based on adding independent noise do not provide good accuracy for these databases. Motivated by these findings, we believe that an important approach to designing practical algorithms is to achieve a minimum database size comparable to that of private multiplicative weights, and seek to optimize the running time of the algorithm as much as possible. In this paper we give the first algorithms for privately answering marginal queries for this parameter regime.
1.1 Our Results
In this paper we give faster algorithms for privately answering marginal queries on databases of size , which is nearly the smallest a database can be while admitting any differentially private approximation to marginal queries [BUV13].
Theorem 1.1.
There exists a constant such that for every , , and every , there is an differentially private online algorithm that, on input a database , runs in time
per query and answers any sequence of (possibly adaptively chosen) way marginal queries on up to an additive error of at most on every query with probability at least , provided that .
When is much smaller than , it may be useful to view our algorithm as an offline algorithm for releasing answers to all way marginal queries. This offline algorithm can be obtained simply by requesting answers to each of the distinct way marginal queries from the online mechanism. In this case we obtain the following corollary.
Corollary 1.2.
There exists a constant such that for every , , and every , there is an differentially private offline algorithm that, on input a database , runs in time
and, with probability at least , releases answers to every way marginal query on up to an additive error of at most , provided that .
Here , and the number of way marginals on is bounded by a polynomial in this quantity. See Table 1 for a comparison of relevant results on privately answering marginal queries.
Remarks.

When , the minimum database size requirement can be improved to , but we have stated the theorems with a weaker bound for simplicity. (Here is a universal constant and the is with respect to .)

Our algorithm can be modified so that instead of releasing approximate answers to each way marginal explicitly, it releases a summary of the database of size from which an analyst can compute an approximate answer to any way marginal in time .
A key ingredient in our algorithm is a new approximate representation of the database using polynomial approximations to the variate OR function restricted to inputs of Hamming weight at most . For any such polynomial, the degree determines the runtime of our algorithm, while the weight of the coefficient vector determines the minimum required database size. Although lowdegree low weight polynomial approximations to the OR function have been studied in the context of approximation theory and learning theory [STT12], our setting requires an approximation only over a restricted subset of the inputs. When the polynomial needs to approximate the OR function only on a subset of the inputs, is it possible to reduce the degree and weight (in comparison to [STT12]) of the polynomial?
Our main technical contribution addresses this variant of the polynomial approximation problem. We believe that our construction of such polynomials (Theorem 1.3) as well as the lower bound (Theorem 1.4) could be of independent approximationtheoretic interest. The following theorem shows a construction of polynomials that achieve better degree and weight in comparison to [STT12] for small values of . Let denote the OR function on variables with the convention that is TRUE, and for any vector , let denote the number of coordinates of equal to .
Theorem 1.3.
Let . For some constant , there exists a polynomial such that

for every ,

the weight of the coefficient vector of is at most , and

the degree of is at most
The degree bound of in the above theorem follows directly from techniques developed in [STT12], while the degree bound of requires additional insight. We also show a lower bound to exhibit the tightness of our construction.
Theorem 1.4.
Let , and let be a real variate polynomial satisfying for all with . If the weight of the coefficient vector of is , then the degree of is at least .
We note that our algorithmic approach for designing efficient private data release algorithms would work equally well if we have any small set of functions whose lowweight linear combinations approximate disjunctions restricted to inputs of Hamming weight at most . Our lower bound limits the applicability of our approach if we choose to use lowdegree monomials as the set of functions. We observe that this also rules out several natural candidates that can themselves be computed exactly by a lowweight polynomial of lowdegree (e.g., the set of smallwidth conjunctions). There is some additional evidence from prior work that lowdegree monomials may be the optimal choice: if we only care about the size of the set of functions used to approximate disjunctions on inputs of Hamming weight at most , then prior work shows that lowdegree monomials are indeed optimal [She11] (see also Section 5 in the full version of [TUV12]). It remains an interesting open question to determine whether this optimality still holds when we restrict the weight of the linear combinations used in the approximations to be .
1.2 Techniques
For notational convenience, we focus on monotone way disjunction queries. However, our results extend straightforwardly to general nonmonotone way marginal queries via simple transformations on the database and queries. A monotone way disjunction is specified by a set of size and asks what fraction of records in have at least one of the attributes in set to .
Following the approach introduced by Gupta et al. [GHRU11] and developed into a general theory in [HRS12], we view the problem of releasing answers to conjunction queries as a learning problem. That is, we view the database as specifying a function , in which each input vector is interpreted as the indicator vector of a set , with iff , and equals the evaluation of the conjunction query specified by on the database . Then, our goal is to privately learn to approximate the function ; this is accomplished in [HRS12] by approximating succinctly with polynomials and learning the polynomial privately. Polynomial approximation is central to our approach as well, as we explain below.
We begin with a description of how the parameters of the online learning algorithm determine the parameters of the online differentially private learning algorithm. We consider the “IDC framework” [GRU12]—which captures the private multiplicative weights algorithm [HR10] among others [RR10, GRU12, JT12]—for deriving differentially private online algorithms from any online learning algorithm that may not necessarily be privacy preserving.
Informally, an online learning algorithm is one that takes a (possibly adaptively chosen) sequence of inputs and returns answers to each, representing “guesses” about the values for the unknown function . After making each guess , the learner is given some information about the value of . The quantities of interest are the running time required by the online learner to produce each guess and the number of “mistakes” made by the learner, which is the number of rounds in which is “far” from . Ultimately, for the differentially private algorithm derived in the IDC framework, the notion of far will correspond to the accuracy, the per query running time will essentially be equal to the running time of the online learning algorithm, and the minimum database size required by the private algorithm will be proportional to the square root of the number of mistakes.
We next describe the wellknown technique of deriving faster online learning algorithms that commit fewer mistakes using polynomial approximations to the target function. Indeed, it is wellknown that if can be approximated to high accuracy by a variate polynomial of degree and weight at most , where the weight is defined to be the sum of the absolute values of the coefficients, then there is an online learning algorithm that runs in time and makes mistakes. Thus, if , the running time of such an online learning algorithm will be significantly less than and the number of mistakes (and thus the minimum database size of the resulting private algorithm) will only blow up by a factor of .
Consequently, our goal boils down to constructing the best possible polynomial representation for any database – one with lowdegree, lowweight such that is small for all vectors corresponding to monotone way disjunction queries. To accomplish this goal, it is sufficient to construct a lowdegree, lowweight polynomial that can approximate the variate OR function on inputs of Hamming weight at most (i.e., those that have in at most indices). Such problems are wellstudied in the approximationtheory literature, however our variant requires polynomials to be accurate only on a restricted subset of inputs. In fact, the existence of a polynomial with degree and weight that approximates the variate OR function on all inputs follows from the work of Servedio et al. [STT12]. We improve these bounds for small values of by constructing an approximating polynomial that has degree and weight .
We also prove a new approximationtheoretic lower bound for polynomials that seek to approximate a target function for a restricted subset of inputs. Specifically, we show that for any , any polynomial of weight that satisfies for all inputs of Hamming weight at most must have degree . We prove our lower bound by expressing the problem of constructing such a lowweight, lowdegree polynomial as a linear program, and exhibiting an explicit solution to the dual of this linear program. Our proof is inspired by recent work of Sherstov [She09, She11, She12b] and BunThaler [BT13].
1.3 Related Work
Other Results on Privately Releasing Marginals.
In work subsequent to our result, Dwork et al. [DNT13] show how to privately release marginals in a very different parameter regime. Their algorithm is faster than ours, running in time , and has better dependence on the error parameter. However, their algorithm requires that the database size is for answering with error . This size is comparable to the optimal only when . In contrast, our algorithm has nearlyoptimal minimum database size for every choice of .
While we have focused on accurately answering every way marginal query, or more generally every query in a sequence of marginal queries, several other works have considered more relaxed notions of accuracy. These works show how to efficiently release a summary of the database from which an analyst can efficiently compute an approximate answer to marginal queries, with the guarantee that the average error of a marginal query is at most , when the query is chosen from a particular distribution. In particular, Feldman and Kothari [FK13] achieve small average error over the uniform distribution with running time and database size ; Gupta et al. [GHRU11] achieve small average error over any product distribution with running time and minimum database size ; finally Hardt et al. [HRS12] show how to achieve small average error over arbitrary distributions with running time and minimum database size . All of these results are based on the approach of learning the function .
Several works have also considered information theoretic bounds on the minimum database size required to answer way marginals. Kasiviswanathan et al. [KRSU10] showed that is necessary to answer all way marginals with error . De [De12] extended this result to hold even when accuracy can be violated for a constant fraction of way marginals. In our regime, where , their results do not give a nontrivial lower bound. In forthcoming work, Bun, Ullman, and Vadhan [BUV13] have proven a lower bound of , which is nearly optimal for .
Relationship with Hardness Results for Differential Privacy.
Ullman [Ull13] (building on the results of Dwork et al. [DNR09]), showed that any time differentially private algorithm that answers arbitrary counting queries can only give accurate answers if , assuming the existence of exponentially hard oneway functions. Our algorithms have running time and are accurate when , and thus show a separation between answering marginal queries and answering arbitrary counting queries.
When viewed as an offline algorithm for answering all way marginals, our algorithm will return a list of values containing answers to each way marginal query. It would in some cases be more attractive if we could return a synthetic database, which is a new database whose rows are “fake”, but such that approximately preserves many of the statistical properties of the database (e.g., all the marginals). Some of the previous work on counting query release has provided synthetic data [BCD07, BLR08, DNR09, DRV10, HLM12].
Unfortunately, Ullman and Vadhan [UV11] (building on [DNR09]) have shown that no differentially private sanitizer with running time can take a database and output a private synthetic database , all of whose way marginals are approximately equal to those of , assuming the existence of oneway functions. They also showed that under certain strong cryptographic assumptions, there is no differentially private sanitizer with running time can output a private synthetic database, all of whose way marginals are approximately equal to those of . Our algorithms indeed achieve this running time and accuracy guarantee when releasing way marginals for constant , and thus it may be inherent that our algorithms do not generate synthetic data.
Relationship with Results in Approximation Theory.
Servedio et al. [STT12] focused on developing lowweight, lowdegree polynomial threshold functions (PTFs) for decision lists, motivated by applications in computational learning theory. As an intermediate step in their PTF constructions, they constructed lowweight, lowdegree polynomials that approximate the OR function on all Boolean inputs. Our construction of lowerweight, lowerdegree polynomials that approximate the OR function on low Hamming weight inputs is inspired by and builds on Servedio et al.’s construction of approximations that are accurate on all Boolean inputs.
The proof of our lower bound is inspired by recent work that has established new approximate degree lower bounds via the construction of dual solutions to certain linear programs. In particular, Sherstov [She09] showed that approximate degree and PTF degree behave roughly multiplicatively under function composition, while Bun and Thaler [BT13] gave a refinement of Sherstov’s method in order to resolve the approximate degree of the twolevel ANDOR tree, and also gave an explicit dual witness for the approximate degree of any symmetric Boolean function. We extend these lower bounds along two directions: (1) we show degree lower bounds that take into account the weight of the coefficient vector of the approximating polynomial, and (2) our lower bounds hold even when we only require the approximation to be accurate on inputs of low Hamming weight, while prior work only considered approximations that are accurate on all Boolean inputs.
Some prior work has studied the degree of polynomials that pointwise approximate partial Boolean functions [She12b, She12a]. Here, a function is said to be partial if its domain is a strict subset of , and a polynomial is said to approximate if

for all , and

for all .
In contrast, our lower bounds apply even in the absence of Condition 2, i.e., when is allowed to take arbitrary values on inputs in .
Finally, while our motivation is private data release, our approximation theoretic results are similar in spirit to recent work of Long and Servedio [LS13], who are motivated by applications in computational learning theory. Long and Servedio consider halfspaces defined on inputs of small Hamming weight, and (using different techniques very different from ours)
give upper and lower bounds on the weight of these halfspaces when represented as linear threshold functions.
Organization. In Section 3, we describe our private online algorithm and show that it yields the claimed accuracy given the existence of sufficiently lowweight polynomials that approximate the variate OR function on inputs of low Hamming weight. The results of this section are a combination of known techniques in differential privacy [RR10, HR10, GRU12] and learning theory (see e.g., [KS04]). Readers familiar with these literatures may prefer to skip Section 3 on first reading. In Section 4, we give our polynomial approximations to the OR function, both on low Hamming weight Boolean inputs and on all Boolean inputs. Finally, in Section 5, we state and prove our lower bounds for polynomial approximations to the OR function on restricted inputs.
2 Preliminaries
2.1 Differentially Private Sanitizers
Let a database be a collection of rows from a data universe . We say that two databases are adjacent if they differ only on a single row, and we denote this by .
Let be an algorithm that takes a database as input and outputs some data structure in . We are interested in algorithms that satisfy differential privacy.
Definition 2.1 (Differential Privacy [Dmns06]).
An algorithm is differentially private if for every two adjacent databases and every subset ,
Since a sanitizer that always outputs satisfies Definition 2.1, we focus on sanitizers that are accurate. In particular, we are interested in sanitizers that give accurate answers to counting queries. A counting query is defined by a boolean predicate . Abusing notation, we define the evaluation of the query on a database to be Note that the value of a counting query is in . We use to denote a set of counting queries.
For the purposes of this work, we assume that the range of is simply . That is, outputs a list of real numbers representing answers to each of the specified queries.
Definition 2.2 (Accuracy).
The output of , , is accurate for the query set if
A sanitizer is accurate for the query set if for every database , outputs such that with probability at least , the output is accurate for , where the probability is taken over the coins of .
We remark that the definition of both differential privacy and accuracy extend straightforwardly to the online setting. Here the algorithm receives a sequence of (possibly adaptively chosen) queries from and must give an answer to each before seeing the rest of the sequence. Here we require that with probability at least , every answer output by the algorithm is within of the true answer on . See e.g., [HR10] for an elaborate treatment of the online setting.
2.2 Query Function Families
Given a set of queries of interest, (e.g., all marginal queries), we think of the database as specifying a function mapping queries to their answers . We now describe this transformation more formally:
Definition 2.3 (Function Family).
Let be a set of counting queries on a data universe , where each query is indexed by an bit string. We define the index set of to be the set .
We define the function family as follows: For every possible database row , the function is defined as . Given a database we define the function where . When is clear from context we will drop the subscript and simply write , , and .
When is the set of all monotone way disjunctions on a database , the queries are defined by sets , . In this case, we represent each query by the bit indicator vector of the set , where if and only if . Thus, has at most entries that are . Hence, we can take and .
2.3 LowWeight Polynomial Approximations
Given an variate real polynomial ,
we define the degree, weight and nonconstant weight of the polynomial as follows:
We use to denote and .
We will attempt to approximate the functions on all the indices in by a family of polynomials with low degree and low weight. Formally and more generally:
Definition 2.4 (Restricted Approximation by Polynomials).
Given a function , where , and a subset , we denote the restriction of to by . Given an variate real polynomial , we say that is a approximation to the restriction , if . Notice there is no restriction whatsoever placed on for .
Given a family of variate functions , where , a set we use to denote the family of restricted functions. Given a family of variate real polynomials, we say that the family is a approximation to the family if for every , there exists that is a approximation to .
Let denote the set of inputs of Hamming weight at most . We view the variate OR function, as mapping inputs from to , with the convention that is TRUE and is FALSE. Let denote the family of all variate real polynomials of degree and weight . For the upper bound, we will show that for certain small values of and , the family is a approximation to the family of all disjunctions restricted to .
Fact 2.5.
If is the set of all monotone way disjunctions on a database , is its function family, and is its index set, then is a approximation to the restriction if and only if there is a degree polynomial of weight that approximates
The fact follows easily by observing that for any , ,
For the lower bound, we will show that any collection of polynomials with small weight that is a approximation to the family of disjunctions restricted to should have large degree. We need the following definitions:
Definition 2.6 (Approximate Degree).
Given a function , where , the approximate degree of is
Analogously, the approximate degree of is
It is clear that .
We let denote the degree nonconstant margin weight of , defined to be:
The above definitions extend naturally to the restricted function .
Our definition of nonconstant margin weight is closely related to the wellstudied notion of the degree polynomial threshold function (PTF) weight of (see e.g., [She11]), which is defined as , where the minimum is taken over all degree polynomials with integer coefficients, such that for all . Often, when studying PTF weight, the requirement that have integer coefficients is used only to ensure that has nontrivial margin, i.e. that for all ; this is precisely the requirement captured in our definition of nonconstant margin weight. We choose to work with margin weight because it is a cleaner quantity to analyze using linear programming duality; PTF weight can also be studied using LP duality, but the integrality constraints on the coefficients of introduces an integrality gap that causes some loss in the analysis (see e.g., Sherstov [She11, Theorem 3.4] and Klauck [Kla11, Section 4.3]).
3 From LowWeight Approximations to Private Data Release
In this section we show that lowweight polynomial approximations imply data release algorithms that provide approximate answers even on small databases. The main goal of this section is to prove the following theorem.
Theorem 3.1.
Given , and a family of linear queries with index set . Suppose for some , , the family of polynomials approximates the function family . Then there exists an differentially private online algorithm that is accurate for any sequence of (possibly adaptively chosen) queries from on a database , provided
The private algorithm has running time .
We note that the theorem can be assembled from known techniques in the design and analysis of differentially private algorithms and online learning algorithms. We include the proof of the theorem here for the sake of completeness, as to our knowledge they do not explicitly appear in the privacy literature.
We construct and analyze the algorithm in two steps. First, we use standard arguments to show that the nonprivate multiplicative weights algorithm can be used to construct a suitable online learning algorithm for whenever can be approximated by a lowweight, lowdegree polynomial. Here, a suitable online learning algorithm is one that fits into the IDC framework of Gupta et al. [GRU12]. We then apply the generic conversion from IDCs to differentially private online algorithms [RR10, HR10, GRU12] to obtain our algorithm.
3.1 IDCs
We start by providing the relevant background on the iterative database construction framework. An IDC will maintain a sequence of functions that give increasingly good approximations to the . In our case, these functions will be lowdegree polynomials. Moreover, the mechanism produces the next approximation in the sequence by considering only one query that “distinguishes” the real database in the sense that is large.
Definition 3.2 (Idc [Rr10, Hr10, Gru12]).
Let be a family of counting queries indexed by bit strings. Let be an algorithm mapping a function , a query , and a real number to a new function . Let be a database and be a parameter. Consider the following game with an adversary. Let be some function. In each round :

The adversary chooses a query (possibly depending on ).

If , then we say that the algorithm has made a mistake.

If the algorithm made a mistake, then it receives a value such that and computes a new function . Otherwise let .
If the number of rounds in which the algorithm makes a mistake is at most for every adversary, then is a iterative database construction for with mistake bound .
Theorem 3.3 (Variant of [Gru12]).
For any , and any family of queries , if there is an iterative database construction for with mistake bound , then there is an differentially private online algorithm that is accurate for any sequence of (possibly adaptively chosen) queries from on a database , so long as
Moreover, if the iterative database construction, , runs in time , then the private algorithm has running time per query.
The IDC we will use is specified in Algorithm 1. The IDC will use approximations in the form of lowdegree polynomials of lowweight, and thus we need to specify how to represent such a function. Specifically, we will represent a polynomial as a vector of length with only nonnegative entries. For each coefficient , the vector will have two components . (Recall that .) Intuitively these two entries represent the positive part and negative part of the coefficient of . There will also be an additional entry that is used to ensure that the norm of the vector is exactly . Given a polynomial with coefficients , we can construct this vector by setting
and choosing so that . Observe that can always be set appropriately since the weight of is at most .
Similarly, we want to associate queries with vectors so that we can replace the evaluation of the polynomial on a query with the inner product . We can do so by defining the vector of length such tha , and .
Fact 3.4.
For every variate polynomial of degree at most and weight at most , and every query , .
):
We summarize the properties of the multiplicative weights algorithm in the following theorem:
Theorem 3.5.
For any , and any family of linear queries if approximates the restriction then Algorithm 1 is an iterative database construction for with mistake bound for
Moreover, runs in time .
Proof.
Let be any database. For every round in which makes a mistake, we consider the tuple representing the information used to update the approximation in round . In order to bound the number of mistakes, it will be sufficient to show that after , the vector is such that
That is, after making mistakes represents a polynomial that approximates on every query, and thus there can be no more than makes.
First, we note that there always exists a polynomial such that
(1) 
The assumption of our theorem is that for every , there exists such that
Thus, since , the polynomial will satisfy (1). Note that , thus if we represent as a vector,
Given the existence of , we will define a potential function capturing how far is from . Specifically, we define
to be the KL divergence between and the current approximation . Note that the sum iterates over all indices in . We have the following fact about KL divergence.
Fact 3.6.
For all : , and .
We will argue that after each mistake the potential drops by at least . Note that the potential only changes in rounds where a mistake was made. Because the potential begins at , and must always be nonnegative, we know that there can be at most mistakes before the algorithm outputs a (vector representation of) a polynomial that approximates on .
The following lemma is standard in the analysis of multiplicativeweights based algorithms.