Amplification and Derandomization Without Slowdown

# Amplification and Derandomization Without Slowdown

Ofer Grossman ogrossma@mit.edu. Department of Mathematics, MIT.    Dana Moshkovitz dmoshkov@csail.mit.edu. Department of Electrical Engineering and Computer Science, MIT. This material is based upon work supported by the National Science Foundation under grants number 1218547 and 1452302.
###### Abstract

We present techniques for decreasing the error probability of randomized algorithms and for converting randomized algorithms to deterministic (non-uniform) algorithms. Unlike most existing techniques that involve repetition of the randomized algorithm and hence a slowdown, our techniques produce algorithms with a similar run-time to the original randomized algorithms. The amplification technique is related to a certain stochastic multi-armed bandit problem. The derandomization technique – which is the main contribution of this work – points to an intriguing connection between derandomization and sketching/sparsification.

We demonstrate the techniques by showing the following applications:

1. Dense Max-Cut: A Las Vegas algorithm that given a -dense that has a cut containing fraction of the edges, finds a cut that contains fraction of the edges. The algorithm runs in time and has error probability exponentially small in . It also implies a deterministic non-uniform algorithm with the same run-time (note that the input size is ).

2. Approximate Clique: A Las Vegas algorithm that given a graph that contains a clique on vertices, and given , finds a set on vertices of density at least . The algorithm runs in time and has error probability exponentially small in . We also show a deterministic non-uniform algorithm with the same run-time.

3. Free Games: A Las Vegas algorithm and a non-uniform deterministic algorithm that given a free game (constraint satisfaction problem on a dense bipartite graph) with value at least and given , find a labeling of value at least . The error probability of the randomized algorithm is exponentially small in the number of vertices and labels. The run-time of the algorithms is similar to that of algorithms with constant error probability.

4. From List Decoding To Unique Decoding For Reed-Muller Code: A randomized algorithm with error probability exponentially small in the input size that given a word and finds a short list such that every low degree polynomial that is -close to is -close to one of the words in the list. The algorithm runs in nearly linear time in the input size, and implies a deterministic non-uniform algorithm with similar run-time. The run-time of our algorithms compares with that of the most efficient algebraic algorithms, but our algorithms are combinatorial and much simpler.

## 1 Introduction

### 1.1 Amplification

Given a randomized algorithm that runs in time and has error probability , can we find a randomized algorithm that runs in similar time and has a substantially smaller error probability ? One can achieve such a low error probability by repeating the algorithm times. However, the resulting algorithm is slower by a factor of than the original algorithm, which is a significant slowdown when is large (for instance, consider that equals the input size , or equals for some constant ). In this work we show that in many situations one can decrease the error probability of the algorithm to without any substantial slowdown. These situations occur when there is an additional randomized algorithm for evaluating the quality of the randomized choices of the algorithm that is more efficient than the overall algorithm.

We show how to capitalize on the existence of such a testing algorithm using an algorithm for a stochastic multi-armed bandit problem that we define. In this problem, which we call the biased coin problem, there is a large pile of coins, and fraction of the coins are biased, meaning that they fall on heads with high probability. The coins are unmarked and the only way to discover information about a coin is to toss it. The task is to find one biased coin with certainty using as few coin tosses as possible. The analogy between the biased coin problem and amplification is that the coins represent possible random choices of the algorithm, many of which are good. The task is to find one choice that is good with very high probability. Tossing a coin corresponds to testing the random choice of the algorithm.

### 1.2 Derandomization

What speed-up does randomization buy? Impagliazzo and Wigderson [22] showed that, under plausible hardness assumptions, randomness can only speed up a polynomial-time computation by a polynomial factor. Their deterministic algorithm, which invokes the randomized algorithm on randomness strings generated by enumerating over all possible seeds of a pseudorandom generator, slows down the run-time by at least a linear factor. To avoid the reliance on unproven assumptions, researchers typically use properties of the concrete randomized algorithms they wish to derandomize and design (or use off-the-shelf) pseudorandom generators for them (e.g., pairwise independent, -biased sets, -wise-independent and almost -wise independent; see, e.g., [25, 29, 6]). Here too derandomization slows down the run-time by at least a linear factor.

A different derandomization method by Adleman [2] uses amplification for derandomization and does not rely on any unproven assumptions. It generates a non-uniform deterministic algorithm by first decreasing the error probability of the randomized algorithm below where is the input size. Then, there must exist a randomness string that works for all inputs, and this randomness string can be hard-wired to a non-uniform algorithm. Due to the slowdown in amplification discussed in the previous sub-section, this technique too slows down the run-time by a linear factor in .

There is a general derandomization method that typically does not increase the run-time significantly, namely the method of conditional probabilities. It is used, for instance, for finding an assignment that satisfies fraction of the clauses in a 3Sat formula. However, this method works only in very special cases. In this work we’ll be interested in derandomizing algorithms without slowing down the run-time significantly, in cases where the method of conditional expectations does not apply.

Our derandomization method builds on Adleman’s technique but avoids its slowdown, by using a new connection to sketching and sparsification. Briefly, the connection is as follows: consider a verifier that given an input and a randomness string for the randomized algorithm tests whether the outcome of the randomized algorithm is correct. If the verifier can perform its test with only a size- sketch of the input (we call such a verifier an oblivious verifier), then Adleman’s union bound can be performed over representative inputs, rather than over inputs. This means that it suffices to amplify the error probability below . This saving, together with the amplification technique discussed in the previous sub-section, allows us to derandomize without slowdown.

### 1.3 Context

The main existing approach to derandomization – the one based on pseudorandom generators – focuses on shrinking the number of random strings. This is possible since the algorithm is limited (by its run-time or by the simple way it uses the randomness) and cannot distinguish the set of all randomness strings from a small subset of it (pseudorandom strings). In contrast, our approach focuses on shrinking the number of inputs one needs to argue about. We show that it’s enough that a randomness string leads to a correct output for all sketches.

Crucially, we do not argue that the algorithm doesn’t make use of its entire input, or that inputs with the same sketch are indistinguishable, or that inputs with the same sketch are not distinguished by the algorithm. The algorithms we consider depend on all of their input. Our argument relies on the existence of a verifier aimed at certifying that randomness is good for an input, and which doesn’t distinguish between inputs with the same sketch. Surprisingly, we are able to devise sketches and design such oblivious verifiers for many natural algorithms.

The sketch that the oblivious verifier uses can be hard to compute, and it may reveal to the verifier a correct output. The verifier need not (and generally will not be) efficient. The only requirement is that the number of bits in the sketch is small and that the verifier is deterministic (though the construction of the sketch can be probabilistic – we only need existence of a sketch). Our applications include problems on dense graphs where sketching can be done using uniform samples. We hope that the large body of work on sparsification and sketching (see, e.g.,  [23, 9, 35] and the many works that followed them) could be used for more sophisticated applications of our methods.

### 1.4 Non-Uniform Algorithms, Preprocessing and Amortization

Our derandomization produces non-uniform algorithms, i.e., algorithms that are designed with a specific input size in mind. The knowledge of the input size is manifested by an “advice” string that depends on the input size. The size of the advice counts toward the run-time of the algorithm (so, for instance, advice that consists of the output for each possible input leads to an exponential-time algorithm). Equivalently, non-uniform algorithms are described as sequences of circuits, one for each input size. Sorting networks are an instance of non-uniform algorithms.

In some cases non-uniform algorithms imply uniform algorithms with the same asymptotic run time. This is the case with Matrix-Multiplication and Minimum-Spanning-Tree [30]. More generally, whenever a problem on inputs of size can be reduced to the same problem on inputs of size each for, say, , a non-uniform algorithm for the problem implies a uniform algorithm. The uniform algorithm uses exhaustive search to find the advice for inputs of size (checking all possible advices, and for each, all possible inputs). It then uses the reduction to find the sub-problems and the non-uniform algorithm to solve the sub-problems.

Even when a reduction of this sort does not exist, one can either designate the search for a good advice as a preprocessing phase after which the algorithm is correct on all inputs, or amortize the cost of searching for a good advice across inputs. If the space of possible advice strings contains possibilities (where can be as small as if the space is the set of possible outputs of a pseudorandom generator), and one can amortize the cost over inputs, then one obtains the desired run-time uniformly, amortized.

### 1.5 Applications

We demonstrate our techniques with applications for Max-Cut on dense graphs, (approximate) Clique on graphs that contain large cliques, free games (constraint satisfaction problems on dense bipartite graphs), and reducing the Reed-Muller list decoding problem to its unique decoding problem. All our algorithms run in nearly linear time in their input size, and all of them beat the current state of the art algorithms in one aspect or another. The biggest improvement is in the algorithm for free games that is more efficient by orders of magnitude than the best deterministic algorithms. The algorithm for Max-Cut can efficiently handle sparser graphs than the best deterministic algorithm, the algorithm for (approximate) Clique can efficiently handle smaller cliques than the best deterministic algorithm; and the algorithm for the Reed-Muller code achieves similar run-time as sophisticated algebraic algorithms despite being much simpler. In general, our focus is on demonstrating the utility and versatility of the techniques and not on obtaining the most efficient algorithm for each problem. In the open problems section we point to several aspects where we leave room for improvement.

#### 1.5.1 Max Cut on Dense Graphs

Given a graph , a cut in the graph is defined by . The value of the cut is the fraction of edges such that and . We say that a graph is -dense if it contains edges. For simplicity we assume that the graph is regular, so every vertex has degree . Given a regular -dense graph that has a cut of value at least for , we’d like to find a cut of value roughly . Understanding this problem on general (non-dense) graphs is an important open problem: (a weak version of) the Unique Games Conjecture [24]. However, for dense graphs, it is possible to construct a cut of value efficiently [10, 7, 18, 28]. The best randomized algorithms are an algorithm of Mathieu and Schudy [28] that runs in time and an algorithm of Goldreich, Goldwasser and Ron [18] that runs in time (Note that the algorithm of [18] runs in sub-linear time. This is possible since it is an Atlantic City algorithm). Both algorithms have constant error probability. We obtain a Las Vegas algorithm with exponentially small error probability, and deduce a deterministic non-uniform algorithm. This is the simplest application of our techniques. It uses the biased coin algorithm, but does not require any sketches.

###### Theorem 1.1.

There is a Las Vegas algorithm that given a -dense graph that has a cut of value at least for , and given , finds a cut of value at least , except with probability exponentially small in . The algorithm runs in time . It also implies a non-uniform deterministic algorithm with the same run-time.

Note that run-time is necessary for a deterministic algorithm, since the input size is . A deterministic -time algorithm follows from a recent deterministic version of the Frieze-Kannan regularity lemma [34, 16, 12, 11, 5], however the term in the exponent hides large constant exponents. Therefore, our algorithm handles efficiently graphs that are sparser than those handled efficiently by the existing deterministic algorithm.

#### 1.5.2 Approximate Clique

The input is and an undirected graph for which there exists a set , , that spans a clique. The goal is to find a set , , whose edge density is at least , i.e., if is the set of edges whose endpoints are in , then . Goldreich, Goldwasser and Ron [18] gave a randomized time algorithm for this problem with constant error probability (Note that this is a sub-linear time algorithm. This is possible since it is an Atlantic City algorithm). A deterministic time algorithm with worse dependence on and follows from a deterministic version of the Frieze-Kannan regularity lemma [34, 16, 12, 11, 5]. We obtain a randomized algorithm with exponentially small error probability in , and use sketching to obtain a non-uniform deterministic algorithm. Our algorithms have better dependence in and than the existing deterministic algorithm, and can therefore handle efficiently graphs with smaller cliques than the existing deterministic algorithm and output denser sets.

###### Theorem 1.2.

The following hold:

1. There is a Las Vegas algorithm that given , and a graph with a clique on vertices, finds a set of vertices and density at least , except with probability exponentially small in . The algorithm runs in time .

2. There is a deterministic non-uniform algorithm that given , and a graph with a clique on vertices, finds a set of vertices and density at least . The algorithm runs in time .

The sketch for approximate clique consists of all the edges that touch a small random set of vertices. We show that such a sketch suffices to estimate the density of the sets considered by the algorithm and the quality of the random samples of the algorithm.

#### 1.5.3 Free Games

A free game is defined by a complete bipartite graph , a finite alphabet and constraints for all . For simplicity we assume . A labeling to the vertices is given by , . The value achieved by , denoted , is the fraction of edges that are satisfied by , , where an edge is satisfied by , if . The value of the instance, denoted , is the maximum over all labelings , , of . Given a game with value , the task is to find a labeling to the vertices , , that satisfies at least fraction of the edges.

Free games have been researched in the context of one round two prover games (see [14] and many subsequent works on parallel repetition of free games) and two prover AM [1]. They unify a large family of problems on dense bipartite graphs obtained by considering different constraints. For instance, for Max-2Sat we have , and contains all that satisfy where is either or and is either or . Note that on a small fraction of the edges the constraints can be “always satisfied”, so one can optimize over any dense graph, not just over the complete graph (the density of the graph is crucial: if fewer than of the edges have non-trivial constraints, then any labeling satisfies fraction of the edges).

There are randomized algorithms for free games that have constant error probability [7, 4, 8, 1], as well as a derandomization that incurs a polynomial slowdown [7]. In addition, deterministic algorithms for free games of value are known [27]. We show a randomized algorithm with exponentially small error probability in and a non-uniform deterministic algorithm whose running time is similar to that of the randomized algorithms with constant error probability.

###### Theorem 1.3.

The following hold:

1. There is a Las Vegas algorithm that given a free game with vertex sets , alphabet , and , and given , finds a labeling to the vertices that satisfies fraction of the edges, except with probability exponentially small in . The algorithm runs in time .

2. There is a deterministic non-uniform algorithm that given a free game with vertex sets , alphabet , and , and given , finds a labeling to the vertices that satisfies fraction of the edges. The algorithm runs in time .

The sketch of a free game consists of the restriction of the game to a small random subset of . We show that the sketch suffices to estimate the value of the labelings considered by the algorithm and the random samples the algorithm makes.

#### 1.5.4 From List Decoding to Unique Decoding of Reed-Muller Code

###### Definition 1.4 (Reed-Muller code).

The Reed-Muller code defined by a finite field and natural numbers and consists of all -variate polynomials of degree at most over .

Let . In the list decoding to unique decoding problem for the Reed-Muller code, the input is a function and the goal is to output a list of functions , such that for every -variate polynomial of degree at most over that agrees with on at least fraction of the points , there exists that agrees with on at least fraction of the points .

There are randomized, self-correction-based, algorithms for this problem (see [33] and the references there). There are also deterministic list decoding algorithms for the Reed-Solomon and Reed-Muller codes that can solve this problem: The algorithms of Sudan [32] and Guruswami-Sudan [20] run in large polynomial time, as they involve solving a system of linear equations and factorization of polynomials. There are efficient implementations of these algorithms that run in time (see [3] and the references there), but they involve deeper algebraic technique. Our contribution is simple, combinatorial, algorithms, randomized and deterministic, with nearly-linear run-time. This application too relies on the biased coin algorithm but does not require sketching.

###### Theorem 1.5.

Let be a finite field, let and be natural numbers and let , such that , and . There is a randomized algorithm that given , finds a list of functions , such that for every -variate polynomial of degree at most over that agrees with on at least fraction of the points , there exists that agrees with on at least fraction of the points . The algorithm has error probability exponentially small in and it runs in time . It implies a deterministic non-uniform algorithm with the same run-time.

Note that the standard choice of parameters for the Reed-Muller code has , and in this case our algorithms run in nearly linear time .

### 1.6 Previous Work

The biased coin problem introduced in Sub-section 1.1 is related to the stochastic multi-armed bandit problem studied in [13, 26], however, in the latter there might be only one biased coin, whereas in our problem we are guaranteed that a constant fraction of the coins are biased. This makes a big difference in the algorithms one would consider for each problem and in their performance. In the setup considered by [13, 26] one has to toss all coins, and the algorithms focus on which coins to eliminate. In our setup it is likely that we find a biased coin quickly, and the focus is on certifying bias. In [13, 26] an lower bound is proved for the number of coin tosses needed to find a biased coin with probability , whereas we present an upper bound for the case of a constant fraction of biased coins.

The connection that we make between derandomization and sketching adds to a long list of connections that have been identified over the years between derandomization, compression, learning and circuit lower bounds, e.g., circuit lower bounds can be used for pseudorandom generators and derandomization [22]; learning goes hand in hand with compression, and can be used to prove circuit lower bounds [15]; simplification under random restrictions can be used to prove circuit lower bounds [31] and construct pseudorandom generators [21]. Sparsification of the distinguisher of a pseudorandom generator (e.g., for simple distinguishers like DNFs) can lead to more efficient pseudorandom generators and derandomizations [19]. Our connection differs from all those connections. In particular, previous connections are based on pseudorandom generators, while our approach is dual and focuses on shrinking the number of inputs.

The idea of saving in a union bound by only considering representatives is an old idea with countless appearances in math and theoretical computer science, including derandomization (one example comes from the notion of an -net and its many uses; another example is [19] we mentioned above). Our contribution is in the formulation of an oblivious verifier and in designing sketches and oblivious verifiers.

Our applications have Atlantic City algorithms that run in sub-linear time and have a constant error probability. There are works that aim to derandomize sub-linear time algorithms. Most notably, as mentioned before, there is a deterministic version of the Frieze-Kannan regularity lemma [34, 16, 12, 11, 5], which is relevant to some of our applications but not to others. Another work is [36] that generates deterministic average case algorithms for decision problems with certain sub-linear run time while incurring a slowdown.

## 2 Preliminaries

### 2.1 Conventions and Inequalities

###### Lemma 2.1 (Chernoff bounds).

Let be i.i.d random variables taking values in . Let . Then,

 Pr[1n∑Xi≥1n∑E[Xi]+ε]≤e−2ε2n,Pr[1n∑Xi≤1n∑E[Xi]−ε]≤e−2ε2n.

The same inequalities hold for random variables taking values in (Hoeffding bound). The multplicative version of the Chernoff bound states:

 Pr[∑Xi≥(1+ε)⋅∑E[Xi]]≤e−ε2∑E[Xi]/3,Pr[∑Xi≤(1−ε)⋅∑E[Xi]]≤e−ε2∑E[Xi]/2.

When we say that a quantity is exponentially small in we mean that it is of the form . We use to mean .

### 2.2 Non-Uniform and Randomized Algorithms

###### Definition 2.2 (Non-uniform algorithm).

A non-uniform algorithm that runs in time is given by a sequence of Boolean circuits, where for every , the circuit gets an input of size and satisfies .

Alternatively, a non-uniform algorithm that runs in time on input of size is given an advice string of size at most (note that depends on but not on the input!). The algorithm runs in time given the input and the advice.

The class of all languages that have non-uniform polynomial time algorithms is called P.

There are two main types of randomized algorithms: the strongest are Las Vegas algorithms that may not return a correct output with small probability, but would report their failure. Atlantic City algorithms simply return an incorrect output a small fraction of the time.

###### Definition 2.3 (Las Vegas algorithm).

A Las Vegas algorithm that runs in time on input of size is a randomized algorithm that always runs in time at most , but may, with a small probability return . In any other case, the algorithm returns a correct output.

The probability that a Las Vegas algorithm returns is called its error probability. In any other case we say that the algorithm succeeds.

###### Definition 2.4 (Atlantic City algorithm).

An Atlantic City algorithm that runs in time on input of size is a randomized algorithm that always runs in time at most , but may, with a small probability, return an incorrect output.

The probability that an Atlantic City algorithm returns an incorrect output is called its error probability. In any other case we say that the algorithm succeeds.

Note that a Las Vegas algorithm is a special case of Atlantic City algorithms. Atlantic City algorithms that solve decision problems return the same output the majority of the time. For search problems we have the following notion:

###### Definition 2.5 (Pseudo-deterministic algorithm, [17]).

A Pseudo-deterministic algorithm is an Atlantic City algorithm that returns the same output except with a small probability, called its error probability.

## 3 Derandomization by Oblivious Verification

In this section we develop a technique for converting randomized algorithms to deterministic non-uniform algorithms. The derandomization technique is based on the notion of “oblivious verifiers”, which are verifiers that deterministically test the randomness of an algorithm while accessing only a sketch (compressed version) of the input to the algorithm. If the verifier accepts, the algorithm necessarily succeeds on the input and the randomness. In contrast, the verifier is allowed to reject randomness strings on which the randomized algorithm works correctly, as long as it does not do so for too many randomness strings.

###### Definition 3.1 (Oblivious verifier).

Suppose that is a randomized algorithm that on input uses random bits. Let and . An -oblivious verifier for is a deterministic procedure that gets as input , a sketch and , either accepts or rejects, and satisfies the following:

• Every has a sketch .

• For every and its sketch , for every , if the verifier accepts on input and , then succeeds on and .

• For every and its sketch , the probability over that the verifier rejects is at most .

Note that of the oblivious verifier may be somewhat larger than the error probability of the algorithm , but hopefully not much larger. We do not limit the run-time of the verifier, but the verifier has to be deterministic. Indeed, the oblivious verifiers we design run in deterministic exponential time. We do not limit the time for computing the sketch from the input either. Indeed, we use the probabilistic method in the design of our sketches. Crucially, the sketch depends on the input , but is independent of .

Our derandomization theorem shows how to transform a randomized algorithm with an oblivious verifier into a deterministic (non-uniform) algorithm whose run-time is not much larger than the run-time of the randomized algorithm. Its idea is as follows. An oblivious verifier allows us to partition the inputs so inputs with the same sketch are bundled together, and the number of inputs effectively shrinks. This allows us to apply a union bound, just like in Adleman’s proof [2], but over many fewer inputs, to argue that there must exist a randomness string for (a suitable repetition of) the randomized algorithm that works for all inputs.

###### Theorem 3.2 (Derandomizing by verifying from a sketch).

For every , if a problem has a Las Vegas algorithm that runs in time and a corresponding -oblivious verifier for , then the problem has a non-uniform deterministic algorithm that runs in time and always outputs the correct answer.

###### Proof.

Consider the randomized algorithm that runs the given randomized algorithm on its input for times independently, and succeeds if any of the runs succeeds. Its run-time is . For any input, the probability that the oblivious verifier rejects all of the runs is less than . By a union bound over the possible sketches, the probability that the oblivious verifier rejects for any of the sketches is less than . Hence, there exists a randomness string that the oblivious verifier accepts no matter what the sketch is. On this randomness string the algorithm has to be correct no matter what the input is. The deterministic non-uniform algorithm invokes the repeated randomized algorithm on this randomness string. ∎

Adleman’s theorem can be seen as a special case of Theorem 3.2, in which the sketch size is the trivial , the oblivious verifier runs the algorithm on the input and randomness and accepts if the algorithm succeeds, and the randomized algorithm has error probability .

The reason that we require that the algorithm is a Las Vegas algorithm in Theorem 3.2 is that it allows us to repeat the algorithm and combine the answers from all invocations. Combining is possible by other means as well. E.g., for randomized algorithms that solve decision problems or for pseudo-deterministic algorithms (algorithms that typically return the same answer) one can combine by taking majority. For algorithms that return a list, one can combine the lists.

The derandomization technique assumes that the error probability of the algorithm is sufficiently low. To complement it, in Section 4 we develop an amplification technique to decrease the error probability. Interestingly, our applications are such that the error probability can be decreased without a substantial slowdown to a point at which our derandomization technique kicks in, but we do not know how to decrease the error probability sufficiently for Adleman’s original proof to work without slowing down the algorithm significantly.

## 4 Amplification by Finding a Biased Coin

In this section we develop a technique that will allow us to significantly decrease the error probability of randomized algorithms without substantially slowing down the algorithms. The technique works by testing the random choices made by the algorithm and quickly discarding undesirable choices. It requires the ability to quickly estimate the desirability of random choices. The technique is based on a solution to the following problem.

###### Definition 4.1 (Biased coin problem).

Let . In the biased coin problem one has a source of coins. Each coin has a bias, which is the probability that the coin falls on “heads”. The bias of a coin is unknown, and one can only toss coins and observe the outcome. It is known that at least fraction111 can be replaced with any constant larger than . of the coins have bias at least . Given , the task is to find a coin of bias at least with probability at least using as few coin tosses as possible.

A similar problem was studied in the setup of multi-armed bandit problems [13, 26], however in that setup there might be only one coin with large bias, as opposed to a constant fraction of coins as in our setup. In the former setup, many more coin tosses might be needed (an lower bound is proved in [26]).

The analogy between the biased coin problem and amplification is as follows: a coin corresponds to a random choice of the algorithm. Its bias corresponds to how desirable the random choice is. The assumption is that a constant fraction of the random choices are very desirable. The task is to find one desirable random choice with a very high probability. Tossing a coin corresponds to testing the random choice. The coin falls on heads in proportion to the quality of the random choice.

Interestingly, if we knew that all coins have bias either at least or at most , it would have been possible to solve the biased coin problem using only coin tosses. The algorithm is described in Figure 1. It tosses a random coin a small number of times and expects to witness about fraction heads. If so, it doubles the number of tosses, and tries again, until its confidence in the bias is sufficiently large. If the fraction of heads is too small, it restarts with a new coin. The algorithm has two parameters that determines the initial number of tosses and that determines the final number of tosses.

The probability that the algorithm restarts at the ’th phase is exponentially small in for : either the coin had bias at least , and then there’s an exponentially small probability in that there were less than heads, or the coin had bias at most , and then there is probability exponentially small in that the coin had at least fraction heads in all the previous phases (whereas if this is phase , then the probability that a coin with bias less than was picked in this case is constant, i.e., exponentially small in ). Moreover, the number of coin tosses up to this step is at most . Hence, we maintain a linear relation (up to factor) between the number of coin tosses and the exponent of the probability. To get the error probability down to we only need coin tosses.

\@checkend

algbox

Counter-intuitively, adding coins of bias between and – all acceptable outcomes of the algorithm – derails the algorithm we outlined above, as well as other algorithms. If one fixes a threshold like for the fraction of heads one expects to witness, there might be a coin whose bias is around the threshold. We might toss this coin a lot and then decide to restart with a new coin. One can also consider a competition-style algorithm like the ones studied in [13, 26] when one tries several coins each time, keeping the ones that fall on heads most often. Such a algorithm may require coin tosses, since coins can lose any short competition to coins with slightly smaller bias; then, such coins can lose to coins with slightly smaller bias, and so on, until we may end up with a coin of bias smaller than .

There is, however, a algorithm that uses only coin tosses. This algorithm decreases the threshold for the fraction of heads one expects to witness with respect to the number of coin tosses one already made for this coin. If the coin was already tossed a lot, the deviation of the number of heads from would have to be large for us to decide to restart with a new coin. The algorithm is described in Figure 2.

\@checkend

algbox

Note that the deviation parameter is picked so for all .

###### Lemma 4.2.

Within coin tosses, Find-Biased-Coin outputs a coin of bias at least except with probability .

###### Proof.

Suppose that the algorithm restarts at phase . The number of coin tosses made by this point since the previous restart (if any) is . Moreover, if the coin had bias smaller than , then, if , by a Chernoff bound, the probability the coin passed the previous test, where it was supposed to have at least fraction of heads, is at most . If , there is probability less than that the coin was picked. If the coin had bias at least , then by the Chernoff bound, the probability it failed the current test, where it is supposed to have at least fraction of heads, is at most . In any case, the ratio between the number of coin tosses and the exponent of the probability is . The value of is chosen so that the error probability in the last iteration is . By the choice of , the coin tosses to exponent ratio is . Therefore, the number of coin tosses one needs in order to reach error probability is . ∎

### 4.1 Extensions

In the sequel, we’ll use the biased coin algorithm in a more general setting, and in this section we develop the appropriate machinery. In the general setting coins are divided into groups, and rather than directly tossing coins we simulate tossing. The simulation may fail or may produce results that are inconsistent with the true bias of the coin. Some of the coins may be faulty, and their tossing may fail arbitrarily. For other coins, the probability that tossing fails is small. For any coin, the probability that the coin toss does not fail and is inconsistent with the true bias is small. The coins are partitioned into groups of size each. The bias of a group is the maximum bias among non-faulty coins in the group, and is if all the coins in the group are faulty. At least fraction of the groups have bias at least . The task is to find a group of coins of bias at least .

The formal requirements from a simulated coin toss are as follows:

###### Definition 4.3.

Simulated coin tosses satisfy the following:

• For any coin, when tossing the coin times, there is exponentially small probability in for the following event: the tosses do not fail, yet the fraction of tosses that fall on heads deviates from the true bias by more than an additive for as in Figure 3.

• For non-faulty coins, the probability that tossing the coin fails is exponentially small in .

Note that the simulation has to be very accurate, since the deviation is sub-constant. We describe a modified biased coin algorithm in Figure 3.

\@checkend

algbox

###### Lemma 4.4 (Generalized biased coin).

If Find-Biased-Coin-in-Group restarts at a certain phase, then either in this phase or in the previous, the reported fraction of heads deviates by more than from the true bias for one of the non-faulty coins in the group, or it is the first phase and a group of bias at most was picked.

As a result, within coin tosses Find-Biased-Coin-in-Group outputs a coin of bias at least except with probability .

###### Proof.

Suppose that the algorithm restarts at phase . The number of coin tosses made by this point is .

Suppose that the group had bias smaller than . If , the probability that the coins passed the previous test, where at least one of them was supposed to have at least fraction of heads, is (where of the deviation and of the error probability can be attributed to the simulation). Note that this probability is exponentially small in when is sufficiently smaller than (here we use the choice of ). If , the probability that a group of bias smaller than was picked is less than .

On the other hand, if the group has bias at least , then the probability it failed the current test, where one of the coins is supposed to have at least fraction of heads, is at most (again, of the deviation and of the error probability can be attributed to the simulation).

In any case, the ratio between the number of coin tosses and the exponent of the probability is . The value of is set so the error probability in the last iteration is . By the choice of , the coin tosses to exponent ratio is . Therefore, the number of coin tosses one needs in order to reach error probability is . ∎

## 5 Max Cut on Dense Graphs

In this section we show the application to Max-Cut on dense graphs. This is our simplest example. It relies on the biased coin algorithm, and does not require any sketches.

### 5.1 A Simple Randomized Algorithm

First we describe a simple randomized algorithm for dense Max-Cut based on the sampling idea of Fernandez de la Vega [10] and Arora, Karger and Karpinski [7]. We remark that Mathieu and Schudy [28] have similar, but more efficient, randomized algorithms, however, for the sake of simplicity, we stick to the simplest algorithm with the easiest analysis.

The main idea of the algorithm is as follows. We sample a small and enumerate over all possible -cuts . Each -cut induces a cut as follows.

###### Definition 5.1 (Induced cut).

Let . Let and . We define as follows: for every let if the fraction of edges with is larger than the fraction of edges with .

We will argue below that if there is a cut in with value at least and is the restriction of that cut to , then the induced cut is likely to approximately achieve the optimal value. Note that we rely on density when we hope that the edges that touch the small set span most of the vertices in the graph.

###### Lemma 5.2 (Sampling).

Let be a regular -dense graph that has a cut of value at least for . Then for and for a uniform ,

 |S|=max{⌈log(2/ζ2)/ζ2⌉,⌈2log(2/ζ2)/γ2⌉},

with probability at least , there exists such that the value of the cut is at least .

###### Proof.

Let be the optimal cut in . Let be the restriction of to . Denote by the set of all such that at least fraction of the edges that touch contribute to the value of . Note that . By -density and regularity, the degree of all vertices is . By a Chernoff bound, except with probability over , at least of the vertices in are neighbors of . The sample of ’s neighbors is uniform and hence by another Chernoff bound, except with probability over , the vertex is assigned by to the same side as assigns it. Therefore, except with probability over the random choice of , at least fraction of the vertices are assigned by the same as . This means that at least fraction of the vertices are assigned by the same as . Therefore, the fraction of edges that: (i) contribute to the value of , and (ii) have both their endpoints assigned by the same as , is at least . ∎

### 5.2 A Randomized Algorithm With Exponentially Small Error Probability

We describe an analogy between finding a cut of high value and finding a biased coin. We think of sampling as picking a group of coins, and picking as picking a coin in the group. The bias of the coin is the value of the cut . Therefore a biased coin directly corresponds to a desirable cut. One tosses a coin by picking an edge uniformly at random, computing whether and whether , and checking whether the edge contributes to the value of the cut. Note that checking whether a vertex belongs to is computed in time . The coin toss algorithm is described in Figure 4. The algorithm based on finding a biased coin is described in Figure 5.

\@checkend

algbox

\@checkend

algbox

This proves Theorem 1.1, which is repeated below for convenience. Note that for a sufficiently small error probability exponentially small in it follows that there exists a randomness string on which the algorithm succeeds, no matter what the input is.

###### Theorem 5.3.

There is a Las Vegas algorithm that given a -dense graph that has a cut of value at least for , and given , finds a cut of value at least , except with probability exponentially small in . The algorithm runs in time . It also implies a non-uniform deterministic algorithm with the same run-time.

## 6 Approximate Clique

### 6.1 An Algorithm With Constant Error Probability

In this section we describe a randomized algorithm with constant error probability for finding an approximate clique in a graph that has a large clique. The algorithm is a simplified version of an algorithm and analysis by Goldreich, Goldwasser and Ron [18]. We rely on the algorithm and the analysis when we design a randomized algorithm with error probability and again when we design a deterministic algorithm.

The main idea of the algorithm is as follows. We first find a small random subset of the large clique by sampling vertices from and enumerating over all possibilities for . The intuition is that now we would like to find other vertices that are part of the large clique . A natural test for whether a vertex is in the clique is whether is connected to all the vertices in . This, however, is not a sound test, since the clique might have many vertices that neighbor it but do not neighbor one another. A better test checks whether neighbors all of , as well as many of the vertices that neighbor all of . Vertices that neighbor all of are likely to neighbor almost all of the clique.

The algorithm is described in Figure 6. It picks at random, considers all possible sub-cliques , , computes the set of vertices that neighbor all of , computes for every vertex in the fraction of vertices in that neighbor it, and considers the set of vertices in with largest fractions of neighbors. The algorithm outputs a sufficiently dense set among all sets , if such exists.

\@checkend

algbox

The algorithm runs in time . Next we analyze the probability it is correct. By a Chernoff bound, we have , except with probability . Pick a uniformly random order on the vertices. Let us focus on the event and that is the first elements in according to the random order. Note that the elements of are uniformly and independently distributed in . Let contain all the vertices that neighbor all of .

###### Lemma 6.1.

With probability over the choice of , the fraction of that neighbor less than fraction of is at most .

###### Proof.

Consider that has less than neighbors in . For to be in the set must miss all of the non-neighbors of . Since is a uniform sample of , this happens with probability . The lemma follows. ∎

Let us focus on for which the fraction of that neighbor less than fraction of is at most . Lemma 6.1 guarantess that such a , which we call good, is picked with constant probability. Next we show that an average vertex in neighbors most of .

###### Lemma 6.2 (Density for C).

For good , the average number of neighbors a vertex has in is at least .

###### Proof.

Since is good, more than fraction of neighbor at least fraction of . Hence, the average fraction of neighbors a uniform vertex in has is at least (using ). ∎

We can now prove the correctness of Find-Approximate-Clique-Constant-Error.

###### Lemma 6.3.

With probability at least , Find-Approximate-Clique-Constant-Error, when invoked on , and a graph with a clique on vertices, picks such that , and returns a set of density at least .

###### Proof.

For good , by Lemma 6.2, . Since takes the vertices with largest and , we have . Therefore, the density within is at least , and so is the density of the set returned by the algorithm. ∎

Next we show how to transform the randomized algorithm with constant error probability from Section 6.1 into an algorithm with error probability that is exponentially small in without increasing the run-time by more than poly-logarithmic factors. The algorithm applies the biased coin algorithm from Section 4.

### 6.2 Finding an Approximate Clique as Finding a Biased Coin

The analogy between finding a biased coin and finding an approximate clique is as follows: Picking picks a group of coins. There is a coin for every , . The coin is faulty if . A coin corresponds to the set of the vertices in with largest number of neighbors in (when the coin is faulty, pad the set with dummy vertices with neighbors). The bias of the coin is the expectation, over the choice of a random vertex , of the fraction of vertices in that neighbor . With at least probability, one of the coins in the group – the one associated with a good in the sense of Section 6.1 – has . Moreover, any with corresponds to a set of density at least .

Had we found the vertices in each , we could have tossed a coin by picking a vertex at random from and a vertex at random from and letting the coin fall on heads if there is an edge between the two vertices. Unfortunately, finding the vertices in may take time, so we refrain from doing that. We settle for a simulated toss – where with high probability the coin falls on heads with probability close to its bias. In Section 4.1 we extended the biased coin algorithm to simulated tosses. In Figure 7 we describe the algorithm for tossing a coin enough times so the probability of -deviation from the true bias is exponentially small in (the number of coin tosses is implicit). The algorithm runs in time .

\@checkend

algbox

In the next lemma we prove that is likely to approximate well. For future use we phrase a more general statement than we need here, addressing that defines a slightly faulty coin as well.

###### Lemma 6.4.

Assume that , where and . For a uniform , except with probability exponentially small in ,

 ∣∣biasV′U′−biasU′∣∣≤3γ+2γ′.
###### Proof.

By a multiplicative Chernoff bound, except with probability exponentially small in , there are vertices in . Let us focus on this event.

By a Hoeffding bound, except with probability exponentially small in , we have

 ∣∣ ∣∣1|V′∩SU′|∑v∈V′∩SU′fv−1|SU′|∑v∈SU′fv∣∣ ∣∣≤γ.

Hence,

 ∣∣ ∣∣1ρ|V′|∑v∈V′∩SU′fv−1|SU′|∑v∈SU′fv∣∣ ∣∣≤3γ+2γ′.

The lemma follows. ∎

The algorithm for finding an approximate clique using Clique-Coin-Toss is described in Figure 8. Note that the coin tossing algorithm satisfies the conditions of simulated tossing (Definition 4.3). Lemma 6.1 and Lemma 6.2 ensure that with constant probability over the choice of , for as specified in Section 6.1, we have for one of the . Moreover, a coin with bias at least yields a set which is at least -dense, and this set can be computed in time. Therefore, the algorithm in Figure 8 gives an algorithm for finding an approximate clique that errs with probability exponentially small in and runs in time . This proves part of Theorem 1.2 repeated below for convenience (note that in Theorem 1.2 is replaced with here).

###### Theorem 6.5.

There is a Las Vegas algorithm that given a graph with a clique on vertices and given , finds a set of vertices and density , except with probability exponentially small in . The algorithm runs in time .

The remainder of the section constructs an oblivious verifier for Find-Approximate-Clique and uses it to prove the second part of Theorem 1.2 (a deterministic algorithm). First we describe the sketch and its properties, then we devise an oblivious verifier for Clique-Coin-Toss, and finally we describe the verifier for Find-Approximate-Clique.

\@checkend

algbox

### 6.3 A Sketch for Approximate Clique

The sketch for a given contains, for some carefully chosen set of vertices, the bipartite graph that contains all the edges of that at least one of their endpoints falls in . The set is chosen so it allows the verifier to estimate the ’s corresponding to different sets . Note that the size of the sketch is .

Let . For every we denote by the fraction of vertices in that neighbor . For , let denote the elements with largest (pad with dummy vertices with neighbors if needed). Let . For let denote the fraction of vertices that neighbor among all vertices in . For , let be the vertices with largest (pad with dummy vertices with neighbors if needed). Let .

In the lemma we use from Find-Approximate-Clique in Figure 8.

###### Lemma 6.6 (Sketch).

There exists , , such that for every , ,

1. If , then , whereas if , then .

2. Suppose that . Then, for every , we have .

3. Suppose that . Then, .

###### Proof.

Pick uniformly at random. Let , . By a multiplicative Chernoff bound, if , then , except with probability exponentially small in . If , then except with probability exponentially small in