A \mathsf{LWE} in Small Dimension

# An Improved BKW Algorithm for LWE with Applications to Cryptography and Lattices

## Abstract

In this paper, we study the Learning With Errors problem and its binary variant, where secrets and errors are binary or taken in a small interval. We introduce a new variant of the Blum, Kalai and Wasserman algorithm, relying on a quantization step that generalizes and fine-tunes modulus switching. In general this new technique yields a significant gain in the constant in front of the exponent in the overall complexity. We illustrate this by solving within half a day a instance with dimension , modulus , Gaussian noise and binary secret, using samples, while the previous best result based on BKW claims a time complexity of with samples for the same parameters.
We then introduce variants of , and , where the target point is required to lie in the fundamental parallelepiped, and show how the previous algorithm is able to solve these variants in subexponential time. Moreover, we also show how the previous algorithm can be used to solve the problem with samples in subexponential time . This analysis does not require any heuristic assumption, contrary to other algebraic approaches; instead, it uses a variant of an idea by Lyubashevsky to generate many samples from a small number of samples. This makes it possible to asymptotically and heuristically break the cryptosystem in subexponential time (without contradicting its security assumption). We are also able to solve subset sum problems in subexponential time for density , which is of independent interest: for such density, the previous best algorithm requires exponential time. As a direct application, we can solve in subexponential time the parameters of a cryptosystem based on this problem proposed at TCC 2010.

\pdfgentounicode

## 1 Introduction

The Learning With Errors () problem has been an important problem in cryptography since its introduction by Regev in [46]. Many cryptosystems have been proven secure assuming the hardness of this problem, including Fully Homomorphic Encryption schemes [21, 14]. The decision version of the problem can be described as follows: given samples of the form , where are uniformy distributed in , distinguish whether is uniformly chosen in or is equal to for a fixed secret and a noise value in chosen according to some probability distribution. Typically, the noise is sampled from some distribution concentrated on small numbers, such as a discrete Gaussian distribution with standard deviation for . In the search version of the problem, the goal is to recover given the promise that the sample instances come from the latter distribution. Initially, Regev showed that if , solving on average is at least as hard as approximating lattice problems in the worst case to within factors with a quantum algorithm. Peikert shows a classical reduction when the modulus is large in [44]. Finally, in [13], Brakerski et al. prove that solving instances with polynomial-size modulus in polynomial time implies an efficient solution to .

There are basically three approaches to solving : the first relies on lattice reduction techniques such as the  [32] algorithm and further improvements [15] as exposed in [34, 35]; the second uses combinatorial techniques [12, 47]; and the third uses algebraic techniques [9]. According to Regev in [1], the best known algorithm to solve is the algorithm by Blum, Kalai and Wasserman in [12], originally proposed to solve the Learning Parities with Noise () problem, which can be viewed as a special case of where . The time and memory requirements of this algorithm are both exponential for and subexponential for in . During the first stage of the algorithm, the dimension of is reduced, at the cost of a (controlled) decrease of the bias of . During the second stage, the algorithm distinguishes between and uniform by evaluating the bias.

Since the introduction of , some variants of the problem have been proposed in order to build more efficient cryptosystems. Some of the most interesting variants are by Lyubashevsky, Peikert and Regev in [38], which aims to reduce the space of the public key using cyclic samples; and the cryptosystem by Döttling and Müller-Quade [18], which uses short secret and error. In 2013, Micciancio and Peikert [40] as well as Brakerski et al. [13] proposed a binary version of the problem and obtained a hardness result.

Related Work. Albrecht et al. have presented an analysis of the BKW algorithm as applied to in [4, 5]. It has been recently revisited by Duc et al., who use a multi-dimensional FFT in the second stage of the algorithm [19]. However, the main bottleneck is the first BKW step and since the proposed algorithms do not improve this stage, the overall asymptotic complexity is unchanged.

In the case of the variant, where the error and secret are binary (or sufficiently small), Micciancio and Peikert show that solving this problem using samples is at least as hard as approximating lattice problems in the worst case in dimension with approximation factor . We show in Appendix B that existing lattice reduction techniques require exponential time. Arora and Ge describe a -time algorithm when to solve the problem [9]. This leads to a subexponential time algorithm when the error magnitude is less than . The idea is to transform this system into a noise-free polynomial system and then use root finding algorithms for multivariate polynomials to solve it, using either relinearization in [9] or Gröbner basis in [3]. In this last work, Albrecht et al. present an algorithm whose time complexity is when the number of samples is super-linear, where is the linear algebra constant, under some assumption on the regularity of the polynomial system of equations; and when , the complexity becomes exponential.

Contribution. Our first contribution is to present in a unified framework the BKW algorithm and all its previous improvements in the binary case [33, 28, 11, 25] and in the general case [5]. We introduce a new quantization step, which generalizes modulus switching [5]. This yields a significant decrease in the constant of the exponential of the complexity for . Moreover our proof does not require Gaussian noise, and does not rely on unproven independence assumptions. Our algorithm is also able to tackle problems with larger noise.

We then introduce generalizations of the , and problems, and prove a reduction from these variants to . When particular parameters are set, these variants impose that the lattice point of interest (the point of the lattice that the problem essentially asks to locate: for instance, in the case of , the point of the lattice closest to the target point) lie in the fundamental parallelepiped; or more generally, we ask that the coordinates of this point relative to the basis defined by the input matrix has small infinity norm, bounded by some value . For small , our main algorithm yields a subexponential-time algorithm for these variants of , and .

Through a reduction to our variant of , we are then able to solve the subset-sum problem in subexponential time when the density is , and in time if the density is . This is of independent interest, as existing techniques for density , based on lattice reduction, require exponential time. As a consequence, the cryptosystems of Lyubashevsky, Palacio and Segev at TCC 2010 [37] can be solved in subexponential time.

As another application of our main algorithm, we show that with reasonable noise can be solved in time instead of ; and the same complexity holds for secret of size up to . As a consequence, we can heuristically recover the secret polynomials of the problem in subexponential time (without contradicting its security assumption). The heuristic assumption comes from the fact that samples are not random, since they are rotations of each other: the heuristic assumption is that this does not significantly hinder -type algorithms. Note that there is a large value hidden in the term, so that our algorithm does not yield practical attacks for recommended parameters.

## 2 Preliminaries

We identify any element of to the smallest of its equivalence class, the  positive one in case of tie. Any vector has an Euclidean norm and . A matrix can be Gram-Schmidt orthogonalized in , and its norm is the maximum of the norm of its columns. We denote by the vector obtained as the concatenation of vectors . Let be the identity matrix and we denote by the neperian logarithm and the binary logarithm. A lattice is the set of all integer linear combinations (where ) of a set of linearly independent vectors called the basis of the lattice. If is the matrix basis, lattice vectors can be written as for . Its dual is the set of such that . We have . We borrow Bleichenbacher’s definition of bias [42].

###### Definition 1.

The bias of a probability distribution over is

 Ex∼ϕ[exp(2iπx/q)].

This definition extends the usual definition of the bias of a coin in : it preserves the fact that any distribution with bias can be distinguished from uniform with constant probability using samples, as a consequence of Hoeffding’s inequality; moreover the bias of the sum of two independent variable is still the product of their biases. We also have the following simple lemma:

###### Lemma 1.

The bias of the Gaussian distribution of mean and standard deviation is .

###### Proof.

The bias is the value of the Fourier transform at . ∎

We introduce a non standard definition for the problem. However as a consequence of 1, this new definition naturally extends the usual Gaussian case (as well as its standard extensions such as the bounded noise variant [13, Definition 2.14]), and it will prove easier to work with. The reader can consider the distorsion parameter as it is the case in other papers and a gaussian of standard deviation .

###### Definition 2.

Let and be integers. Given parameters and , the distribution is, for , a distribution on pairs such that is sampled uniformly, and for all ,

 |E[exp(2iπ(⟨→a,→s⟩−b)/q)|→a]exp(α′2)−1|≤ϵ

for some universal .

For convenience, we define . In the remainder, is called the noise parameter1, and the distortion parameter. Also, we say that a distribution has a noise distribution if is distributed as .

###### Definition 3.

The problem is to distinguish a distribution from the uniform distribution over . The problem is, given samples from a distribution, to find .

###### Definition 4.

The real is the radius of the smallest ball, centered in , such that it contains vectors of the lattice which are linearly independent.

We define and (and similarly for other functions). The discrete Gaussian distribution over a set and of parameter is such that the probability of of drawing is equal to . To simplify notation, we will denote by the distribution .

###### Definition 5.

The smoothing parameter of the lattice is the smallest such that .

Now, we will generalize the BDD, UniqueSVP and GapSVP problems by using another parameter that bounds the target lattice vector. For , we recover the usual definitions if the input matrix is reduced.

###### Definition 6.

The resp. problem is, given a basis of the lattice , and a point such that and resp. , to find .

###### Definition 7.

The resp. problem is, given a basis of the lattice , such that and there exists such that with resp. , to find .

###### Definition 8.

The resp. problem is, given a basis of the lattice to distinguish between and if there exists such that resp. and .

###### Definition 9.

Given two probability distributions and on a finite set , the Kullback-Leibler (or ) divergence between and is

 DKL(P||Q)=∑x∈Sln(P(x)Q(x))P(x)  with  ln(x/0)=+∞  if  x>0.

The following two lemmata are proven in [45] :

###### Lemma 2.

Let and be two distributions over , such that for all , with . Then :

 DKL(P||Q)≤2∑x∈Sδ(x)2P(x).
###### Lemma 3.

Let be an algorithm which takes as input samples of and outputs a bit. Let (resp. ) be the probability that it returns when the input is sampled from (resp. ). Then :

 |x−y|≤√mDKL(P||Q)/2.

Finally, we say that an algorithm has a negligible probability of failure if its probability of failure is . 2

### 2.1 Secret-Error Switching

At a small cost in samples, it is possible to reduce any distribution with noise distribution to an instance where the secret follows the rounded distribution, defined as  [7, 13].

###### Theorem 2.1.

Given an oracle that solves with samples in time with the secret coming from the rounded error distribution, it is possible to solve with samples with the same error distribution (and any distribution on the secret) in time , with negligible probability of failure.

Furthermore, if is prime, we lose samples with probability of failure bounded by .

###### Proof.

First, select an invertible matrix from the vectorial part of samples in time [13, Claim 2.13].

Let be the corresponding rounded noisy dot products. Let be the secret and such that . Then the subsequent samples are transformed in the following way. For each new sample with , we give the sample to our oracle.

Clearly, the vectorial part of the new samples remains uniform and since

 b′−⟨t→A−1→a′,→b⟩=⟨−t→A−1→a′,→b−→As⟩+b′−⟨→a′,→s⟩=⟨−t→A−1→a′,→e⟩+e′

the new errors follow the same distribution as the original, and the new secret is . Hence the oracle outputs in time , and we can recover as .

If is prime, the probability that the first samples are in some hyperplane is bounded by . ∎

### 2.2 Low dimension algorithms

Our main algorithm will return samples from a LWE distribution, while the bias decreases. We describe two fast algorithms when the dimension is small enough.

###### Theorem 2.2.

If and , with smaller than the real part of the bias, the problem can be solved with advantage in time .

###### Proof.

The algorithm Distinguish computes and returns the boolean . If we have a uniform distribution then the average of is , else it is larger than . The Hoeffding inequality shows that the probability of is , which gives the result. ∎

###### Lemma 4.

For all , if is sampled uniformly, .

###### Proof.

Multiplication by in is -to-one because it is a group morphism, therefore is uniform over . Thus, by using , is distributed uniformly over so

 E[exp(2iπ⟨→a,→s⟩/q)]=qkq/k−1∑j=0exp(2iπjk/q)=0.\qed
###### Theorem 2.3.

The algorithm FindSecret, when given samples from a problem with bias whose real part is superior to returns the correct secret in time except with probability .

###### Proof.

The fast Fourier transform needs operations on numbers of bit size . The Hoeffding inequality shows that the difference between and is at most except with probability at most . It holds for all except with probability at most using the union bound. Then and for all , so the algorithm returns . ∎

## 3 Main algorithm

In this section, we present our main algorithm, prove its asymptotical complexity, and present practical results in dimension .

### 3.1 Rationale

A natural idea in order to distinguish between an instance of (or ) and a uniform distribution is to select some samples that add up to zero, yielding a new sample of the form . It is then enough to distinguish between and a uniform variable. However, if is the bias of the error in the original samples, the new error has bias , hence roughly samples are necessary to distinguish it from uniform. Thus it is crucial that be as small a possible.

The idea of the algorithm by Blum, Kalai and Wasserman BKW is to perform “blockwise” Gaussian elimination. The coordinates are divided into blocks of length . Then, samples that are equal on the first coordinates are substracted together to produce new samples that are zero on the first block. This process is iterated over each consecutive block. Eventually samples of the form are obtained.

Each of these samples ultimately results from the addition of starting samples, so should be at most for the algorithm to make sense. On the other hand data are clearly required at each step in order to generate enough collisions on consecutive coordinates of a block. This naturally results in a complexity roughly in the original algorithm for . This algorithm was later adapted to in [4], and then improved in [5].

The idea of the latter improvement is to use so-called “lazy modulus switching”. Instead of finding two vectors that are equal on a given block in order to generate a new vector that is zero on the block, one uses vectors that are merely close to each other. This may be seen as performing addition modulo instead of for some , by rounding every value to the value nearest in . Thus at each step of the algorithm, instead of generating vectors that are zero on each block, small vectors are produced. This introduces a new “rounding” error term, but essentially reduces the complexity from roughly to . Balancing the new error term with this decrease in complexity results in a significant improvement.

However it may be observed that this rounding error is much more costly for the first few blocks than the last ones. Indeed samples produced after, say, one iteration step are bound to be added together times to yield the final samples, resulting in a corresponding blowup of the rounding error. By contrast, later terms will undergo less additions. Thus it makes sense to allow for progressively coarser approximations (i.e. decreasing the modulus) at each step. On the other hand, to maintain comparable data requirements to find collisions on each block, the decrease in modulus is compensated by progressively longer blocks.

What we propose here is a more general view of the BKW algorithm that allows for this improvement, while giving a clear view of the different complexity costs incurred by various choice of parameters. Balancing these terms is the key to finding an optimal complexity. We forego the “modulus switching” point of view entirely, while retaining its core ideas. The resulting algorithm generalizes several variants of BKW, and will be later applied in a variety of settings.

Also, each time we combine two samples, we never use again these two samples so that the combined samples are independent. Previous works used repeatedly one of the two samples, so that independency can only be attained by repeating the entire algorithm for each sample needed by the distinguisher, as was done in [12].

### 3.2 Quantization

The goal of quantization is to associate to each point of a center from a small set, such that the expectancy of the distance between a point and its center is small. We will then be able to produce small vectors by substracting vectors associated to the same center.

Modulus switching amounts to a simple quantizer which rounds every coordinate to the nearest multiple of some constant. Our proven algorithm uses a similar quantizer, except the constant depends on the index of the coordinate.

It is possible to decrease the average distance from a point to its center by a constant factor for large moduli [24], but doing so would complicate our proof without improving the leading term of the complexity. When the modulus is small, it might be worthwhile to use error-correcting codes as in [25].

### 3.3 Main Algorithm

Let us denote by the set of starting samples, and the sample list after reduction steps. The numbers partition the coordinates of sample vectors into buckets. Let be the vector of quantization coefficients associated to each bucket.