Markov chain Monte Carlo Methods For Lattice Gaussian Sampling: Lattice Reduction and Decoding Optimization

# Markov chain Monte Carlo Methods For Lattice Gaussian Sampling: Lattice Reduction and Decoding Optimization

Zheng Wang,  Yang Huang,  and Shanxiang Lyu This work was supported in part by the National Natural Science Foundation of China under Grant 61801216, in part by the Natural Science Foundation of Jiangsu Province under Grant SBK2018042902. Z. Wang and Y. Huang are with College of Electronic and Information Engineering, Nanjing University of Aeronautics and Astronautics (NUAA), Nanjing, China; S. Lyu is with the Department of Electrical and Electronic Engineering, Imperial College London, London, SW7 2AZ, United Kingdom (e-mail: z.wang@ieee.org, yang.huang.ceie@nuaa.edu.cn, s.lyu14@imperial.ac.uk).
###### Abstract

Sampling from the lattice Gaussian distribution has emerged as an important problem in coding, decoding and cryptography. In this paper, lattice reduction technique is adopted to Gibbs sampler for lattice Gaussian sampling. Firstly, with respect to lattice Gaussian distribution, the convergence rate of systematic scan Gibbs sampling is derived and we show it is characterized by the Hirschfeld-Gebelein-Rényi (HGR) maximal correlation among the multivariate of being sampled. Therefore, lattice reduction is applied to formulate an equivalent lattice Gaussian distribution but with less correlated multivariate, which leads to a better Markov mixing due to the enhanced convergence rate. Then, we extend the proposed lattice-reduction-aided Gibbs sampling to lattice decoding, where the choice of the standard deviation for the sampling is fully investigated. A customized solution that suits for each specific decoding case by Euclidean distance is given, thus resulting in a better trade-off between Markov mixing and sampler decoding. Moreover, based on it, a startup mechanism is also proposed for Gibbs sampler decoding, where decoding complexity can be reduced without performance loss. Simulation results based on large-scale MIMO detection are presented to confirm the performance gain and complexity reduction.

Keywords: Lattice Gaussian sampling, Markov chain Monte Carlo, Gibbs sampling, lattice decoding, large-scale MIMO detection.

## I Introduction

Nowadays, lattice Gaussian sampling has drawn a lot of attention in various research fields. In mathematics, Banaszczyk was the first to apply it to prove the transference theorems for lattices [1]. In coding, lattice Gaussian distribution was employed to obtain the full shaping gain for lattice coding [2, 3, 4], and to achieve the capacity of the Gaussian channel [5]. It was also used to achieve information-theoretic security in the Gaussian wiretap channel [6, 7, 8] and in the bidirectional relay channel [9], respectively. In cryptography, the lattice Gaussian distribution has become a central tool in the construction of many primitives [10, 11, 12]. Specifically, lattice Gaussian sampling lies at the core of signature schemes in the Gentry, Peikert and Vaikuntanathan (GPV) paradigm [13]. In decoding, lattice Gaussian sampling with a suitable variance allows to solve the closest vector problem (CVP) and the shortest vector problem (SVP) [14, 15]. From the viewpoint of lattice decoding, the optimal maximal likelihood (ML) detection in multiple-input multiple-output (MIMO) systems corresponds to solving the CVP [16].

However, in sharp contrast to the continuous Gaussian density, it is by no means trivial even to sample from a low-dimensional discrete Gaussian distribution. Efficient sampling schemes do exist but they only work for a few special lattices [5, 17]. As the default sampling algorithm for general lattices, Klein’s algorithm [18] only works when the standard deviation [13], where is a superlogarithmic function, denotes the lattice dimension and ’s are the Gram-Schmidt vectors of the lattice basis . Unfortunately, such a requirement of tends to be excessively large, rendering Klein’s algorithm inapplicable to many scenarios of interest.

In order to sample from a target lattice Gaussian distribution with arbitrary , Markov chain Monte Carlo (MCMC) methods were introduced in [19, 20]. In principle, it randomly generates the next Markov state conditioned on the previous one; after the burn-in time, which is normally measured by the mixing time, the Markov chain will step into a stationary distribution, when samples from the target distribution can be obtained [21]. As a basic MCMC method, the Gibbs sampling, which employs univariate conditional sampling to build the Markov chain, has been introduced to lattice Gaussian sampling by showing its ergodicity[22]. In [23], the symmetric Metropolis-within-Gibbs (SMWG) algorithm was proposed for lattice Gaussian sampling to achieve the exponential convergence. Moreover, the Markov chain induced by random scan Gibbs sampling for lattice Gaussian distribution was shown to be geometric ergodicity [24], which means it converges exponentially fast to the stationary distribution.

On the other hand, with the increment of antenna numbers, the large-scale MIMO system has become a promising extension of MIMO, which boosts the network capacity on a much greater scale. The dramatically increased system size places a pressing challenge on the signal detection while a lot of research attentions have been attracted by it [25, 26, 27, 28, 29]. Thanks to the convergence theorem of MCMC, Gibbs sampling with a finite state space naturally experiences the geometric ergodicity, so that it has already been adapted to MIMO detection to solve the CVP [30, 31, 32, 33, 34]. Meanwhile, Gibbs sampling has also been introduced into soft-output decoding in MIMO systems, where the extrinsic information calculated by a priori probability (APP) detector is used to produce soft outputs [35, 36]. In [37], an investigation of Gibbs-based MCMC receivers in different communication channels is given as well.

However, given those works, the choice of the standard deviation (also referred to as “temperature”) for Gibbs sampling decoding has not been fully investigated. A common choice comes from statistics by letting be the variance of noises, which severely suffers from the stalling problem in high signal-to-noise ratio (SNR) regime. Although Hassibi et al. suggested should instead be scaling at least as , it fails to exploit the decoding potential for each specific case [30].

Meanwhile, another very important point was ignored for years. Specifically, as an advanced decoder, Gibbs sampler decoding, however, is not necessary for all the decoding cases, where the optimal solution may be directly obtained by suboptimal decoding schemes especially in high SNRs. This indicates substantial computational complexity can be saved without any performance loss. In [31, 38], two stopping criterions were given for mixed-Gibbs sampler decoding schemes, but they only work for the proposed multiple restart strategies by terminating those trapped Markov chains.

In this paper, we advance the state of the art of the MCMC-based lattice Gaussian sampling in several fronts. First of all, in order to enhance the convergence performance of Gibbs sampling, the lattice-reduction-aided Gibbs sampling algorithm is proposed for lattice Gaussian sampling. In particular, a comprehensive analysis regarding to the convergence rate of the Markov chain induced by systematic scan Gibbs sampling is presented, and we show the convergence is essentially dominated by the Hirschfeld-Gebelein-Rényi (HGR) maximal correlation between the multiple random variables. Hence, by lattice reduction, an equivalent lattice Gaussian distribution can be established with significantly reduced HGR maximal correlation, thus leading to a boosting convergence performance. Note that the block strategy, which improves the convergence by performing the sampling over multivariate within one Markov move [22], also works in the proposed lattice-reduction-aided Gibbs sampling.

We then extend the lattice-reduction-aided Gibbs sampling algorithm to lattice decoding. The investigation starts from optimizing the sampling probability of the target decoding point, which leads to a better trade-off between Markov mixing and sampling decoding. Specifically, we show that the choice of the standard deviation heavily depends on the distance from the query point to the lattice. This not only effectively avoids the stalling problem, but also provides a preferable choice of for each specific decoding case. Furthermore, for a better approximation of , the initial starting point of the Markov chain is strongly desired to be well chosen while this is actually in accordance with geometric ergodicity as the initial starting point also has an indispensable impact on the convergence behaviour. Then, based on the initial starting point, we adopt the correct decoding radius from bounded distance decoding (BDD) to build a startup mechanism, which decides whether to invoke Gibbs sampler or not. Meanwhile, the demand of the high quality initial starting point can also be guaranteed through the usage of lattice reduction. In a word, our proposed Gibbs sampler decoding advances with better decoding performance and less complexity cost.

It should be noticed that compared to the lattice Gaussian distribution, the discrete Gaussian distribution designed for MIMO detection entails a finite state space (i.e., based on the QAM constellation). After the nonlinear transformation of lattice reduction ( is a unimodular matrix with ), the state space of turns out to be computationally expensive to get. For non-Gibbs sampling based detectors [39], suboptimal remedies can be carried out to restrict to the original set in the end. However, for Gibbs sampler decoding, the Markov chain along with an unbounded or approximate state space of tends to be unreasonably wild, which most likely results in an invalid Markov mixing. Such a problem does not exist in lattice decoding paradigm since and share the same state space . To this end, lattice reduction is not recommended to be directly applied in the Markov mixing of Gibbs sampling for MIMO detection. Nevertheless, the aforementioned analysis results from lattice decoding are still applicable to MIMO detection, by simply removing lattice reduction from the Markov mixing. Additionally, besides MIMO detection, the sampler decoding strategy can also be extended to signal processing as an useful signal estimator or detector [40, 41, 42, 43, 44].

The rest of this paper is organized as follows. Section II introduces the background of lattice Gaussian distribution and briefly reviews the basics of Gibbs sampling as well as lattice reduction. In Section III, the convergence rate of systematic scan Gibbs sampling is derived, which is essentially determined by the HGR maximal correlation among the multivariate. Based on it, the lattice-reduction-aided Gibbs sampling algorithm is proposed in Section IV for a better Markov mixing performance. Section V extends the lattice-reduction-aided Gibbs sampling to lattice decoding. The choice of the standard deviation is studied in full details while the startup mechanism of Gibbs sampling resorted to the correct decoding radius is established. Simulation results for large-scale MIMO detection are presented in Section VI. Finally, Section VII concludes the paper.

Notation: Matrices and column vectors are denoted by upper and lowercase boldface letters, and the transpose, inverse, pseudoinverse of a matrix by and , respectively. We use for the th column of the matrix , for the th Gram-Schmidt vector of the matrix , for the entry in the th row and th column of the matrix . In addition, in this paper, the computational complexity is measured by the number of arithmetic operations (additions, multiplications, comparisons, etc.). Finally, and denote the set of all mean zero and finite variance functions with respect to the target distribution , i.e., and .

## Ii Preliminaries

In this section, we introduce the background and mathematical tools needed to describe and analyze the following lattice-reduction-aided Gibbs sampling.

### Ii-a Lattice Gaussian Distribution

Let consist of linearly independent vectors. The -dimensional lattice generated by is defined by

 Λ=L(B)={Bx:x∈Zn}, (1)

where is called the lattice basis. We define the Gaussian function centered at for standard deviation as

 ρσ,c(z)=e−∥z−c∥22σ2, (2)

for all . When or are not specified, we assume that they are and respectively. Then, the discrete Gaussian distribution over is defined as

 DΛ,σ,c(x)=ρσ,c(Bx)ρσ,c(Λ)=e−12σ2∥Bx−c∥2∑x∈Zne−12σ2∥Bx−c∥2 (3)

for all , where is just a scaling to obtain a probability distribution. We remark that this definition differs slightly from the one in [10], where is scaled by a constant factor (i.e., ). Fig. 1 illustrates the discrete Gaussian distribution over . As can be seen clearly, it resembles a continuous Gaussian distribution, but is only defined over a lattice. In fact, discrete and continuous Gaussian distributions share similar properties, if the flatness factor is small [7].

### Ii-B Sampler Decoding

Consider the decoding of an real-valued system. The extension to the complex-valued system is straightforward [45, 46]. Let denote the transmitted signal. The corresponding received signal is given by

 c=Bx+w (4)

where is the noise vector with zero mean and variance , is an full column-rank matrix of channel coefficients. Typically, the conventional maximum likelihood (ML) reads

 ˆx=arg minx∈Xn∥Hx−y∥2 (5)

where denotes the Euclidean norm. Clearly, ML decoding corresponds to the CVP. If the received signal is the origin, then ML decoding reduces to SVP.

Intuitively, the CVP given in (5) can be solved by lattice Gaussian sampling. Since the distribution is centered at the query point , the closest lattice point to is assigned the largest sampling probability. Therefore, by multiple samplings, is most likely to be returned. It has been demonstrated that lattice Gaussian sampling is equivalent to CVP via a polynomial-time dimension-preserving reduction [47]. More specifically, in [18], Klein introduced a lattice decoding algorithm which performs the sampling from a Gaussian-like distribution, and it was further improved in [45, 48]. Aggarwal et al. used lattice Gaussian sampling to solve CVP and SVP with space and time complexities [14] [15]. Furthermore, only polynomial space complexity is required by the independent MHK sampling algorithm with CVP complexity [20].

Decoding by sampling has promising advantages. Firstly, sampling has the potential to be efficiently implemented, thus providing a prospective decoding method especially for high-dimensional systems. Secondly, the standard deviation of the discrete Gaussian distribution can be optimized to improve the sampling probability of the target point, which leads to a better decoding performance. Thirdly, by adjusting the sample size, the sampler decoding enjoys a flexible trade-off between performance and complexity. However, the problem of sampler decoding chiefly lies on how to perform the sampling over the target lattice Gaussian distribution.

### Ii-C Gibbs Sampling

As a foremost sampling scheme in MCMC, Gibbs tries to tackles with the sampling from a complicated joint distribution through conditional sampling over its marginal distributions. Typically, as for lattice Gaussian sampling, each coordinate of is sampled from the following 1-dimensional conditional distribution

 Pi(xi|x[−i])=DΛ,σ,c(xi|x[−i])=e−12σ2∥Bx−c∥2∑xi∈Ze−12σ2∥Bx−c∥2 (6)

with . Here denotes the coordinate index of , . During this univariate sampling, the other variables contained in are leaving unchanged. By repeating such a procedure with a certain scan scheme, a Markov chain is established.

In particular, there are various scan schemes to proceed the component updating. Apart from the random scan who randomly updates the coordinate of , systematic scan proceeds the update in a sequential order from to , thus completing an iteration during each Markov move. Compared to random scan, systematic scan is more preferable in lattice decoding due to its fixed update order. In fact, the mixing times of these two scan schemes do not differ by more than a polynomial factor [49].

###### Theorem 1 ([24]).

Given the invariant lattice Gaussian distribution , the Markov chain induced by random scan Gibbs algorithm is geometrically ergodic

 ∥Pt(x,⋅)−DΛ,σ,c∥TV≤C(x)ϱt (7)

with convergence rate and for all .

Here, is parameterized by the initial Markov state 111Once is a constant for all , the Markov chain is referred to as uniform ergodicity. More details can be found in [50]., which also plays an important role in the Markov mixing. represents the total variation distance, denotes the row of corresponding to initial state . Another thing should be pointed out is that in MCMC the complexity of each Markov move is often insignificant, whereas the required mixing time as well as the convergence rate are more critical.

### Ii-D Lattice Reduction Technique

Lattice reduction techniques have a long tradition in the field of number theory. In 1982, the celebrated LLL algorithm was proposed as a powerful and famous lattice reduction criterion for arbitrary lattice. Specifically, a basis is said to be LLL-reduced222Other lattice reduction schemes like Korkin-Zolotarev (KZ) reduction and Seysen reduction also exist, which are out of scope of this work. See [51, 52] for more details., if it satisfies the following two conditions,

• ,  for   ;

• ,  for   .

The first clause is called size reduction condition with , while the second is known as Lovász condition. If Lovász condition is violated, the basis vectors and are swapped; otherwise, size reduction is carried out. If only size reduction condition is satisfied, then the basis is called size-reduced. The parameter controls both the convergence speed of the reduction and the degree of orthogonality of the reduced basis [53].

After LLL reduced, the lattice basis consists of vectors that are relatively short and orthogonal to each other. More precisely, LLL reduction is able to yield a lattice vector within of the shortest vector in lattice by average polynomial complexity [54]. Inspired by it, the lattice-reduction-aided decoding has emerged as a powerful decoding strategy in various research fields. In MIMO detection, it has been demonstrated that the LLL reduction based minimum mean square error (MMSE) detection not only attains the full receive diversity [55], but also facilitates the diversity-multiplexing trade-off (DMT) optimal decoding [56]. Meanwhile, LLL reduction can be efficiently realized by effective LLL reduction with polynomial complexity [57]. Nevertheless, the performance gap between the optimal ML decoding and lattice-reduction-aided decoding is still substantial especially in high-dimensional systems.

### Ii-E HGR Maximal Correlation

For decades, the measurement of Hirschfeld-Gebelein-Rényi (HGR) maximal correlation has found numerous interesting applications in the field of information theory [58, 59, 60].

###### Definition 1 ([61]).

For any two random variables and , their maximal correlation is defined as

 γ(ξ,η)=supf,g corr(f(ξ),g(η)), (8)

where the supremum is taken over all Borel functions with and .

More specifically, with and , the HGR maximal correlation can be rewritten as

 γ(ξ,η)=supf,g E[f(ξ),g(η)]. (9)

Then, by a simple application of the Cauchy-Schwarz inequality, Rényi showed the following one-function alternate characterization for as [60],

 γ2(ξ,η)=supf(ξ):E(f)=0,var(f)=1E[E2[f(ξ)|η]]. (10)

Theoretically, HGR maximal correlation is an elegant generalization of the well-known Pearson correlation coefficient, and serves as a normalized measure of the dependence between two random variables. Although the Pearson correlation is analytically simple to evaluate in theory and computationally tractable to implement in practice, it only measures the linear relationship between and rather than capturing true statistical dependence. Apart from Pearson correlation coefficient, is defined whenever both and are non-degenerate, which assumes values in the interval and vanish if and only if and are independent.

## Iii Convergence Analysis

In this section, the convergence analysis of systematic scan Gibbs sampling for lattice Gaussian sampling is presented, where its convergence rate is derived by means of HGR maximal correlation.

### Iii-a Systematic Scan Gibbs Sampling

To start with, by induction, the transition probability of the systematic scan Gibbs sampling can be expressed as

 P(Xt=x,Xt+1=y)=n∏i=1Pn−i+1(xt+1n−i+1|xt[−(n−i+1)]). (11)

Clearly, for a given standard deviation and full rank lattice basis , it is easy to verify that each random variable is sampled with variance

 var[xi|x[−i]]=κi>0. (12)

Therefore, all the sampling candidates of are possible to be sampled theoretically, indicating an irreducible chain. In principle, the irreducible property prevents the random variables to be totally dependent, where all the components of for Markov state may be different with of .

For the sake of convergence analysis, we formulate the systematic scan Gibbs sampling to a simple version which only consists of two nominal components , and . In particular, similar to (6), during a Markov move, and are iteratively generated by

 xt+12∼Px2(x2|xt1)=e−12σ2∥Bx−c∥2∑x2∈Zn−me−12σ2∥Bx−c∥2 (13)

and

 xt+11∼Px1(x1|xt+12)=e−12σ2∥Bx−c∥2∑x1∈Zme−12σ2∥Bx−c∥2. (14)

In contrast to the conventional data augmentation scheme in MCMC, sampling over subvectors and can be conducted via blocked sampling [62], which does enable a faster convergence by taking multiple sampling elements into account (see [22] for an efficient blocked strategy of Gibbs sampler for lattice Gaussian sampling). We also claim that the following convergence analysis with respect to such a simplification can be easily adopted to general cases with (e.g., and ).

Through the simplification, the above Markov chain still attains as the invariant distribution while its transition probability becomes

 P(Xt=x,Xt+1=y)=Px1(xt+11|xt2)⋅Px2(xt+12|xt+11). (15)

Insight into this simplified Gibbs sampler, the marginal chains and with respect to and also function as valid Markov chains. Most importantly, these marginal chains experience the same mixing performance as the original chain with convergence rate [63, 64]

 ϱ=ϱ1=ϱ2, (16)

which implies we can obtain the convergence rate of the joint chain by only focusing on its marginal chain. Furthermore, because and are conditionally independent for a given , the detailed balance condition is satisfied by

 π′(xt1)P(xt1,xt+11)=π′(xt1)∑xt+12π(xt+12|xt1)π(xt+11|xt+12)
 = ∑xt+12π(xt1|xt+12)π(xt+12|xt1)π(xt+11|xt+12) = = π′(xt+11)∑xt+12π(xt+12|xt+11)π(xt1|xt+12) = π′(xt+11)P(xt+11,xt1), (17)

indicating that the marginal chain turns out to be reversible. Inspired by it, the following convergence analysis takes place in the marginal Markov chain with target distribution for simplicity333The same result can be obtained with respect to the marginal Markov chain ..

### Iii-B Convergence Analysis

Typically, given the transition probability , the forward operator of the Markov chain is defined as [65]

 Fh(Xt)≜∑Xt+1∈Ωh(Xt+1)P(Xt,Xt+1)=E[h(Xt+1)|Xt] (18)

with induced operator norm

 ∥F∥=suph∈L20(π),var(h)=1∥Fh∥. (19)

Here, is the Hilbert space of square integrable functions with respect to so that denotes the subspace of consisting of functions with zero mean relative to . More precisely, for , the inner product defined by the space is

 ⟨h(x),g(x)⟩=E[h(x)g(x)] (20)

with variance

 varπ[h(x)]=⟨h(x),h(x)⟩=∥h(x)∥2. (21)
###### Theorem 2.

Given the invariant lattice Gaussian distribution , the Markov chain induced by systematic scan Gibbs algorithm is geometrically ergodic

 ∥Pt(x,⋅)−DΛ,σ,c∥TV≤C(x)ϱt (22)

with convergence rate

 ϱ=γ2(x1,x2)<1. (23)
###### Proof.

First of all, regarding to the marginal Markov chain , the spectral radius of is closely related with its norm as [66]

 spec(F1)=limt→∞∥Ft1∥1/t. (24)

Meanwhile, the reversibility of the marginal chain corresponds to a self-adjoint operator with [67]

 ∥Ft1∥=∥F1∥t, (25)

then we have

 spec(F1)=∥F1∥. (26)

Subsequently, according to (19) and (26), the spectral radius of the forward operator is derived as

 spec(F1) =∥F1∥=suph∈L20(π′),% var(h)=1∥F1h∥ =suph∈L20(π′),var(h)=1{var[E[h(xt+11)|xt1]]}12 =γ(xt1,xt+11). (27)

With respect to , on one hand, it follows that

 γ(xt1,xt+11) =suph∈L20(π′),var(h)=1var[E[E[h(xt+11)|xt+12]|xt1]] ≤suph∈L20(π′),var% (h)=1var[E[h(xt+11)|xt+12]] =suph∈L20(π′),var(h)=1E[E2[h(x1)∣x2]] =γ2(x1,x2). (28)

On the other hand, we have

 γ(xt1,xt+11) ≥suph∈L20(π′),var% (h)=1corr[h(xt1),h(xt+11)] =suph∈L20(π′),var(h)=1E[h(xt1)h(xt+11)] =suph∈L20(π′),var(h)=1E[E[h(xt1)h(xt+11)∣xt+12]] =suph∈L20(π′),var(h)=1E[E2[h(x1)∣x2]] =γ2(x1,x2). (29)

Therefore, according to (28) and (29), we get

 spec(F1)=γ(xt1,xt+11)=γ2(x1,x2)<1, (30)

where the inequality holds due to the fact that and are random variables of each other by configuration.

Next, by invoking the following Lemma from [68], the marginal chain turns out to be geometrically ergodic with convergence rate

 ϱ1=spec(F1). (31)
###### Lemma 1 ([68]).

Given the invariant distribution , a reversible, irreducible and aperiodic Markov chain with spectral gap converges exponentially as

 ∥Pt(x,⋅)−π(⋅)∥TV≤C(x)(1−γ)t. (32)

Hence, from (16) and (31), the original Markov chain is geometric ergodicity with exponential convergence rate

 ϱ=γ2(x1,x2)<1, (33)

completing the proof. ∎

Clearly, measures the dependence between and , where if and only if and are independent of each other. On the other hand, the high correlation between and gives rise to a larger value of approaching to . It should be noticed that such a result can be easily generalized as , where a less correlation among and for is also the sufficient condition for a small value of .

###### Remark 1.

The convergence rate of systematic scan Gibbs sampling for the lattice Gaussian distribution is dominated by the HGR maximal correlation among random variables ’s, where the optimal convergence happens when ’s are independent of each other.

## Iv Lattice-reduction-aided Gibbs Sampling

From the convergence analysis, in order to achieve an efficient Markov mixing, a smaller , is strongly desired. However, it is hard to explicitly calculate in practice. Regarding to the lattice Gaussian distribution shown in (3), it is clear that the correlation over elements of is decided by matrix , i.e., the more orthogonal of , the less correlation of components in . For this reason, we attempt to use the orthogonality defect of to partially characterize .

Specifically, the orthogonality defect of a matrix is defined as [54]

 ξ(B)=∏ni=1∥bi∥|det(B)|, (34)

where represent the determinant of the square matrix. According to Hadamard inequality, the orthogonality defect is lower bounded by , where the equality holds if and only if vectors in are mutually orthogonal. Consequently, we can easily arrive at the following Lemma, whose proof is omitted here due to simplicity.

###### Lemma 2.

If the full rank matrix is an orthogonal matrix with , then for , and samples from lattice Gaussian distribution can be immediately obtained by systematic scan Gibbs sampling with convergence rate

 ϱ=0. (35)

Clearly, a smaller value of is in high demand for the fast mixing. However, for a given lattice basis , any attempt to reduce directly for a small is impossible. Nevertheless, an alternative way can still be carried out by resorting to lattice reduction technique [54], which transfers the lattice Gaussian distribution in (3) to an equivalent one:

 π(z)=e−12σ2∥¯¯¯¯Bz−c∥2∑z∈Zne−12σ2∥¯¯¯¯Bz−c∥2, (36)

where , and is a unimodular matrix with .

Undoubtedly, and describe the same lattice point in the space. Therefore, the target distribution essentially maintains unchanged during this transformation but is parameterized by , where there is a one-to-one correspondence between and . Then, with respect to the Gibbs sampling, the conditional sampling probability of Gibbs sampling shown in (14) becomes

 Pz1(z1|z2)=e−12σ2∥¯¯¯¯Bz−c∥2∑z1∈Zme−12σ2∥¯¯¯¯Bz−c∥2 (37)

with and , and can be further generalized to

 Pi(zi|z[−i])=e−12σ2∥¯¯¯¯Bz−c∥2∑zi∈Ze−12σ2∥¯¯¯¯Bz−c∥2. (38)

In particular, as shown in Fig. 2, given the target distribution , the proposed lattice-reduction-aided Gibbs sampling consists of the following three steps:

1) Generate the equivalent lattice Gaussian distribution by LLL reduction.

2) Perform the Gibbs sampling over .

3) Collect samples of after the Markov mixing and output samples of by .

Meanwhile, similarly, it is straightforward to verify the Gibbs sampling with respect to the converted lattice Gaussian distribution is also geometrically ergodic.

###### Theorem 3.

Given , the Markov chain induced by Gibbs sampling converges exponentially fast:

 (39)

where .

Remarkably, such a slight change by replacing with introduces a significant benefit: compared to , the orthogonality of matrix is greatly improved by lattice reduction. More specifically, it has been demonstrated that after LLL reduction, the orthogonality defect of the reduced basis is upper bounded by [54]

 ξ(¯¯¯¯¯B)≤βn(n−1)4 (40)

with , indicating a guaranteed reduction from to . Therefore, a smaller HGR maximal correlation over components within is most likely to be achieved, i.e., , thus leading to a better convergence rate by Theorem 2.

###### Remark 2.

With respect to sampling from the lattice Gaussian distribution , the usage of lattice reduction is capable of achieving less correlated random variable ’s than ’s, which leads to a more efficient Markov mixing.

To summarize, the proposed lattice-reduced-aided Gibbs sampling algorithm is presented in Algorithm 2.

## V Lattice-Reduction-Aided Gibbs Sampling Algorithm for Lattice Decoding

In this section, we extend the proposed lattice-reduction-aided Gibbs sampling to lattice decoding. Theoretically, when MCMC method is applied for sampler decoding, its decoding performance can be evaluated by CVP decoding complexity (i.e., the number of Markov move ), which is defined by [20]

 Ccvp≜tmixDΛ,σ,c(xcvp). (41)

Here, the mixing time serves as a pick-up gap to guarantee i.i.d. samples because samples from the stationary distribution tend to be correlated with each other. Besides, denotes the sampling probability of the target CVP point. Therefore, in order to strengthen the decoding performance, one can either reduce the mixing time (e.g., use LLL reduction to boost the convergence), or improve the sampling probability , which will be studied in the following.

### V-a Choice of the Sampling Deviation σ

From the point of view of simulated annealing in statistics, functions as “temperature” to guide the Markov mixing, which also has an impact upon as well. Given the lattice Gaussian distribution shown in (36), although a small size corresponds to a relatively large decoding sampling probability , it also incurs a “cold” Markov chain which tends to be trapped by the frozen status, and vice versa[69]444Actually, this is in accordance with the result of independent MHK sampling algorithm for lattice Gaussian distribution, where the exact convergence rate as well as the mixing time can be estimated [19, 20].. However, since for Gibbs sampling is hard to get at the current stage, to balance this inherent trade-off for a better decoding performance, a feasible compromise is to ensure a reliable sampling probability given moderate .

In particular, with respect to any to be sampled, we firstly extract from the denominator of as

 π(z) =e−12σ2∥¯¯¯¯Bz−c∥2∑z∈Zne−12σ2∥¯¯¯¯Bz−c∥2 (a)≥e−12σ2∥¯¯¯¯Bz−c∥2∑z∈Zne−12σ2∥¯¯¯¯Bz∥2 (b)≥e−12σ2∥¯¯¯¯Bz−c∥2(√2πσ)n∑z∈Zne−π∥¯¯¯¯Bz∥2 =f(σ)⋅c  for √2πσ≥1 (42)

where

 c≜1/∑z∈Zne−π∥¯¯¯¯Bz∥2 (43)

is a constant and

 f(σ)≜e−12σ2∥¯¯¯¯Bz−c∥2(√2πσ)n (44)

is parameterized by . Here, and respectively obey the facts from lattice theory ([1, Lemma 1.4]) that

 ∑v∈Λe−12σ2∥v−c∥2≤∑v∈Λe−12σ2∥v∥2 (45)

and

 ∑v∈Λe−πs−1∥v∥2≤sn2⋅∑v∈Λe−π∥v∥2, for s≥1. (46)

From (42), it is natural to see that the sampling probability for any specific is lower bounded by the function . Furthermore, the derivative of function with respect to is derived as follows

 ∂f(σ)∂σ=(nσ2−∥∥¯Bz−c∥∥2)exp(−∥∥¯Bz−c∥∥22σ2)σn+3(√2π)n. (47)

Subsequently, let the above derivative be zero, the optimized that maximizes is obtained as

 σ=max{∥¯¯¯¯¯Bz−c∥√n,1√2π}, (48)

which implies that should vary with for a large lower bound of .

Clearly, the existence of the lower bound for guarantees a reliable sampling probability of , which could be further optimized by the careful selection of . Meanwhile, the requirement of serves as a baseline to ensure the Markov chain evolves dynamically, even though the sampling probability below seems rather attractive.

Hence, as for the target point for lattice decoding, the choice of due to (48) turns out to be

 σcvp=max⎧⎨⎩∥¯¯¯¯¯Bzcvp−c∥√n,1√2π⎫⎬⎭. (49)

Generally speaking, regarding to different configurations of and , such a flexible setting of is more beneficial to the sampler decoding by providing a specific rather than statistic choice. For small value of , tends to get smaller since appears close to the lattice and vice versa, thus adaptively guiding the choice of for each .

Unfortunately, it is impossible to get for . Therefore, in practice, the initial starting point can be applied as an approximation. Clearly, the closer of to , the more accurate of the selected . This essentially poses a stringent request for the selection of . Fortunately, thanks to the lattice reduction, the required high quality initial starting point in lattice-reduction-aided Gibbs sampling can be guaranteed. In this paper, the classic Babai’s nearest plane algorithm (also known as successive interference cancelation (SIC) in MIMO detection) is utilized by

 z0=zlll-sic, (50)

where the decoding of can be executed during the transformation from to . To summarize, we reformat the proposed standard deviation as

 σdistance=max⎧⎨⎩∥¯¯¯¯¯Bzlll-sic−c∥√n,1√2π⎫⎬⎭. (51)

Again, we emphasize that other decoding schemes are also applicable to output while the decoding performance improves with the accuracy of the approximation.

Besides the sampling probability, the initial starting point also plays an important role in the Markov mixing. More specifically, for the small set and , the geometric ergodicity Markov chains will converge exponentially to the stationary distribution as [70]

 (52)

where , , and . From (52), starting the Markov chain with as close to the center of the lattice Gaussian distribution (i.e., the query point ) as possible would be a judicious choice for the efficient , which is accordance with our suggestion.

On the other hand, since in (5) entails the additive white Gaussian noise (AWGN) with zero mean and variance , it follows that

 ∥¯¯¯¯¯Bz−c∥2=∥Bx−c∥2≈nσ2w (53)

by the law of large numbers. Then, by simply substituting (53) into (48), the choice of can be obtained in a statistic way, that is,

 σstatistic=max{σw,1√2π}. (54)

Interestingly, we point out that is just the conventional wisdom that is widely accepted by related works. However, compared to , it severely suffers from the stalling problem as shrinks intensively with the increase of SNR. Therefore, the lower bound serves as a necessary complement to active the sampling away from the frozen status. Note that the consistency behind choices of and suggests our analysis based on the sampling probability is tight enough, and we then advance it to more specific cases.

### V-B Startup Mechanism based on Correct Decoding Radius R

The application of the initial starting point arises a natural question: whether Gibbs sampling is necessary to every decoding case? In what follows, we try to answer this question from the perspective of correct decoding radius of BDD.

Theoretically, BDD targets at solving the decoding problem when the query point is close to the lattice within a certain distance, which corresponds to a restricted variant of CVP. In BDD, the concept of correct decoding radius was proposed to serve as a benchmark for evaluating the decoding performance [71]. More specifically, CVP is guaranteed to be solved if the distance between the query point and the lattice (i.e., ) is less than . As for Babai’s nearest plane algorithm, its correct decoding radius is given by [71]

 (55)

Here, we highlight the significance of LLL reduction again as it greatly increases compared to . Furthermore, it has been shown in [71] that is lower bounded as

 Rlll-sic≥12√nβn−14λ1(B), (56)

where and denotes the minimum distance of the lattice . Therefore, for the consideration of decoding efficiency, the correct decoding radius can be applied as a theoretical judgement to make the decision whether invoke Gibbs sampling or not. This means substantial decoding complexity will be saved without performance loss.

In particular, let