Statistical Mechanics of MAP Estimation: General Replica Ansatz

Statistical Mechanics of MAP Estimation: General Replica Ansatz

Abstract

The large-system performance of maximum-a-posterior estimation is studied considering a general distortion function when the observation vector is received through a linear system with additive white Gaussian noise. The analysis considers the system matrix to be chosen from the large class of rotationally invariant random matrices. We take a statistical mechanical approach by introducing a spin glass corresponding to the estimator, and employing the replica method for the large-system analysis. In contrast to earlier replica based studies, our analysis evaluates the general replica ansatz of the corresponding spin glass and determines the asymptotic distortion of the estimator for any structure of the replica correlation matrix. Consequently, the replica symmetric as well as the replica symmetry breaking ansatz with steps of breaking is deduced from the given general replica ansatz. The generality of our distortion function lets us derive a more general form of the maximum-a-posterior decoupling principle. Based on the general replica ansatz, we show that for any structure of the replica correlation matrix, the vector-valued system decouples into a bank of equivalent decoupled linear systems followed by maximum-a-posterior estimators. The structure of the decoupled linear system is further studied under both the replica symmetry and the replica symmetry breaking assumptions. For steps of symmetry breaking, the decoupled system is found to be an additive system with a noise term given as the sum of an independent Gaussian random variable with correlated impairment terms. The general decoupling property of the maximum-a-posterior estimator leads to the idea of a replica simulator which represents the replica ansatz through the state evolution of a transition system described by its corresponding decoupled system. As an application of our study, we investigate large compressive sensing systems by considering the norm minimization recovery schemes. Our numerical investigations show that the replica symmetric ansatz for norm recovery fails to give an accurate approximation of the mean square error as the compression rate grows, and therefore, the replica symmetry breaking ansätze are needed in order to assess the performance precisely.

= [draw, rectangle, rounded corners, minimum height=2.5em, minimum width=5em] = [draw, dotted, rectangle, rounded corners, minimum height=2.1em, minimum width=2em] = [draw, circle, node distance=1cm, inner sep=0pt] = [draw, circle, node distance=1cm, inner sep=1pt, minimum size=.8cm] = [coordinate] = [coordinate] = [pin edge=to-,thick,black]

1Introduction

Consider a vector-valued system specified by

where the source vector , with components in a support set , is measured by the random system matrix , with , and corrupted by the zero-mean Gaussian noise vector , with variance , i.e., . The source vector can be estimated from the observation vector using a estimator. For a given system matrix , the estimator maps the observation vector to the estimated vector via the estimation function defined as

for some “utility function” and estimation parameter . In , denotes the Euclidean norm, and it is assumed that the minimum is not degenerate so that is well-defined, at least for almost all and . In order to analyze the performance of the system in the large-system limit, i.e., , one considers a general distortion function . For some choices of , the distortion function determines the distance between the source and estimated vector, e.g. ; however, in general, it takes different choices. The asymptotic distortion

then, expresses the large-system performance regarding the distortion function . The performance analysis of the estimator requires to be explicitly computed, and then, substituted in the distortion function. This task, however, is not trivial for many choices of the utility function and the source support , and becomes unfeasible as grows large. As basic analytic tools fail, we take a statistical mechanical approach and investigate the large-system performance by studying the macroscopic parameters of a corresponding spin glass. This approach enables us to use the replica method which has been developed in the context of statistical mechanics.

1.1Corresponding Spin Glass

Consider a thermodynamic system which consists of particles with each having a microscopic parameter . The vector , collecting the microscopic parameters, presents then the microscopic state of the system and is called the “microstate”. The main goal of statistical mechanics is to excavate the “macroscopic parameters” of the system, such as energy and entropy through the analysis of the microstate in the thermodynamic limit, i.e., . Due to the large dimension of the system, statistical mechanics proposes a stochastic approach in which the microstate is supposed to be randomly distributed over the support due to some distribution . For this system, the Hamiltonian assigns to each realization of the microstate a non-negative energy level, and denotes the system’s entropy. The “free energy” of the thermodynamic system at the inverse temperature is then defined as

The second law of thermodynamics states that the microstate at thermal equilibrium takes its distribution such that the free energy meets its minimum. Thus, the microstate’s distribution at thermal equilibrium reads

where is a normalization factor referred to as the “partition function”, and the superscript indicates the distribution’s dependence on the inverse temperature. The distribution in is known as the “Boltzmann-Gibbs distribution” and covers many distributions on by specifying and correspondingly. Substituting the Boltzmann-Gibbs distribution in , the free energy at thermal equilibrium and inverse temperature reads

The average energy and entropy of the system at thermal equilibrium are then determined by taking expectation over the distribution in , i.e.,

which can be calculated in terms of the free energy via

In spin glasses [1], the Hamiltonian assigns the energy levels randomly using some randomizer resulting from random interaction coefficients. In fact, each realization of specifies a thermodynamic system represented by the deterministic Hamiltonian . In statistical mechanics, is known to have “quenched” randomness while the microstate is an “annealed” random variable. The analysis of spin glasses takes similar steps as above considering a given realization of the randomizer, and therefore, as the system converges to its thermal equilibrium at the inverse temperature , the microstate’s conditional distribution given , i.e., , is a Boltzmann-Gibbs distribution specified by . Consequently, the free energy reads

where is the partition function with respect to the Hamiltonian . Here, the free energy, as well as other macroscopic parameters of the system, is random; however, the physical intuition behind the analyses suggests that these random macroscopic parameters converge to deterministic values in the thermodynamic limit. This property is known as the “self averaging property” and has been rigorously justified for some particular classes of Hamiltonians, e.g., [2]. Nevertheless, in cases where a mathematical proof is still lacking, the property is supposed to hold during the analysis. According to the self averaging property, the free energy of spin glasses converges to its expectation in the thermodynamic limit.

As mentioned earlier, the estimator in can be investigated using a corresponding spin glass. To see that, consider a spin glass whose microstate is taken from , and whose Hamiltonian is defined as

Here, the system matrix and the observation vector are considered to be the randomizers of the spin glass. In this case, given and , the conditional distribution of the microstate is given by

Taking the limit when and using Laplace method of integration [6], the zero temperature distribution, under this assumption that the minimizer is unique, reduces to

where denotes the indicator function, and is defined as in . indicates that the microstate of the spin glass converges to the estimated vector of the estimator, i.e., , in the zero temperature limit. Invoking this connection, we study the corresponding spin glass instead of the estimator. We represent the input-output distortion of the system regarding a general distortion function as a macroscopic parameter of the spin glass. Consequently, the replica method developed in statistical mechanics is employed to determine the defined macroscopic parameter of the corresponding spin glass. The replica method is a generally nonrigorous but effective method developed in the physics literature to study spin glasses. Although the method lacks rigorous mathematical proof in some particular parts, it has been widely accepted as an analysis tool and utilized to investigate a variety of problems in applied mathematics, information processing, and coding [7].

The use of the replica method for studying multiuser estimators goes back to [11] where Tanaka determined the asymptotic spectral efficiency of estimators by employing the replica method. The study demonstrated interesting large-system properties of multiuser estimators, and consequently, the statistical mechanical approach received more attention in the context of multiuser systems. This approach was then employed in the literature to study multiple estimation problems in large vector-valued linear systems, e.g. [12]. The method was also utilized to analyze the asymptotic properties of systems in [15] considering an approach similar to [11]. Regarding multiuser estimators, the earlier studies mainly considered the cases in which the entries of the source vector are binary or Gaussian random variables. The results were later extended to a general source distribution in [14]. The statistical mechanical approach was further employed to address mathematically similar problems in vector precoding, compressive sensing and analysis of superposition codes [16], to name just a few examples. Despite the fact that the replica method lacks mathematical rigor, a body of work, such as [19], has shown the validity of several replica-based results in the literature, e.g., Tanaka’s formula in [11], using some alternative rigorous approaches. We later discuss these rigorous results with more details by invoking the literature of compressive sensing.

1.2Decoupling Principle

Considering the estimator defined in , the entries of the estimated vector are correlated in general, since the system matrix couples the entries of linearly, and performs several nonlinear operations on . In the large-system performance analysis, the marginal joint distribution of two corresponding input-output entries and , , is of interest. To clarify our point, consider the case in which a linear estimator is employed instead of , i.e., . Denote the matrices and as and , respectively, with and being vectors for . Therefore, is written as

Here, the right hand side of can be interpreted as the linear estimation of a single-user system indexed by in which the symbol is corrupted by an additive impairment given by the last two summands in the right hand side of . The impairment term is not necessarily independent and Gaussian. For some classes of matrix ensembles, and under a set of assumptions, it is shown that the dependency of the derived single-user systems on the index vanishes, and the distribution of the impairment terms converges to a Gaussian distribution in the large-system limit [26]. As a result, one can assume the vector-valued system described by followed by the linear estimator to be a set on additive scalar systems with Gaussian noise which have been employed in parallel. In other words, the vector system can be considered to “decouple” into a set of similar scalar systems. Each of them relates an input entry to its corresponding estimated one . This asymptotic property of the estimator is referred to as the “decoupling property” and can be investigated through the large-system performance analysis.

The decoupling property was first studied for linear estimators. Tse and Hanly noticed this property while they were determining the multiuser efficiency of several linear multiuser estimators in the large-system limit [27]. They showed that for an system matrix, the effect of impairment is similar to the effect of some modified Gaussian noise when the dimension tends to infinity. This asymptotic property was then investigated further by studying the asymptotics of different linear receivers and their large-system distributions [28]. In an independent work, Verdú and Shamai also studied the linear estimator and showed that the conditional output distribution is asymptotically Gaussian [30]. In [31], the authors studied the asymptotics of the impairment term when a family of linear estimators is employed and proved that it converges in distribution to a Gaussian random variable. The latter result was further extended to a larger class of linear estimators in [26].

Regarding linear estimators, the main analytical tool is random matrix theory [32]. In fact, invoking properties of large random matrices and the central limit theorem, the decoupling property is rigorously proved, e.g. [34]. These tools, however, fail for nonlinear estimators as the source symbol and impairment term do not decouple linearly due to nonlinear operations at the estimators. In [36], Müller and Gerstacker employed the replica method and studied the capacity loss due to the separation of detection and decoding. The authors showed that the additive decoupling of the spectral efficiency, reported in [35] for Gaussian inputs, also holds for binary inputs. As a result, it was conjectured that regardless of input distribution and linearity, the spectral efficiency always decouples in an additive form [37]. In [14], Guo and Verdú justified this conjecture for a family of nonlinear estimators, and showed that for an system matrix, the estimator decouples into a bank of single-user estimators under the ansatz. In [38], Rangan et al. studied the asymptotic performance of a class of estimators. Using standard large deviation arguments, the authors represented the estimator as the limit of an indexed estimators’ sequence. Consequently, they determined the estimator’s asymptotics employing the results from [14] and justified the decoupling property of estimators under the ansatz for an .

Regarding the decoupling property of estimators, there are still two main issues which need further investigations:

The first issue was partially addressed in [39] where, under the assumption, the authors studied the asymptotics of a estimator employed to recover the support of a source vector from observations received through noisy sparse random measurements. They considered a model in which a sparse Gaussian source vector1 is first randomly measured by a square matrix , and then, the measurements are sparsely sampled by a diagonal matrix whose non-zero entries are Bernoulli random variables. For this setup, the input-output information rate and support recovery error rate were investigated by considering the measuring matrix to belong to a larger set of matrix ensembles. These results, moreover, could address the decoupling property of the considered setting. Although the class of system matrices is broadened in [39], it cannot be considered as a complete generalization of the property presented in [14] and [38], since it is restricted to cases with a sparse Gaussian source and loading factors less than one, i.e., in . Vehkaperä et al. also tried to investigate the first issue for a similar formulation in compressive sensing [40]. In fact, the authors considered a linear sensing model as in for the class of rotationally invariant random matrices2 and under the ansatz determined the asymptotic for the least-square recovery schemes which can be equivalently represented by the formulation in . The large-system results in [40], however, did not address the asymptotic marginal joint input-output distribution, and the emphasis was on the . Regarding the second issue, the estimator has not yet been investigated under ansätze in the literature. Nevertheless, the necessity of such investigations was mentioned for various similar settings in the literature; see for example [41]. In [41], the performances of detectors were investigated by studying both the and one-step ansätze and the impact of symmetry breaking onto the results for low noise scenarios were discussed. The authors in [43] further studied the performance of vector precoding under both and and showed that the analysis under yields a significantly loose bound on the true performance. The replica ansatz with one-step of , however, was shown to lead to a tighter bound consistent with rigorous performance bound available in the literature. A similar observation was recently reported for the problem of least-square-error precoding in [44]. The replica analyses of compressive sensing in [42], moreover, discussed the necessity of investigating the performance of minimization recovery schemes under ansätze for some choices of .

1.3Compressive Sensing

The estimation of a source vector from a set of noisy linear observations arises in several applications, such as and sampling systems. To address one, we consider large compressive sensing systems and employ our asymptotic results to analyze the large-system performance [47]. In context of compressive sensing, represents a noisy sampling system in which the source vector is being sampled linearly via and corrupted by additive noise . In the “noise-free” case, i.e. , the source vector is exactly recovered from the observation vector , if the number of observations is as large as the source length and the sampling matrix is full rank. As the number of observations reduces, the possible answers to the exact reconstruction problem may increase regarding the source support , and therefore, the recovered source vector from the observation vector is not necessarily unique. In this case, one needs to enforce some extra constraints on the properties of the source vector in order to recover it uniquely among all possible solutions. In compressive sensing, the source vector is supposed to be sparse, i.e., a certain fraction of entries are zero. This property of the source imposes an extra constraint on the solution which allows for exact recovery in cases with . In fact, in this case, one should find a solution to over

where denotes the “ norm” and is defined as , and is the source’s sparsity factor defined as the fraction of non-zero entries. Depending on and , the latter problem can have a unique solution even for [50]. Searching for this solution optimally over , however, is an NP-hard problem and therefore intractable. The main goal in noise-free compressive sensing is to study feasible reconstruction schemes and derive tight bounds on the sufficient compression rate, i.e., , for exact source recovery via these schemes.

In noisy sampling systems, exact recovery is only possible for some particular choices of . Nevertheless, considering either cases in which exact recovery is not possible or choices of for which the source vector can be exactly recovered from noisy observations, the recovery approaches in these sensing systems need to take the impact of noise into account. The classical strategy in this case is to find a vector in such that the recovery distortion is small. Consequently, a recovery scheme for noisy sensing system based on the norm is given by

which is the estimator defined in with . It is straightforward to show that for , i.e., zero noise variance, reduces to the optimal noise-free recovery scheme as . Similar to the noise-free case, the scheme in results in a non-convex optimization problem, and therefore, is computationally infeasible. Alternatively, a computationally feasible schemes is introduced by replacing the norm in the cost function with the norm. The resulting recovery scheme is known as LASSO [53] or basis pursuit denoising [54]. Based on these formulations, several iterative and greedy algorithms have been introduced for recovery taking into account the sparsity pattern and properties of the sampling matrix [55]. The main body of work in noisy compressive sensing then investigates the trade-off between the compression rate and recovery distortion.

For large compressive sensing systems, it is common to consider a random sensing matrix, since for these matrices, properties such as restricted isometry property are shown to hold with high probability [58]. In this case, the performance of a reconstruction schemes is analyzed by determining the considered performance metric, e.g., and probability of exact recovery in the noisy and noise-free case, respectively, for a given realization of the sensing matrix. The average performance is then calculated by taking the expectation over the matrix distribution. Comparing with , one can utilize the formulation, illustrated at the beginning of this section, to study the large-system performance of several reconstruction schemes. This similarity was considered in a series of papers, e.g., [17], and therefore, earlier replica results were employed to study compressive sensing systems. The extension of analyses from the context of multiuser estimation had the disadvantage that the assumed sampling settings were limited to those setups which are consistent with the estimation problems in the literature. Compressive sensing systems, however, might require a wider set of assumptions, and thus, a large class of settings could not be addressed by earlier investigations. As the result, a body of work deviated from this approach and applied the replica method directly to the compressive sensing problem; see for example [39].

Although in general the replica method is considered to be mathematically non-rigorous, several recent studies have justified the validity of the replica results in the context of compressive sensing by using some alternative tools for analysis. A widely investigated approach is based on the asymptotic analysis of algorithms. In the context of compressive sensing, the algorithms were initially introduced to address iteratively the convex reconstruction schemes based on norm minimization, such as LASSO and basis pursuit, with low computational complexity [62]. The proposed approach was later employed to extend the algorithm to a large variety of estimation problems including and estimation; see for example [64]. The primary numerical investigations of demonstrated that for large sensing systems the sparsity-compression rate tradeoff of these iterative algorithms, as well as the compression rate-distortion tradeoff in noisy cases, is derived by the fixed-points of “state evolution” and recovers the asymptotics of convex reconstruction schemes [63]. This observation was then rigorously justified for sub-Gaussian sensing matrices in [66] by using the conditioning technique developed in [67]. The study was recently extended to cases with rotationally invariant system matrices in [68]. The investigations in [70] moreover showed that using algorithms for spatially coupled measurements, the fundamental limits on the required compression rate [72] can be achieved in the asymptotic regime. The methodology proposed by algorithms and their state evolution also provided a justification for validity of several earlier studies based on the replica method. In fact, the results given by the replica method were recovered through state evolution of the corresponding algorithms. Invoking this approach along with other analytical tools, the recent study in [23] further approved the validity of the replica prediction for the asymptotic and mutual information of the linear system in with Gaussian measurements. Similar results were demonstrated in [21] using a different approach.

1.4Contribution and Outline

In this paper, we determine the asymptotic distortion for a general distortion function for cases where the estimator is employed to estimate the source vector from the observation given in . We represent the asymptotic distortion in as a macroscopic parameter of a corresponding spin glass and study the spin glass via the replica method. The general replica ansatz is then given for an arbitrary replica correlation matrix, and its special cases are studied considering the and assumptions. The asymptotic distortion is determined for rotationally invariant random system matrices invoking results for asymptotics of spherical integrals [74]. Using our asymptotic results, we derive a more general form of the decoupling principle by restricting the distortion function to be of a special form and employing the moment method [76]. We show that the vector-valued system in estimated by decouples into a bank of similar noisy single-user systems followed by single-user estimators. This result holds for any replica correlation matrix; however, the structure of the decoupled single-user system depends on the supposed structure of the correlation matrix. Under the assumption with steps of breaking (), the noisy single-user system is given in the form of an input term added by an impairment term. The impairment term, moreover, is expressed as the sum of an independent Gaussian random variable and correlated interference terms. By reducing the assumption to , the result reduces to the formerly studied decoupling principle of the estimators [38] for rotationally invariant system matrix ensembles. In fact, our investigations collect the previous results regarding the decoupling principle in addition to a new set of setups under a single umbrella given by a more general form of the decoupling principle. More precisely, we extend the scope of the decoupling principle to

  • the systems whose measuring matrix belongs to the class of rotationally invariant random matrices, and

  • the replica ansatz with general replica correlations which include the and ansätze.

To address a particular application, we study the large-system performance of a compressive sensing system under the minimization recovery schemes. We address the linear reconstruction, as well as the LASSO and the norm scheme considering both the sparse Gaussian and finite alphabet sources. Our general setting allows to investigate the asymptotic performance with respect to different metrics and for multiple sensing matrices such as random and projector. The numerical investigations show that the ansatz becomes unstable for some regimes of system parameters and predicts the performance of minimization recovery loosely within a large range of compression rates. This observation agrees with the earlier discussions on the necessity of investigations reported in [42]. We therefore study the performance under and discuss the impact of the symmetry breaking. Throughout the numerical investigations, it is demonstrated that the performance enhancement obtained via random orthogonal measurements, reported in [40], also holds for sparse finite alphabet sources in which sensing via random projector matrices results in phase transitions at higher rates.

The rest of the manuscript is organized as follows. In Section 2, the problem is formulated. We illustrate our statistical mechanical approach in Section 3 and explain briefly the replica method. The general replica ansatz, as well as the general decoupling principle is given in Section 4. The ansatz under the and assumptions are expressed in Sections Section 5 and Section 6. Based on decoupled system, we propose the idea of a replica simulator in Section 7 and describe the given ansätze in terms of the corresponding decoupled systems. To address an application of our study, we consider large compressive sensing systems in Section 8 and discuss several examples. The numerical investigations of the examples are then given in Section 9. Finally, we conclude the manuscript in Section 10.

1.5Notations

Throughout the manuscript, we represent scalars, vectors and matrices with lower, bold lower, and bold upper case letters, respectively. The set of real numbers is denoted by , and and indicate the transposed and Hermitian of . is the identity matrix and is an matrix with all entries equal to . For a random variable , represents either the or , and represents the . We denote the expectation over by , and an expectation over all random variables involved in a given expression by . and represent the set of integer and real numbers and the superscript , e.g. , indicates the corresponding subset of all non-negative numbers. For sake of compactness, the set of integers is abbreviated as , the zero-mean and unit-variance Gaussian is denoted by , and the Gaussian averaging is expressed as

Moreover, in many cases, we drop the set on which a sum, minimization or an integral is taken. Whenever needed, we consider the entries of to be discrete random variables, namely the support to be discrete. The results of this paper, however, are in full generality and hold also for continuous distributions.

1.6Announcements

Some of the results of this manuscript were presented at the IEEE Information Theory Workshop [78] and the Information Theory and Applications Workshop [79]. Even though the results have a mathematical flavor, the stress is not on investigating the rigor of the available tools such as the replica method, but rather to employ them for deriving formulas which can be used in different problems.

2Problem Formulation

Consider the vector-valued linear system described by . Let the system satisfy the following properties.

  1. is an random vector with each entry being distributed with over .

  2. is randomly generated over with from rotationally invariant random ensembles. The random matrix is said to be rotationally invariant when its Gramian, i.e., , has the eigendecomposition

    with being an orthogonal Haar distributed matrix and being a diagonal matrix. For a given , we denote the empirical of ’s eigenvalues (cumulative density of states) with and define it as

    where for denotes the th diagonal entry of . We assume that converges, as , to a deterministic .

  3. is a real zero-mean Gaussian random vector in which the variance of each entry is .

  4. The number of observations is a deterministic function of the transmission length , such that

    For sake of compactness, we drop the explicit dependence of on .

  5. , and are independent.

The source vector is reconstructed from the observation vector with a system matrix that is known at the estimator. Thus, for a given , the source vector is recovered by where is given in . Here, the non-negative scalar is the estimation parameter and the non-negative cost function is referred to the “utility function”. The utility function is supposed to decouple which means that it takes arguments with different lengths, i.e., for any positive integer , and

In order to use the estimator in , one needs to guarantee the uniqueness of the estimation output. Therefore, we impose the following constraint on our problem.

  1. For a given observation vector , the objective function in has a unique minimizer over the support .

2.1 Estimator

The estimator in can be considered as the optimal estimator in the sense that it minimizes the reconstruction’s error probability postulating a source prior distribution proportional to and a noise variance . To clarify the argument, assume is a finite set3. In this case, we can define the reconstruction’s error probability as

for some estimator . In order to minimize , is chosen such that the posterior distribution over the input support is maximized, i.e.,

where comes from the independency of and . Here, , and is the prior distribution of the source vector. Now, let the estimator postulate the noise variance to be and the prior to be

for some non-negative function . Substituting into , the estimator reduces to defined in . The estimator in models several particular reconstruction schemes in compressive sensing. We address some of these schemes later on in Section 8.

2.2Asymptotic Distortion and Conditional Distribution

In many applications, the distortion is given in terms of the average , while in some others the average symbol error rate is considered. In fact, the former takes the norm as the distortion function, and the latter considers the norm. The distortion function, however, can be of some other form in general. Here, we study the asymptotic performance by considering a general distortion function which determines the imperfection level of the estimation. Thus, we consider a distortion function which

The term “average distortion” usually refers to the case when the averaging weights are uniform. It means that each tuple of source-estimated entries is weighted equally when the distortion is averaged over all the entries’ tuples. It is however possible to average the distortion by a non-uniform set of weights. In the following, we define the average distortion for a class of binary weights which includes the case of uniform averaging, as well.

Definition ? is moreover utilized to investigate the asymptotic conditional distribution of the estimator which plays a key role in studying the decoupling principle. For further convenience, we define the asymptotic conditional distribution of the estimator as follows.

We study the asymptotic distortion over the limit of a desired index set and distortion function by defining it as a macroscopic parameter of the corresponding spin glass and employing the replica method to evaluate it. Using the result for the asymptotic distortion, we determine then the asymptotic conditional distribution and investigate the decoupling property of the estimator.

3Statistical Mechanical Approach

The Hamiltonian in introduces a spin glass which corresponds to the estimator. The spin glass at zero temperature describes the asymptotics of the estimator. For further convenience, we formally define the “corresponding spin glass” as follows.

For the corresponding spin glass, at the inverse temperature , the following properties are directly concluded.

  • The conditional distribution of the microstate reads

    with being the partition function

  • The normalized free energy is given by

    where the expectation is taken over the quenched randomizers.

  • The entropy of the spin glass is determined as

Regarding the estimator, one can represent the asymptotic distortion as a macroscopic parameter of the corresponding spin glass. More precisely, using Definition ?, the asymptotic distortion reads

where indicates the expectation over with respect to the conditional Boltzmann-Gibbs distribution defined in . In fact, by introducing the macroscopic parameter 4 at the temperature as

the asymptotic distortion can be interpreted as the macroscopic parameter at zero temperature. Here, we take a well-known strategy in statistical mechanics which modifies the partition function to

In this case, the expectation in is taken as

The macroscopic parameter defined in is random, namely depending on the quenched randomizer . As discussed in Section 1.1, under the self averaging property, the macroscopic parameter is supposed to converge in the large-system limit to its expected value over the quenched random variables. For the corresponding spin glass defined in Definition ?, the self averaging property has not been rigorously justified, and the proof requires further mathematical investigations as in [4]. However, as it is widely accepted in the literature, we assume that the property holds at least for the setting specified here. Therefore, we state the following assumption.

Using the self averaging property of the system, the asymptotic distortion is written as

Evaluation of , as well as the normalized free energy defined in , confronts the nontrivial problem of determining the logarithmic expectation. The task can be bypassed by using the Riesz equality [80] which for a given random variable states that

Using the Riesz equality, the asymptotic distortion can be finally written as

expresses the asymptotic distortion in terms of the moments of the modified partition function; however, it does not yet simplify the problem. In fact, one faces two main difficulties when calculating the right hand side of :

Here is where the replica method plays its role. The replica method suggests to determine the moment for an arbitrary non-negative integer as an analytic function in and then assume the two following statements:

  1. The moment function analytically continues from the set of integer numbers onto the real axis (at least for some at a right neighborhood of ) which means that an analytic expression found for integer moments directly extends to all (or some) real moments. Under this assumption, the expression determined for integer moments can be replaced in , and the limit with respect to taken when . This assumption is the main part where the replica method lacks rigor and is known as the “replica continuity”.

  2. In , the limits with respect to and exchange. We refer to this assumption as the “limit exchange”.

In order to employ the replica method, we need to suppose the validity of the above two statements; therefore, we state the following assumption.

By means of Assumption ?, the calculation of asymptotic distortion reduces to the evaluation of integer moments of the modified partition function which is written as

Here, we refer to for as the replicas, and define as the set of replicas. After taking the expectation with respect to and , it is further observed that, in the large limit, the expectation with respect to can be dropped due to the law of large numbers. By inserting the final expression for in and taking the limits, the asymptotic distortion is determined as in Proposition ? given below.

4Main Results

Proposition ? states the general replica ansatz. The term “general” is emphasized here, in order to indicate that no further assumption needs to be considered for derivation. Using Proposition ? along with results in the classical moment problem [77], a general form of the decoupling principle is justified for the estimator.

Before stating the general replica ansatz, let us define the -transform of a probability distribution. Considering a random variable , the corresponding Stieltjes transform over the upper complex half plane is defined as

Denoting the inverse with respect to composition by , the -transform is given by

such that .

The definition can also be extended to matrix arguments. Assume the matrix to have the decomposition where is a diagonal matrix whose nonzero entries represent the eigenvalues of , i.e., , and is the matrix of eigenvectors. is then an matrix defined as

4.1General Replica Ansatz

Proposition ? expresses the macroscopic parameters of the corresponding spin glass, including the asymptotic distortion, in terms of the parameters of a new spin glass of finite dimension. It is important to note that the new spin glass, referred to as “spin glass of replicas”, is different from the corresponding spin glass defined in Definition ?. In fact, the spin glass of replicas is the projection of the corresponding spin glass on the reduced support with indicating the number of replicas. The macroscopic parameters of the spin glass of replicas can therefore readily be determined.

Considering Definition ?, the evaluation of the system parameters such as the replicas’ average distortion or the normalized free energy needs the replica correlation matrix to be explicitly calculated first. In fact, describes a fixed point equation in terms of when one writes out the expectation using the conditional distribution in . The solution can then be substituted in the distribution and the parameters of the system can be calculated via -. The fixed point equation, however, may have several solutions and thus result in multiple outputs for the system. Nevertheless, we express the asymptotic distortion of the estimator in terms of a single output of the spin glass of replicas for which the limits exist and the free energy is minimized.

Proof. The proof is given in Appendix Section 12. However, we explain briefly the strategy in the following.
Starting from , the expectation with respect to the noise term is straightforwardly taken. Using the results from [75], the expectation with respect to the system matrix is further taken as discussed in Appendix Section 15. Then, by considering the following variable exchange,

is determined in terms of the replica correlation matrix . Finally, by employing the law of large numbers, the th moment of the partition function is given as

where with the integral being taken over , tends to zero as and is given by . Moreover, denotes the non-normalized probability weight of the vectors of replicas with a same correlation matrix and is explicitly determined in in Appendix Section 12, and reads

for some diagonal matrix with denoting the trace operator, and being the -transform with respect to . In , the term is a probability measure which satisfies the large deviations property. Using results from large deviations [81], the integral in for large values of can be written as the integrand at the saddle point multiplied by some bounded coefficient which results in

with denoting the asymptotic equivalence in exponential scale5. Consequently, by substituting in , and exchanging the limits with respect to and , as suggested in Assumption ?, the asymptotic distortion is found as in Proposition ? where determines the saddle point of the integrand function in . The free energy is further determined as in by substituting in . Finally by noting the fact that the free energy is minimized at the equilibrium, the proof is concluded.

Proposition ? introduces a feasible way to determine the asymptotics of the estimator; its validity depends only on Assumptions ? and ? and has no further restriction. To pursue the analysis, one needs to solve the fixed point equation for the replica correlation matrix and calculate the parameters of the spin glass of replicas explicitly. The direct approach to find , however, raises both complexity and analyticity issues. In fact, finding the saddle point by searching over the set of all possible choices of is a hard task to do; moreover, several solutions may not be of use since they do not lead to analytic and in , and thus, they cannot be continued analytically to the real axis via Assumption ?.

To overcome both the issues, the approach is to restrict our search into a parameterized set of replica correlation matrices and find the solution within this set. Clearly, the asymptotics found via this approach may fail as several other solutions are missed by restriction. The result, in this case, becomes more trustable by extending the restricted set of replica correlation matrices. Several procedures of restrictions can be considered. The procedures introduced in the literature are roughly divided into and schemes. The former considers the replicas to interact symmetrically while the latter recursively breaks this symmetry in a systematic manner. In fact, the scheme was first introduced due to some symmetric properties observed in the analysis of the spin glasses [82]. The properties, however, do not force the correlation matrix to have a symmetric structure, and later several examples were found showing that leads to wrong conclusions. For these examples, the scheme was further considered as an extension of the symmetric structure of the correlation matrix to a larger set. We consider both the and schemes in this manuscript; however, before pursuing our study, let us first investigate the general decoupling property of the estimator which can be concluded from Proposition ?.

4.2General Decoupling Property of Estimator

Regardless of any restriction on , the general ansatz leads to the decoupling property of the estimator. In fact by using Proposition ?, it can be shown that for almost any tuple of input-output entries, the marginal joint distribution converges to a deterministic joint distribution which does not depend on the entries’ index. The explicit term for the joint distribution, however, depends on the assumptions imposed on the correlation matrix.

Proof. The proof follows from Proposition ? and the moment method [76]. From the classical moment problem, we know that the joint probability of the tuple of random variables are uniquely specified with the sequence of integer joint moments, if the joint moments are uniformly bounded. More precisely, by defining the -joint moment of the tuple as

the sequence of for is uniquely mapped to the probability distribution , if is uniformly bounded for all integers and . Consequently, one can infer that any two tuples of the random variables and with the same sequences of the joint moments are identical in distribution.

To determine the joint moment of input and output entries, consider the distortion function

in Proposition ?, and evaluate the asymptotic distortion over the limit of for some in a right neighborhood of zero. The -joint moment of is then determined by taking the limit .

Substituting the distortion function and the index set in Proposition ?, it is clear that the asymptotic distortion is independent of and , and therefore, the limit with respect to exists and is independent of as well. Noting that the evaluated moments are uniformly bounded, it is inferred that the asymptotic joint distribution of is uniquely specified and does not depend on the index . Finally, by using the fact that the source vector is and the distribution of the entry is independent of the index, we conclude that the asymptotic conditional distribution is independent of . The exact expression for is then found by determining the solution to the fixed point equation and determining the joint moments.

Proposition ? is a generalized form of the decoupling principle for the estimators studied in [38]. In fact, Proposition ? indicates that a vector system followed by a estimator always decouples into a bank of identical single-user systems regardless of any restriction on the replica correlation matrix.

4.3Consistency Test

If one restricts the replica correlation matrix to be of a special form, the asymptotics determined under the assumed structure do not necessarily approximate true asymptotics accurately. Several methods were introduced in the literature to check the consistency of the solution. A primary method is based on calculating the entropy of the corresponding spin glass at zero temperature. As the temperature tends to zero, the distribution of the microstate tends to an indicator function at the point of the estimated vector, and consequently, the entropy of the corresponding spin glass converges to zero6. One consistency check is therefore the zero temperature entropy of a given solution.

Several works invoked this consistency test and showed that for the settings in which the ansatz fails to give a tight bound on the exact solution, the zero temperature entropy determined from the ansatz does not converge to zero; see for example [43]. This observation illustrates the invalidity of the assumption and hints at ansätz giving better bounds on the true solution. Inspired by the aforementioned results, we evaluate the zero temperature entropy of the corresponding spin glass as a measure of consistency.

In order to determine the zero temperature entropy, we invoke which determines the entropy in terms of the free energy at inverse temperature . Considering the free energy of the corresponding spin glass as given in Proposition ?, the entropy reads

where denotes the normalized entropy of the spin glass of replicas. As determines the entropy of a thermodynamic system, for any we have

and therefore, the zero temperature entropy is given by

which obviously depends on the structure of the replica correlation matrix. In [43], the authors determined the zero temperature entropy for the spin glass which corresponds to the vector precoding problem considering the and 1 assumptions, and observed that it takes the same form under both assumptions. They then conjectured that the zero temperature entropy is of a similar form for the general structure regardless of the number of breaking steps7. Using , we later show that the conjecture in [43] is true.

5RS Ansatz and RS Decoupling Principle

The most elementary structure which can be imposed on the replica correlation matrix is . Here, one assumes the correlation matrix to be of a symmetric form which means that the replicas of the spin glass defined in Definition ? are invariant under any permutation of indices. Using the definition of the replica correlation matrix as given in , it consequently reads that

Considering , Assumption ? supposes and . Substituting in Definition ?, the spin glass of replicas is then specified by the scalars and . The scalars are moreover related via a set of saddle point equations which are obtained from . Finally, using Proposition ?, the asymptotics of the system are found.

Proof. See Appendix Section 13.

The asymptotic distortion under the ansatz is equivalent to the average distortion of a scalar channel followed by a single-user estimator as shown in Figure 1. In this block diagram, the single-user estimator maximizes the posterior probability over a postulated scalar channel. We refer to this estimator as the “decoupled estimator” and define it as follows.

Figure 1: The decoupled single-user system under the  ansatz.
Figure 1: The decoupled single-user system under the ansatz.

Using the definition of the decoupled estimator, the decoupled system is defined next.

Using Proposition ?, the equivalency in the asymptotic distortion can be extended to the asymptotic conditional distribution as well. In fact, by considering the decoupling principle, Definition ? describes the structure of the decoupled single-user system under the assumption.

Proof. Using Proposition ?, for any two different indices we have

at the mass point . Therefore for any index , we have

Consequently, the asymptotic -joint moment of under the assumption is determined by letting and the distortion function in the ansatz

and determining the asymptotic distortion. Substituting in Proposition ?, the asymptotic joint moment reads

where is defined in . Considering Definition ? and assuming , describes the -joint moment of as well. Noting that is uniformly bounded for any pair of integers and , it is concluded that the asymptotic joint distribution of and the joint distribution of are equivalent.

Proposition ? gives a more general form of the decoupling principles investigated in [38] and [39]. In fact, by restricting the system matrix and source distribution as in [38] and [39], one can recover the formerly studied decoupling principles.

RS Zero Temperature

To have a basic measure of ansatz’s consistency, we evaluate the zero temperature entropy under the assumption following the discussion in Section 4.3. Substituting in and taking the same steps as in Appendix Section 13, the zero temperature entropy is determined as

where the function is defined as

Taking the derivative first and then the limit, it finally reads

We later on see that the zero temperature entropy takes the same form under the assumptions.

6RSB Ansätze and RSB Decoupling Principle

In [83], Parisi introduced a breaking scheme to broaden the restricted set of correlation matrices. The scheme recursively extends the set of matrices to larger sets. The breaking scheme was then employed to broaden the structure of the correlation matrices, and therefore, the obtained structure was identified as the broken or, structure. The key feature of Parisi’s breaking scheme is that, by starting from the structure, the new structure after breaking can be reduced to the structure before breaking. Thus, the set of fixed point solutions found by assuming the broken structure includes the solutions of the previous structure as well.

By choosing to be an correlation matrix in Definition ?, the matrix finds the structure with one step of breaking (1). The steps of breaking can be further increased recursively by inserting in the next breaking scheme, determining the new correlation matrix , and repeating the procedure. We start by the 1 correlation matrix, and then, extend the result to higher ansätze with more steps of breaking.

Regarding Parisi’s breaking scheme, Assumption ? considers by letting have the structure with parameters and , and . Here, the 1 structure reduces to by setting . Therefore, the set of 1 correlation matrices contains the one considered in Assumption ?.

Proof. See Appendix Section 14.

Similar to our approach under the ansatz, we employ Proposition ? to introduce the decoupled 1 single-user system which describes the statistical behavior of the estimator’s input-output entries under 1 assumption. The decoupled system under 1 differs from within a new impairment term which is correlated with the source and noise symbols through a joint distribution. The impairment term intuitively plays the role of a correction factor which compensates the possible inaccuracy of the ansatz. The decoupled estimator, however, follows the same structure as for .

Figure 2: The decoupled scalar system under the 1 ansatz.
Figure 2: The decoupled scalar system under the 1 ansatz.

Proof. The proof takes exactly same steps as for the decoupling principle using Proposition ?.

The 1 decoupled system, in general, provides a more accurate approximation of the estimator’s asymptotics by searching over a larger set of solutions which include the ansatz. To investigate the latter statement, consider the case of . In this case, the 1 structure reduces to . Setting in Proposition ?, becomes zero, and consequently . Moreover, the fixed point equations in hold for any choice of , and the scalars and couple through the same set of equations as in the ansatz. The zero temperature free energy of the system, furthermore, reduces to its form under the assumption of . Denoting the parameters of the ansatz by , it is then concluded that is a solution to the 1 fixed point equations, when an stable solution to the fixed point exists. The solution, however, does not give necessarily the 1 ansatz, the stable solution to the 1 fixed point equations with minimum free energy may occur at some other point. We investigate the impact of replica breaking for some examples later through numerical results.

Parisi’s breaking scheme can be employed to extend the structure of the correlation matrix to the structure with more steps of breaking by recursively repeating the scheme. In fact, one can start from an structured and employ the breaking scheme for steps to determine the correlation matrix . In this case, the replica correlation matrix is referred to as the correlation matrix.

Considering the correlation matrix in Proposition ? to be of the form indicated in Assumption ? the previous ansätze are extended to a more general ansatz which can reduce to the 1 as well as ansatz. Proposition ? expresses the replica ansatz under the assumption.

Proof. See Appendix Section 15.

One can simply observe that Proposition ? reduces to Propositions ? and ? by letting and for , respectively. The ansatz, moreover, extends the corresponding decoupled single-user system of the estimator considering the general decoupling principle investigated in Proposition ?. By taking the same steps as in Proposition ?, the decoupled single-user system is found which represents the extended version of the 1 system with additive impairment taps. In fact, considering the impairment terms to intuitively play the role of correction factors, the ansatz takes more steps of correction into account. The decoupled estimator, moreover, remains unchanged.

Figure 3: The decoupled scalar system under the  ansatz.
Figure 3: The decoupled scalar system under the ansatz.

Proof. Using Proposition ?, it takes same steps as for Proposition ?.

RSB Zero Temperature

In Appendix Section 15, it is shown that under the assumption on the replica correlation matrix the free energy of the corresponding spin glass at the inverse temperature reads

Here, denotes the normalized free energy of the spin glass of replicas defined in in the limit , and the function is defined as

Following the discussion in Section 4.3, the entropy at the zero temperature reads

which reduces to

justifies the conjecture in [43] and states that the zero temperature entropy under any number of breaking steps, including the case, is of the similar form and only depends on the scalar . In fact, the Hamiltonian in reduces to the one considered in vector precoding by considering to be the deterministic vector of zeros, , and . Substituting in , the zero temperature entropy reduces to the one determined in [43] within a factor of . The factor comes from the difference in the prior assumption on the support of microstate8.

7Replica Simulator: Characterization via the Single-User Representation

The general decoupling principle determines an equivalent single-user system which describes the input-output statistics of the estimator under the ansatz. In order to specify the exact parameters of the decoupled single-user system, the set of fixed point equations needs to be solved explicitly. In this section, we propose an alternative approach which describes an ansatz in terms of the corresponding decoupled system’s input-output statistics. We define the exact form of the decoupled system as the “steady state” of a transition system named “replica simulator”. The proposed approach enables us to investigate the properties of the and ansätze by studying the replica simulator. In order to clarify the idea of the replica simulator, let us define a set of input-output statistics regarding the decoupled system.

Invoking Definition ?, the ansatz can be completely represented in terms of the input-output statistics of the decoupled system. In fact, by means of Definition ?, the fixed point equations in - can be expressed as

for ; moreover, the factor is given as

which reduces to

with indicating the decoupled posterior distribution defined in Definition ?. The second term on the right hand side of is an extended form of the likelihood ratio. By defining

reads

and for are determined by

where are recursively defined as

The fixed point in is therefore rewritten accordingly.

The above alternative representation of the ansatz leads us to a new interpretation. In fact, one can define a transition system in which the vector of replica parameters denotes the state, and the decoupled single-user system defines the transition rule [84]. We refer to this transition system as the “replica simulator”, and define it formally as the following.

Figure 4: Replica Simulator of breaking order b
Figure 4: Replica Simulator of breaking order

The structure of the replica simulator is illustrated in Figure 4. For the replica simulator of breaking order , a sequence of states is considered to be a “process”, if for

The state is then called the “steady state”, if setting results in . Regarding Proposition ?, the ansatz is in fact the steady state of the replica simulator which minimizes the free energy function. Our conclusion also extends to the case, if we set .

Considering Definition ?, as well as the above discussions, the ansatz can be numerically investigated using the methods developed in the literature of transition systems. This approach may reduce the complexity of numerical analysis; however, it does not impact the computational complexity9. In fact, assuming that one realizes the decoupled system for any desired state vector denoted in via some methods of realization, e.g., Monte Carlo simulation, the ansatz can be found by means of an iterative algorithm which has been designed to find the steady state of a transition system. The latter statement can be clarified as in Scheme ?.

In Scheme ?, in step A can be realized via different methods. One may determine the input-output distribution of the single-user system analytically or simulate the system by generating impairment and source samples numerically via Monte Carlo technique. Another degree of freedom is in step B where different mapping rules with different convergence speeds can be employed. For algorithms designed based on Scheme ?, the computational complexity depends on the realization method while the convergence speed is mainly restricted with some given mapping rule .

The replica simulator introduces a systematic approach for investigating the replica anätze based on the decoupling principle. Moreover, it gives an intuition about the impact of symmetry breaking. To clarify the latter statement, let us consider an example.

Example.

( vs. 1 ansatz) Let ; thus, the fixed point equations read

The equations under the 1 assumption are moreover given by

and is determined through the fixed point equation

where and denote the mutual information and Kullback-Leibler divergence, respectively. Assuming the system matrix to be and setting to be independent of and , the right hand side of tends to zero, and therefore, the solutions and are concluded. Consequently, becomes ineffective, and the fixed point equations reduce to . The latter observation can be interpreted in terms of the “state evolution” of the replica simulator. More precisely, assume that the initial state of the replica simulator with breaking order one is chosen such that in the corresponding decoupled system, is sufficiently correlated with the source and noise symbols. In this case, by assuming the mapping rule to be converging, the correlation in each iteration of Scheme ? reduces, and thus, at the steady state, becomes independent of and .

The above discussion can be extended to replica simulators with larger breaking orders. Moreover, further properties of the ansätze could be studied using methods developed in the literature of transition systems10. We leave the further investigations as a possible future work.

8Large Compressive Sensing Systems

Considering the setting represented in Section 2, a large compressive sensing system can be studied through our results by restricting the source’s to be of the form

In the large limit, the source vector distributed as has entries equal to zero while the remaining entries are distributed with . In this case, is an -sparse vector, and thus, is considered to represent a large compressive sensing system with the sensing matrix .

Considering the prior as in , different recovery schemes are then investigated by restricting the prior setup of the system, correspondingly. In this section, we study the asymptotics of several recovery schemes using our decoupling principle for both the cases of continuous and finite alphabet sources.

8.1Continuous Sources

Assuming , describes a continuous random variable multiplied by an -Bernoulli random variable. In this case, by varying the utility function , different reconstruction schemes are considered. Here, we address the linear, LASSO and norm recovery schemes. The results can however be employed to investigate a general norm recovery scheme [61].
Example . (linear recovery scheme) The estimation is reduced to the linear recovery scheme when the utility function is set to be

In fact, in this case, the estimator postulates the prior distribution to be a zero-mean and unit-variance Gaussian and performs considerably inefficient when the source is sparse. Using the decoupling principle, we conclude that in the large-system limit the source entry and the estimated entry , for any , converge in probability to a sparse random variable distributed as in and the estimated symbol