On sensing capacity of sensor networks for the class of linear observation, fixed SNR models

# On sensing capacity of sensor networks for the class of linear observation, fixed SNR models

Shuchin Aeron, Manqi Zhao, and Venkatesh Saligrama The authors are with the department of Electrical and Computer Engineering at Boston University, MA -02215. They can be reached at {shuchin, mqzhao, srv}@bu.edu
###### Abstract

In this paper we address the problem of finding the sensing capacity of sensor networks for a class of linear observation models and a fixed SNR regime. Sensing capacity is defined as the maximum number of signal dimensions reliably identified per sensor observation. In this context sparsity of the phenomena is a key feature that determines sensing capacity. Precluding the SNR of the environment the effect of sparsity on the number of measurements required for accurate reconstruction of a sparse phenomena has been widely dealt with under compressed sensing. Nevertheless the development there was motivated from an algorithmic perspective. In this paper our aim is to derive these bounds in an information theoretic set-up and thus provide algorithm independent conditions for reliable reconstruction of sparse signals. In this direction we first generalize the Fano’s inequality and provide lower bounds to the probability of error in reconstruction subject to an arbitrary distortion criteria. Using these lower bounds to the probability of error, we derive upper bounds to sensing capacity and show that for fixed SNR regime sensing capacity goes down to zero as sparsity goes down to zero. This means that disproportionately more sensors are required to monitor very sparse events. We derive lower bounds to sensing capacity (achievable) via deriving upper bounds to the probability of error via adaptation to a max-likelihood detection set-up under a given distortion criteria. These lower bounds to sensing capacity exhibit similar behavior though there is an SNR gap in the upper and lower bounds. Subsequently, we show the effect of correlation in sensing across sensors and across sensing modalities on sensing capacity for various degrees and models of correlation. Our next main contribution is that we show the effect of sensing diversity on sensing capacity, an effect that has not been considered before. Sensing diversity is related to the effective coverage of a sensor with respect to the field. In this direction we show the following results (a) Sensing capacity goes down as sensing diversity per sensor goes down; (b) Random sampling (coverage) of the field by sensors is better than contiguous location sampling (coverage). In essence the bounds and the results presented in this paper serve as guidelines for designing efficient sensor network architectures.

## I Introduction

In this paper we study fundamental limits to the performance of sensor networks for a class of linear sensing models under a fixed SNR regime. Fixed SNR is an important and necessary ingredient for sensor network applications where the observations are inevitably corrupted by external noise and clutter. In addition we are motivated by sensor network applications where the underlying phenomena exhibits sparsity. Sparsity is manifested in many applications for which sensor networks are deployed, e.g. localization of few targets in a large region, search for targets from among a large number of sites e.g. land mine detection, estimation of temperature variation for which few spline coefficients may suffice to represent the field , i.e. phenomena is sparse under a suitable transformation. More recent applications such as that considered in [1] also involve imaging a sparse scattering medium.

The motivation for considering linear sensing models comes from the fact that in most cases the observation at a sensor is a superposition of signals that emanate from different sources, locations etc. For e.g., in seismic and underground borehole sonic applications, each sensor receives signals that is a superposition of signals arriving from various point/extended sources located at different places. In radar applications [1, 2], under a far field assumption the observation system is linear and can be expressed as a matrix of steering vectors. In this case the directions becomes the variable space and one looks for strategies to optimally search using many such radars. Statistical modulation of gain factors in different directions is feasible in these scenarios and is usually done to control the statistics of backscattered data. In other scenarios the scattering medium itself induces random gain factors in different directions.

In relation to signal sparsity compressive sampling, [3, 4] has shown to be very promising in terms of acquiring minimal information, which is expressed as minimal number of random projections, that suffices for adequate reconstruction of sparse signals. Thus in this case too, the observation model is linear. In [5] this set-up was used in a sensor network application for realizing efficient sensing and information distribution system by combining with ideas from linear network coding. Also it was used in [6] to build a wireless sensor network architecture using a distributed source-channel matched communication scheme.

For applications related to wireless sensor networks where power limited sensors are deployed, it becomes necessary to compress the data at each sensor. For e.g. consider a parking surveillance system where a network of wireless low resolution cameras are deployed, [7]. With each camera taking several snapshots in space and transmitting all of them to a base station will overwhelm the wireless link to the base station. Instead transmission overhead is significantly reduced by sending a weighted sum of the observations. An illustration is shown in figure 1. A similar set-up was also considered in [8] for a robotic exploration scenario.

Motivated by the scenarios considered above we start with sensing (observation) models where at a sensor the information about the signal is acquired as a projection of the signal onto a weight vector. Under this class of observation model, the sensing model is linear and is essentially a matrix, chosen from some appropriate class particular to the application. In this work we consider a fixed model (see also [9]) where the observations at sensors for the signal are given by,

 Y=√SNRGX+N (1)

where each row of the matrix is restricted to have a unit norm and where is the noise vector with unit noise power in each dimension. It is important to consider fixed SNR scenario particularly for applications related to sensor networks. Practically each sensor is power limited. In an active sensing scenario the sensors distribute this power to sense different modalities, or to look (beamform) in various directions. Thus we restrict the norm of each row of to be unity and then scale the system model appropriately by . For a networked setting we assume that the observations made at the sensors are available for processing at a centralized location or node. In case when this is infeasible or costly, information can be exchanged or aggregated at each sensor using distributed consensus type algorithms, such as that studied in [10].

In order utilize the information theoretic ideas and tools, we adopt a Bayesian perspective and assume a prior distribution on . Another motivation for considering a Bayesian set-up is that one can potentially model classification/detection scenarios where prior information is usually available and is useful. Note that under some technical conditions it can be shown that a lower bound to the Bayesian error is also lower bound to worst case probability of error for the parametric set-up. Therefore the lower bounds presented in this paper also provide lower bounds to the parameter estimation problem.

In this paper we capture the system performance via evaluating asymptotic upper and lower bounds to the ratio such that reconstruction to within a distortion level is feasible. We call the ratio as sensing capacity : the number of signal dimensions reliably identified per projection (sensor). This term was coined in [11] in the context of sensor networks for discrete applications. Alternatively, bounds to can be interpreted as providing scaling laws for the minimal number of sensors/projections required for reliable monitoring/signal reconstruction.

For a signal sparsity level of , a different ratio of also seems to be a reasonable choice, but in most cases is unknown and needs to be determined, e.g., target density, or sparsest signal reconstruction. Here it is important to penalize false alarms, misclassification costs. Furthermore, and are known and part of the problem specification, while signal complexity is governed by , and one of our goals is to understand performance as a function of signal complexity. In this paper we show that sensing capacity is also a function of signal sparsity apart from .

The upper bounds to are derived via finding lower bounds to the probability of error in reconstruction subject to a distortion criteria, that apply to any algorithm used for reconstruction. The achievable (lower) bounds to are derived via upper bounding the probability of error in a max-likelihood detection set-up over the set of rate distortion quantization points. Since most of the development for these classes of problems has been algorithmic, [3, 9], our motivation for the above development is driven by the need to find fundamental algorithm independent bounds for these classes of problems. In particular, under an i.i.d model on the components of that models a priori information, e.g. sparsity of , and letting denote the reconstruction of from , then we show that,

 Pr(1nd(^X(Y),X)≥d0)≥RX(d0)−K(d0,n)−1nI(X;Y|G)RX(d0)−o(1) (2)

for some appropriate distortion measure and where is the corresponding scalar rate distortion function; is bounded by a constant and it depends on the number of neighbors of a quantization point in an optimal dimensional rate distortion mapping.

Next, we consider the effect of structure of on the performance. Using the result on the lower bound on the probability of error given by equation (2), a necessary condition is immediately identified in order that the reconstruction to within an average distortion level is feasible, which is, . For a fixed prior on the performance is then determined by the mutual information term that in turn depends on . This motivates us to consider the effect of the structure of on the performance and via evaluation of for various ensembles of we quantify the performance of many different scenarios that restrict the choice of for sensing. Under the case when is chosen independently of and randomly from an ensemble of matrices (to be specified later in the problem set-up), we have

 I(X;Y,G) = I(X;G)=0+I(X;Y|G) (3) = I(X;Y)+I(X;G|Y) (4) ⇒I(X;Y|G) = I(X;Y)+I(X;G|Y) (5)

This way of expanding allow us to isolate the effect of structure of the sensing matrix on the performance which in principle influences bounds on through the change in mutual information as captured via the equations 3-5 and as applied to satisfy the necessary conditions prescribed by the lower bound in equation (2).

Using the above idea, in this paper we will show the effect of sensing diversity on the performance, a concept which is explained next. Under the sensing model as prescribed above, at each sensor one can relate each component of the corresponding projection vector as contributing towards diversity in sensing. The total number of non-zero components in the projection vector is called sensing diversity. This terminology is analogous to that used in MIMO systems in the context of communications. As will be shown later on that loss in sensing capacity is not very significant at reasonable levels of sensing diversity (with randomization in sampling per sensor). In fact there is a saturation effect that comes into play, which implies that most of the gains can be obtained at diversity factor close to . Now if one considers the noiseless case, i.e. , then it was shown in [3] that for some and for some sparsity as a function of and the coherence of the sensing matrix, an optimization problem :

yields exact solution. To this end note that if is sparse then solving the above system is computationally faster as is shown in [12].

There are other types of modalities that arise in the context of resource constrained sensor networks. As an example consider the application in [7] where each camera may be physically restricted to sample contiguous locations in space or under limited memory it is restricted to sample few locations, possibly at random. This motivates us to consider other structures on under such modalities of operation. In this paper we will contrast random sampling and contiguous sampling and show that random sampling is better than contiguous sampling. In such scenarios it becomes important to address a coverage question and in some cases may lead to a poor performance. In highly resource constrained scenarios randomization in elements of is not feasible. In this direction we also consider an ensemble of matrices, with and without randomization in the locations of non-zero entries in each row. To facilitate the reading of the paper we itemize the organization as follows.

• We present the problem set-up in section II where we make precise the signal models and the ensembles of sensing matrices that will be considered in relation to different sensor networking scenarios.

• In section III we will present the lower bounds to the probability of error in reconstruction subject to an average distortion criteria. The development is fairly general and is self-contained.

• In section IV we will present a constructive upper bound to the probability of error in reconstruction subject to an average distortion criteria. The development there is particular to the fixed SNR linear sensing model that is the subject of the present paper, though the ideas are in general applicable to other sensing models and to other classes of distortion measures.

• Once we establish the upper and lower bounds, we will use the results to obtain upper and lower bounds to sensing capacity for the fixed SNR linear sensing models, in sections V and VI. In these sections we will consider the full diversity Gaussian ensemble for sensing matrix. The motivation to consider this model is that the mutual information and moment generating functions are easier to evaluate for the Gaussian ensemble. This is thus useful to gain initial insights into the tradeoffs of signal sparsity and SNR.

• Since the bounds to sensing capacity can be interpreted as providing bounds for number of projections/sensors for reliable monitoring, in section VII we will compare the scaling implied by bounds to sensing capacity to that obtained in [9] in the context of complexity penalized regularization framework.

• In section VIII we consider the effect of the structure of the sensing matrix on sensing capacity. The section is divided into several subsections. We begin by considering the effect of sensing diversity on sensing capacity. Following that we consider the effect of correlation in the columns of on achievable sensing capacity. Then we consider a very general case of a deterministic sensing matrix and via upper bounding the mutual information we comment on the performance of various types of sensing architectures of interest.

• In section IX we consider the ensemble for sensing matrices and provide upper bounds to sensing capacity for various modalities in sensing.

• In section X we give an example of how our methods can be extended to handle cases when one is interested in reconstruction of functions of rather than itself. In this direction we will consider the case of recovery of sign patterns of .

## Ii Problem Set-up

Assume that the underlying signal lies in an n-dimensional space , where can be discrete or continuous. Discrete models scenarios of detection or classification and continuous models scenarios of estimation.

#### Fixed SNR model

: The observation model for the sensors is a linear observation model and is given by,

 Y=√SNRGX+N (6)

which is the fixed model as described in the introduction. The matrix is a random matrix selected from an ensemble which we will state subsequently. For all each row of is restricted to have a unit norm. The noise vector is i.i.d. Gaussian unit variance in each dimension.

### Ii-a Discussion about fixed SNR model

At this point it is important to bring out an important distinction of the assumption and subsequently analysis of a fixed SNR model in contrast to similar scenarios considered but in albeit high SNR setting. The observation model of equation 1 studied in this paper is related to a class of problems that have been central in statistics. In particular it is related to the problem of regression for model order selection. In this context the subsets of columns of the sensing matrix form a model for signal representation which needs to be estimated from the given set of observations. The nature selects this subset in a weighted/non-weighted way as modeled by . The task is then to estimate this model order and thus . In other words estimate of in most cases is also linked to the estimate of the model order under some mild assumptions on . Several representative papers in this direction are [13, 14, 15] that consider the performance of several (signal) complexity penalized estimators in both parametric and non-parametric framework. One of the key differences to note here is that the analysis of these algorithms is done for the case when , i.e. in the limit of high SNR which is reflected by taking the additive noise variance to go to zero or not considering the noise at all. However SNR is an important and necessary ingredient for applications related to sensor networks and therefore we will not pursue a high SNR development here. Nevertheless the results obtained are directly applicable to such scenarios.

In the next section we will first outline prior distribution(s) on , that reflect the sparsity of the signal and the model for realizing sensing diversity in the sensing matrix . Then we will outline the choices of ensembles for the sensing matrix . In the following denotes the Gaussian distribution with mean and variance .

### Ii-B Generative models of signal sparsity and sensing diversity

#### Signal sparsity

In a Bayesian set-up we model the sparsity of the phenomena by assuming a mixture distribution on the signals . In particular the dimensional vector is a sequence drawn i.i.d from a mixture distribution

 PX=αN(m1,σ21)+(1−α)N(m0,σ20)

where . In this paper we consider two cases.

1. Discrete Case: and and . This means that is a Bernoulli sequence. This models the discrete case for addressing problems of target localization, search, etc.

2. Continuous Case: but and . This models the continuous case.

In this context we call the sparsity ratio which is held fixed for all values of . Under the above model, on an average the signal will be sparse where . Note that as .

#### Sensing diversity and ensemble for G

In connection to the model for diversity, the sensing matrix is random matrix such that for each row , are distributed i.i.d according to a mixture distribution, . We consider three cases:

1. Gaussian ensemble: and

2. Deterministic : The matrix is deterministic.

3. ensemble: and .

The matrix is then normalized so that each row has a unit norm. In this context we call as the (sensing) diversity ratio. Under the above model, on an average each sensor will have a diversity of . Note that as . Given the set-up as described above the problem is to find upper and lower bounds to

 C(d0)=limsup{nm:Pr(1nd(^X(Y),X)>d0)→0}

where is the reconstruction of from observation and where for some distortion measure defined on . In this paper we will consider Hamming distortion measure for discrete and squared distortion measure for the continuous . Under this set-up we exhibit the following main results:

1. Sensing capacity is also a function of , signal sparsity and sensing diversity.

2. For a fixed SNR sensing capacity goes to zero as sparsity goes to zero.

3. Low diversity implies low sensing capacity.

4. Correlations across the columns and across the rows of leads to decrease in sensing capacity.

5. For the ensemble for sensing matrices, sensing capacity for random sampling is higher than for contiguous sampling.

In the next section we will provide asymptotic lower bounds on the probability of error in reconstruction subject to a distortion criteria. Following that we will provide a constructive upper bound to the probability of error. We will then use these results to evaluate upper and lower bounds to sensing capacity. In the following we will use and interchangeably.

## Iii Bounds to the performance of estimation algorithms: lower bounds

###### Lemma III.1

Given observation(s) for the sequence of random variables drawn i.i.d. according to . Let be the reconstruction of from . Also is given a distortion measure then,

 Pr(1nd(^Xn(Y),Xn)≥d0)≥RX(d0)−K(d0,n)−1nI(Xn;Y)RX(d0)−o(1)

where is bounded by a constant and where is the corresponding (scalar) rate distortion function for .

{proof}

See Appendix. Essentially, neighbors of a quantization point in an optimal n-dimensional rate-distortion mapping). NOTE: The assumption of a scalar valued process in lemma III.1 is taken for the sake of simplicity. The results are easily generalizable and can be extended to the case of vector valued processes.

For the simpler case of discrete parameter space, the lower bound to the minimax error in a parameter estimation framework is related to the Bayesian error as follows,

 min^X(Y)maxX∈ΘPr(1nd(X,^X(Y))≥d0) = min^X(Y)maxPΘ∈Pθ∑X∈ΘP(X)Pr(1nd(X,^X(Y))≥d0) (7) ≥ min^X(Y)∑X∈Θπ(X)Pr(1nd(X,^X(Y))≥d0) (8)

where is the parameter space and is the class of probability measures over and is any particular distribution. The above result holds true for the case of continuous parameter space under some mild technical conditions. Thus a lower bound to the probability of error as derived in this paper also puts a lower bound on the probability of error for the parametric set-up. In our set-up we will choose as a probability distribution that appropriately models the a priori information on , e.g. signal sparsity. For modeling simple priors such as sparsity on one can choose distributions that asymptotically put most of the mass uniformly over the relevant subset of and is a key ingredient in realization of the lower bound on probability of error derived in this paper.

We have the following corollary that follows from lemma III.1.

###### Corollary III.1

Let be an i.i.d. sequence where each is drawn according to some distribution and , where is finite. Given observation about we have,

 Pr(Xn≠^Xn(Y))≥H(X)−1nI(Xn;Y)−1/nH(X)+o(1)−o(1)

### Iii-a Tighter bounds for discrete X under hamming distortion

The results in the previous section can be stated for any finite without resorting to the use of AEP for the case of discrete alphabets, with hamming distortion as the distortion measure and for certain values of the average distortion constraint . We have the following lemma.

###### Lemma III.2

Given observation(s) for the sequence of random variables drawn i.i.d. according to . Then for hamming under distortion measure , for and for distortion levels, ,

 Pr(1ndH(Xn,^Xn(Y)≥d0))≥nRX(d0)−I(Xn;Y)−1nlog(|X|)−n(h(d0)+d0log(|X|−1))
{proof}

See Appendix.

### Iii-B Comment on the proof technique

The proof of lemma III.1 closely follows the proof of Fano’s inequality [16], where we start with a distortion error event based on and then evaluate conditional entropy of a rate-distortion mapping conditioned on the error event and the observation . To bound , we use results in [17] for the case of squared distortion measure.

In relation to the lower bounds presented in this paper for the probability of reconstruction subject to an average distortion level one such development was considered in [18] in the context of a non-parametric regression type problem. Let be an element of the metric space . Then given for some random or non-random vectors and being the responses to these vectors under . Also is given the set of conditional pdfs given by where the notation means that that the pdfs are parametrized by . The task is to find a lower bound on the minimax reconstruction distortion under measure , in reconstruction of given and . In our case one can identify and with squared metric . For such a set-up lower bounds on the asymptotic minimax expected distortion in reconstruction (not the probability of such an event) was derived in [18] using a variation of Fano’s bound (see [19]) under a suitable choice of worst case quantization for the parameter space meterized with distance.

Our derivation has a flavor of this method in terms of identifying the right quantization, namely the rate distortion quantization for a given level of average distortion in a Bayesian setting. Although we evaluate the lower bounds to the probability of error and not the expected distortion itself, the lower bound on the expected distortion in reconstruction follows immediately. Moreover our method works for any distortion metric , though in this paper we will restrict ourselves to cases of interest particular to sensor networks applications.

## Iv Constructive upper bound to the probability of error

In this section we will provide a constructive upper bound to the probability of error in reconstruction subject to an average squared distortion level. Unlike the lower bounds in this section we will provide upper bounds for the particular observation model of equation (6). This could potentially be generalized but we will keep our focus on the problem at hand.

To this end, given and , assume that we are given the functional mapping (or ) that corresponds to the minimal cover at average distortion level as given by lemma XI.2. Upon receiving the observation the aim is to map it to the index corresponding index , i.e. we want to detect which distortion ball the true signal belongs to. Clearly if is not typical there is an error. From lemma XI.1, the probability of this event can be bounded by an arbitrary for a large enough n. So we will not worry about this a-typical event in the following.

Since all the sequences in the typical set are equiprobable, we covert the problem to a max-likelihood detection set-up over the set of rate-distortion quantization points given by the minimal cover as follows. Given we and the rate distortion points corresponding to the functional mapping , we enumerate the set of points, . Then given the observation we map to the nearest point (in ) . Then we ask the following probability,

 Pr(√SNRGf(X)→√SNRGf(X′)|G,X∈Bi,X′∈Bj:1ndset(Bi,Bj)≥2d0)\par

that is, we are asking what is the probability that the in typical max-likelihood detection set-up we will map signals from distortion ball to signals in distortion ball that is at an average set distance from , where . For sake of brevity we denote the above probability via to reflect it as a pairwise error probability. Since the noise is additive Gaussian noise we have

 Pe(pair)=Pr(NTG(X−X′)≥12√SNR||G(X−X′)||2:X∈Bi,X′∈Bj)
 Pe(pair)=Pr(NTG(X−X′)||G(X−X′)||≥√SNR2||G(X1−X2)||||G(X−X′)||2:X∈Bi,X′∈Bj)

Since noise is AWGN noise with unit variance in each dimension, its projection onto the unit vector is also Gaussian with unit variance. Thus we have

 Pe(pair)=Pr(N≥√SNR2||G(X−X′)||:X∈Bi,X′∈Bj)

By a standard approximation to the (error) function, we have that,

 Pe(f(X)→f(X′)|X∈Bi,X′∈Bj,G:1ndset(Bi,Bj)≥2d0)≤exp{−SNR||G(X−X′)||24}

In the worst case we have the following bound,

 Pe(f(X)→f(X′)|X∈Bi,X′∈Bj,G:1ndset(Bi,Bj)≥2d0)≤exp{−minX∈Bi,X′∈BjSNR||G(X−X′)||24}

Now note that from above construction it implies that the average distortion in reconstruction of is bounded by if the distortion metric obeys triangle inequality. To evaluate the total probability of error we use the union bound to get,

We will use this general form and apply it to particular cases of ensembles of the sensing matrix . In the following sections we begin by providing upper and lower bounds to the sensing capacity for the Gaussian ensemble for full diversity.

## V Sensing Capacity: Upper bounds, Gaussian ensemble

### V-a Discrete X, full diversity, Gaussian ensemble

For this case we have the following main lemma.

###### Lemma V.1

Given drawn Bernoulli  and chosen from the Gaussian ensemble. Then, with the distortion measure as the hamming distortion, for a diversity ratio of and for , the sensing capacity is upper bounded by

 C(d0)≤12log(1+αSNR)RX(d0)
{proof}

From lemma III.2 the probability of error is lower bounded by zero if the numerator in the lower bound is negative, this implies for any that

Since is random we take expectation over . It can be shown that the mutual information

 EGI(Xn;Y|G)≤maxPX:∑1nEX2i≤α12EGlogdet(Im×m+GXXTGT)

where are singular values of . Since rows of have a unit norm . Hence . Thus the result follows.

### V-B Continuous X, full diversity, Gaussian ensemble

###### Lemma V.2

Given drawn i.i.d. according to and chosen from the Gaussian ensemble. Then, for squared distortion measure, for diversity ratio and for , the sensing capacity obeys,

 C(d0)≤12log(1+αSNR)H(α)+α2logα2d0
{proof}

From lemma V.1 we have that . In order that the probability of error be lower bounded by zero, from lemma III.1 it follows that asymptotically

 nm≤EGI(X;Y|G)RX(d0)−K(d0,n)

It can be shown that with very small for large enough , see e.g. [17]. The lemma then follows by plugging in the results from section XI-C.

It can be easily seen that as the sensing capacity goes to zero. We illustrate this by plotting the upper bounds in figure 2 for the discrete case. We will revisit this phenomena in section VII in relation to the bounds derived in [5] in the context of compressed sensing.

## Vi Sensing Capacity: Lower bounds, Gaussian ensemble

### Vi-a Discrete alphabet, full diversity

The discrete with hamming distortion is a special case where we can provide tighter upper bounds. The proof follows from the development in section IV and identifying that for the discrete case one can choose the discrete set of points instead of the distortion balls. We have the following lemma.

###### Lemma VI.1

Given with , for and chosen from a Gaussian ensemble. Then for , a sensing capacity of

 C(d0)=12log(1+SNRd02)H(X)−d0log|X−1|−d0log1d0

is achievable in that the probability of error goes down to zero exponentially for choices of for any .

{proof}

We have

 Pr(1nd(X,^X(Y))≥d0|G)≤exp{−SNR||G(X−X′)||24}2nH(X)−nd0log|X−1|−log(nnd0)

where we have applied the union bound to all the typical sequences that are outside the hamming distortion ball of radius . Taking the expectation with respect to we get,

 Pr(1nd(X,^X(Y))≥d0)≤EGexp{−SNR||G(X−X′)||24}2nH(X)−nd0log|X−1|−log(nnd0)

Now note that since is a Gaussian random matrix where each row has a unit norm, is a sum of independent random variables with mean . Thus from the moment generating function of the random variable we get that,

 Pr(1nd(X,^X(Y))≥d0)≤⎛⎜ ⎜⎝11+SNR||X−X′||22n⎞⎟ ⎟⎠m/22nH(X)−nd0log|X−1|−log(nnd0)

This implies,

 Pr(1nd(X,^X(Y))≥d0)≤2−m2log(1+SNRd02)2nH(X)−nd0log|X−1|−log(nnd0)

Now note that for , . Then from above one can see that the probability of error goes down to zero if,

 nm<12log(1+SNRd02)H(X)−d0log|X−1|−d0log1d0

Thus a sensing capacity of

 C(d0)=12log(1+SNRd02)H(X)−d0log|X−1|−d0log1d0

is achievable in that the probability of error goes down to zero exponentially for choices of for any .

### Vi-B Continuous X, full diversity

###### Lemma VI.2

[Weak Achievability] For and drawn i.i.d. according to , chosen from the Gaussian ensemble and , a sensing capacity of

 C(2d0)=12log(1+d0SNR)RX(d0)−K(n,d0)

is achievable in that the probability of error goes down to zero exponentially with for for some arbitrary .

{proof}

For this case we invoke the construction as outlined in section IV. From the results in that section we get that,

Note that the result is little weaker in that guarantees are only provided to reconstruction within , but one can appropriately modify the rate distortion codebook to get the desired average distortion level. Proceeding as in the case of discrete and , by taking the expectation over and noting that , we get that,

This implies,

This implies that for

 nm<12log(1+d0SNR)RX(d0)−K(n,d0)

the probability of error goes to zero exponentially. This means that a sensing capacity of

 C(2d0)=12log(1+d0SNR)RX(d0)−K(n,d0)

is achievable in that the probability of error goes down to zero exponentially with for for some arbitrary .

A plot of upper and lower bounds are shown in figure 3.

## Vii Comparison with existing bounds

Note that the results in this paper are stated for for the discrete case and for for the continuous case. This is because one must consider stricter average distortion measures as the phenomena becomes sparser. To bring out this point concretely and for purposes of comparison with existing bounds, we consider the result obtained in [5] based on optimal complexity regularized estimation framework. They show that the expected mean squared error in reconstruction is upper bounded by,

 E[1n||X−^X||2]≤C1C2klognm (9)

where and , under normalization of the signal and the noise power and is the number of quantization levels, [9]. To this end consider an extremely sparse case, i.e., . Then the average distortion metric in equation 9, does not adequately capture the performance, as one can always declare all zeros to be the estimated vector and the distortion then is upper bounded by . Consider the case when is extremely sparse, i.e. as . Then a right comparison is to evaluate the average distortion per number of non-zero elements, . Using this as the performance metric we have from equation 9,

 E[1αn||X−^X||2]≤C1C2nlognm (10)

When is small then the average number of projections required such that the per non-zero element distortion is bounded by a constant, scales as . This is indeed consistent with our results, in that the Sensing Capacity goes down to zero as .

is sparse, i.e. but not very small. From results on achievable sensing capacity we have that

 Pr(1n||X−^X||2≥d0)≤−m2log(1+d0SNR/2)+n(RX(d0)−K(n,d0))

In order to compare the results we fix, performance guarantee of for a given , we have for the minimal number of projections required that,

 m≥2(log(1/ϵ)+n(RX(d0)−K(n,d0)))log(1+d0SNR/2)

from our results. From results in [9] it follows that,

 m≥C1C2αnlognd0ϵ

For the special case of binary alphabet we have the following scaling orders for the number of projections in both cases, from achievable sensing capacity we have and from results in [9] we have . A plot of these orders as a function of for a fixed is shown in figure, 4.

## Viii Effect of structure of G

In this section we will show that effect of structure of on sensing capacity. This section is divided into several subsections and the discussion is self-contained. In section VIII-A we will show that for the Gaussian ensemble, the sensing capacity reduces for when diversity is low. Following that in section VIII-B we will show the effect of correlation across columns in the sensing matrix for the Gaussian ensemble on achievable sensing capacity. In section VIII-C we will present a general result for a generic sensing matrix which will subsequently be used to highlight the effect of structures such as that induced via random filtering using a FIR filter with/without downsampling as considered in [20].

### Viii-a Effect of sensing diversity, Gaussian ensemble

In order to show the effect of sensing diversity we evaluate the mutual information using the intuition described in the introduction. To this end we have the following lemma.

###### Lemma VIII.1

For a diversity ratio of , with as the average diversity per sensor and an average sparsity level of , we have

 EGI(X;Y|G)≤m2Ej[log(SNRlj+1)], (11)

where the expectation is evaluated over the distribution

 Pr(j)=(kj)(n−kl−j)(nl)
{proof}

See Appendix.

In the above lemma plays the role of number of overlaps between the projection vector and the sparse signal. As the diversity reduces this overlap reduces and the mutual information decreases. We will illustrate this by considering the extreme case when with as . For this case we have,

The effect is illustrated in figure 5. Thus low sensing diversity implies low sensing capacity.

### Viii-B Effect of correlation in G on achievable sensing capacity

In this section we will show that correlation in sensing matrix reduces achievable capacity. Correlation in can arise due to many physical reasons such as correlated scattering, correlation of gains across modalities in sensing which may arise due to the physical construction of the sensor. Naturally there can be direct relations between various phenomena that can lead to such correlation. This is captured by assuming that there is correlation across the columns of . Consider the upper bound to the probability of error as derived in section IV,

In the above expression, the term

 SNR||G(X−X′)||2=SNRn∑i=1|n∑j=1Gij(Xi−X′j)|2

where for each are independent Gaussian random variables with zero mean and variance given by- where is the vector and is the covariance matrix (symmetric and positive semi-definite) of the -th row of . By construction, we know that and note that in the worst case,

 minΔT~ΣGiΔ=λminΔTΔ

where is the minimum eigenvalue of the normalized covariance matrix . Proceeding in a manner similar to that in the proof of lemma VI.2 we have that,

From the above expression one can see that achievable sensing capacity falls in general, since as compared to the case when the elements of are uncorrelated in which case .

### Viii-C Deterministic G

In this section we will consider deterministic matrices and provide upper bounds to sensing capacity for the general case. To this end denote the rows of as . Let the cross-correlations of these rows be denoted as:

 ri=GTiGi+1GTiGi

As before to ensure the SNR, to be fixed we impose for all . Then we have the following result:

###### Lemma VIII.2

For the generative models for the signal as outlined in the problem set-up, an upper bound for the sensing capacity for a deterministic sensing matrix is given by:

 C(d0)≤m−1∑i=1log(1+SNRα(1−ri)+riαSNRαSNR+1(1+αSNR(1−ri)))RX(d0)−K(n,d0) (12)
{proof}

We will evaluate via the straightforward method,

 I(X;Y|G)=h(Y|G)−h(Y|G,X)

Note that . Note that where is a Gaussian random vector obtained via where is now a Gaussian random vector with i.i.d components and with the same covariance as under the generative model(s). We will now upper bound the entropy of via,

 h(Y)≤h(Y∗)≤h(Y∗1)+m−1∑i=1h(Y∗i+1∣Y∗i)≤h(Y∗1)+h(Y∗i+1−ηiY∗i)

where is the best MMSE estimate for . The MMSE estimate of from is given by,

 ^Y∗i+1=ΣY∗iY∗i+1ΣY∗iY∗i

and . The result then follows by evaluating the MMSE error given by,

 E(Y∗i+1−^Y∗i+1)2=E(Y∗i+1−riαSNRαSNR+1Y∗i)2
 E(Y∗i+1−riαSNRαSNR+1Y∗i)2=αSNR+1+(riαSNR)2αSNR+1−2(riαSNR)2αSNR+1=1+αSNR(1−ri)+riαSNRαSNR+1(1+(1−ri)αSNR)

Plugging in the quantities the result follows.

Let us see the implications of the above result for one particular type of sensing matrix architecture induced via a random filtering and downsampling, considered in [20]. The output of the filter of length can be modeled via multiplication of via a Toeplitz matrix (with a banded structure). The overlap between successive rows of the matrix is in this case implying a large cross correlation . From lemma 12 it follows that larger cross correlation in rows implies poor sensing capacity. Also note that for a filtering architecture one has to address a coverage issue wherein it is required that . This implies that . Thus the filter length has to be sufficiently large which implies that cross-correlation is also large.

Indeed randomizing each row will lead to low cross-correlation (in an expected sense) but the coverage issue still needs to be addressed. On the other hand one can subsample the output signal of length by some factor so as to reduce the cross correlation yet ensuring coverage. In this case the matrix almost becomes like a upper triangular matrix and there is a significant loss of sensing diversity. A loose tradeoff between the filter-length and the sampling factor (say) immediately follows from lemma 12 where the cross correlation changes according to

## Ix Upper bounds on Sensing Capacity for {0,1} ensemble

The main motivation for considering this ensemble comes from scenarios where randomization in the elements of is not feasible, e.g. field estimation from smoothed data. In this case each sensor measures a superposition of the signals that are in the sensing range of the sensor. This leads us to consider other types of modalities, e.g. contiguous sampling of by each sensor Vs random sampling for . An illustration of the two types of sampling is shown in figure 6. We reveal the following contrast for the two cases for same

###### Lemma IX.1

Random Sampling: For the ensemble for sensing matrices consider the case when each row of has ones randomly placed in positions. Then for discrete drawn Bernoulli and for ,

 Crand(d0)≤H(J)h2(α)−h2(d0)

where is the discrete entropy function and where is a random variable with distribution given by

 Pr(J=j)=(αnj)(n(1−α)βn−j)(nβn)
{proof}

See Appendix.

###### Lemma IX.2

Contiguous Sampling: For the ensemble for sensing matrices consider the case where each row of has consecutive ones randomly placed with wrap around. Then for discrete drawn Bernoulli and ,

 Ccontg.(d0)≤h2(α+β)h2(α)−h2(d0)
{proof}

See Appendix.

As seen the upper bound, . Thus randomization in performs better. The difference is shown in figure 7 for a low sparsity scenario. The proofs of the lemmas IX.1 and IX.2 follow from the upper bounds to the mutual information terms as provided in section XII and then applying the necessary conditions for the lower bound on the probability of error to be lower bounded by zero.

## X Estimation of functions of X

The analysis of lower bounds to the probability of error presented in this paper extend in a straightforward way to estimation of functions of . In this section we will consider one such scenario that has received attention in relation to problems arising in physics. The discussion below will reveal the power of the method presented in this work and it is easily capable of handling more complicated cases and scenarios, though the computation of the terms involved in the analysis may become hard.

### X-a Detecting the sign pattern of X

Of particular interest is to estimate the sign pattern of the underlying signal . To this end define a new random variable