Distributed source coding in dense sensor networks

Distributed source coding in dense sensor networks

Akshay Kashyap Dept. of ECE, UIUC, Urbana, IL. Tel: 217-766-2537, Email: kashyap@uiuc.edu. Luis Alfonso Lastras-Montaño IBM T.J. Watson Research Center, Yorktown Heights, NY. Email: lastrasl@us.ibm.com. Cathy Xia IBM T.J. Watson Research Center, Hawthorne, NY. Emails: cathyx@us.ibm.com, zhenl@us.ibm.com. Zhen Liu IBM T.J. Watson Research Center, Hawthorne, NY. Emails: cathyx@us.ibm.com, zhenl@us.ibm.com.
Abstract

We study the problem of the reconstruction of a Gaussian field defined in using sensors deployed at regular intervals. The goal is to quantify the total data rate required for the reconstruction of the field with a given mean square distortion. We consider a class of two-stage mechanisms which a) send information to allow the reconstruction of the sensor’s samples within sufficient accuracy, and then b) use these reconstructions to estimate the entire field. To implement the first stage, the heavy correlation between the sensor samples suggests the use of distributed coding schemes to reduce the total rate. We demonstrate the existence of a distributed block coding scheme that achieves, for a given fidelity criterion for the reconstruction of the field, a total information rate that is bounded by a constant, independent of the number of sensors. The constant in general depends on the autocorrelation function of the field and the desired distortion criterion for the sensor samples. We then describe a scheme which can be implemented using only scalar quantizers at the sensors, without any use of distributed source coding, and which also achieves a total information rate that is a constant, independent of the number of sensors. While this scheme operates at a rate that is greater than the rate achievable through distributed coding and entails greater delay in reconstruction, its simplicity makes it attractive for implementation in sensor networks.

1 Introduction

In this paper, we consider a sensor network deployed for the purpose of sampling and reconstructing a spatially varying random process. For the sake of concreteness, let us assume that the area of interest is represented by the line segment , and that the for each , the value of the random process is . For example, may denote the value of some environmental variable, such as temperature, at point .

A sensor network, for the purpose of this paper, is a system of sensing devices (sensors) capable of

  1. taking measurements from the environment that they are deployed in, and

  2. communicating the sensed data to a fusion center for processing.

The task of the fusion center is to obtain a reconstruction of the spatially varying process, while meeting some distortion criteria.

There has been great interest recently in performing such sensing tasks with small, low power sensing devices, deployed in large numbers in the region of interest [1][2][3] [4]. This interest is motivated by the commercial availability of increasingly small and low-cost sensors which have a wide array of sensing and communication functions built in (see, for example, [5]), and yet must operate with small, difficult to replace batteries.

Compression of the sensed data is of vital importance in a sensor network. Sensors in a wireless sensor network operate under severe power constraints, and communication is a power intensive operation. The rate at which sensors must transmit data to the fusion center in order to enable a satisfactory reconstruction is therefore a key quantity of interest. Further, in any communication scheme in which there is an upper bound (independent of the number of sensors) on the amount of data that the fusion center can receive per unit time, there is another obvious reason why the compressibility of sensor data is important - the average rate that can be guaranteed between any sensor and the fusion center varies inversely with the number of sensors. Therefore, any scheme in which the per-sensor rate decreases slower than inversely with the number of sensors will build backlogs of data at sensors for large enough number of sensors.

Environmental variables typically vary slowly as a function of space and it is reasonable to assume that samples at locations close to each other will be highly correlated. The theory of distributed source coding ([6][7][8]) shows that if the sensors have knowledge of this correlation, then it is possible to reduce the data-rate at which the sensors need to communicate, while still maintaining the property that the information conveyed by each sensor depends only on that sensor’s measurements. Research on practical techniques ([9][10][11][12][13]) for implementing distributed source coding typically focuses on two correlated sources, with good solutions for the many sources problem still to be developed. Thus, in our work, we attack the problem at hand using the available theoretical tools which have their origins in [6].

This approach has been taken earlier in [1] and [2], which investigate whether it is possible to use such distributed coding schemes to reduce the per-sensor data rate by deploying a large number of sensors at closely spaced locations in the area of interest. In particular, it is investigated whether it is possible to construct coding schemes in which the per-sensor rate decreases inversely with the number of sensors. The conclusion of [1], however, is that if the sensors quantize the samples using scalar quantizers, and then encode them, the sum of the data rates of all sensors increases as the number of sensors increases (even with distributed coding), and therefore the per-sensor rate cannot be traded off with the number of sensors in the manner described above.

Later, though, it was demonstrated in [14] that there exists a distributed coding scheme which achieves a sum rate that is a constant independent of the number of sensors used (so long as there is a large enough number of sensors). The per-sensor rate of such a scheme therefore decreases inversely with the number of sensors, which is the trade-off of sensor number with per-sensor rate that was desired, but shown unachievable with scalar quantization, in [1]. Results similar to those of [14] for the case when a field of infinite size is sampled densely have since appeared in [3]. However, a question that still appears to be unresolved is whether it is possible to achieve a per-sensor rate that varies inversely with the number of sensors using a simple sensing (sampling, coding, and reconstruction) scheme.

This paper is an expanded version of [14]. We describe the distributed coding scheme of [14] in detail, and then study another sampling and coding scheme which achieves the desired decrease of per-sensor rate with the number of sensors. The two main properties of this scheme are that (1) it does not make use of distributed coding and therefore does not require the sensors to have any knowledge of the correlation structure of the spatial variable of interest, and (2) it can in fact be implemented using only scalar quantizers at the sensors for the purpose of coding the samples. The scheme utilizes the fact that the sensors are synchronized, which is already assumed in the models of [1][2][3], and is easily achievable in practice. Since scalar quantizers are easily implementable in sensors with very low complexity, this paper shows that it is possible achieve per-sensor rates that decrease inversely with the number of sensors with simple, practical schemes.

A brief outline of this paper is as follows: We pose the problem formally and establish notation in Section 1.1. We study the achievability of the above tradeoff with a distributed coding scheme in Section 2, and compare the rate of this coding scheme with that of a reference centralized coding scheme in Section 3. We describe the simple coding scheme mentioned above in Section 4. Some numerical results are presented in Section 5. We make some concluding remarks in Section 6.

1.1 Problem statement

1.1.1 Model for the spatial process

We take a discrete time model, and assume that the spatial process of interest is modeled by a (spatially) stationary, real-valued Gaussian random process, at each time , where is the space variable. The focus of this paper is the sampling and reconstruction of a finite section of the process, which we assume without loss of generality to be the interval . We follow conventional usage in referring to the spatial process as the field at time .

We assume that the field at time is independent of the field for any , and has identical statistics at all times. (In what follows, we omit the time index when we can do so without any ambiguity.) For simplicity, we assume that is centered, , and that the variance of is unity, for all . The autocorrelation function of the field is denoted as

Following common usage, we sometimes refer to as the correlation structure of the field. Clearly, , and for any . We need only mild assumptions on the field :

  1. We assume that is mean-square continuous, which is equivalent to the continuity of at (see, for example, [15]).

  2. We assume that there is a neighborhood of in which is non-increasing.

Note that all results in this paper extend to fields in higher dimensions. We restrict the exposition to one-dimensional fields for clarity and to avoid the tedious notation required for higher dimensional fields.

1.1.2 Assumptions on the sensor network

We assume that sensors are placed at regular intervals in the segment , with sensor being placed at for . Sensors are assumed to be synchronized, and at each time , sensor can observe the value of the field at its location, for each . Sensor encodes a block of observations, into an index chosen from the set , where is the rate of sensor , which we state in the units of nats per discrete time unit. We assume that the blocklength is the same at all sensors. The messages of the sensors are assumed to be communicated to the fusion center over a shared, rate constrained, noiseless channel. The fusion center then uses the received data to produce a reconstruction of the field.

A coding scheme is a specification of the sampling and encoding method used at all sensors, as well as the reconstruction method used at the fusion center.

1.1.3 Error criterion

We refer to as the mean square error (MSE) of the reconstruction of the field at point and time . We measure the error in the reconstruction as the average (over a blocklength) integrated MSE, which is defined as

(1)

We study coding schemes in which, for all large enough blocklengths and a specified positive constant , the fusion center is able reconstruct the field with an integrated MSE of less than , that is, schemes for which

(2)

1.1.4 Sum rate

In this paper, we describe coding schemes in which for any given value of in (2), the sum rate, , is bounded above by some constant independent of the number of sensors. The bound may in general depend on . This allows the per-sensor rate can be traded off with the number of sensors, so that for all large enough, the rate of each sensor is no more than a constant multiple of .

1.2 Contributions

Our main contributions are:

  1. We prove the existence of a distributed coding scheme in which, under the assumption that the correlation structure is known at each sensor, a sum rate that is independent of the number of sensors can be achieved.

  2. We design a simple coding scheme which can be implemented using scalar quantization at sensors, which does not require the sensors to have any information about the correlation structure, and which makes use of the fact that the sensors are synchronized to achieve a sum rate that is a constant independent of .

The latter scheme has the advantage of being simple enough to be implementable even with extremely resource-constrained sensors. However, the sum-rate achievable through this scheme is in general greater than the sum-rate achievable through distributed coding. Also, unlike distributed coding, this scheme entails a delay that increases with the number of sensors in the network.

2 Distributed coding

In this section we describe a distributed coding scheme which achieves the desired scaling.

2.1 Encoding and decoding

The scheme consists of encoders, , where is the encoder at sensor , and decoders, at the fusion center. For each , the rate of is assumed to be , and maps the block

of samples to an index chosen from , which is then communicated to the fusion center. While the output of encoder may not depend on the realizations of the observations at any other sensor , it is assumed that all sensors have knowledge of the statistics of the field (in particular, the function is assumed known at each sensor111In practice, the sensors need only know the vector .) and utilize this information to compress their samples. The decoders may use the messages received from all encoders to produce their reconstruction:

where is shorthand for , for and similarly for .

2.2 Reconstructing the continuous field

The reconstruction of the field for those values of where there are no sensors is done in a two-step fashion as follows. In the first step, the estimates of sensor samples are obtained as described above. Then, the value of the field between sensor locations is found by interpolation.

The interpolation for is based on the minimum MSE estimator for given the value of the sample closest to . Formally, for any , define if as the location of the sample closest to . Then, given , the minimum MSE estimate for is given by . The reconstruction of the field at the fusion center is obtained by replacing in this estimate with the quantized version ,

(3)

While this two-step reconstruction procedure is not optimal in general, it suffices for our purposes.

2.3 Error analysis

Define

(4)

Using the upper bound found in equation (23) (Appendix A) on the error of the coding scheme described above, we see that is met if , where

(5)

given that is large enough so that . It is easy to see that approaches from below as .

2.4 Sum rate

We now study the sum rate of the distributed coding scheme discussed above. We begin with finding the encoding rates required for achieving

(6)

for some constant .

The rate region is defined as the set of all tuples of rates for which there exist encoders and decoders , for , such that (6) can be met. If a rate vector belongs to the rate region, we say that the corresponding set of rates is achievable.

The rate-distortion problem in (6) is a Gaussian version of the Slepian-Wolf distributed coding problem [6]. Until recently, the rate region for this problem was not known for even sources. An achievable region for two discrete sources first appeared in [16], and was extended to continuous sources in [7]. The extension to a general number of Gaussian sources appears in [17]. The two-source Gaussian distributed source coding problem was recently solved in [8], where the achievable region of [16] was found to be tight. The rate region is still not known for more than sources. We use the achievable region found in [17].

Though the result is stated in [17] for individual distortion constraints on the sources, the extension to a more general distortion constraint is straightforward. We state the achievable region for distributed source coding in the form most useful to us in Theorem 1 below. In the statement of the theorem, we use to denote a Markov-chain relationship between random variables and , that is, conditioned on , is independent of . Also, for any , denotes the vector of those sources the indexes of which lie in the set and denotes the complement of the set .

Theorem 1

, where is the set of tuples of rates for which there exists a vector of random variables that satisfies the following conditions.

  1. ,    .

  2. ,    .

  3. such that

    (7)

Note that each of the rate-constraints in Theorem 1 forms some part of the boundary of the achievable region (see, for example, [17]). In particular, the constraint on the sum rate is not implied by any other set of constraints.

Constructing a vector satisfying the conditions of Theorem 1 corresponds to the usual construction of a forward channel for proving achievability in a rate-distortion problem. For each , can be thought of as the encoding of .

We now construct a that would suffice for our purposes. Consider a random vector that is independent of , and has a Gaussian distribution with mean and covariance matrix , where is the identity matrix. Then satisfies the Markov chain constraints of Theorem 1. To find a good bound on the sum rate, we now find a lower bound on the variance for which there exists an estimator which satisfies condition (7). Since is jointly Gaussian with , the estimator which minimizes the MSE in (7) is the linear estimator,

(8)

where and . Let be the largest value of for which the MSE achieved by this estimator satisfies (7). We prove below that for large enough , grows faster than linearly with .

Lemma 1

Let be a symmetric autocorrelation function such that and a threshold exists for which

  1. if and

  2. the inequality holds.

Then

Note: The second condition can be met for all since as .
Proof: We call a value of allowable if the expected reconstruction error in (7), with , is less than . We find the largest for the error criterion: for each , which is more stringent than the average error requirement of (7).

Let us consider the estimation of . Since is the best linear estimate of from the data , any other linear estimator cannot result in a smaller expected MSE. We take advantage of this observation and choose a linear estimator that although suboptimal, is simple to analyze and yet suffices to establish the lemma.

Our estimator for shall be the scaled average , where is a parameter to be optimized shortly. To estimate for , simply substitute the samples used with those whose indexes lie in the set (or, for samples at the right edge of the interval , ; this does not lead to any change in what follows because of the stationarity of the field).

It is not difficult to see that

(10)

where we have used the inequality for and the fact that the greatest integer not greater than is at least . The value of that makes the bracketed expression in (10) smallest is equal to (we do not optimize the entire expression for simplicity). Substitution of this value yields

Now let be sufficiently small so that , and let be sufficiently large so that . We can always do this since only depends on and on the autocorrelation function. Now suppose that , then

The above implies that for sufficiently large, . Taking the liminf, we obtain that for all sufficiently small ,

Since can be arbitrarily small, we obtain the desired conclusion.

The purpose of this Lemma is only to establish that grows at least linearly with . The constants presented were chosen for simplicity of presentation.

The following is our main result on the rate of distributed coding:

Proposition 1

The sum rate of the distributed coding scheme described above is bounded above by a constant, independent of .

Proof: Consider a vector Gaussian channel with input and output , , where is as above, and where the power constraint on the input is given by . Since is distributed , the capacity of this channel,

is equal to (see, for example, [18]).

Let be any number smaller than . We know from Section 2.3 that there is an such that for , . Further, from Lemma 1, we know that there exists some and a constant such that for , . Clearly, is a non-decreasing function of , and therefore for , . It then follows that for ,

Then, using the inequality , and using the result of Theorem 1 to substitute for , we see that

is achievable.

The constants in Proposition 1 have been chosen for simplicity. In general, the rates achievable by distributed coding are smaller than the bound found in Proposition 1.

3 Comparison with a reference scheme

In this section, we compare the rate of the distributed coding scheme discussed in Section 2 with a reference scheme, which for reasons that will become apparent below, we call as centralized coding.

The scheme consists of one centralized encoder , which has access to samples taken at all sensors at times , and decoders, at the fusion center. The encoder maps the samples of the sensors, , into an index chosen from the set , where is the rate of the centralized scheme, and communicates this index to the fusion center. The decoder at the fusion center reconstructs the samples from sensor from the messages received from the centralized encoder,

for .

At the fusion center, the reconstruction of the field is obtained in the same two-step manner described in Section 2.2: the fusion center constructs estimates of the samples , for from the messages received from the sensors, and then interpolates between samples using (3).

Let be the smallest rate for which there exists an encoder and decoders such that the integrated MSE (1) achieved by the above scheme satisfies the constraint (2). Then, it is clear that is a lower bound on the rates of all schemes which use the two-step reconstruction procedure of Section 2.2. In this section we bound the excess rate of the distributed coding scheme of Section 2 over the rate of the centralized scheme.

3.1 Error analysis

Using the lower bound in Appendix A, equation (24), on the error (1) in terms of of (4) we conclude that for large enough, if , then , where

Note that approaches from above as .

3.2 Bounding the rate loss

Now, consider

(11)

From Section 3.1, it is clear that the rate of the centralized coding scheme, satisfies, for any ,

We now use techniques similar to those in [19] to bound the redundancy of distributed coding over the rate of joint coding. Let be as in Proposition 1. Expanding in two ways, we get , so that

Since , we have . Subject to the constraint in (11), is upper bounded by the capacity of a parallel Gaussian channel, with noise and input , the power constraint on which is given by . The capacity of this channel is [18] , and therefore from (3.2) and the definition (11) of as the rate-distortion achieving random vector, we get

where the second inequality follows because . From Section 3.1, we know that for any , there is a large enough so that for all , , and we can choose the variance of the entries of to be at least , where is as in Lemma 1, while still ensuring that meets the requirements on the auxiliary random variable of Theorem 1. Therefore, substituting for , and using Lemma 1 and the result of Section 3.1 we get that for any , there is an large enough so that for all ,

(13)

We conclude that the rate of the distributed coding scheme of Section 2 is no more than a constant (independent of ) more than the rate of a centralized coding scheme with the same reconstruction procedure. Again, the constant in (13) has been chosen for simplicity of presentation and is in general much larger than the actual excess of the rate of the distributed coding scheme (see Section 5).

4 Point-to-point coding

The distributed coding scheme studied in Section 2 shows that the tradeoff of sensor numbers to sensor accuracy is achievable. However, it may not be feasible to implement complicated distributed coding schemes in simple sensors. In this section we show that if the sensors are synchronized and if a delay that increases linearly with the number of sensors is tolerable, then the desired tradeoff can be achieved by a simple scheme in which encoding can be performed at sensors without any knowledge of the correlation structure of the field.

In this scheme, we partition the interval into equal sized sub-intervals, ,,. We specify later, but assume that sensors are placed uniformly in . We assume that divides for simplicity (so that there are an integer number, , of samples in each interval).

Since the somewhat involved notation may obscure the simple idea behind the scheme, we explain it before describing the scheme in detail. We consider time in blocks of duration units each. The scheme operates overall with a blocklength of , that is, blocks, for some integer . Each sensor is active exactly once in any time interval that is units in duration. A sensor samples the field at its location only at those times when it is active. Each sensor uses a point-to-point code of blocklength and rate nats per active time unit. The code is chosen appropriately so as to meet the distortion constraint. However, since the sensor is active only in out of time units, the rate of the code per time-step is only nats. We show below that the desired distortion can be achieved with a rate that is independent of and therefore the desired scaling can be achieved by the above scheme.

We now describe the scheme in detail. Consider the time instants . Each sensor uses a code of blocklength , which is constructed from a code of blocklength , as follows. For each in and each in , sensor (which is the -th sensor from the left in the sub-interval , and is at location ) samples the field only at times . It uses a code of rate , to be specified below, to map the samples to an element of the set . The rate per-time unit of each sensor is therefore nats.

The fusion center consists of decoders, one for each sensor. Decoder constructs estimates of the samples encoded by sensor using only messages received from sensor . Then, for each time in , the fusion center has reconstructions

that is, one reconstruction for each sub-interval.

For any , we denote the location of the (unique) sensor active within the interval to which belongs by . For each time instant , the fusion center reconstructs the field for as

where is the decoded sample at the fusion center of the sensor at at time .

We show in Appendix B that

(14)
(15)

where, with some abuse of notation, we use to denote the set of time steps in which sensor is active. Note that the cardinality of is for each .

We now choose large enough so that and choose

(16)

The -blocklength code used at sensor for the times that it is active is a code that achieves the rate-distortion bound for the distortion constraint

as . It is well known that the rate of this code is nats per time step. It is clear from (15) and (16) that this scheme achieves the required distortion. Since the rate of each sensor in the overall scheme is nats per time step we have therefore constructed a scheme in which the bit rate of each sensor is

(17)

nats per time step. We can now choose to minimize the sum-rate .

Further, it is well known (see [20, Section 5.1]) that using scalar quantization, each sensor can achieve distortion at rate , where is a small constant. For example, for Max-Lloyd quantizers (see [20, Section 5.1]), is less than bit.

Therefore, we conclude that it is indeed possible to achieve the desired tradeoff between sensor numbers and the per-sensor rate even when the sensors encode their measurements using appropriate scalar quantizers, given that we also make use of the synchronization between sensors to activate sensors appropriately. This is in contrast to the conclusions of [1], where full use of synchronization is not made, and therefore it is found that the above tradeoff is not achievable with scalar quantization.

5 Numerical examples

In this section we give numerical examples of the rates of the coding schemes discussed in Section 2, Section 3 and Section 4. The two fields we consider as examples are (1) a (spatially) band-limited Gaussian field, for which , where , and (2) a Gauss-Markov field, for which .

For these fields, we numerically find the largest value of the variance of for which the error for the estimator in (8) is no more than the distortion of (5), with . The resulting values are shown in Figure 1. We see that for large values of , is indeed approximately linear in .

Figure 1: Linear increase of for large : (left) and (right). .

We compute the achievable sum rate of the distributed source coding scheme, which is equal to from Theorem 1, with the found above as the variance of the entries of . These rates are shown in Figure 2. For reference, we also show the lower bound on the rate of the centralized coding scheme computed in Section 3.

Figure 2: Rates of joint and distributed coding (in nats per snapshot) vs. number of sensors : (left) and (right). .

In comparison, on minimizing the rate (17) of the point-to-point coding scheme of Section 4, we find that best sum rate for is nats for intervals, and that the best sum rate for is nats with intervals, which is significantly greater than the sum-rate of the distributed coding scheme found above. However, part of the reason for the large sum-rate of the point-to-point coding scheme is that our analysis exaggerates an edge-effect for the sake of simplicity: In Section 4 we estimated the value of the field at point at time using the sample that the fusion center has at time from the sub-interval that lies in. We could instead have used the sample closest to that is available at the fusion center at time , similar to what is done in Section 2 and Section 3. However, this would have meant dealing with the first and the last sub-interval differently, and therefore we did not follow the analysis outlined above. Without this edge effect, the rates of the point-to-point coding scheme are approximately half the rates found above, which are still considerably larger than the sum-rates of the distributed coding scheme.

6 Conclusions

We have studied the sum rate of distributed coding for the reconstruction of a random field using a dense sensor network. We have shown the existence of a distributed coding scheme which achieves a sum rate that is a constant independent of the number of sensors. Such a scheme is interesting because it allows us to achieve a per-sensor rate that decreases inversely as the number of sensors, and therefore to achieve small per-sensor rates using a large number of sensors.

In obtaining bounds on the sum rate of distributed coding, we made full use to the heavy correlation between samples of the field taken at positions that are close together. When the number of sensors is large, the redundancy in their data can be utilized by coding more and more coarsely: this corresponds to more noisy samples, and is manifested in the growth of the noise in the forward channel in Section 2. We believe that this technique of bounding the sum rate is of independent interest.

We have also shown that contrary to what has been suggested in [1] and [3], it is indeed possible to design a scheme that achieves a constant sum rate with sensors that are scalar quantizers, even without the use of distributed coding. This scheme, however, requires that we make appropriate use of the synchronization between the sensors, results in a delay in reconstruction which increases linearly with the number of sensors, and achieves rates that may be significantly higher than the rates achieved by distributed coding. The scheme is nevertheless interesting because its low complexity makes it easy to implement.

Acknowledgement

The first author thanks Prof. R. Srikant for many insightful comments on this work, and for his encouragement to work on this paper while the first author was at UIUC.

Appendix A Bounds on for the schemes in Section 2 and Section 3

We can write the error in reconstruction at any as

(18)

where and . Note that in the schemes described in Section 2 and Section 3, the encodings of all samples are used to obtain the estimate , and therefore is in general not independent of , for . As a result, and are in general not independent. In this appendix, we find upper and lower bounds on that hold for the schemes of Section 2 and Section 3.

Using the Cauchy-Schwarz inequality (for any two appropriately integrable random variables and , ), it is easy to see that

(19)
(20)

Now, note that . Therefore,

For large enough so that both and lies in the interval around in which is non-increasing (so that for , which holds because the function is decreasing in ), we get that