Lossy Computing of Correlated Sources with Fractional Sampling

Lossy Computing of Correlated Sources with Fractional Sampling

Abstract

This paper considers the problem of lossy compression for the computation of a function of two correlated sources, both of which are observed at the encoder. Due to presence of observation costs, the encoder is allowed to observe only subsets of the samples from both sources, with a fraction of such sample pairs possibly overlapping. The rate-distortion function is characterized for memoryless sources, and then specialized to Gaussian and binary sources for selected functions and with quadratic and Hamming distortion metrics, respectively. The optimal measurement overlap fraction is shown to depend on the function to be computed by the decoder, on the source statistics, including the correlation, and on the link rate. Special cases are discussed in which the optimal overlap fraction is the maximum or minimum possible value given the sampling budget, illustrating non-trivial performance trade-offs in the design of the sampling strategy. Finally, the analysis is extended to the multi-hop set-up with jointly Gaussian sources, where each encoder can observe only one of the sources.

1Introduction

A battery-limited wireless sensor node consumes energy in both its channel coding and its source coding components. The energy expenditure of the channel coding component is due to the power amplifier and to processing steps related to communication; instead, the source coding component consumes energy in the process of digitizing the information sources of interest through a cascade of acquisition, sampling, quantization and compression. It is also known that the overall energy spent for compression is generally comparable to that used for communication and that a joint design of compression and transmission is critical to improve the energy efficiency [1] [2]. We refer to the energy associated with the source coding component, i.e., measurements and compression of sources, as “sensing energy”.

A reasonable, and analytically tractable model, for the sensing energy is obtained by assuming that the sensing cost is proportional to the number of source samples measured and compressed.1 In our previous work [5], in the presence of constant per-sample sensing energy, we have investigated the problem of minimizing the distortion of reconstruction of independent Gaussian sources measured by a single integrated sensor under energy constraints on the channel and source coding components. Reference [5] reveals that, similar to the channel coding counterpart set-up in [6], it is generally optimal to measure and process a fraction of the source samples. We observe that this principle also underlies the compressive sensing framework [7]. In this work, instead, we consider a set-up with functional reconstruction requirements on correlated measured sources, as explained next.

Consider an encoder endowed with an integrated sensor that is able to measure two correlated discrete memoryless source sequences and through two different sensor interfaces, as shown in Figure 1. Following [5], we assume that measuring each sample of source , , entails a constant sensing energy cost per source sample. For simplicity, instead of having a total sensing energy budget for all the sources as in [5], we assume that the integrated sensor has a separate sensing energy budget (and thus a separate sampling budget) for either source. That is, the encoder can only measure samples from source , , with . The encoder compresses the measured samples to bits, where is the communication rate in bits per source sample. Based on the received bits, the decoder reconstructs a lossy version of a target function of source sequences and , which is calculated symbol-by-symbol as , . We refer to the above problem as lossy computing with fractional sampling. In Section 6, we will also consider the problem of multi-hop lossy computing with fractional sampling, which, as shown in Figure 2, differs from the integrated sensor (point-to-point) problem in that sources and are measured by two distributed sensors connected by a finite-capacity link.

Figure 1: The encoder measures correlated sources S_{1} and S_{2} for a fraction of time \theta_{1} and \theta_{2}, respectively, and the decoder estimates a function T^{n}=f^{n}(S_{1}^{n},S_{2}^{n}).
Figure 1: The encoder measures correlated sources and for a fraction of time and , respectively, and the decoder estimates a function .
Figure 2: The multi-hop setup studied in Section VII: Encoder 1 and Encoder 2 measure correlated sources S_1 and S_2 for a fraction of time \theta_{1} and \theta_{2}, respectively, and the decoder estimates a function T^{n}=f^{n}(S_{1}^{n},S_{2}^{n}).
Figure 2: The multi-hop setup studied in Section VII: Encoder 1 and Encoder 2 measure correlated sources and for a fraction of time and , respectively, and the decoder estimates a function .

A key aspect of the problem of lossy computing with fractional sampling is that the encoder is allowed to choose which samples to measure given the sampling budget (). To fix the ideas, assume that we have , so that only half of the samples can be observed from both sources. As two extreme strategies, the encoder can either measure the same samples from both sources, say for , or it can measure the first source for the first samples, namely for , and the second source for the remaining samples, namely for . With the first sampling strategy, the encoder is able to directly calculate the desired function for , while having no information (beside the prior distribution) about for the remaining samples. With the second strategy, instead, the encoder collects partial information about at all times in the form of samples from source or source .

Relating the discussed fractional sampling model with prior literature, we observe that, with full sampling of both sources, i.e., (), the encoder can directly calculate the function and the problem at hand reduces to the standard rate-distortion set-up (see, e.g., [3]). Instead, if the encoder can only measure one of the two sources, i.e., () or (), the problem at hand becomes a special case of the indirect source coding set-up introduced in [8]. The model of fractional sampling is also related to that of compression with actions of [9], in which the decoder obtains side information by taking cost-constrained actions based on the message received from the encoder. Finally, various recent information-theoretic results on the functional reconstruction problem without sampling constraints can be found in [10] (see also references therein).

The main contributions and the organization of the paper are as follows. We formulate the problem of lossy computing with fractional sampling of correlated sources for the set-up in Figure 1 in Section II. After providing general expressions for the distortion-rate and the rate-distortion functions in Section III, we specialize them to Gaussian sources and weighted sum function in Section IV and binary sources with arbitrary functions in Section V. As a result, various conclusions are drawn regarding conditions under which the optimal sampling strategy prescribes the maximum or the minimum possible overlap between the samples measured from the two sources. In Section VI, we extend the analysis to the multi-hop set-up of Figure 2, in which sources and are measured by different encoders connected by a finite-capacity link.

2System Model

In this section, we formally introduce the system model of interest for the point-to-point set-up of Figure 1. The multi-hop model will be introduced in Section VII. As shown in Figure 1, the encoder has access to two discrete memoryless source sequences and respectively, which consist of independent and identically distributed (i.i.d.) samples with and , , where and are the alphabet sets for and respectively. All alphabets are assumed to be finite unless otherwise stated. Due to presence of observation costs, we assume the encoder can only sample a fraction of the samples for source , with for , where the samples are determined prior to the observation of . Given the i.i.d. nature of the sources, without loss of generality, we assume that the encoder measures the first fraction of samples of source and measures the fraction of samples of starting from sample 2, as shown in Figure 3. The samples measured at the encoder from the two sources thus overlap for a fraction , with satisfying

with and , where denotes . We refer to the triple as a sampling profile, and to as the sampling budget.

Figure 3: Sampling profile (\theta_{1},\theta_{2},\theta_{12}) at the encoder: a fraction, \theta_{1}-\theta_{12}, of samples is measured only from source S_{1}; a fraction, \theta_{12}, of samples is measured from both sources; a fraction, \theta_{2}-\theta_{12}, of samples is measured only from source S_{2}; and the remaining fraction, 1+\theta_{12}-\theta_{1}-\theta_{2}, of samples is not measured for either source (0\leq\theta_{1},\theta_{2}\leq1, and \theta_{12} as in ()).
Figure 3: Sampling profile at the encoder: a fraction, , of samples is measured only from source ; a fraction, , of samples is measured from both sources; a fraction, , of samples is measured only from source ; and the remaining fraction, , of samples is not measured for either source (, and as in ()).

The decoder wishes to estimate a function , where for . We let be a distortion measure, where and are the alphabet sets of the variables and respectively. We assume, without loss of generality, that for each there exists a such that . The link between the encoder and the decoder can support a rate of bits/sample. Formal definitions follow.

Definition 1: A code for the problem of lossy computing of two memoryless sources with fractional sampling consists of an encoder , which maps the measured -fraction of source , i.e., , and the measured -fraction of source , i.e., , into a message of rate bits per source sample (where the normalization is with respect to the overall number of samples ); and a decoder which maps the message from the encoder into an estimate , such that the average distortion constraint is satisfied, i.e.,

Definition 2: Given any sampling profile , a tuple is said to be achievable, if for any , and sufficiently large , there exists a code. The distortion-rate function for sampling profile , , is defined as : is achievable, and the minimum achievable distortion for the same sampling profile is defined as .

Definition 3: The distortion-rate function with sampling budget , , is defined as , where the minimum is taken over all satisfying (Equation 1). Moreover, the minimum achievable distortion for the same sampling budget is defined as .

Similar definitions are used for the rate-distortion function. Specifically, the rate-distortion function for sampling profile , , is defined as : is achievable, and the rate-distortion function with sampling budget , , is defined as where the minimum is taken over all satisfying (Equation 1).

In most of the paper, we consider the average distortion criterion (), following standard considerations, the results presented herein hold also under the definition of distortion level whereby the probability that the distortion level is exceeded by an arbitrarily small amount vanishes as the block length grows large (i.e., as ) [11]. Moreover, the worst-case average per-sample criterion is briefly considered in Appendix .

3Rate-Distortion Trade-Off with Fractional Sampling

In this section, we characterize the distortion-rate functions and defined above as well as their rate-distortion counterparts. To elaborate, we first provide some standard definitions.

Definition 4: The standard distortion-rate function for source , , is defined as [3]. Moreover, the indirect distortion-rate function for compression of source when only source is observed at the encoder, , is . The distortion is defined as , for , with . Finally, the distortion is .

We similarly define the corresponding rate-distortion functions and , .

The rate-distortion function ( ?), and the corresponding distortion-rate function ( ?) can be obtained by noting that the rate-distortion problem with fractional sampling at hand can in fact be viewed as a special case of the conditional rate-distortion problem [12]. To this end, let be a time-sharing random variable independent of and and distributed as: , , , . Also, let if , if , if , if . For any given sampling profile , the rate-distortion problem at hand reduces to a standard Wyner-Ziv problem with as the source available at the encoder and as side information available both at the encoder and the decoder. Hence, the rate-distortion function in (5) is given as , where the minimum is taken over the set of all conditional distributions for which the joint distribution satisfies the expected distortion constraint . This expression can be easily evaluated to ( ?).

Note that the number of samples for each fraction in Figure 3 grows to infinity for as long as its corresponding fraction (i.e., , , or ) is non-zero. Therefore, these fractions can be considered separately without loss of optimality.

Note that in Lemma ?, rate is assigned for the description of the -fraction of samples in which only source is measured, , while rate is assigned for the description of the -fraction of samples in which both sources are measured (recall Figure 3). Distortions , and are the weighted average per-sample distortions in the reconstruction of at the decoder for the corresponding fractions of samples.

It follows by considering the monotonicity of ( ?) with respect to .

It follows from the operation definitions given in Definition 1 similar to [3].

4Gaussian Sources

In this section, we focus on the case in which sources and are jointly Gaussian, zero-mean, unit-variance and correlated with coefficient , with . The decoder wishes to compute a weighted sum function , with , under the mean square error (MSE) distortion measure . In the following, we study two specific choices for the weights and , resulting in the functions and , respectively. These two cases are selected in order to illustrate the impact of the choice of the function on the optimal sampling strategy. The discussion can be extended with appropriate modifications to arbitrary choices of weights .

4.1Computation of

See Appendix Section 8.

Proposition ? confirms the intuition that if the receiver is interested in source 1 only, i.e., , the encoder should simultaneously measure both sources and for a fraction of time to be kept as small as possible. Moreover, if , the entire rate is used to describe only the -fraction of samples measured from source only; otherwise, both the -fraction of source and the -fraction of source that is not overlapped are described at positive rates. Using a variant of the classic reverse water-filling solution [3], the threshold value of rate , for which only the independent source with the larger variance, namely, the -fraction, is described, can be obtained as . This threshold only depends on the sampling fraction of the source with the larger variance, , and the ratio of the variances of the -fraction and the -fraction, . The reader is referred to Appendix A and [5] for more details on how rate R is optimally allocated between the two fractions of source samples.

4.2Computation of

We now consider the case in which the desired function is . Note that is a Gaussian random variable with zero mean and variance , and that and (or ) are jointly Gaussian with correlation coefficient . Moreover, since for , it is enough to focus on the interval . Finally, we recall that the distortion-rate function for is given by for [3], and the indirect distortion-rate function is for and [13].

Given sampling budget , the distortion-rate function for computing is where the minimization is taken over all satisfying () and all satisfying .

This proposition follows by Lemma ? using arguments similar to the ones in Appendix Section 8.

In order to obtain further analytical insight into ( ?) and the optimal sampling strategy, we now consider some special cases of interest.

This proposition is easily obtained from Lemma ?. It says that, if the sources () have positive correlation, i.e., , and there are no rate limitations (), the MSE distortion increases linearly with , and it is thus optimal to set to be the smallest possible value . In contrast, if , the MSE distortion decreases linearly with , and thus the optimal is the largest possible value, . This shows the relevance of the source correlation in designing the optimal sampling strategy.

The general conclusions about the optimal sampling strategies discussed above for infinite rate can be extended to finite rates in certain regimes. Specifically, Proposition ? below states that if , then, just as in the case of of Proposition ?, the encoder should set to be as large as possible, i.e., , irrespective of the value of . Furthermore, Proposition ? below demonstrates that, for sufficiently small rates, the optimal overlap tends to be maximum, i.e., , for a larger range of correlation coefficients than . This is mainly because when rate is small enough, it is generally more efficient to use the available rate to describe directly during the overlapping -fraction, rather than indirectly describing based on observations of or alone.

The proof is obtained by solving ( ?) for .

Given , for any feasible satisfying (Equation 1), we always have . In this case, for any given , applying the standard Lagrangian method to ( ?), we obtain . Substituting into ( ?) and considering the monotonicity of function with respect to , we can show that the optimal overlap fraction is given by , leading to the distortion-rate function as stated in the proposition.

4.3Numerical Results

In this subsection, we numerically evaluate the distortion-rate function with fractional sampling for computation of function . Figure 4 and Figure 5 show the minimum MSE distortion and the optimal overlap fraction versus rate , respectively, for , and . The curves are obtained by numerically solving the optimization in ( ?). It can be seen from Figure 5 that, as predicted by Proposition ?, the optimal overlap fraction is equal to the maximum possible fraction , for and . Moreover, for , with sufficiently small rates , as described in Proposition ?, the optimal overlap fraction equals to the maximum overlap . However, as increases, drops to the minimum value , which is consistent with Proposition ?.

Figure 4: Distortion-rate function for computing T = S_1+S_2, (S_1, S_2) jointly Gaussian with correlation coefficient \rho=-0.5,0,0.5, respectively. The sampling budget is (\theta_{1},\theta_{2})=(0.5,0.75).
Figure 4: Distortion-rate function for computing , jointly Gaussian with correlation coefficient , respectively. The sampling budget is .
Figure 5: Optimal overlap fraction \theta_{12}^{*} that minimizes the average expected distortion as a function of rate R for computing T = S_1+S_2, (S_1, S_2) jointly Gaussian with correlation coefficient \rho=-0.5,0,0.5, respectively. The sampling budget is (\theta_{1},\theta_{2})=(0.5,0.75).
Figure 5: Optimal overlap fraction that minimizes the average expected distortion as a function of rate for computing , jointly Gaussian with correlation coefficient , respectively. The sampling budget is .

5Binary Sources

In this section, we consider binary sources so that , and is a doubly symmetric binary source (DSBS), i.e., we have , and , where . In other words, is the output of a binary symmetric channel with crossover probability corresponding to the input . We take the Hamming distortion as the distortion measure, i.e., , where if and otherwise. Since all non-trivial binary functions are equivalent, up to relabeling, to either the exclusive OR or the AND [14], it suffices to consider only these two options for function : (i) the exclusive OR or binary sum, i.e., ; (ii) the AND or binary product, i.e., . In the following, we focus on deriving the rate-distortion for convenience, since in general it takes a simpler analytical form as compared to the distortion-rate function .

5.1Computation of

Since is a Bernoulli() random variable independent of and , the observation of either or is not useful for computing . Thus, one should choose the overlap fraction to be as large as possible, i.e., . The rate-distortion function ( ?) then follows immediately from the rate-distortion function of the binary random variable [3].

5.2Computation of

In this subsection, we focus on the binary product , which is Bernoulli distributed with probability . For convenience, we start by finding the minimum possible distortion at the decoder given , i.e., as defined in Lemma ?, and the minimum required rate to achieve it. Then, we proceed to derive the rate-distortion function. For given sampling budget , the minimum achievable distortion for computing is given by where if and if . Moreover, distortion can be achieved as long as .

See Appendix Section 9.

The results in Proposition ? can be seen as the counterpart of Proposition ? for binary sources. In fact, they show that, for sufficiently large , if , the average Hamming distortion increases linearly with and thus we should set to the smallest possible value ; instead, if , the optimal value of is the largest possible, namely, .

Before we proceed to investigate the general rate-distortion function , we first derive the indirect rate-distortion function when only is observed at the encoder.

See Appendix Section 10.

By symmetry, the indirect rate-distortion function for when is observed at the encoder is also given by Lemma ?. The rate-distortion function for is obtained from standard results [3] as if , and if .

For a given sampling budget , the rate-distortion function for computing is given as where is as given in Proposition and the minimization is taken over all choices of , and such that () is satisfied, , , and

See Appendix Section 11.

5.3Numerical Results

In this subsection, we numerically evaluate the distortion-rate function for computation of function . Figure 6 and Figure 7 plot the minimum average Hamming distortion and the optimal overlap fraction for , and . In Figure 6, as predicted by Proposition ?, the minimum rate that achieves distortion , is given by , , for , respectively. It can be observed from Figure 7, for , the optimal overlap fraction is equal to the maximum possible value , for any . However, for smaller probabilities , the optimal overlap fraction equals to the maximum possible value for sufficiently smaller rates and then drops to the minimum possible value once gets larger. Moreover, the smaller the probability is, the larger range of rates over which the optimal overlap fraction is .

Figure 6: Distortion-rate function for computing T = S_1\otimes S_2, (S_1,S_2) doubly symmetric binary with probability \text{Pr}[S_{1}\neq S_{2}] equal to p = 0.1,0.2,0.4, respectively. The sampling budget is (\theta_{1},\theta_{2})=(0.5,0.75).
Figure 6: Distortion-rate function for computing , doubly symmetric binary with probability equal to , respectively. The sampling budget is .

We note that with a larger , it is easier to describe directly, since , but the indirect description of based on or becomes more difficult since becomes less correlated with or .3 This explains why the optimal overlap fraction should be chosen as the maximum possible value when is larger than (see the curve ). In this sense, the regime may be considered as the binary counterpart of the regime for the Gaussian sum case in Section IV-B. For probabilities , the numerical results above imply that the optimal overlap depends on the link rate . Similar to the Gaussian sum case when (Proposition ?), when is sufficiently small, it remains optimal to choose the overlap fraction to be the maximum possible; however, as grows sufficiently large, it is more advantageous to have the overlap fraction as small as possible, which is consistent with Proposition ?.

Figure 7: Optimal overlap fraction \theta_{12}^* that minimizes the average expected distortion as a function of R for computing T = S_1\otimes S_2, (S_1,S_2) doubly symmetric binary with probability \text{Pr}[S_{1}\neq S_{2}] equal to p = 0.1,0.2,0.4, respectively. The sampling budget is (\theta_{1},\theta_{2})=(0.5,0.75).
Figure 7: Optimal overlap fraction that minimizes the average expected distortion as a function of for computing , doubly symmetric binary with probability equal to , respectively. The sampling budget is .

6Multi-Hop Lossy Computing with Fractional Sampling

In this section, we extend the analysis of lossy computing with fractional sampling from the point-to-point setup to a multi-hop setup as depicted in Figure 2. We assume that Encoder can only sample a fraction of the samples for source , with , for . Moreover, the encoders make decisions on which samples to sense independently and based only on the statistics of the sources. In particular, to ensure causality, Encoder 2 is not allowed to observe the message from Encoder 1 before making a decision on which samples to measure as instead assumed in [15] for a related set-up. Under this assumption, the sampling fractions can overlap for a fraction , with satisfying (Equation 1). Similar to the point-to-point setup of Figure 1, it is without loss of generality to assume the sampling profile is as shown in Figure 3. The links between Encoder 1 and Encoder 2 and between Encoder 2 and the decoder can support a rate of bits/sample and a rate of bits/sample, respectively. As above, the goal is to estimate a function at the decoder. It is observed that if is unbounded, then the scenario reduces to the point-to-point system studied in the previous sections.

Definition 5: A code for the problem of multi-hop lossy computing of two memoryless sources with fractional sampling consists of an encoder (Encoder 1) ; an encoder (Encoder 2) ; and a decoder such that distortion constraint is satisfied as in (Equation 2). It is assumed that encoder operates on the measurements and encoder operates on the measurements as well as the index received from Encoder 1.

The distortion-rate function and the distortion-rate function are defined in a similar manner as in the point-to-point setup of Section II. In the remaining of this section, we focus on the specific case in which sources and are Gaussian and the decoder wishes to compute the sum . Other cases studied in Section IV and Section V can also be investigated similarly.

6.1Lower Bounds on the Achievable Distortion

Two lower bounds on the achievable distortion for the Gaussian multi-hop lossy computing problem discussed above can be derived based on the cut-set arguments [16]. Specifically, the first cut is around Encoder 1 and the second cut is around the decoder. These two cuts induce the following two subproblems of the original problem in Figure 2: 1) For the cut around Encoder 1, the problem is equivalent to point-to-point lossy computing with side information, in which the encoder and the decoder can only measure a fraction of the samples from the respective source; 2) For the cut around the decoder, the problem reduces to that of point-to-point source coding problem investigated in Section IV-B, leading to a lower bound as given by ( ?) of Proposition ? with replaced by .

In the following, we study the first subproblem identified above, namely, the problem of lossy computing with side information at the decoder and fractional sampling as shown in Figure 8. To elaborate, the encoder measures a fraction of samples from and describes it using rate to the decoder. At the same time, the decoder measures a fraction, , of samples of a correlated source , which overlaps with the encoder’s measurements for a fraction of samples. Based on the description received from the encoder and its own measurements, the decoder forms the estimate .

Figure 8: Lossy computing with side information at the decoder and fractional sampling: The encoder measures source S_1 for a fraction of time \theta_1, and the decoder measures source S_2 for a fraction of time \theta_2 and estimates a function T^n = f^n(S_1^n,S_2^n).
Figure 8: Lossy computing with side information at the decoder and fractional sampling: The encoder measures source for a fraction of time , and the decoder measures source for a fraction of time and estimates a function .

For the problem of lossy computing with side information at the decoder and fractional sampling, given sampling budget and rate , the distortion-rate function can be obtained as follows.

  1. For ,

    where is the optimal overlap fraction;

  2. If , then

    where is the optimal overlap fraction.

See Appendix Section 12.

Proposition ? states that, in the setting of Figure 8, it is optimal to have the overlap fraction as small as possible if the two sources are positively correlated, and as large as possible if they are negatively correlated. This result is consistent and follows from similar considerations as Proposition ?, ? and ?.

6.2Upper Bounds on the Achievable Distortion

In this subsection, we first propose a specific strategy, thus providing an upper bound on the achievable distortion. The derived upper bound is then compared to the lower bounds in Proposition ? and Proposition ? through numerical examples.

In the proposed strategy, we treat the -fraction of samples measured only at Encoder 1, the overlapping -fraction measured by both the encoders, and the -fraction measured only at Encoder 2 separately in terms of encoding and decoding. In particular, on the link between the two encoders, which is of rate , we assign rate to the encoded version of the -fraction of samples and rate to the -fraction. Moreover, on the link between Encoder 2 and the decoder, which is of rate , we allocate rate to forward the encoded version of the -fraction, rate to the -fraction and rate to the -fraction. By definition, we thus have the conditions

We specify the source coding strategy used for the different fractions of samples and discuss the resulting average distortions as follows. For the -fraction measured only by Encoder 1, we have available an end-to-end rate equal to . Encoder 1 thus compresses this fraction of samples of at rate bits/source sample using a standard indirect rate-distortion optimal code, leading to average distortion [13]. Similarly, for the -fraction of samples measured only by Encoder 2, Encoder 2 can employ rate using a standard indirect rate-distortion optimal code, leading to average distortion [13]. For the -fraction measured by neither node, the average distortion at the decoder is equal to the variance of source , namely, . For the -fraction measured by both nodes, the setup at hand reduces to the multi-hop source coding problem investigated in [17] with the average link rates over the two links being and , respectively. Among the class of achievable schemes considered in [17], under the assumption of unit-variance sources, the so called “re-compress” scheme is optimal. Therefore, we assume the “re-compress” scheme for the -fraction at hand, which leads to average distortion , where , for and .

Applying the source coding strategy described above, an achievable distortion at the decoder is given by summing the contributions of the different fractions of samples with the appropriate weights as