Incremental Refinement using a Gaussian
Test Channel
Abstract
The additive rate-distortion function (ARDF) was developed in order to universally bound the rate loss in the Wyner-Ziv problem, and has since then been instrumental in e.g., bounding the rate loss in successive refinements, universal quantization, and other multi-terminal source coding settings. The ARDF is defined as the minimum mutual information over an additive test channel followed by estimation. In the limit of high resolution, the ADRF coincides with the true RDF for many sources and fidelity criterions. In the other extreme, i.e., the limit of low resolutions, the behavior of the ARDF has not previously been rigorously addressed.
In this work, we consider the special case of quadratic distortion and where the noise in the test channel is Gaussian distributed. We first establish a link to the I-MMSE relation of Guo et al. and use this to show that for any source the slope of the ARDF near zero rate, converges to the slope of the Gaussian RDF near zero rate. We then consider the multiplicative rate loss of the ARDF, and show that for bursty sources it may be unbounded, contrary to the additive rate loss, which is upper bounded by 1/2 bit for all sources. We finally show that unconditional incremental refinement, i.e., where each refinement is encoded independently of the other refinements, is ARDF optimal in the limit of low resolution, independently of the source distribution. Our results also reveal under which conditions linear estimation is ARDF optimal in the low rate regime.
I Introduction
Shannon’s rate-distortion function (RDF) for a source and distortion measure is given by
(1) |
where the infimum is over all reconstructions such that the expected distortion satisfies . Even though (1) perhaps appears simple and innocent, it is well-known that it is generally very hard to explicitly compute. In fact, there exists only very few cases where (1) is known in closed-form, e.g., Gaussian sources and MSE, binary sources and Hamming distances etc. In the information theoretic literature, several methods have been proposed to approximate the RDF’s e.g., iterative numeric solutions, high-resolution source coding, and (universal) bounds. In the first case, the Arimoto-Blahut algorithm is able to numerically obtain the rate-distortion function for arbitrary finite input/output alphabet sources and single-letter distortion measures [1]. In the second case, for continuous alphabet sources, it was shown by Linder and Zamir that the Shannon lower bound (SLB) is asymptotically tight for norm-based distortion metrics [2]. Thus, at asymptotically high coding rates, the RDFs can be approximated by simple formulaes. In the third case, alternative RDFs, which are easier to compute and analyze, are used to bound the true RDFs. For example, at general resolution and for difference distortion measures, the SLB provides a lower bound to the true RDF for many sources. On the other hand, Zamir presented in [3] an additive RDF (ARDF), which consists of an additive test channel followed by estimation. The ARDF has been shown to be a convenient tool for upper bounding the rate loss in many source coding problems. In particular, it was shown in [3] that the additive rate loss in the Wyner-Ziv problem is at most 1/2 bit for all sources. Similarly, it was shown by Lastras and Berger in [4], that the additive rate loss in the successive refinement problem is at most 1/2 bit per stage. The ARDF has also been successfully applied to upper bound the rate loss in other multi-terminal problems, cf. [3, 5]. In the limit of high resolution, the ARDF coincides with the true RDF for many sources and fidelity criterions [2]. In the other extreme, i.e., in the limit of low resolutions, the behavior of the ARDF has not been rigorously addressed. There has, however, been a great interest in the counter part to low resolution source coding, i.e., communication at low SNR, e.g., ultra-wideband communication [6]. A motivating factor for considering the low SNR regime in communications, is that the absolute value of the slope of the capacity-cost function is large (and therefore small for the cost-capacity function), which indicates that one gets the most channel capacity per unit cost at low SNR, as was shown by Verdú [7]. Interestingly, Verdú also showed that for rate-distortion at low rates, the most cost effective operating point in terms of bits per unit distortion, is near zero rate [7]. This follows since the absolute value of the slope of the RDF is minimized when the distortion approaches its maximum.
In this paper, we are interested in analyzing the ARDF at low resolutions. We consider the special case of the ARDF where the test channel’s noise is Gaussian and the distortion measure is the MSE. We establish a link to the mutual information – minimum mean squared estimation (I-MMSE) relation of Guo et al. [8] and use this to show that for any source the slope of the ARDF near zero rate, converges to the slope of the Gaussian RDF near zero rate. We then consider the multiplicative rate loss of this ARDF and show that for bursty sources it may be unbounded. We also show that unconditional incremental refinement, i.e., where each refinement is encoded independently of the other refinements, is ARDF optimal in the limit of low resolution, independently of the source distribution. In particular, let an arbitrarily distributed source be encoded into representations where are mutually independent, Gaussian distributed, and independent of . Then we show that at low rates. Moreover, the joint reconstruction follows by simple linear estimation of from . If side information , where is independent of , but arbitrarily jointly distributed with , is available both at the encoder and decoder, we show that . In this case, however, the best conditional estimator is generally not linear. We provide the exact conditions for ARDF optimality of linear estimation in the low rate regime.
Ii Background
In this section, we present two existing important concepts that we will be needing in the sequel, i.e., the additive RDF and the I-MMSE relation.
Ii-a The Additive Rate-Distortion Function
The additive (noise) RDF, as defined by Zamir in [3], describes the best rate-distortion performance achievable for any additive noise followed by optimum estimation, including the possibility of time sharing (convexification). In the current paper, we restrict attention to Gaussian noise, MMSE estimation (MSE distortion), and no time-sharing, so we take the “freedom” to use the notation additive RDF, , for this special case (i.e. no minimization over free parameters). Specifically, let denote the minimum possible MSE in estimating from , i.e.,
(2) |
Moreover, let the additive noise be zero-mean Gaussian distributed with variance . Then,
(3) |
where the noise variance is chosen such that .
Ii-B The I-MMSE Relation
Using an incremental Gaussian channel, Guo et al. [8] was able to establish an explicit connection between information theory and estimation theory. For future reference, we include this result below:
Theorem 1 ([8]).
Let be zero-mean Gaussian of unit variance, independent of , and let have an arbitrary distribution that satisfies . Then
(4) |
where
(5) |
Iii Incremental Refinements
Iii-a The Slope of the ARDF
We will show that the slope of at for a source with variance is independent of the distribution of . In fact, the slope is identical to the slope of the RDF of a Gaussian source with variance . This is interesting since the RDF of any zero-mean source with a variance meets the Gaussian RDF at . Thus, since the Gaussian RDF can be obtained by linear estimation, it follows that can also be obtained by linear estimation near .
Lemma 1.
Let where , is arbitrarily distributed with variance and is Gaussian distributed according to . Moreover, let be the additive RDF. Then
(6) |
irrespective of the distribution on .
Remark 1.
Interestingly, it was shown by Marco and Neuhoff [9] that in the quadratic memoryless Gaussian case, the operational rate-distortion function of the scalar uniform quantizer (followed by entropy coding) has the same slope as (6). Thus, in this particular case, the optimal scalar quantizer is as good as any vector quantizer.
Iii-B Multiplicative Rate Loss in the Low Rate Regime
Recall that in e.g., the successive refinement problem, the additive rate loss is no more than 0.5 bits per stage. We will now show that the multiplicative rate loss may be unbounded.
Let be a Gaussian mixture source with a density given by , where . The variance of is . The components contribution can be parametrized by as follows: . It will be convenient to let and . Moreover, we shall assume that . Notice that as we have that , and .
At this point, let with probability and with probability , and let be an indicator of the two components, i.e., , if , and , if . The RDF, conditional on the indicator , is given by
Thus, the slope of w.r.t. is given by
(7) |
which tends to zero as and . It follows from this fact and from Lemma 1 that the ratio of the slope of the conditional RDF and the slope of the ARDF grows unboundedly as . Moreover, as , , which implies that it becomes increasingly easier for the uninformed encoder/decoder to guess the correct component of the source. Thus, the conditional RDF converges towards the true RDF , from which it follows that the ratio .^{1}^{1}1A rigorous proof of the convergence is omitted due to space considerations.
Iii-C Unconditional Incremental Refinements
We will now show that unconditional incremental refinement, i.e., where each refinement is encoded independently of the other refinements, is ARDF optimal in the limit of low resolution, independently of the source distribution. This result is not only of theoretical value but is also useful in practice, since conditional source coding is generally more complicated than unconditional source coding, i.e., creating descriptions that are individually optimal and at the same time jointly optimal is a long standing problem in information theory, where it is known as the multiple descriptions problem [10].
Lemma 2.
Let be arbitrarily distributed with variance , and let be a sequence of zero-mean mutually independent Gaussian sources each with variance . Then
Lemma 3.
Let , where . Moreover, let be arbitrarily distributed with variance and let be zero-mean unit-variance i.i.d. Gaussian distributed. Then
and
(8) |
To illustrate the importance of Lemma 3, let us consider the situation of a zero-mean unit-variance memoryless Gaussian source , which is to be encoded successively in stages. In stage , descriptions , are constructed unconditionally of each other. Thus, for the same coding rate (at each stage), the joint distortion in the th stage is worse than if only a single joint description within each stage had been created. In fact, in the symmetric case where all individual descriptions within stage has the same distortion and rate , it can be shown that the joint distortion of the th stage is given by
(9) |
and the sum-rate at stage is given by
(10) |
where . Since the Gaussian source is successively refinable, using conditional refinements will achieve the true RDF given by , where is given by (9). On the other hand, the rate required when unconditional coding is used is given by (10). For comparison, we have illustrated the performance of unconditional and conditional coding when the source is encoded into descriptions per stage, for the case of of and increments (stages), respectively, see Fig. 1. In this example, and . Notice that when using smaller increments, i.e., when as compared to when , the resulting rate loss due to using unconditional coding is significantly reduced.
Iii-D Unconditional Incremental Refinements (Side Information)
The case of additional side information available at the encoder and the decoder was not considered by Guo et al. in [8]. Below we generalize Theorem 1 to include side information:
Lemma 4.
Let where and is arbitrarily distributed, independent of and of variance . Let be arbitrarily distributed and correlated with but independent of . Then
(11) |
Corollary 1.
Let , where and . Let be arbitrarily distributed with variance and let be zero-mean unit-variance i.i.d. Gaussian distributed. Let be arbitrarily distributed and correlated with but independent of . Then
Iii-E Conditions for Optimality of Linear Estimation
It was recently shown by Akyol et al. [11], that for an arbitrarily distributed source , contaminated by Gaussian noise , the MMSE estimator of given , converges in probability to a linear estimator, in the limit where . Contrary to this result, we show that the conditional MMSE estimator with side information , where is independent of but is arbitrarily correlated with is generally not linear.
Lemma 5.
Let where , is arbitrarily distributed with variance and is Gaussian distributed according to . Moreover, let be arbitrarily distributed, independent of but arbitrarily correlated with . Then the conditional MMSE estimator is linear if and only if
(12) |
where
and
In the case where are jointly Gaussian, it is easy to show that (12) is satisfied and, thus, the MMSE estimator is trivially linear in both and .
Acknowledgment
The authors would like to thank Uri Erez who initially proposed the idea of incremental refinements in the context of multiple descriptions with feedback.
Proof:
The additive RDF is defined parametrically as , by , which implies that
(13) |
From the derivative of a composite function, it follows that
(14) |
We know that can be expanded as [8]
(15) |
and that
(16) |
It follows from (15) that
(17) |
From [8], . Moreover, since and since implies , we have that the slope of with respect to at is
(18) |
∎
Proof:
Let and let be the DFT of , i.e.,
(19) |
The DC term is given by . The other terms, i.e., , are AC terms and do not contain (since is DC). The AC terms are orthogonal to the DC component of the noise, i.e., , and since the Gaussianity of the noise implies independence, we are left with only the DC term. Since the ’s are mutually independent, the resulting sum-noise component of the DC term has variance . Thus, the DC term is equivalent to , where is distributed as . This shows that . The lemma is proved. ∎
Proof:
From Lemma 2, it is clear that
(20) |
To get to the standard form with unit-variance noise, we may scale both and by without affecting their mutual information, i.e.,
(21) |
At this point we use that [8]
(22) |
where and . This proves the first part of the lemma. By using well-known linear estimation theory, it is easy to show that
(23) |
where denotes the MSE due to estimating from using linear estimation. We now invoke the fact that linear estimation is optimal in the limit and re-order the terms in (23) to get (8). ∎
Proof:
We will extend the proof technique used in [8, Lemma 1] to allow for arbitrary conditional distributions. To do this, we make use of the fact forms a Markov chain (in that order), which will allow us to simplify the decomposition of their joint distribution.
Let denote expectation with respect to . We first expand the conditional mutual information in terms of the Divergence, i.e.
(24) |
where can be chosen arbitrary as long as and are both well-defined. Let .
The first term in (24) is the Divergence between two Gaussian distributions, since is Gaussian distributed and is Gaussian since a linear combination of Gaussians remain Gaussian. In this case we have [8]
(25) |
where we used that .
We now look at the second expression in (24) and use the Markov condition to get to . With this, we may adapt the proof technique of [8] to obtain:
where follows by using a series expansion of in terms of . We have thus established that the second term of (24) goes to zero (as a function of ) faster than the first term. Thus, the first term dominates the conditional mutual information for small . This completes the proof. ∎
Proof:
We first consider the unconditional case, where . Let us assume that . Recall that , where and . For small , the optimal estimator is linear, and we have that
(26) |
where is the Wiener coefficient given by . From (16), we know that the MMSE behaves as:
(27) |
On the other hand, in the conditional case with side information , for each the source has mean and variance . Using this in (26), and fixing , leads to
(28) |
where the Wiener coefficient depends on , i.e., . Using (16) for a fixed yields
(29) |
Taking the average over results in
(30) |
where . By Jensen’s inequality, it follows that
(31) |
with equality if and only if the conditional variance is independent of the realization of . Thus, comparing (30) to (27) shows that the linear estimator is generally not optimal.
∎
References
- [1] R. E. Blahut, “Computation of channel capacity and rate-distortion function,” IEEE Trans. Inform. Theory, no. 4, pp. 460 – 473, July 1972.
- [2] T. Linder and R. Zamir, “On the asymptotic tightness of the Shannon lower bound,” IEEE Trans. Inform. Theory, vol. 40, no. 6, pp. 2026 – 2031, November 1994.
- [3] R. Zamir, “The rate loss in the Wyner-Ziv problem,” IEEE Trans. Inform. Theory, vol. 42, no. 6, pp. 2073 – 2084, November 1996.
- [4] L. Lastras and T. Berger, “All sources are nearly successively refinable,” IEEE Trans. Inform. Theory, vol. 47, no. 3, pp. 918 – 926, 2001.
- [5] R. Zamir and T. Berger, “Multiterminal source coding with high resolution,” IEEE Trans. Inform. Theory, vol. 45, no. 1, pp. 106 – 117, January 1999.
- [6] A. Lapidoth and I. E. Teletar, “On wide-band broadcast channels,” IEEE Trans. Inform. Theory, vol. 49, no. 12, pp. 3250 – 3258, December 2003.
- [7] S. Verdú, “On channel capacity per unit cost,” IEEE Trans. Inform. Theory, no. 5, pp. 1019 – 1030, September 1990.
- [8] D. Guo, S. Shamai, and S. Verdú, “Mutual information and minimum mean-square error in Gaussian channels,” IEEE Trans. Inform. Theory, vol. 51, pp. 1261 – 1282, April 2005.
- [9] D. Marco and D. L. Neuhoff, “Low-resolution scalar quantization for Gaussian sources and squared error,” IEEE Trans. Inform. Theory, vol. 52, no. 4, pp. 1689 – 1697, April 2006.
- [10] A. A. E. Gamal and T. M. Cover, “Achievable rates for multiple descriptions,” IEEE Trans. Inform. Theory, vol. IT-28, no. 6, pp. 851 – 857, Nov. 1982.
- [11] E. Akyol, K. Viswanatha, and K. Rose, “On conditions for linearity of optimal estimation,” in IEEE Information Theory Workshop, Dublin, 2010.