Zero-Delay Rate Distortion via Filtering for Vector-Valued Gaussian Sources

Zero-Delay Rate Distortion via Filtering for Vector-Valued Gaussian Sources

Photios A. Stavrou, Jan Østergaard and Charalambos D. Charalambous Part of this work was presented at the Symposium on Information Theory and Signal Processing in the Benelux, Delft, Netherlands [1] and at the IEEE Information Theory Workshop (ITW), Kaohsiung, Taiwan [2]. This work has received funding from VILLUM FONDEN Young Investigator Programme, under grant agreement No. 10095., email: fstavrou@kth.se, email: jo@es.aau.dk, email: chadcha@uc.ac.cy P. A. Stavrou is with the Department of Information Science and Engineering, KTH Royal Institute of Technology, Sweden J. Østergaard is with the Department of Electronic Systems, Aalborg University, Denmark C. D. Charalambous is with the Department of Electrical and Computer Engineering, University of Cyprus, Cyprus
Abstract

We deal with zero-delay source coding of a vector-valued Gauss-Markov source subject to a mean-squared error () fidelity criterion characterized by the operational zero-delay vector-valued Gaussian rate distortion function (). We address this problem by considering the nonanticipative () which is a lower bound to the causal optimal performance theoretically attainable () function (or simply causal ) and operational zero-delay . We recall the realization that corresponds to the optimal “test-channel” of the Gaussian , when considering a vector Gauss-Markov source subject to a distortion in the finite time horizon. Then, we introduce sufficient conditions to show existence of solution for this problem in the infinite time horizon (or asymptotic regime). For the asymptotic regime, we use the asymptotic characterization of the Gaussian to provide a new equivalent realization scheme with feedback which is characterized by a resource allocation (reverse-waterfilling) problem across the dimension of the vector source. We leverage the new realization to derive a predictive coding scheme via lattice quantization with subtractive dither and joint memoryless entropy coding. This coding scheme offers an upper bound to the operational zero-delay vector-valued Gaussian . When we use scalar quantization, then for active dimensions of the vector Gauss-Markov source the gap between the obtained lower and theoretical upper bounds is less than or equal to bits/vector. However, we further show that it is possible when we use vector quantization, and assume infinite dimensional Gauss-Markov sources to make the previous gap to be negligible, i.e., Gaussian approximates the operational zero-delay Gaussian . We also extend our results to vector-valued Gaussian sources of any finite memory under mild conditions. Our theoretical framework is demonstrated with illustrative numerical experiments.

I Introduction

Rate distortion theory describes the fundamental limits between the desired bitrate and the associated achievable distortion or vice versa, for a specific source and distortion measure [3]. The source coders and decoders, which are able to get very close to the fundamental rate-distortion limits are generally computationally expensive, non-causal, and tends to impose long delays on the end-to-end processing of information. When source coding is to be part of a bigger infrastructure such as distributed data processing over sensor networks, networked control systems, etc., there will often be strict requirements on the tolerable delay and system complexity. This necessitates real-time communication between the systems involved whereas delays play a critical role on the performance or even the stability of these systems.

To achieve near instantaneous encoding and decoding, it is necessary that the source encoder and decoder are causal [4]. Unfortunately, causality comes with a price. In particular, it was shown in [5] that imposing causality on the coder results in an increase in the bitrate due to the quantizer’s space-filling loss and reduced de-noising capabilities due to causal filtering at the decoder. If zero-delay is furthermore imposed, there will be an additional increase in the bitrate due to having a finite (and often small) alphabet in the entropy coder [5].

In applications where both instantaneous encoding and decoding are required, it is common to use the term zero-delay source coding [6]. Zero-delay source coding is particularly relevant for networked control systems, where an unstable plant is to be stabilized via a communication channel. At each time step, the feedback signal of the plant needs to be encoded, transmitted over a channel, decoded, and reproduced at the controller’s side. Some indicative works on zero-delay source coding for control systems can be found, for instance, in [7, 8, 9, 10, 11, 12, 13, 14].

In the field of information theory, there is a tradition to establish achievability of a certain rate-distortion performance by showing a construction based on random codebooks, which requires asymptotically large source vector dimensions [15]. However, in the case of zero-delay source coding, the random coding based technique is often not applicable. Indeed, the optimal rate-distortion performance for zero-delay source coding, hereinafter called zero-delay rate distortion, is hard to establish and is, for example, not known for the case of general Gaussian sources subject to a mean squared error () distortion, whereas the non-causal classical rate distortion function () is, in general, known. To overcome the computational complexity of the zero-delay , there has been a turn in studying variants of classical that are lower bounds to the zero-delay . One such variant is the so-called nonanticipative () also found as nonanticipatory entropy and sequential in the literature. The was first introduced in [16] and extensively analysed for Gauss-Markov sources in [17]. In [17, Theorem 5], the authors derived a partial characterization (because certain parameters are not found) for for time-varying vector-valued Gauss-Markov sources with square-error distortion function, by providing a parametric realization of the test channel conditional distribution of the reproduction process, that is first-order Markov with respect to source symbols and depends only on the previous reproduction symbol. Moreover, in [17, Examples 1, 2] the authors derive the complete characterization of the , for time-varying and stationary scalar-valued Gaussian first-order autoregressive () processes, with pointwise or per-letter mean squared-error () distortion fidelity and gave the expression of finite-time in terms of a reverse-waterfilling at each time instant and the corresponding expression in the asymptotic regime. Tatikonda et al. in [8] leverage the results of [17, Examples 1, 2] and applied the asymptotic to compute the Gaussian for time-invariant scalar-valued Gaussian sources with an asymptotic distortion constraint. In addition, they gave a parametric expression of the for time-invariant vector-valued Gauss-Markov sources, that is described by a reverse-waterfilling algorithm which is unfortunately suboptimal (the suboptimality is demonstrated via a counterexample in [18]). It should be noted that in [8] and also [18] the authors do not attempt to identify the parameters of the realization given in [17, Theorem 5]. Derpich and stergaard in [5] considered variants of for stationary scalar-valued Gaussian autoregressive models of any order with pointwise distortion fidelity and computed the asymptotic expression of the Gaussian for stationary scalar-valued Gaussian sources which was first derived in [17, Equation (1.43)] using alternative methods. Tanaka et al. in [19] revisited the finite-time for vector-valued Gauss-Markov sources with pointwise distortion fidelity following the line of work of [8] and showed that the resulting optimization problem is semidefinite representable, thus, it can be solved numerically. However, none of the previously discussed works, i.e., [8, 5, 19, 18] provide a realization of the test channel conditional distribution that achieves the or attempt to identify the parameters in the realization given in [17, Theorem 5]. Stavrou et al. in [20] considered the of the time-varying vector-valued Gauss-Markov source under a total distortion fidelity, and gave a sub-optimal realization of the design coefficient in the parametric realization given in [17, Theorem 5]. Further, in [20, Theorem 2] the computation of the finite-time via a dynamic reverse-waterfilling algorithm is not optimal (this is explained in [21]). Recently, in [22] the authors computed the finite-time for vector-valued Gauss-Markov sources subject to a total and per-letter distortion constraint, using convex optimization techniques and gave a parametric solution via a dynamic reverse-waterfilling algorithm, that identifies the parametric realization given in [17, Theorem 5]. The results obtained in [22] did not consider the asymptotic regime.

The signal processing approaches to source coding can roughly be classified into transform coding [23, 24], filterbanks [25, 26], and predictive coders [27, 28, 29, 25]. A transform can be put on a matrix form, which is multiplied on the signal vector. Clearly, this operation is only causal if the matrix is lower triangular (when multiplied on the left hand side of the signal vector). Low delay filters have been considered in [30], and zero-delay filtering in [31, 20]. Predictive coders usually operates directly on the time-domain signal, and can easily be made causal (and of zero-delay) by simply only making use of the current and past samples of the source signal. Recently, it has been shown that the causal of a stationary colored scalar Gaussian process can be achieved by causal prediction and noise-shaping [5].111This result parallels that of [32], where it was shown that the non-causal of a stationary colored scalar Gaussian process under can be achieved by (non-causal) prediction.

In this paper, we deal with zero-delay source coding of a vector-valued Gauss-Markov source expressed in state space form subject to a distortion constraint. We recall the complete characterization of the finite-time for time-varying Gauss-Markov sources subject to a total distortion developed for scalar-valued sources in [21] and for vector-valued sources in [22] to obtain the following results.

  • Sufficient conditions to ensure by construction existence of solution of the per unit time asymptotic limit of the finite-time Gaussian . The asymptotic Gaussian provides a lower bound to the operational zero-delay vector-valued Gaussian .

  • A new feedback realization scheme that corresponds to the optimal test channel of the asymptotic Gaussian . This scheme is characterized by a resource allocation problem across the dimension of the source.

  • A coding scheme based on predictive coding which is applied to this feedback realization scheme using scalar or vector quantization and joint entropy coding separately across every dimension of the vector-valued Gauss-Markov source. This scheme provides an achievable (upper) bound to the operational zero-delay vector-valued Gaussian .

  • Several numerical examples that demonstrate our theoretical framework. These examples take into account both stable and unstable Gaussian sources.

In addition to the previous main results, we explain how our scheme can be generalized to vector-valued Gauss-Markov processes of any finite order.
The new feedback realization scheme has a Kalman filter in the feedback loop. The feedback loop serves two purposes; if the Gaussian source is unstable then the filter with the help of the feedback loop tracks it while the estimation error converges, and it removes most of the source redundancy along the temporal direction by means of closed-loop vector prediction. On the other hand, the feed-forward path transforms the residual (innovations) vector Gaussian source into a new vector source, which has independent spatial components, and thereby can be efficiently encoded by applying for example scalar quantization and joint entropy coding separately across each dimension of the vector. Our construction makes use of simple building blocks such as non-singular joint diagonalizers (KLT matrices), diagonal scaling matrices, Kalman filters, and scalar (or lattice) quantization. It also demonstrates the resource allocation of the source signals depending on the data rate budget. This means that our scheme demonstrates which dimensions are active when the reverse-waterfilling kicks in. This issue and the complete machinery to obtain theoretical lower and upper bounds as well as the operational rates to the zero-delay Gaussian is not demonstrated in the recent works of [1, 2].

Our coding results demonstrate that when we use scalar quantization, the gap between the obtained lower and theoretical upper bounds is less than or equal to bits/vector where denotes the number of active dimensions of the vector-valued Gauss-Markov source. Moreover, at high rates our simulation experiments demonstrate that the gap between the lower bound and the operational rates mitigates to approximately bits/vector. For vector quantization, we show that in the limit of asymptotically large vector dimensions, it is possible for the causal and zero-delay to coincide with the Gaussian .

It should be noted that our realization scheme can be paralleled to the work developed in [33, Chapter 11] (see also the references therein) where various (possibly partially observable) source signals are communicated via an observer or controller over parallel Gaussian channels with spatially independent delays. Compared to that framework we investigate perfect prediction in the sense that we do not take into account issues like data dropouts or delays within the parallel channels or even conditions for stability of the estimator. Potentially, one can leverage our framework to investigate similar problems to [33, Chapter 11].

This paper is structured as follows. In §II we characterize the zero-delay source coding problem for vector-valued Gauss-Markov sources subject to an asymptotic distortion constraint in terms of zero-delay Gaussian . In §III we give known lower bounds to zero-delay Gaussian using general Gaussian sources subject to a distortion whereas in §III-A we concentrate on the of vector-valued Gauss-Markov source. In §IV we show existence of solution to the asymptotic Gaussian and we provide a new feedback realization scheme that corresponds to the asymptotic test-channel of this problem. §V derives upper bounds to the zero-delay Gaussian in terms of the Gaussian using scalar and vector quantization with memoryless entropy coding. In §VI we demonstrate our theoretical framework via several numerical experiments. We draw conclusions and discuss future directions in §VII.

Notation

denotes the set of real numbers, the set of integers, the set of natural numbers including zero, and . Let be a finite dimensional Euclidean space, and be the Borel -algebra on . A random variable () defined on some probability space is a map . We denote a sequence of by , and their values by , with , for simplicity. If and , we use the notation , and if , we use the notation . The distribution of the on is denoted by . The conditional distribution of given is denoted by . The transpose of a matrix or vector is denoted by . The covariance of a random vector is denoted by . For a square matrix , we denote the diagonal by , where denotes the th eigenvalue of matrix , its determinant by , its trace by , and its rank by . We denote by (respectively, ) a symmetric positive-definite matrix (respectively, symmetric positive-semidefinite matrix). The statement means that is positive semidefinite. We denote identity matrix by . We denote by any or a vector that is Gaussian distributed. We denote by the discrete entropy and by the differential entropy. denotes the relative entropy of probability distribution with respect to probability distribution . We denote by the absolute value of a quantity in the logarithm.

Ii Problem Statement

In this paper we consider the zero-delay source coding setting illustrated in Fig. 1. In this setting, the -valued Gaussian source is governed by the following discrete-time linear time-invariant Gaussian state-space model

(1)

where and are deterministic matrices, is the initial state, , , is an Gaussian sequence, independent of .

The system operates as follows. At every time step , the encoder observes the vector source and produces a single binary codeword from a predefined set of codewords of at most countable number of codewords. Since the source is random, and its length (in bits) are random variables. Upon receiving , the decoder produces an estimate of the source sample , under the assumption that is already reproduced. We assume that both the encoder and decoder process information without delay.

Fig. 1: A zero-delay source coding scenario using variable-length binary codewords.

The analysis of the noiseless digital channel is restricted to the class of instantaneous variable-length binary codes . The countable set of all codewords (codebook) is time-varying to allow the binary representation to be an arbitrarily long sequence.

Zero-delay source coding

Formally, the zero-delay source coding problem of Fig. 1 can be explained as follows. Define the input and output alphabet of the noiseless digital channel by where (possibly infinite). The elements in enumerate the codewords of . The encoder is specified by the sequence of measurable functions with . At time , the output of the encoder is a message with which is transmitted through a noiseless channel to the decoder. The decoder is specified by the sequence of measurable functions with . For each , the decoder generates with assuming is already generated.

Asymptotic distortion constraint

The design in Fig. 1 is required to yield an asymptotic average distortion , where is the pre-specified distortion level, . For the asymptotic regime, the objective is to minimize the expected average codeword length, i.e., the total number of bits received by the decoder at the time it reproduces , denoted by , over all measurable zero-delay encoding and decoding functions . We denote by the accumulated number of bits received by the decoder at the time it reproduces the estimate .

Problem 1.

(Zero-delay vector-valued Gaussian
The previous design requirements are formally cast by the following optimization problem:

(2)

We refer to (2) as the operational zero-delay Gaussian .

Unfortunately, the solution of Problem 1 is very hard to find because it is defined over all operational codes. For this reason, in the next section we introduce a lower bound to this problem which is defined based on information theoretic quantities.

Iii Lower Bounds on Problem 1

In this section, we present known lower bounds to the operational zero-delay Gaussian of Problem 1. To do so, first we formally introduce the definitions of causal source coder, causal optimal performance theoretically attainable () function and (and its relation to [16]) assuming general Gaussian sources (although the same bounds apply to non-Gaussian sources) subject to an asymptotic distortion constraint. Then, we concentrate on the specific lower bound to Problem 1 investigated in this paper.

Causal function

In general source coding, a source encoder-decoder () pair encodes a source distributed according to with , into binary representations from which the estimate of is generated. The end-to-end effect of any pair is captured by a sequence of reproduction functions such that

Following [4] the pair is called causal if the following definition holds.

Definition 1.

(Causal reproduction coder) 
A sequence of reproduction coders , is called causal if for each ,

(3)

A causal source code is induced by a causal reproduction coder.

Next, we give the definition of the causal function [4].

Definition 2.

(Causal function) 
For , the minimum rate achievable by any causal source code with distortion not exceeding is given by the causal function defined by

(4)

We consider a source that randomly generates sequences , that we wish to reproduce or reconstruct by , subject to a distortion constraint defined by .
Source. The source distribution satisfies conditional independence

(5)

Since no initial information is assumed, the distribution at is . Also, by Bayes’ rule we obtain . Note that for model (1), (5) implies that is independent of the past reproductions .

Reproduction or “test-channel”. Suppose the reproduction of is randomly generated, according to the collection of conditional distributions, known as test-channels, by

(6)

At , no initial state information is assumed, hence . From [34, Remark 1], we know that the conditional distributions in (6), uniquely define the family of conditional distributions on parametrized by , given by

and vice-versa. By (5) and (6), we can uniquely define the joint distribution of by

(7)

In addition, from (7), we can uniquely define the marginal distribution by

and the conditional distributions .

Given the above construction of distributions, we introduce the information measure using relative entropy as follows:

(8a)
(8b)
(8c)

where follows by definition of relative entropy; is due to the Radon-Nikodym derivative theorem [34, Appendix A.C]; is due to chain rule of relative entropy; follows by definition. Often, we use either (8a) or (8c). It should be remarked that since (5) and (6) hold, then (8c) is a special case of directed information from to (see [35]).

Next, we formally define the Gaussian subject to a distortion. Recall that the following definition was announced in [16] for general distortion functions (including distortions) and [17] for pointwise distortion functions.

Definition 3.

(Asymptotic Gaussian subject to a distortion) 
For the fixed Gaussian source of (5), and a distortion the following holds.
(1) The finite-time is defined by

(9)

assuming the infimum is achieved in the set.
(2) The asymptotic limit of (9) is defined by

(10)

assuming the infimum is achieved in the set and the limit exists and it is finite.

If one interchanges to in (10), then an upper bound to is obtained, defined as follows:

(11)

where denotes the sequence of conditional probability distributions .

Next, we discuss some properties of the that can be extracted from different references. First, it can be shown that the optimization problem (9), in contrast to the one of (2), is convex with respect to the test channel, for . Moreover, under mild conditions (given in [34, Theorem 15]), when the source is not necessarily Gaussian, the infimum is achieved and the is finite. By the structural properties of the test channel derived in [20, Theorem 1], if the source is first-order Markov, i.e., with distribution , the test channel distribution is of the form . Finally, combining this structural result, with [36, Theorem 1.8.6], it can be shown that if is Gaussian then a jointly Gaussian process achieves a smaller value of the , and if is Gaussian and Markov, then the infimum in the can be restricted to test channel distributions which are Gaussian, of the form , with linear mean in and conditional covariance which is non-random, . The above results are also derived in [17, Theorem 5] for pointwise distortion constraint.

In view of the above results, the following hold.

Problem 2.

(A lower bound on Problem 1
Consider the vector-valued Gaussian source model in (1). Then, the finite-time Gaussian is characterized by the expression

(12)

provided the infimum is achieved in the set.
The asymptotic limit of (12) is defined as:

(13)

provided the infimum is achieved in the set and the limit exists and it is finite. If the source model of (1) is stationary (or asymptotically stationary) then (see [17, Theorem 4]), where is defined as in (11) but denotes the sequence of conditional probability distributions .

The next theorem, provides a series of inequalities that connect all previously discussed information measures in the context of Gaussian sources with asymptotic distortions.

Theorem 1.

(Inequalities) 
For Gaussian sources with asymptotic distortion constraint, the following bounds hold.

(14)

where denotes the classical [3].

Proof.

The bounds are derived in [5, eq. (11)] (the first two inequalities are also derived in [16]). The last inequality follows by definition of the operational zero-delay Gaussian and causal function. ∎

In the next remark, we state a bound on for unstable Gauss-Markov sources and asymptotic distortion.

Remark 1.

(Bound on unstable -valued Gauss-Markov sources) 
Consider the time-invariant vector-valued Gauss-Markov source of (1) where has eigenvalues with magnitude greater than one and the asymptotic distortion. Then from [37]

(15)

Iii-a Characterization of Problem 2 via filtering and Markov realization

The what follows, we leverage the Markov realization of the optimal test-channel that corresponds to Problem 2, (12) to provide the complete characterization of Problem 2. We note that the following two results are derived in [22] but we provide them herein for completeness.

The first result serves as an intermediate step towards the complete characterization of Problem 2 and is a simple extension of the result derived for the scalar case in [21, Lemma 1] hence we omit its proof.

Lemma 1.

(Realization of
Consider the class of test channels . Then, the following statements hold.
(1) Any candidate of is realized by the recursion

(16)

where is an independent Gaussian process independent of and , and are time-varying deterministic matrices.
Moreover, the innovations process of (16) is the orthogonal process defined by

(17)

where and .
(2) Let and . Then, satisfy the following vector-valued equations:

(18a)
(18b)
(18c)
(18d)
(18e)

where and .
(3) is given by

for some and .

The next theorem uses Lemma 1 to identify the missing parameters in the realization of [17, Theorem 5] and, therefore, to provide the complete characterization of .

Theorem 2.

(Characterization of Gaussian
Consider Problem 2, (12). Then, the following holds.
(1) The “test channel” and is realized by

(20)

where

(21a)
(21b)
(21c)

(2) Moreover, the above realization yields in Lemma 1

(22)

(3) The characterization of is

(23a)
s.t. (23b)
(23c)

for some .

Proof.

The proof is derived in [22].∎

Next, we give sufficient conditions for existence of solution to Theorem 2, (3).

Remark 2.

(Existence of solution of (23)) 
A sufficient condition for existence of solution with finite value in (23) is to consider the strict linear matrix inequality () constraint in (23b), i.e., , because otherwise the value of takes the value of . Then, by construction, the minimization problem of (23) is strictly feasible, i.e., there always exists an optimal solution with finite value. The strict further means that (non-zero distortion) and also . Then, from (18b) the following conditions on matrices and are sufficient for existence of a finite solution:

(24)

Iv Asymptotic Feedback Realization Scheme via Kalman Filtering for Problem 2

In this section, we leverage results from §III-A to show that the asymptotic limit of (23) exists and it is finite. Then, we propose a new alternative realization scheme that makes use of joint diagonalization matrices, reverse-waterfilling design parameters by means of an innovations encoder, an additive Gaussian channel, and a decoder which includes a Kalman filter. Recall that our feedback realization scheme is fundamentally different compared to the approach considered in [19] because it builts upon the realization scheme of [17, Theorem 5] whereas the one in [19] makes use of the so called “sensor-estimation separation principle”.

In the first result of this section, we provide sufficient conditions to show existence of solution with finite value to the asymptotic limit of (23) and then we give the asymptotic characterization of Theorem 2.

Theorem 3.

(Existence of solution to the asymptotic characterization of (23)) 
Suppose condition (24) holds and the optimal test channel distribution is restricted to be time invariant and there is a unique invariant distribution of the transition probability . Then, the following statements hold.
(1) The limit

(25)

i.e., exists and it is finite, if the solution of the limiting problem for , is finite, given by

(26)

where are the corresponding time-invariant values of and , respectively.
(2) The asymptotic limit of is realized by

(27)

where ,

(28a)
(28b)
(28c)
and () are the corresponding time-invariant values of and , respectively.
Proof.

(1) Observe that the sequence is sub-additive (see [16, Lemma 1]). Hence, the limit in (25) always exists (although it can be infinite). However, since we assumed the optimal test channel distribution is time-invariant and there is a unique invariant distribution, we ensure that the limit is finite. The last part follows because we assume is time-invariant. (2) follows from (1) and Theorem 2. This completes the proof. ∎

The minimization problem of Theorem 3, (26) can be solved by employing, for instance, Karush-Kuhn-Tucker () conditions [22] or semidefinite programming algorithm [19]. We wish to remark that sufficient conditions which do not require the test channel in Theorem 3 to be time-invariant, can be identified upon solving the conditions of the finite-time Gaussian of Theorem 2 (through the solutions of the Riccati equations). This method is employed in [22]. In the next theorem, we evaluate numerically the optimization problem (26) by providing two equivalent semidefinite programming representations of . The first is similar to the one derived in [19, equation (27)] whereas the second is new. The utility of each of these semidefinite representations will be perceived in the sequel.

Lemma 2.

(Optimal solution of
Suppose the conditions of Theorem 3 are satisfied. Then, the following statements hold.

Suppose matrix is full rank. Introduce the variable , where . Then, for , is semidefinite representable as follows:

(29a)
s.t. (29b)
(29c)
(29d)

Suppose matrix is full rank. Introduce the decision variable , where . Then, for , is semidefinite representable as follows:

(30a)
s.t. (30b)
(30c)
(30d)
Proof.

The derivation of the semidefinite representation of (1) follows similar to [19, equation (27)], hence we omit it. The derivation of (2) is given in Appendix A. ∎

The two semidefinite representations of