ZeroDelay Rate Distortion via Filtering for VectorValued Gaussian Sources
Abstract
We deal with zerodelay source coding of a vectorvalued GaussMarkov source subject to a meansquared error () fidelity criterion characterized by the operational zerodelay vectorvalued Gaussian rate distortion function (). We address this problem by considering the nonanticipative () which is a lower bound to the causal optimal performance theoretically attainable () function (or simply causal ) and operational zerodelay . We recall the realization that corresponds to the optimal “testchannel” of the Gaussian , when considering a vector GaussMarkov source subject to a distortion in the finite time horizon. Then, we introduce sufficient conditions to show existence of solution for this problem in the infinite time horizon (or asymptotic regime). For the asymptotic regime, we use the asymptotic characterization of the Gaussian to provide a new equivalent realization scheme with feedback which is characterized by a resource allocation (reversewaterfilling) problem across the dimension of the vector source. We leverage the new realization to derive a predictive coding scheme via lattice quantization with subtractive dither and joint memoryless entropy coding. This coding scheme offers an upper bound to the operational zerodelay vectorvalued Gaussian . When we use scalar quantization, then for active dimensions of the vector GaussMarkov source the gap between the obtained lower and theoretical upper bounds is less than or equal to bits/vector. However, we further show that it is possible when we use vector quantization, and assume infinite dimensional GaussMarkov sources to make the previous gap to be negligible, i.e., Gaussian approximates the operational zerodelay Gaussian . We also extend our results to vectorvalued Gaussian sources of any finite memory under mild conditions. Our theoretical framework is demonstrated with illustrative numerical experiments.
I Introduction
Rate distortion theory describes the fundamental limits between the desired bitrate and the associated achievable distortion or vice versa, for a specific source and distortion measure [3]. The source coders and decoders, which are able to get very close to the fundamental ratedistortion limits are generally computationally expensive, noncausal, and tends to impose long delays on the endtoend processing of information. When source coding is to be part of a bigger infrastructure such as distributed data processing over sensor networks, networked control systems, etc., there will often be strict requirements on the tolerable delay and system complexity. This necessitates realtime communication between the systems involved whereas delays play a critical role on the performance or even the stability of these systems.
To achieve near instantaneous encoding and decoding, it is necessary that the source encoder and decoder are causal [4]. Unfortunately, causality comes with a price. In particular, it was shown in [5] that imposing causality on the coder results in an increase in the bitrate due to the quantizer’s spacefilling loss and reduced denoising capabilities due to causal filtering at the decoder. If zerodelay is furthermore imposed, there will be an additional increase in the bitrate due to having a finite (and often small) alphabet in the entropy coder [5].
In applications where both instantaneous encoding and decoding are required, it is common to use the term zerodelay source coding [6]. Zerodelay source coding is particularly relevant for networked control systems, where an unstable plant is to be stabilized via a communication channel. At each time step, the feedback signal of the plant needs to be encoded, transmitted over a channel, decoded, and reproduced at the controller’s side. Some indicative works on zerodelay source coding for control systems can be found, for instance, in [7, 8, 9, 10, 11, 12, 13, 14].
In the field of information theory, there is a tradition to establish achievability of a certain ratedistortion performance by showing a construction based on random codebooks, which requires asymptotically large source vector dimensions [15]. However, in the case of zerodelay source coding, the random coding based technique is often not applicable. Indeed, the optimal ratedistortion performance for zerodelay source coding, hereinafter called zerodelay rate distortion, is hard to establish and is, for example, not known for the case of general Gaussian sources subject to a mean squared error () distortion, whereas the noncausal classical rate distortion function () is, in general, known. To overcome the computational complexity of the zerodelay , there has been a turn in studying variants of classical that are lower bounds to the zerodelay . One such variant is the socalled nonanticipative () also found as nonanticipatory entropy and sequential in the literature. The was first introduced in [16] and extensively analysed for GaussMarkov sources in [17]. In [17, Theorem 5], the authors derived a partial characterization (because certain parameters are not found) for for timevarying vectorvalued GaussMarkov sources with squareerror distortion function, by providing a parametric realization of the test channel conditional distribution of the reproduction process, that is firstorder Markov with respect to source symbols and depends only on the previous reproduction symbol. Moreover, in [17, Examples 1, 2] the authors derive the complete characterization of the , for timevarying and stationary scalarvalued Gaussian firstorder autoregressive () processes, with pointwise or perletter mean squarederror () distortion fidelity and gave the expression of finitetime in terms of a reversewaterfilling at each time instant and the corresponding expression in the asymptotic regime. Tatikonda et al. in [8] leverage the results of [17, Examples 1, 2] and applied the asymptotic to compute the Gaussian for timeinvariant scalarvalued Gaussian sources with an asymptotic distortion constraint. In addition, they gave a parametric expression of the for timeinvariant vectorvalued GaussMarkov sources, that is described by a reversewaterfilling algorithm which is unfortunately suboptimal (the suboptimality is demonstrated via a counterexample in [18]). It should be noted that in [8] and also [18] the authors do not attempt to identify the parameters of the realization given in [17, Theorem 5]. Derpich and stergaard in [5] considered variants of for stationary scalarvalued Gaussian autoregressive models of any order with pointwise distortion fidelity and computed the asymptotic expression of the Gaussian for stationary scalarvalued Gaussian sources which was first derived in [17, Equation (1.43)] using alternative methods. Tanaka et al. in [19] revisited the finitetime for vectorvalued GaussMarkov sources with pointwise distortion fidelity following the line of work of [8] and showed that the resulting optimization problem is semidefinite representable, thus, it can be solved numerically. However, none of the previously discussed works, i.e., [8, 5, 19, 18] provide a realization of the test channel conditional distribution that achieves the or attempt to identify the parameters in the realization given in [17, Theorem 5]. Stavrou et al. in [20] considered the of the timevarying vectorvalued GaussMarkov source under a total distortion fidelity, and gave a suboptimal realization of the design coefficient in the parametric realization given in [17, Theorem 5]. Further, in [20, Theorem 2] the computation of the finitetime via a dynamic reversewaterfilling algorithm is not optimal (this is explained in [21]). Recently, in [22] the authors computed the finitetime for vectorvalued GaussMarkov sources subject to a total and perletter distortion constraint, using convex optimization techniques and gave a parametric solution via a dynamic reversewaterfilling algorithm, that identifies the parametric realization given in [17, Theorem 5]. The results obtained in [22] did not consider the asymptotic regime.
The signal processing approaches to source coding can roughly be classified into transform coding [23, 24], filterbanks [25, 26], and predictive coders [27, 28, 29, 25]. A transform can be put on a matrix form, which is multiplied on the signal vector. Clearly, this operation is only causal if the matrix is lower triangular (when multiplied on the left hand side of the signal vector). Low delay filters have been considered in [30], and zerodelay filtering in [31, 20]. Predictive coders usually operates directly on the timedomain signal, and can easily be made causal (and of zerodelay) by simply only making use of the current and past samples of the source signal. Recently, it has been shown that the causal of a stationary colored scalar Gaussian process can be achieved by causal prediction and noiseshaping [5].^{1}^{1}1This result parallels that of [32], where it was shown that the noncausal of a stationary colored scalar Gaussian process under can be achieved by (noncausal) prediction.
In this paper, we deal with zerodelay source coding of a vectorvalued GaussMarkov source expressed in state space form subject to a distortion constraint. We recall the complete characterization of the finitetime for timevarying GaussMarkov sources subject to a total distortion developed for scalarvalued sources in [21] and for vectorvalued sources in [22] to obtain the following results.

Sufficient conditions to ensure by construction existence of solution of the per unit time asymptotic limit of the finitetime Gaussian . The asymptotic Gaussian provides a lower bound to the operational zerodelay vectorvalued Gaussian .

A new feedback realization scheme that corresponds to the optimal test channel of the asymptotic Gaussian . This scheme is characterized by a resource allocation problem across the dimension of the source.

A coding scheme based on predictive coding which is applied to this feedback realization scheme using scalar or vector quantization and joint entropy coding separately across every dimension of the vectorvalued GaussMarkov source. This scheme provides an achievable (upper) bound to the operational zerodelay vectorvalued Gaussian .

Several numerical examples that demonstrate our theoretical framework. These examples take into account both stable and unstable Gaussian sources.
In addition to the previous main results, we explain how our scheme can be generalized to vectorvalued GaussMarkov processes of any finite order.
The new feedback realization scheme has a Kalman filter in the feedback loop. The feedback loop serves two purposes; if the Gaussian source is unstable then the filter with the help of the feedback loop tracks it while the estimation error converges, and it removes most of the source redundancy along the temporal direction by means of closedloop vector prediction. On the other hand, the feedforward path transforms the residual (innovations) vector Gaussian source into a new vector source, which has independent spatial components, and thereby can be efficiently encoded by applying for example scalar quantization and joint entropy coding separately across each dimension of the vector. Our construction makes use of simple building blocks such as nonsingular joint diagonalizers (KLT matrices), diagonal scaling matrices, Kalman filters, and scalar (or lattice) quantization. It also demonstrates the resource allocation of the source signals depending on the data rate budget. This means that our scheme demonstrates which dimensions are active when the reversewaterfilling kicks in. This issue and the complete machinery to obtain theoretical lower and upper bounds as well as the operational rates to the zerodelay Gaussian is not demonstrated in the recent works of [1, 2].
Our coding results demonstrate that when we use scalar quantization, the gap between the obtained lower and theoretical upper bounds is less than or equal to bits/vector where denotes the number of active dimensions of the vectorvalued GaussMarkov source. Moreover, at high rates our simulation experiments demonstrate that the gap between the lower bound and the operational rates mitigates to approximately bits/vector. For vector quantization, we show that in the limit of asymptotically large vector dimensions, it is possible for the causal and zerodelay to coincide with the Gaussian .
It should be noted that our realization scheme can be paralleled to the work developed in [33, Chapter 11] (see also the references therein) where various (possibly partially observable) source signals are communicated via an observer or controller over parallel Gaussian channels with spatially independent delays. Compared to that framework we investigate perfect prediction in the sense that we do not take into account issues like data dropouts or delays within the parallel channels or even conditions for stability of the estimator. Potentially, one can leverage our framework to investigate similar problems to [33, Chapter 11].
This paper is structured as follows. In §II we characterize the zerodelay source coding problem for vectorvalued GaussMarkov sources subject to an asymptotic distortion constraint in terms of zerodelay Gaussian . In §III we give known lower bounds to zerodelay Gaussian using general Gaussian sources subject to a distortion whereas in §IIIA we concentrate on the of vectorvalued GaussMarkov source. In §IV we show existence of solution to the asymptotic Gaussian and we provide a new feedback realization scheme that corresponds to the asymptotic testchannel of this problem. §V derives upper bounds to the zerodelay Gaussian in terms of the Gaussian using scalar and vector quantization with memoryless entropy coding. In §VI we demonstrate our theoretical framework via several numerical experiments. We draw conclusions and discuss future directions in §VII.
Notation
denotes the set of real numbers, the set of integers, the set of natural numbers including zero, and . Let be a finite dimensional Euclidean space, and be the Borel algebra on . A random variable () defined on some probability space is a map . We denote a sequence of by , and their values by , with , for simplicity. If and , we use the notation , and if , we use the notation . The distribution of the on is denoted by . The conditional distribution of given is denoted by . The transpose of a matrix or vector is denoted by . The covariance of a random vector is denoted by . For a square matrix , we denote the diagonal by , where denotes the th eigenvalue of matrix , its determinant by , its trace by , and its rank by . We denote by (respectively, ) a symmetric positivedefinite matrix (respectively, symmetric positivesemidefinite matrix). The statement means that is positive semidefinite. We denote identity matrix by . We denote by any or a vector that is Gaussian distributed. We denote by the discrete entropy and by the differential entropy. denotes the relative entropy of probability distribution with respect to probability distribution . We denote by the absolute value of a quantity in the logarithm.
Ii Problem Statement
In this paper we consider the zerodelay source coding setting illustrated in Fig. 1. In this setting, the valued Gaussian source is governed by the following discretetime linear timeinvariant Gaussian statespace model
(1) 
where and are deterministic matrices, is the initial state, , , is an Gaussian sequence, independent of .
The system operates as follows. At every time step , the encoder observes the vector source and produces a single binary codeword from a predefined set of codewords of at most countable number of codewords. Since the source is random, and its length (in bits) are random variables. Upon receiving , the decoder produces an estimate of the source sample , under the assumption that is already reproduced. We assume that both the encoder and decoder process information without delay.
The analysis of the noiseless digital channel is restricted to the class of instantaneous variablelength binary codes . The countable set of all codewords (codebook) is timevarying to allow the binary representation to be an arbitrarily long sequence.
Zerodelay source coding
Formally, the zerodelay source coding problem of Fig. 1 can be explained as follows. Define the input and output alphabet of the noiseless digital channel by where (possibly infinite). The elements in enumerate the codewords of . The encoder is specified by the sequence of measurable functions with . At time , the output of the encoder is a message with which is transmitted through a noiseless channel to the decoder. The decoder is specified by the sequence of measurable functions with . For each , the decoder generates with assuming is already generated.
Asymptotic distortion constraint
The design in Fig. 1 is required to yield an asymptotic average distortion , where is the prespecified distortion level, . For the asymptotic regime, the objective is to minimize the expected average codeword length, i.e., the total number of bits received by the decoder at the time it reproduces , denoted by , over all measurable zerodelay encoding and decoding functions . We denote by the accumulated number of bits received by the decoder at the time it reproduces the estimate .
Problem 1.
(Zerodelay vectorvalued Gaussian )
The previous design requirements are formally cast by the following optimization problem:
(2)  
We refer to (2) as the operational zerodelay Gaussian .
Unfortunately, the solution of Problem 1 is very hard to find because it is defined over all operational codes. For this reason, in the next section we introduce a lower bound to this problem which is defined based on information theoretic quantities.
Iii Lower Bounds on Problem 1
In this section, we present known lower bounds to the operational zerodelay Gaussian of Problem 1. To do so, first we formally introduce the definitions of causal source coder, causal optimal performance theoretically attainable () function and (and its relation to [16]) assuming general Gaussian sources (although the same bounds apply to nonGaussian sources) subject to an asymptotic distortion constraint. Then, we concentrate on the specific lower bound to Problem 1 investigated in this paper.
Causal function
In general source coding, a source encoderdecoder () pair encodes a source distributed according to with , into binary representations from which the estimate of is generated. The endtoend effect of any pair is captured by a sequence of reproduction functions such that
Following [4] the pair is called causal if the following definition holds.
Definition 1.
(Causal reproduction coder)
A sequence of reproduction coders , is called causal if for each ,
(3) 
A causal source code is induced by a causal reproduction coder.
Next, we give the definition of the causal function [4].
Definition 2.
(Causal function)
For , the minimum rate achievable by any causal source code with distortion not exceeding is given by the causal function defined by
(4)  
We consider a source that randomly generates sequences , that we wish to reproduce or reconstruct by , subject to a distortion constraint defined by .
Source. The source distribution satisfies conditional independence
(5) 
Since no initial information is assumed, the distribution at is . Also, by Bayes’ rule we obtain . Note that for model (1), (5) implies that is independent of the past reproductions .
Reproduction or “testchannel”. Suppose the reproduction of is randomly generated, according to the collection of conditional distributions, known as testchannels, by
(6) 
At , no initial state information is assumed, hence . From [34, Remark 1], we know that the conditional distributions in (6), uniquely define the family of conditional distributions on parametrized by , given by
and viceversa. By (5) and (6), we can uniquely define the joint distribution of by
(7) 
In addition, from (7), we can uniquely define the marginal distribution by
and the conditional distributions , .
Given the above construction of distributions, we introduce the information measure using relative entropy as follows:
(8a)  
(8b)  
(8c) 
where follows by definition of relative entropy; is due to the RadonNikodym derivative theorem [34, Appendix A.C]; is due to chain rule of relative entropy; follows by definition. Often, we use either (8a) or (8c). It should be remarked that since (5) and (6) hold, then (8c) is a special case of directed information from to (see [35]).
Next, we formally define the Gaussian subject to a distortion. Recall that the following definition was announced in [16] for general distortion functions (including distortions) and [17] for pointwise distortion functions.
Definition 3.
(Asymptotic Gaussian subject to a distortion)
For the fixed Gaussian source of (5), and a distortion the following holds.
(1) The finitetime is defined by
(9) 
assuming the infimum is achieved in the set.
(2) The asymptotic limit of (9) is defined by
(10) 
assuming the infimum is achieved in the set and the limit exists and it is finite.
If one interchanges to in (10), then an upper bound to is obtained, defined as follows:
(11)  
where denotes the sequence of conditional probability distributions .
Next, we discuss some properties of the that can be extracted from different references. First, it can be shown that the optimization problem (9), in contrast to the one of (2), is convex with respect to the test channel, for . Moreover, under mild conditions (given in [34, Theorem 15]), when the source is not necessarily Gaussian, the infimum is achieved and the is finite. By the structural properties of the test channel derived in [20, Theorem 1], if the source is firstorder Markov, i.e., with distribution , the test channel distribution is of the form . Finally, combining this structural result, with [36, Theorem 1.8.6], it can be shown that if is Gaussian then a jointly Gaussian process achieves a smaller value of the , and if is Gaussian and Markov, then the infimum in the can be restricted to test channel distributions which are Gaussian, of the form , with linear mean in and conditional covariance which is nonrandom, . The above results are also derived in [17, Theorem 5] for pointwise distortion constraint.
In view of the above results, the following hold.
Problem 2.
(A lower bound on Problem 1)
Consider the vectorvalued Gaussian source model in (1). Then, the finitetime Gaussian is characterized by the expression
(12) 
provided the infimum is achieved in the set.
The asymptotic limit of (12) is defined as:
(13) 
provided the infimum is achieved in the set and the limit exists and it is finite. If the source model of (1) is stationary (or asymptotically stationary) then (see [17, Theorem 4]), where is defined as in (11) but denotes the sequence of conditional probability distributions .
The next theorem, provides a series of inequalities that connect all previously discussed information measures in the context of Gaussian sources with asymptotic distortions.
Theorem 1.
(Inequalities)
For Gaussian sources with asymptotic distortion constraint, the following bounds hold.
(14) 
where denotes the classical [3].
Proof.
In the next remark, we state a bound on for unstable GaussMarkov sources and asymptotic distortion.
Iiia Characterization of Problem 2 via filtering and Markov realization
The what follows, we leverage the Markov realization of the optimal testchannel that corresponds to Problem 2, (12) to provide the complete characterization of Problem 2. We note that the following two results are derived in [22] but we provide them herein for completeness.
The first result serves as an intermediate step towards the complete characterization of Problem 2 and is a simple extension of the result derived for the scalar case in [21, Lemma 1] hence we omit its proof.
Lemma 1.
(Realization of )
Consider the class of test channels . Then, the following statements hold.
(1) Any candidate of is realized by the recursion
(16) 
where , is an independent Gaussian process independent of and , and are timevarying deterministic matrices.
Moreover, the innovations process of (16) is the orthogonal process defined by
(17) 
where , and .
(2) Let and . Then, satisfy the following vectorvalued equations:
(18a)  
(18b)  
(18c)  
(18d)  
(18e) 
where and .
(3) is given by
for some and .
The next theorem uses Lemma 1 to identify the missing parameters in the realization of [17, Theorem 5] and, therefore, to provide the complete characterization of .
Theorem 2.
Proof.
The proof is derived in [22].∎
Next, we give sufficient conditions for existence of solution to Theorem 2, (3).
Remark 2.
(Existence of solution of (23))
A sufficient condition for existence of solution with finite value in (23) is to consider the strict linear matrix inequality () constraint in (23b), i.e., , because otherwise the value of takes the value of . Then, by construction, the minimization problem of (23) is strictly feasible, i.e., there always exists an optimal solution with finite value. The strict further means that (nonzero distortion) and also . Then, from (18b) the following conditions on matrices and are sufficient for existence of a finite solution:
(24) 
Iv Asymptotic Feedback Realization Scheme via Kalman Filtering for Problem 2
In this section, we leverage results from §IIIA to show that the asymptotic limit of (23) exists and it is finite. Then, we propose a new alternative realization scheme that makes use of joint diagonalization matrices, reversewaterfilling design parameters by means of an innovations encoder, an additive Gaussian channel, and a decoder which includes a Kalman filter. Recall that our feedback realization scheme is fundamentally different compared to the approach considered in [19] because it builts upon the realization scheme of [17, Theorem 5] whereas the one in [19] makes use of the so called “sensorestimation separation principle”.
In the first result of this section, we provide sufficient conditions to show existence of solution with finite value to the asymptotic limit of (23) and then we give the asymptotic characterization of Theorem 2.
Theorem 3.
(Existence of solution to the asymptotic characterization of (23))
Suppose condition (24) holds and the optimal test channel distribution is restricted to be time invariant and there is a unique invariant distribution of the transition probability . Then, the following statements hold.
(1) The limit
(25) 
i.e., exists and it is finite, if the solution of the limiting problem for , is finite, given by
(26) 
where are the corresponding timeinvariant values of and , respectively.
(2) The asymptotic limit of is realized by
(27) 
where ,
(28a)  
(28b)  
(28c)  
and (, ) are the corresponding timeinvariant values of and , respectively. 
Proof.
(1) Observe that the sequence is subadditive (see [16, Lemma 1]). Hence, the limit in (25) always exists (although it can be infinite). However, since we assumed the optimal test channel distribution is timeinvariant and there is a unique invariant distribution, we ensure that the limit is finite. The last part follows because we assume is timeinvariant. (2) follows from (1) and Theorem 2. This completes the proof. ∎
The minimization problem of Theorem 3, (26) can be solved by employing, for instance, KarushKuhnTucker () conditions [22] or semidefinite programming algorithm [19]. We wish to remark that sufficient conditions which do not require the test channel in Theorem 3 to be timeinvariant, can be identified upon solving the conditions of the finitetime Gaussian of Theorem 2 (through the solutions of the Riccati equations). This method is employed in [22]. In the next theorem, we evaluate numerically the optimization problem (26) by providing two equivalent semidefinite programming representations of . The first is similar to the one derived in [19, equation (27)] whereas the second is new. The utility of each of these semidefinite representations will be perceived in the sequel.
Lemma 2.
(Optimal solution of )
Suppose the conditions of Theorem 3 are satisfied. Then, the following statements hold.
Suppose matrix is full rank. Introduce the variable , where . Then, for , is semidefinite representable as follows:
(29a)  
s.t.  (29b)  
(29c)  
(29d) 
Suppose matrix is full rank. Introduce the decision variable , where . Then, for , is semidefinite representable as follows:
(30a)  
s.t.  (30b)  
(30c)  
(30d) 
Proof.
The two semidefinite representations of