Semidefinite Programming Approach to Gaussian Sequential Rate-Distortion Trade-offs
Sequential rate-distortion (SRD) theory provides a framework for studying the fundamental trade-off between data-rate and data-quality in real-time communication systems. In this paper, we consider the SRD problem for multi-dimensional time-varying Gauss-Markov processes under mean-square distortion criteria. We first revisit the sensor-estimator separation principle, which asserts that considered SRD problem is equivalent to a joint sensor and estimator design problem in which data-rate of the sensor output is minimized while the estimator’s performance satisfies the distortion criteria. We then show that the optimal joint design can be performed by semidefinite programming. A semidefinite representation of the corresponding SRD function is obtained. Implications of the obtained result in the context of zero-delay source coding theory and applications to networked control theory are also discussed.
In this paper, we study a fundamental performance limitation of zero-delay communication systems using the sequential rate-distortion (SRD) theory. Suppose that is an -valued discrete time random process with known statistical properties. At every time step, the encoder observes a realization of the source and generates a binary sequence of length , which is transmitted to the decoder. The decoder produces an estimation of based on the messages received up to time . Both encoder and decoder have infinite memories of the past. A zero-delay communication system is determined by a selected encoder-decoder pair, whose performance is analyzed in the trade-off between the rate (viz. the average number of bits that must be transmitted per time step) and the distortion (viz. the discrepancy between the source signal and the reproduced signal ). The region in the rate-distortion plane achievable by a zero-delay communication system is referred to as the zero-delay rate-distortion region.111Formal definition of the zero-delay rate-distortion region is given in Section VI-A.
The standard rate-distortion region identified by Shannon only provides a conservative outer bound of the zero-delay rate-distortion region. This is because, in general, achieving the standard rate-distortion region requires the use of anticipative (non-causal) codes (e.g., [1, Theorem 10.2.1]). It is well known that the standard rate-distortion region can be expressed by the rate-distortion function222This quantity is defined by the infimum of the mutual information between the source and the reproduction subject to the distortion constraint [1, Theorem 10.2.1]. for general sources. In contrast, description of the zero-delay rate-distortion region requires more case-dependent knowledge of the optimal source coding schemes. For scalar memoryless sources, it is shown that the optimal performance of zero-delay codes is achievable by a scalar quantizer . Witsenhausen  showed that for the -th order Markov sources, there exists an optimal zero-delay quantizer with memory structure of order . Neuhoff and Gilbert considered entropy-coded quantizers within the class of causal source codes , and showed that for memoryless sources, the optimal performance is achievable by time-sharing memoryless codes. This result is extended to sources with memory in . An optimal memory structure of zero-delay quantizers for partially observable Markov processes on abstract (Polish) spaces is identified in . The rate of finite-delay source codes for general sources and general distortion measures is analyzed in . Zero-delay or finite-delay joint source-channel coding problems have also been studied in the literature; [8, 9, 10, 11] to name a few.
In [12, 13], Tatikonda et al. studied the zero-delay rate-distortion region using a quantity called sequential rate-distortion function,333Closely related or apparently equivalent notions to the sequential rate-distortion function have been given various names in the literature, including nonanticipatory -entropy , constrained distortion rate function , causal rate-distortion function , and nonanticipative rate-distortion function . which is defined as the infimum of the Massey’s directed information  from the source process to the reproduction process subject to the distortion constraint. Although the SRD function does not coincide with the boundary of the zero-delay rate-distortion region in general, it is recently shown that the SRD function provides a tight outer bound of the zero-delay rate-distortion region achievable by uniquely decodable codes [19, 16]. This observation shows an intimate connection between the SRD function and the fundamental performance limitations of real-time communication systems. For this reason, we consider the SRD function as the main object of interest in this paper.
Closely related quantity to the SRD function was studied by Gorbunov and Pinsker  in the early 1970’s. Bucy  derived the SRD function for Gauss-Markov processes in a simple case. In his approach, the problem of deriving the SRD function for Gauss-Markov processes under mean-square distortion criteria (which henceforth will be simply referred to as the Gaussian SRD problem) is viewed as a sensor-estimator joint design problem to minimize the estimation error subject to the data-rate constraint. This approach is justified by the “sensor-estimator separation principle,” which asserts that an optimal solution (i.e., the optimal stochastic kernel, to be made precise in the sequel) to the Gaussian SRD problem is realizable by a two-stage mechanism with a linear-Gaussian memoryless sensor and the Kalman filter. Although this fact is implicitly shown in [12, 13], for completeness, we reproduce a proof in this paper based on a technique used in [12, 13].
The sensor-estimator separation principle gives us a structural understanding of the Gaussian SRD problem. In particular, based on this principle, we show that the Gaussian SRD problem can be formulated as a semidefinite programming problem (Theorem 1), which is the main contribution of this paper. We derive a computationally accessible form (namely a semidefinite representation444To be precise, we show that the exponentiated SRD function for multidimensional Gauss-Markov source is semidefinite representable by (27). ) of the SRD function, and provide an efficient algorithm to solve Gaussian SRD problems numerically.
The semidefinite representation of the SRD function may be compared with an alternative analytical approach via Duncan’s theorem, which states that “twice the mutual information is merely the integration of the trace of the optimal mean square filtering error” . Duncan’s result was significantly generalized as the “I-MMSE” relationships in non-causal  and causal  estimation problems. Our SDP-based approaches are applicable to the cases with multi-dimensional and time-varying Gauss-Markov sources to which the existing I-MMSE formulas cannot be applied straightforwardly. Although we focus on the Gaussian SRD problems in this paper, we note that the standard RD and SRD problems for general sources and distortion measures in abstract (Polish) spaces are discussed in  and, respectively.
This paper is organized as follows. In Section II, we formally introduce the Gaussian SRD problem, which is the main problem considered in this paper. In Section III, we show that the Gaussian SRD problem is equivalent to what we call the linear-Gaussian sensor design problem, which formally establishes the sensor-estimator separation principle. Then, in Section IV, we show that the linear-Gaussian sensor design problem can be reduced to an SDP problem, which thus provides us an SDP-based solution synthesis procedure for Gaussian SRD problems. Extensions to stationary and infinite horizon problems are given in Section V. In Section VI, we consider applications of SRD theory to real-time communication systems and networked control systems. Simple simulation results will be presented in Section VII. We conclude in Section VIII.
Notation: Let be an Euclidean space, and be the Borel -algebra on . Let be a probability space, and be a random variable. Throughout the paper, we use lower case boldface symbols such as to denote random variables, while is a realization of . We denote by the probability measure of defined by for every . When no confusion occurs, this measure will be also denoted by or . For a Borel measurable function , we write . For a random vector, we write or depending on the initial index, and . Let be a real symmetric matrix of size . Notations or (resp. or ) mean that is a positive definite (resp. positive semidefinite) matrix. For a positive semidefinite matrix , we write .
Ii Problem Formulation
We begin our discussion with an estimation-theoretic interpretation of a simple rate-distortion trade-off problem. Recall that a rate-distortion problem for a scalar Gaussian random variable with the mean square distortion constraint is an optimization problem of the following form:
Here, is a reproduction of the source , and denotes the mutual information between and . The minimization is over the space of reproduction policies, i.e., stochastic kernels . The optimal value of (1) is known as the rate-distortion function, , and can be explicitly obtained  as
It is also possible to write the optimal reproduction policy explicitly. To this end, consider a linear sensor
where is a Gaussian noise independent of . Also, let
be the least mean square error estimator of given . Notice that the right hand side of (3) is given by . Then, it can be shown that an optimal solution to (1) is a composition of (2) and (3), provided that the signal-to-noise ratio of the sensor (2) is chosen to be
This gives us the following notable observations:
These facts can be significantly generalized, and serve as a guideline to develop a solution synthesis for Gaussian SRD problems in this paper.
Ii-a Gaussian SRD problem
The Gaussian SRD problem can be viewed as a generalization of (1). Let be an -valued Gauss-Markov process
where and for are mutually independet Gaussian random variables. The Gaussian SRD problem is formulated as
where (6b) is imposed for every . Here, is an -valued reproduction of . The minimization (6a) is over the space of zero-delay reproduction policies of given and , i.e., the sequences of causal stochastic kernels555See Appendix A for a formal description of causal stochastic kernels. . The term is known as directed information, introduced by Massey  following Marko’s earlier work , and is defined by
The Gaussian SRD problem is visualized in Fig. 1.
Directed information measures the amount of information flow from to and is not symmetric, i.e., in general. However, when the process is causally dependent on and is not affected by , it can be shown  that . By definition of our source process (5), there is no information feedback from to , and thus holds in our setup. Hence, can be equivalently used as an objective in (P-SRD). However, we choose to use for the future considerations (e.g., ) in which is a controlled stochastic process and is dependent on . In such cases, and are not equal, and the latter is a more meaningful quantity in many applications.
Since (P-SRD) is an infinite dimensional optimization problem, it is difficult to apply numerical methods directly. Hence, we first need to develop a structural understanding of its solution. It turns out that the sensor-estimator separation principle still holds for (P-SRD), and this observation plays an important role in the subsequent sections. We are going to establish the following facts:
Fact 1’: A sensor-estimator separation principle holds for the Gaussian SRD problem. That is, an optimal policy for (P-SRD) can be realized as a composition of a sensor mechanism
where are mutually independent Gaussian random variables, and the least mean square error estimator (Kalman filter)
Fact 2’: The original optimization problem (P-SRD) over an infinite-dimensional space is reduced to an optimization problem over a finite-dimensional space of matrix-valued signal-to-noise ratios of the sensor (8), defined by
Moreover, the optimal , which depends on , can be obtained by SDP.
Unlike (4), an analytical expression of the optimal may not be available. Nevertheless, we will show that they can be easily obtained by SDP.
Ii-B Linear-Gaussian sensor design problem
In Section III, we establish the sensor-estimator separation principle. To this end, we show that (P-SRD) is equivalent to what we call the linear-Gaussian sensor design problem (P-LGS) visualized in Fig. 2. Formally, (P-LGS) is formulated as
where (11b) is imposed for every . We assume that is produced by a linear-Gaussian sensor (8), and is produced by the Kalman filter (9). In other words, the optimization domain is the space of causal stochastic kernels with a separation structure (8) and (9), which is parameterized by a sequence of matrices . Intuitively, in (11a) can be understood as the amount of information acquired by the sensor (8) at time . We call this problem a “sensor design problem” because our focus is on choosing an optimal sensing gain in (8) and the noise covariance . Notice that perfect observation with and is trivially the best to minimize the estimation error in (11b) (in fact, is achieved), but it incurs significant information cost (i.e., ), and hence it is not an optimal solution to (P-LGS).
In (P-LGS), we search for the optimal and . However, the sensor dimension is not given a priori, and choosing it optimally is part of the problem. In particular, if making no observation is the optimal sensing at some specific time instance , we should be able to recover as an optimal solution.
Although the objective functions (6a) and (11a) appear differently, it will be shown in Section III that they coincide in the domain . Moreover, in the same section it will be shown that an optimal solution to (P-SRD) can always be found in the domain . These observations imply that one can obtain an optimal solution to (P-SRD) by solving (P-LGS).
Ii-C Stationary cases
We will also consider a time-invariant system
where is an -valued random variable with , and is a stationary white Gaussian noise. We assume and . Stationary and infinite horizon version of the Gaussian SRD problem is formulated as
This is an optimization over the sequence of stochastic kernels . The optimal value of (13) as a function of the average distortion is referred to as the sequential rate-distortion function, and is denoted by .
Similarly, a stationary and infinite horizon version of the linear-Gaussian sensor design problem is formulated as
Here, we assume where is a mutually independent Gaussian stochastic process and . Design variables in (14) are . Again, determining their dimensions is part of the problem.
Ii-D Soft- vs. hard-constrained problems
Introducing Lagrange multipliers , one can also consider a soft-constrained version of (P-SRD):
Similarly to the Lagrange multiplier theorem (e.g., Proposition 3.1.1 in ), it is possible to show that there exists a set of multipliers such that an optimal solution to (15) is also an optimal solution to (P-SRD). We will prove this fact in Section IV after we establish that both (P-SRD) and (15) can be transformed as finite dimensional convex optimization problems. For this reason, we refer to both (P-SRD) and (15) as Gaussian SRD problems.
Iii Sensor-estimator separation principle
Let and be the optimal values of (P-SRD) and (P-LGS) respectively. In this section, we show that , and an optimal solution to (P-LGS) is also an optimal solution to (P-SRD). This result establishes the sensor-estimator separation principle (Fact 1’). We introduce another optimization problem (P-1), which serves as an intermediate step to establish this fact.
The optimization is over the space of linear-Gaussian stochastic kernels , where each stochastic kernel is of the form
where are some matrices with appropriate dimensions, and is a zero-mean, possibly degenerate Gaussian random variable that is independent of . Notice that . The underlying Gauss-Markov process is defined by (5). Let be the optimal value of (P-1). The next lemma claims the equivalence between (P-SRD) and (P-1).
If there exists attaining a value of the objective function in (P-SRD), then there exists attaining a value of the objective function in (P-1).
Every attaining in (P-1) also attains in (P-SRD).
Lemma 1 is the most significant result in this section, which essentially guarantees the linearity of an optimal solution to the Gaussian SRD problems. The proof of Lemma 1 can be found in Appendix B. The basic idea of proof relies on the well-known fact that Gaussian distribution maximizes entropy when covariance is fixed. This proposition appears as Lemma 4.3 in , but we modified the proof using the Radon-Nikodym derivatives so that the proof does not require the existence of probability density functions. The next lemma establishes the equivalence between (P-1) and (P-LGS).
If there exists attaining a value of the objective function in (P-1), then there exists attaining a value of the objective function in (P-LGS).
Every attaining in (P-LGS) also attains in (P-1).
Proof of Lemma 2 is in Appendix C. Combining the above two lemmas, we obtain the following consequence, which is the main proposition in this section. It guarantees that we can alternatively solve (P-LGS) in order to solve (P-SRD).
Suppose . Then there exists an optimal solution to (P-LGS). Moreover, an optimal solution to (P-LGS) is also an optimal solution to (P-SRD), and .
Iv SDP-based synthesis
In this section, we develop an efficient numerical algorithm to solve (P-LGS). Due to the preceding discussion, this is equivalent to developing an algorithm to solve (P-SRD). Let (5) be given. Assume temporarily that (8) is also fixed. The Kalman filtering formula for computing is
where is the covariance matrix of , which can be recursively computed as
Note that and guarantee that both differential entropy terms are finite. Hence, (P-LGS) is equivalent to the following optimization problem in terms of the variables :
Equality (18b) is obtained by eliminating from (17). At this point, one may note that (18) can be viewed as an optimal control problem with state and control input . Naturally, dynamic programming approach has been proposed in the literature in similar contexts [15, 12, 10, 11]. Alternatively, we next propose a method to transform (18) into an SDP problem. This allows us to solve (P-SRD) using standard SDP solvers, which is now a mature technology.
Iv-a SRD optimization as max-det problem
Now we show that (18) can be converted to a determinant maximization problem  subject to linear matrix inequality constraints. The first step is to transform (18) into an optimization problem in terms of only. This is possible by simply replacing the nonlinear equality constraint (18b) with a linear inequality constraint
This replacement eliminates from (18) giving us:
Due to the monotonicity of the determinant function, (22) is equal to the optimal value of
Applying the matrix inversion lemma, (23b) is equivalent to , which is further equivalent to
Note that (24) is a linear matrix inequality (LMI) condition. The above discussion leads to the following conclusion.
An optimal solution to (P-LGS) can be constructed by solving the following determinant maximization problem with decision variables :
where is a constant. The optimal sequence can be reconstructed from (20), from which satisfying (10) can be reconstructed via the singular value decomposition. An optimal solution to (P-LGS) is obtained as a composition of (8) and (9).
Under the assumption that for every , the max-det problem (25) is always strictly feasible and there exists an optimal solution.666To see the strict feasibility, consider for and for with sufficiently small . The constraint set defined by (25b)-(25f) can be made compact by replacing (25b) with without altering the result. Thus the existence of an optimal solution is guaranteed by the Weierstrass theorem. Invoking Proposition 1, we have thus shown by construction that there always exists an optimal solution to (P-SRD) under this assumption.
Using the same technique, the soft-constrained version of the problem (15) can be formulated as:
The next proposition claims that (25) and (26) admit the same optimal solution provided Lagrange multipliers , are chosen correctly. This further implies that, with the same choice of , two versions of the Gaussian SRD problems (P-SRD) and (15) are equivalent.
Iv-B Max-det problem as SDP
Strictly speaking, optimization problems (25) and (26) are in the class of determinant maximization problems , but not in the standard form of the SDP.777In the standard form, SDP is an optimization problem of the form . However, they can be considered as SDPs in a broader sense for the following reasons. First, the hard constrained version (25) can be indeed transformed into a standard SDP problem. This conversion is possible by following the discussion in Chapter 4 of . Second, sophisticated and efficient algorithms based on the interior-point method for SDP can almost directly be applied to max-det problems as well. In fact, off-the-shelf SDP solvers such as SDPT3  have built-in functions to handle log-determinant terms directly.
Recall that (P-LGS) and (P-SRD) have a common optimal solution. Hence, Proposition 1 shows that both (P-LGS) and (P-SRD) are essentially solvable via SDP, which is much stronger than merely saying that they are convex problems. Note that convexity alone does not guarantee the existence of an efficient optimization algorithm.
Iv-C Complexity analysis
In this section, we briefly consider the arithmetic complexity (i.e., the worst case number of arithmetic operations needed to obtain an -optimal solution) of problem (25), and how it grows as the horizon length grows when the dimensions of the Gauss-Markov process (5) are fixed to . For a preliminary analysis, it would be natural for us to resort to the existing interior-point method literature (e.g., [32, 34]). Interior-point methods for the determinant maximization problem are already considered in [29, 35, 36]. The most computationally expensive step in the interior-point method is the Cholesky factorization involved in the Newton steps, which requires operations in general. However, it is possible to exploit the sparsity of coefficient matrices in the SDPs to reduce operation counts [37, 38, 39]. By exploiting the structure of our SDP formulation (25), it is theoretically expected that there exists a specialized interior-point method algorithm for (25) whose arithmetic complexity is . However, more careful study and computational experiments are needed to verify this conjecture.
Iv-D Single stage problem
Here, we have already assumed and . This does not result in loss of generality, since otherwise a change of variables , where is an orthonormal matrix that makes diagonal, converts the problem into the above form. For any positive definite matrix , Hadamard’s inequality (e.g., ) states that and the equality holds if and only if the matrix is diagonal. Hence, if diagonal elements of are fixed, is maximized by setting all off-diagonal entries zero. Thus, the optimal solution to the above problem is diagonal. Writing , the problem is decomposed as independent optimization problems, each of which minimizes subject to . It is easy to see that the optimal solution is . This is the closed-form solution to (P-LGS) with , and its pictorial interpretation is shown in Fig. 3. This solution also indicates the optimal sensing formula is given by , where and satisfy
In particular, we have , indicating that the optimal dimension of monotonically decreases as the “price of information” increases.
V Stationary problems
V-a Sequential rate-distortion function
We are often interested in infinite-horizon Gaussian SRD problems (13). Assuming that is a detectable pair, it can be shown that (13) is equivalent to the infinite-horizon linear-Gaussian sensor design problem (14) . Moreover,  shows that (13) and (14) admit an optimal solution that can be realized as a composition of a time-invariant sensor mechanism with i.i.d. process and a time-invariant Kalman filter. Hence, it is enough to minimize the average cost per stage, which leads to the following simpler problem.
To confirm (27) is compatible with the existing result, consider a scalar system with and . In this case, a closed-form expression of the SRD function is known in the literature  , which is given by
For a scalar system, (27) further simplifies to
It is elementary to verify that the optimal value of (29) is if , while it is if . Hence, it can be compactly written as , and the result recovers (28). Alternative representations of the SRD function (27) for stationary multi-dimensional Gauss-Markov processes when are reported in [13, Section IV-B] and [17, Section VI].
V-B Rank monotonicity
Using an optimal solution to (27) the optimal sensing matrices and are recovered from . In particular, determines the optimal dimension of the measurement vector. Similarly to the case of single stage problems, this rank has a tendency to decrease as increases. A typical numerical behavior is shown in Figure 4. We do not attempt to prove the rank monotonicity here.
Vi Applications and related works
Vi-a Zero-delay source coding
SRD theory plays an important role in the rate analysis of zero-delay source coding schemes. For each , let
be a set of variable-length uniquely decodable codewords. Assume that for , and let be the length of . A zero-delay binary coder is a pair of a sequence of encoders , i.e., stochastic kernels on given , and a sequence of decoders , i.e., stochastic kernels on given . The zero-delay rate-distortion region for the Gauss-Markov process (12) is the epigraph of the function
The SRD function is a lower bound of the achievable rate. Indeed, can be shown straightforwardly as
where (30a) holds since there is no feedback from the process to (Remark 1), (30b) follows from the data processing inequality, (30d) holds since conditional entropy is non-negative, and (30e) is due to the chain rule for entropy. The final inequality (30f) holds since the expected length of a uniquely decodable code is lower bounded by its entropy [1, Theorem 5.3,1].
In general, and do not coincide. Nevertheless, by constructing an appropriate entropy-coded dithered quantizer (ECDQ), it is shown in  that does not exceed more than a constant due to the “space-filling loss” of the lattice quantizer and the loss of entropy coding.
Vi-B Networked control theory
Zero-delay source/channel coding technologies are crucial in networked control systems [41, 42, 43, 44]. Gaussian SRD theory plays an important role in the LQG control problems with information theoretic constraints . It is shown in  that an LQG control problem in which observed data must be transmitted to the controller over a noiseless binary channel is closely related to the LQG control problem with directed information constraints. The latter problem is addressed in  using the SDP-based algorithm presented in this paper. In , the problem is viewed as a sensor-controller joint design problem in which directed information from the state process to the control input is minimized.888The problem considered in  is different from the sensor-controller joint design problems considered in  and .
Vi-C Experimental design/Sensor scheduling
In this subsection, we compare the linear-Gaussian sensor design problem (P-LGS) with different types of sensor design/selection problems considered in the literature.
A problem of selecting the best subset of sensors to observe a random variable in order to minimize the estimation error and its convex relaxations are considered in . A sensor selection problem for a linear dynamical system is considered in , where submodularity of the objective function is exploited. Dynamic sensor scheduling problems are also considered in the literature. In , an efficient algorithm to explore branches of the scheduling tree is proposed. In , a stochastic sensor selection strategy that minimizes the expected error covariance is considered.
The linear-Gaussian sensor design problem (P-LGS) is different from these sensor selection/scheduling problems in that it is essentially a continuous optimization problem (since matrices can be freely chosen), and the objective is to minimize an information-theoretic cost (11a).
Vii Numerical Simulations
In this section, we consider two numerical examples to demonstrate how the SDP-based formulation of the Gaussian SRD problem can be used to calculate the minimal communication bandwidth required for the real-time estimation with desired accuracy.
Vii-a Optimal sensor design for double pendulum
A linearized equation of motion of a double pendulum with friction and disturbance is given by
where is a Brownian motion. We consider a discrete time model of the above equation of motion obtained through the Tustin transformation. We are interested in designing a sensing model that optimally trades-off information cost and distortion level.999In practice, it is often the case that is partially observable through a given sensor mechanism. In such cases, the framework discussed in this paper is not appropriate. Instead, one can formulate an SRD problem for partially observable Gauss-Markov processes. See  for details. We solve the stationary optimization problem (27) for this example with various values of . The result is the sequential rate-distortion function shown in Figure 5. Finally, for every point on the trade-off curve, the optimal sensing matrices and are reconstructed, and the Kalman filter is designed base on them. Figure 6 shows the trade-off between the distortion level and the tracking performance of the Kalman filter. When the distortion constraint is strict (), the optimally designed sensor generates high rate information ( bits/sample) and the Kalman filter built on it tracks true state very well. When is large (), the optimal sensing strategy chooses “not to observe much”, and the resulting Kalman filter shows poor tracking performance.
Vii-B Minimum down-link bandwidth for satellite attitude determination
The equation of motion of the angular velocity vector of a spin-stabilized satellite linearized around the nominal angular velocity vector is