The Entropy Gain of Linear Time-Invariant Filters and Some of its Implications
We study the increase in per-sample differential entropy rate of random sequences and processes after being passed through a non minimum-phase (NMP) discrete-time, linear time-invariant (LTI) filter . For LTI discrete-time filters and random processes, it has long been established that this entropy gain, , equals the integral of . It is also known that, if the first sample of the impulse response of has unit-magnitude, then the latter integral equals the sum of the logarithm of the magnitudes of the non-minimum phase zeros of (i.e., its zeros outside the unit circle), say . These existing results have been derived in the frequency domain as well as in the time domain. In this note, we begin by showing that existing time-domain proofs, which consider finite length- sequences and then let tend to infinity, have neglected significant mathematical terms and, therefore, are inaccurate. We discuss some of the implications of this oversight when considering random processes. We then present a rigorous time-domain analysis of the entropy gain of LTI filters for random processes. In particular, we show that the entropy gain between equal-length input and output sequences is upper bounded by and arises if and only if there exists an output additive disturbance with finite differential entropy (no matter how small) or a random initial state. Unlike what happens with linear maps, the entropy gain in this case depends on the distribution of all the signals involved. Instead, when comparing the input differential entropy to that of the entire (longer) output of , the entropy gain equals irrespective of the distributions and without the need for additional exogenous random signals. We illustrate some of the consequences of these results by presenting their implications in three different problems. Specifically: a simple derivation of the rate-distortion function for Gaussian non-stationary sources, conditions for equality in an information inequality of importance in networked control problems, and an observation on the capacity of auto-regressive Gaussian channels with feedback.
In his seminal 1948 paper , Claude Shannon gave a formula for the increase in differential entropy per degree of freedom that a continuous-time, band-limited random process experiences after passing through a linear time-invariant (LTI) continuous-time filter. In this formula, if the input process is band-limited to a frequency range , has differential entropy rate (per degree of freedom) , and the LTI filter has frequency response , then the resulting differential entropy rate of the output process is given by[1, Theorem 14]
The last term on the right-hand side (RHS) of (1) can be understood as the entropy gain (entropy amplification or entropy boost) introduced by the filter . Shannon proved this result by arguing that an LTI filter can be seen as a linear operator that selectively scales its input signal along infinitely many frequencies, each of them representing an orthogonal component of the source. The result is then obtained by writing down the determinant of the Jacobian of this operator as the product of the frequency response of the filter over frequency bands, applying logarithm and then taking the limit as the number of frequency components tends to infinity.
An analogous result can be obtained for discrete-time input and output processes, and an LTI discrete-time filter by relating them to their continuous-time counterparts, which yields
is the differential entropy rate of the process . Of course the same formula can also be obtained by applying the frequency-domain proof technique that Shannon followed in his derivation of (1).
The rightmost term in (2), which corresponds to the entropy gain of , can be related to the structure of this filter. It is well known that if is causal with a rational transfer function such that (i.e., such that the first sample of its impulse response has unit magnitude), then
where are the zeros of and is the open unit disk on the complex plane. This provides a straightforward way to evaluate the entropy gain of a given LTI filter with rational transfer function . In addition, (3) shows that, if , then such gain is greater than one if and only if has zeros outside . A filter with the latter property is said to be non-minimum phase (NMP); conversely, a filter with all its zeros inside is said to be minimum phase (MP) .
NMP filters appear naturally in various applications. For instance, any unstable LTI system stabilized via linear feedback control will yield transfer functions which are NMP [2, 3]. Additionally, NMP-zeros also appear when a discrete-time with ZOH (zero order hold) equivalent system is obtained from a plant whose number of poles exceeds its number of zeros by at least 2, as the sampling rate increases [4, Lemma 5.2]. On the other hand, all linear-phase filters, which are specially suited for audio and image-processing applications, are NMP [5, 6]. The same is true for any all-pass filter, which is an important building block in signal processing applications [7, 5].
An alternative approach for obtaining the entropy gain of LTI filters is to work in the time domain; obtain as a function of , for every , and evaluate the limit . More precisely, for a filter with impulse response , we can write
where and the random vector is defined likewise. From this, it is clear that
where (or simply ) stands for the determinant of . Thus,
regardless of whether (i.e., the polynomial ) has zeros with magnitude greater than one, which clearly contradicts (2) and (3). Perhaps surprisingly, the above contradiction not only has been overlooked in previous works (such as [8, 9]), but the time-domain formulation in the form of (4) has been utilized as a means to prove or disprove (2) (see, for example, the reasoning in [10, p. 568]).
A reason for why the contradiction between (2), (3) and (6) arises can be obtained from the analysis developed in  for an LTI system within a noisy feedback loop, as the one depicted in Fig. 1. In this scheme, represents a causal feedback channel which combines the output of with an exogenous (noise) random process to generate its output. The process is assumed independent of the initial state of , represented by the random vector , which has finite differential entropy.
For this system, it is shown in [11, Theorem 4.2] that
|with equality if is a deterministic function of . Furthermore, it is shown in [12, Lemma 3.2] that if and the steady state variance of system remains asymptotically bounded as , then|
where are the poles of . Thus, for the (simplest) case in which , the output is the result of filtering by a filter (as shown in Fig. 1-right), and the resulting entropy rate of will exceed that of only if there is a random initial state with bounded differential entropy (see (7a)). Moreover, under the latter conditions, [11, Lemma 4.3] implies that if is stable and , then this entropy gain will be lower bounded by the right-hand side (RHS) of (3), which is greater than zero if and only if is NMP. However, the result obtained in (7b) does not provide conditions under which the equality in the latter equation holds.
Additional results and intuition related to this problem can be obtained from in . There it is shown that if is a two-sided Gaussian stationary random process generated by a state-space recursion of the form
for some , , , with unit-variance Gaussian i.i.d. innovations , then its entropy rate will be exactly (i.e., the differential entropy rate of ) plus the RHS of (3) (with now being the eigenvalues of outside the unit circle). However, as noted in , if the same system with zero (or deterministic) initial state is excited by a one-sided infinite Gaussian i.i.d. process with unit sample variance, then the (asymptotic) entropy rate of the output process is just (i.e., there is no entropy gain). Moreover, it is also shown that if is a Gaussian random sequence with positive-definite covariance matrix and , then the entropy rate of also exceeds that of by the RHS of (3). This suggests that for an LTI system which admits a state-space representation of the form (8), the entropy gain for a single-sided Gaussian i.i.d. input is zero, and that the entropy gain from the input to the output-plus-disturbance is (3), for any Gaussian disturbance of length with positive definite covariance matrix (no matter how small this covariance matrix may be).
The previous analysis suggests that it is the absence of a random initial state or a random additive output disturbance that makes the time-domain formulation (4) yield a zero entropy gain. But, how would the addition of such finite-energy exogenous random variables to (4) actually produce an increase in the differential entropy rate which asymptotically equals the RHS of (3)? In a broader sense, it is not clear from the results mentioned above what the necessary and sufficient conditions are under which an entropy gain equal to the RHS of (3) arises (the analysis in  provides only a set of sufficient conditions and relies on second-order statistics and Gaussian innovations to derive the results previously described). Another important observation to be made is the following: it is well known that the entropy gain introduced by a linear mapping is independent of the input statistics . However, there is no reason to assume such independence when this entropy gain arises as the result of adding a random signal to the input of the mapping, i.e., when the mapping by itself does not produce the entropy gain. Hence, it remains to characterize the largest set of input statistics which yield an entropy gain, and the magnitude of this gain.
The first part of this paper provides answers to these questions. In particular, in Section 3 explain how and when the entropy gain arises (in the situations described above), starting with input and output sequences of finite length, in a time-domain analysis similar to (4), and then taking the limit as the length tends to infinity. In Section 4 it is shown that, in the output-plus-disturbance scenario, the entropy gain is at most the RHS of (3). We show that, for a broad class of input processes (not necessarily Gaussian or stationary), this maximum entropy gain is reached only when the disturbance has bounded differential entropy and its length is at least equal to the number of non-minimum phase zeros of the filter. We provide upper and lower bounds on the entropy gain if the latter condition is not met. A similar result is shown to hold when there is a random initial state in the system (with finite differential entropy). In addition, in Section 4 we study the entropy gain between the entire output sequence that a filter yields as response to a shorter input sequence (in Section 6). In this case, however, it is necessary to consider a new definition for differential entropy, named effective differential entropy. Here we show that an effective entropy gain equal to the RHS of (3) is obtained provided the input has finite differential entropy rate, even when there is no random initial state or output disturbance.
In the second part of this paper (Section7) we apply the conclusions obtained in the first part to three problems, namely, networked control, the rate-distortion function for non-stationary Gaussian sources, and the Gaussian channel capacity with feedback. In particular, we show that equality holds in (7b) for the feedback system in Fig. 1-left under very general conditions (even when the channel is noisy). For the problem of finding the quadratic rate-distortion function for non-stationary auto-regressive Gaussian sources, previously solved in [14, 15, 16], we provide a simpler proof based upon the results we derive in the first part. This proof extends the result stated in [15, 16] to a broader class of non-stationary sources. For the feedback Gaussian capacity problem, we show that capacity results based on using a short random sequence as channel input and relying on a feedback filter which boosts the entropy rate of the end-to-end channel noise (such as the one proposed in ), crucially depend upon the complete absence of any additional disturbance anywhere in the system. Specifically, we show that the information rate of such capacity-achieving schemes drops to zero in the presence of any such additional disturbance. As a consequence, the relevance of characterizing the robust (i.e., in the presence of disturbances) feedback capacity of Gaussian channels, which appears to be a fairly unexplored problem, becomes evident.
Finally, the main conclusions of this work are summarized in Section 8.
Except where present, all proofs are presented in the appendix.
For any LTI system , the transfer function corresponds to the -transform of the impulse response , i.e., . For a transfer function , we denote by the lower triangular Toeplitz matrix having as its first column. We write as a shorthand for the sequence and, when convenient, we write in vector form as , where denotes transposition. Random scalars (vectors) are denoted using non-italic characters, such as (non-italic and boldface characters, such as ). For matrices we use upper-case boldface symbols, such as . We write to the note the -th smallest-magnitude eigenvalue of . If , then denotes the entry in the intersection between the -th row and the -th column. We write , with , to refer to the matrix formed by selecting the rows to of . The expression corresponds to the square sub-matrix along the main diagonal of , with its top-left and bottom-right corners on and , respectively. A diagonal matrix whose entries are the elements in is denoted as
2 Problem Definition and Assumptions
Consider the discrete-time system depicted in Fig. 2. In this setup, the input is a random process and the block is a causal, linear and time-invariant system with random initial state vector and random output disturbance .
In vector notation,
where is the natural response of to the initial state . We make the following further assumptions about and the signals around it:
is a causal, stable and rational transfer function of finite order, whose impulse response satisfies . \finenunciado
It is worth noting that there is no loss of generality in considering , since otherwise one can write as , and thus the entropy gain introduced by would be plus the entropy gain due to , which has an impulse response where the first sample equals .
The random initial state is independent of .
The disturbance is independent of and belongs to a -dimensional linear subspace, for some finite . This subspace is spanned by the orthonormal columns of a matrix (where stands for the countably infinite size of ), such that . Equivalently, , where the random vector has finite differential entropy and is independent of .
As anticipated in the Introduction, we are interested in characterizing the entropy gain of in the presence (or absence) of the random inputs , denoted by
In the next section we provide geometrical insight into the behaviour of for the situation where there is a random output disturbance and no random initial state. A formal and precise treatment of this scenario is then presented in Section 4. The other scenarios are considered in the subsequent sections.
3 Geometric Interpretation
In this section we provide an intuitive geometric interpretation of how and when the entropy gain defined in (10) arises. This understanding will justify the introduction of the notion of an entropy-balanced random process (in Definition 1 below), which will be shown to play a key role in this and in related problems.
3.1 An Illustrative Example
Suppose for the moment that in Fig. 2 is an FIR filter with impulse response . Notice that this choice yields , and thus has one non-minimum phase zero, at . The associated matrix for is
whose determinant is clearly one (indeed, all its eigenvalues are ). Hence, as discussed in the introduction, , and thus (and in general) does not introduce an entropy gain by itself. However, an interesting phenomenon becomes evident by looking at the singular-value decomposition (SVD) of , given by , where and are unitary matrices and . In this case, , and thus one of the singular values of is much smaller than the others (although the product of all singular values yields , as expected). As will be shown in Section 4, for a stable such uneven distribution of singular values arises only when has non-minimum phase zeros. The effect of this can be visualized by looking at the image of the cube through shown in Fig. 3.
If the input were uniformly distributed over this cube (of unit volume), then would distribute uniformly over the unit-volume parallelepiped depicted in Fig. 3, and hence .
Now, if we add to a disturbance , with scalar uniformly distributed over independent of , and with , the effect would be to “thicken” the support over which the resulting random vector is distributed, along the direction pointed by . If is aligned with the direction along which the support of is thinnest (given by , the first row of ), then the resulting support would have its volume significantly increased, which can be associated with a large increase in the differential entropy of with respect to . Indeed, a relatively small variance of and an approximately aligned would still produce a significant entropy gain.
The above example suggests that the entropy gain from to appears as a combination of two factors. The first of these is the uneven way in which the random vector is distributed over . The second factor is the alignment of the disturbance vector with respect to the span of the subset of columns of , associated with smallest singular values of , indexed by the elements in the set . As we shall discuss in the next section, if has non-minimum phase zeros, then, as increases, there will be singular values of going to zero exponentially. Since the product of the singular values of equals for all , it follows that must grow exponentially with , where is the -th diagonal entry of . This implies that expands with along the span of , compensating its shrinkage along the span of , thus keeping for all . Thus, as grows, any small disturbance distributed over the span of , added to , will keep the support of the resulting distribution from shrinking along this subspace. Consequently, the expansion of with along the span of is no longer compensated, yielding an entropy increase proportional to .
The above analysis allows one to anticipate a situation in which no entropy gain would take place even when some singular values of tend to zero as . Since the increase in entropy is made possible by the fact that, as grows, the support of the distribution of shrinks along the span of , no such entropy gain should arise if the support of the distribution of the input expands accordingly along the directions pointed by the rows of .
An example of such situation can be easily constructed as follows: Let in Fig. 2 have non-minimum phase zeros and suppose that is generated as , where is an i.i.d. random process with bounded entropy rate. Since the determinant of equals for all , we have that , for all . On the other hand, . Since for some finite (recall Assumption 3), it is easy to show that , and thus no entropy gain appears.
The preceding discussion reveals that the entropy gain produced by in the situation shown in Fig. 2 depends on the distribution of the input and on the support and distribution of the disturbance. This stands in stark contrast with the well known fact that the increase in differential entropy produced by an invertible linear operator depends only on its Jacobian, and not on the statistics of the input . We have also seen that the distribution of a random process along the different directions within the Euclidean space which contains it plays a key role as well. This motivates the need to specify a class of random processes which distribute more or less evenly over all directions. The following section introduces a rigorous definition of this class and characterizes a large family of processes belonging to it.
3.2 Entropy-Balanced Processes
We begin by formally introducing the notion of an “entropy-balanced” process , being one in which, for every finite , the differential entropy rate of the orthogonal projection of into any subspace of dimension equals the entropy rate of as . This idea is precisely in the following definition.
A random process is said to be entropy balanced if, for every ,
for every sequence of matrices , , with orthonormal rows. \finenunciado
Equivalently, a random process is entropy balanced if every unitary transformation on yields a random sequence such that . This property of the resulting random sequence means that one cannot predict its last samples with arbitrary accuracy by using its previous samples, even if goes to infinity.
We now characterize a large family of entropy-balanced random processes and establish some of their properties. Although intuition may suggest that most random processes (such as i.i.d. or stationary processes) should be entropy balanced, that statement seems rather difficult to prove. In the following, we show that the entropy-balanced condition is met by i.i.d. processes with per-sample probability density function (PDF) being uniform, piece-wise constant or Gaussian. It is also shown that adding to an entropy-balanced process an independent random processes independent of the former yields another entropy-balanced process, and that filtering an entropy-balanced process by a stable and minimum phase filter yields an entropy-balanced process as well.
Let be a Gaussian i.i.d. random process with positive and bounded per-sample variance. Then is entropy balanced.\finenunciado
Let be an i.i.d. process with finite differential entropy rate, in which each is distributed according to a piece-wise constant PDF in which each interval where this PDF is constant has measure greater than , for some bounded-away-from-zero constant . Then is entropy balanced.\finenunciado
Let and be mutually independent random processes. If is entropy balanced, then is also entropy balanced.\finenunciado
The working behind this lemma can be interpreted intuitively by noting that adding to a random process another independent random process can only increase the “spread” of the distribution of the former, which tends to balance the entropy of the resulting process along all dimensions in Euclidean space. In addition, it follows from Lemma 2 that all i.i.d. processes having a per-sample PDF which can be constructed by convolving uniform, piece-wise constant or Gaussian PDFs as many times as required are entropy balanced. It also implies that one can have non-stationary processes which are entropy balanced, since Lemma 2 imposes no requirements for the process .
Our last lemma related to the properties of entropy-balanced processes shows that filtering by a stable and minimum phase LTI filter preserves the entropy balanced condition of its input.
Let be an entropy-balanced process and an LTI stable and minimum-phase filter. Then the output is also an entropy-balanced process.\finenunciado
This result implies that any stable moving-average auto-regressive process constructed from entropy-balanced innovations is also entropy balanced, provided the coefficients of the averaging and regression correspond to a stable MP filter.
We finish this section by pointing out two examples of processes which are non-entropy-balanced, namely, the output of a NMP-filter to an entropy-balanced input and the output of an unstable filter to an entropy-balanced input. The first of these cases plays a central role in the next section.
4 Entropy Gain due to External Disturbances
In this section we formalize the ideas which were qualitatively outlined in the previous section. Specifically, for the system shown in Fig. 2 we will characterize the entropy gain defined in (10) for the case in which the initial state is zero (or deterministic) and there exists an output random disturbance of (possibly infinite length) which satisfies Assumption 3. The following lemmas will be instrumental for that purpose.
Let be a causal, finite-order, stable and minimum-phase rational transfer function with impulse response such that . Then and . \finenunciado
(The proof of this Lemma can be found in the Appendix, page .1).
The previous lemma precisely formulates the geometric idea outlined in Section 3. To see this, notice that no entropy gain is obtained if the output disturbance vector is orthogonal to the space spanned by the first columns of . If this were the case, then the disturbance would not be able fill the subspace along which is shrinking exponentially. Indeed, if for all , then , and the latter sum cancels out the one on the RHS of (12), while since is entropy balanced. On the contrary (and loosely speaking), if the projection of the support of onto the subspace spanned by the first rows of is of dimension , then remains bounded for all , and the entropy limit of the sum on the RHS of (12) yields the largest possible entropy gain. Notice that (because ), and thus this entropy gain stems from the uncompensated expansion of along the space spanned by the rows of .
Lemma 5 also yields the following corollary, which states that only a filter with zeros outside the unit circle (i.e., an NMP transfer function) can introduce entropy gain.
Corollary 1 (Minimum Phase Filters do not Introduce Entropy Gain).
4.1 Input Disturbances Do Not Produce Entropy Gain
In this section we show that random disturbances satisfying Assumption 3, when added to the input (i.e., before ), do not introduce entropy gain. This result can be obtained from Lemma 5, as stated in the following theorem:
Theorem 1 (Input Disturbances do not Introduce Entropy Gain).
Let satisfy Assumption 1. Suppose that is entropy balanced and consider the output
where with being a random vector satisfying , and where has orthonormal columns. Then,
In this case, the effect of the input disturbance in the output is the forced response of to it. This response can be regarded as an output disturbance . Thus, the argument of the differential entropy on the RHS of (12) is
The proof is completed by substituting this result into the RHS of (12) and noticing that
An alternative proof for this result can be given based upon the properties of an entropy-balanced sequence, as follows. Since , we have that . Let and be a matrices with orthonormal rows which satisfy and such that is a unitary matrix. Then
where we have applied the chain rule of differential entropy. But
which is upper bounded for all because and , the latter due to being entropy balanced. On the other hand, since is independent of , it follows that , for all . Thus , where the last equality stems from the fact that is entropy balanced. \finenunciado
4.2 The Entropy Gain Introduced by Output Disturbances when has NMP Zeros
We show here that the entropy gain of a transfer function with zeros outside the unit circle is at most the sum of the logarithm of the magnitude of these zeros. To be more precise, the following assumption is required.
The filter satisfies Assumption 1 and its transfer function has poles and zeros, of which are NMP-zeros. Let be the number of distinct NMP zeros, given by , i.e., such that , with being the multiplicity of the -th distinct zero. We denote by , where , the distinct zero of associated with the -th non-distinct zero of , i.e.,
As can be anticipated from the previous results in this section, we will need to characterize the asymptotic behaviour of the singular values of . This is accomplished in the following lemma, which relates these singular values to the zeros of . This result is a generalization of the unnumbered lemma in the proof of [15, Theorem 1] (restated in the appendix as Lemma 8), which holds for FIR transfer functions, to the case of infinite-impulse response (IIR) transfer functions (i.e., transfer functions having poles).
For a transfer function satisfying Assumption 4, it holds that
where the elements in the sequence are positive and increase or decrease at most polynomially with . \finenunciado
(The proof of this lemma can be found in the appendix, page .1).
We can now state the first main result of this section.
See Appendix, page .1. ∎
The second main theorem of this section is the following:
See Appendix, page .1. ∎
5 Entropy Gain due to a Random Initial Sate
Here we analyze the case in which there exists a random initial state independent of the input , and zero (or deterministic) output disturbance.
The effect of a random initial state appears in the output as the natural response of to it, namely the sequence . Thus, can be written in vector form as
This reveals that the effect of a random initial state can be treated as a random output disturbance, which allows us to apply the results from the previous sections.
Recall from Assumption 4 that is a stable and biproper rational transfer function with NMP zeros. As such, it can be factored as
where is a biproper filter containing only all the poles of , and is a FIR biproper filter, containing all the zeros of .
We have already established (recall Theorem 1) that the entropy gain introduced by the minimum phase system is zero. It then follows that the entropy gain can be introduced only by the NMP-zeros of and an appropriate output disturbance . Notice that, in this case, the input process to (i.e., the output sequence of due to a random input ) is independent of (since we have placed the natural response after the filters and , hose initial state is now zero). This condition allows us to directly use Lemma 5 in order to analyze the entropy gain that experiences after being filtered by , which coincides with . This is achieved by the next theorem.
Consider a stable -th order biproper filter having NMP-zeros, and with a random initial state , such that . Then, the entropy gain due to the existence of a random initial state is