Spectral Simplicity of Apparent Complexity, Part II:Exact Complexities and Complexity Spectra

Spectral Simplicity of Apparent Complexity, Part II:
Exact Complexities and Complexity Spectra

Paul M. Riechers pmriechers@ucdavis.edu    James P. Crutchfield chaos@ucdavis.edu Complexity Sciences Center
Department of Physics
University of California at Davis
One Shields Avenue, Davis, CA 95616
June 30, 2019
Abstract

The meromorphic functional calculus developed in Part I overcomes the nondiagonalizability of linear operators that arises often in the temporal evolution of complex systems and is generic to the metadynamics of predicting their behavior. Using the resulting spectral decomposition, we derive closed-form expressions for correlation functions, finite-length Shannon entropy-rate approximates, asymptotic entropy rate, excess entropy, transient information, transient and asymptotic state uncertainty, and synchronization information of stochastic processes generated by finite-state hidden Markov models. This introduces analytical tractability to investigating information processing in discrete-event stochastic processes, symbolic dynamics, and chaotic dynamical systems. Comparisons reveal mathematical similarities between complexity measures originally thought to capture distinct informational and computational properties. We also introduce a new kind of spectral analysis via coronal spectrograms and the frequency-dependent spectra of past-future mutual information. We analyze a number of examples to illustrate the methods, emphasizing processes with multivariate dependencies beyond pairwise correlation. An appendix presents spectral decomposition calculations for one example in full detail.

hidden Markov model, entropy rate, excess entropy, predictable information, statistical complexity, projection operator, complex analysis, resolvent, Drazin inverse
pacs:
02.50.-r 89.70.+c 05.45.Tp 02.50.Ey 02.50.Ga
preprint: Santa Fe Institute Working Paper 2017-06-XXXpreprint: arxiv.org:1706.XXXXX [nlin.cd]\setstretch

1.0

\setstretch

1.1

The prequel laid out a new toolset that allows one to analyze in detail how complex systems store and process information. Here, we use the tools to calculate in closed form almost all complexity measures for processes generated by finite-state hidden Markov models. Helpfully, the tools also give a detailed view of how subprocess components contribute to a process’ informational architecture. As an application, we show that the widely-used methods based on Fourier analysis and power spectra fail to capture the structure of even very simple structured processes. We introduce the spectrum of past-future mutual information and show that it allows one to detect such structure.

I Introduction

Tracking the evolution of a complex system, a time series of observations often appears quite complicated in the sense of temporal patterns, stochasticity, and behavior that require significant resources to predict. Such complexity arises from many sources. Apparent complexity, even in simple systems, can be induced by practical measurement and analysis issues, such as small sample size, inadequate collection of probes, noisy or systematically distorted measurements, coarse-graining, out-of-class modeling, nonconvergent inference algorithms, and so on. The effects can either increase or decrease apparent complexity, as they add or discard information, hiding the system of interest from an observer to one degree or another. Assuming perfect observation, complexity can also be inherent in nonlinear stochastic dynamical processes—deterministic chaos, superexponential transients, high state-space dimension, nonergodicity, nonstationarity, and the like. Even in ideal settings, the smallest sufficient set of a system’s maximally predictive features is generically uncountable, making approximations unavoidable, in principle [1]. With nothing else said, these facts obviate physical science’s most basic goal—prediction—and, without that, they preclude understanding how nature works. How can we make progress?

The prequel, Part I, argued that this is too pessimistic a view. It introduced constructive results that address hidden structure and the challenges associated with predicting complex systems. Part I showed that questions regarding correlation, predictability, and prediction each require their own analytical structures, as long as one can identify a system’s hidden linear dynamic. It distinguished two genres of quantitative question: (i) cascading, in which the influence of an initial preparation cascades through state-space as time evolves, affecting the final measurement, and (ii) accumulating, in which statistics are gathered during such cascades. Part I identified the linear algebraic structure underlying each kind.

Part I explained that the hidden linear dynamic in systems induces a nondiagonalizable metadynamics, even if the dynamics are diagonalizable in their underlying state-space. Assuming normal and diagonalizable dynamics, so familiar in mathematical physics, simply fails in this setting. Thus, nondiagonalizable dynamics present an analytical roadblock. Part I reviewed a calculus for functions of nondiagonalizable operators—the recently developed meromorphic functional calculus of Ref. [2]—that directly addresses nondiagonalizability, giving constructive calculational methods and algorithms.

Along the way, Part I reviewed relevant background in stochastic processes and their complexities and the hidden Markov models (HMMs) that generate them. It delineated several classes of HMMs—Markov chains, unifilar HMMs, and nonunifilar HMMs. It also reviewed their mixed-state presentations (MSPs)—HMM generators of a process that track distributions induced by observation. Related constructions included the HMM and -machine synchronizing MSPs, generator mixed-functional presentations, and cryptic-operator presentations. MSPs are key to calculating complexity measures within an information-theoretic framing. Part I then showed how each complexity measure reduces to a linear algebra of an appropriate HMM adapted to the cascading- or accumulating-question genre. It summarized the meromorphic functional calculus and several of its mathematical implications in relation to projection operators. Part I also highlighted a spectral weighted directed-graph theory that can give useful shortcuts for determining a process’ spectral decomposition. Part II here uses Part I’s notation and assumes familiarity with its results.

With Part I’s toolset laid out, Part II now derives the promised closed-form complexities of a process. Section §II investigates the range of possible behaviors for correlation and myopic uncertainty via convergence to asymptotic correlation and asymptotic entropy rates. Section §III then considers measures related to accumulating quantities during the transient relaxation to synchronization. Section §IV introduces closed-form expressions for a wide range of complexity measures in terms of the spectral decomposition of a process’ dynamic. It also introduces complexity spectra and highlights common simplifications for special cases, such as almost diagonalizable dynamics. Section §V gives a new kind of signal analysis in terms of coronal spectrograms. A suite of examples in §VI and §VII ground the theoretical developments and are complemented with an in-depth pedagogical example worked out in App. §A. Finally, we conclude with a brief retrospective of Parts I and II and give an eye towards future applications.

Ii Correlation and Myopic Uncertainty

Using Part I’s methods, our first step is to solve for the correlation function:

(1)

and the myopic uncertainty or finite-history Shannon entropy rate:

(2)

A comparison is informative. We then determine the asymptotic correlation and myopic uncertainty from the resulting finite- expressions.

ii.1 Nonasymptotics

A central result in Part I was the spectral decomposition of powers of a linear operator , even if that operator is nondiagonalizable. Recall that for any :

(3)

where is the generalized binomial coefficient:

(4)

, and is the Iverson bracket. The latter takes on value if zero is an eigenvalue of and if not.

In light of this, the autocorrelation function is simply a superposition of weighted eigen-contributions. Part I showed that Eq. (1) has the operator expression:

where is the transition dynamic, is the output symbol alphabet, and we defined the row vector:

and the column vector:

Substituting Part I’s spectral decomposition of matrix powers, Eq. (3) above, directly leads to the spectral decomposition of for nonzero integer :

(5)
(6)

We denote the persistent first term of Eq. (5) as , and note that it can be expressed:

where is ’s Drazin inverse. We denote the ephemeral second term as , which can be written as:

where is the eigenprojector associated with the eigenvalue of zero; if .

From Eq. (5), it is now apparent that the index of ’s zero eigenvalue gives a finite-horizon contribution () to the autocorrelation function. Beyond index of , the only -dependence comes via a weighted sum of terms of the form —polynomials in times decaying exponentials. The set simply weights the amplitudes of these contributions. In the familiar diagonalizable case, the behavior of autocorrelation is simply a sum of decaying exponentials .

Similarly, in light of Part I’s expression for the myopic entropy rate in terms of the MSP—starting in the initial unsynchronized mixed-state and evolving the state of uncertainty via the observation-induced MSP transition dynamic :

(7)

—and its spectral decomposition of , we find the most general spectral decomposition of the myopic entropy rates to be:

(8)
(9)

We denote the persistent first term of Eq. (8) as , and note that it can be expressed directly as:

where is the Drazin inverse of the mixed-state-to-state net transition dynamic . We denote the ephemeral second term as , which can be written as:

From Eq. (8), we see that the index of ’s zero eigenvalue gives a finite horizon contribution () to the myopic entropy rate. Beyond index of , the only -dependence comes via a weighted sum of terms of the form —polynomials in times decaying exponentials. The set weights the amplitudes of these contributions.

For stationary processes we anticipate that, for all , and thus . Hence, we can save ourselves from superfluous calculation by excluding the nonunity eigenvalues on the unit circle, when calculating the myopic entropy rate for stationary processes. In the diagonalizable case, again, its behavior is simply a sum of decaying exponentials .

In practice, often vanishes, whereas is often nonzero. This practical difference between and stems from the difference between typical graph structures of the respective dynamics. For a stationary process’ generic transition dynamic, zero eigenvalues (and so of ) typically arise from hidden symmetries in the dynamic. In contrast, the MSP of a generic transition dynamic often has tree-like ephemeral structures that are primarily responsible for the zero eigenvalues (and ). Nevertheless, despite their practical typical differences, the same mathematical structures appear and contribute to the most general behavior of each of these cascading quantities.

The breadth of qualitative behaviors shared by autocorrelation and myopic entropy rate is common to the solution of all questions that can be reformulated as a cascading hidden linear dynamic; the myopic state uncertainty is just one of many other examples. As we have already seen, however, different measures of a process reflect signatures of different linear operators.

Next, we explore similarities in the qualitative behavior of asymptotics and discuss the implications for correlation and entropy rate.

ii.2 Asymptotic correlation

The spectral decomposition reveals that the autocorrelation converges to a constant value as , unless has eigenvalues on the unit circle besides unity itself. This holds if index is finite, which it is for all processes generated by finite-state HMMs and also many infinite-state HMMs. If unity is the sole eigenvalue with magnitude one, then all other eigenvalues have magnitude less than unity and their contributions vanish for large enough . Explicitly, if , then:

This used the fact that and that for an ergodic process.

If other eigenvalues in besides unity lie on the unit circle, then the autocorrelation approaches a periodic sequence as gets large.

ii.3 Asymptotic entropy rate

By the Perron–Frobenius theorem, for all eigenvalues of on the unit circle. Hence, in the limit of , we obtain the asymptotic entropy rate for any stationary process:

(10)
(11)
(12)

since, for stationary processes, for all . For nonstationary processes, the limit may not exist, but may still be found in a suitable sense as a function of time. If the process has only one stationary distribution over mixed states, then and we have:

(13)

where is the stationary distribution over ’s states, found either from or from solving .

A simple but interesting example of when ergodicity does not hold is the multi-armed bandit problem [3, 4]. In this, a realization is drawn from an ensemble of differently biased coins or, for that matter, over any other collection of IID processes. More generally, there can be many distinct memoryful stationary components from which a given realization is sampled, according to some probability distribution. With many attracting components we have the stationary mixed-state eigenprojector , with , where the algebraic multiplicity of the ‘1’ eigenvalue is the number of attracting components. The entropy rate becomes:

(14)
(15)

Above, is the probability of ending up in component , while is component ’s entropy rate. Thus, if nonergodic, the process’ entropy rate may not be the same as the entropy of any particular realization. Rather, the process’ entropy rate is a weighted average of those for the ensemble of sequences constituting the process.

For unifilar , the topology, transition probabilities, and stationary distribution over the recurrent states are the same for both and its -MSP. Hence, for unifilar we have:

(16)

One can easily show that Eq. (16) is equivalent to the well-known closed-form expression for for unifilar presentations:

(17)

For nonunifilar presentations, however, we must use the more general result of Eq. (13). This is similar to the calculation in Eq. (17), but must be performed over the recurrent states of a mixed-state presentation, which may be countable or uncountable.

Iii Accumulated Transients for Diagonalizable Dynamics

In the diagonalizable case, autocorrelation, myopic entropy rate, and myopic state uncertainty reduce to a sum of decaying exponentials. Correspondingly, we can find the power spectrum, excess entropy, and synchronization information respectively via geometric progressions.

For example, if is diagonalizable and has no zero eigenvalue, then the myopic entropy rate reduces to:

where is identifiable as the entropy rate .

It then follows that the excess entropy, which is the mutual information between the past and the future, is:

(18)

Note that larger eigenvalues (closer to unity magnitude) drive the denominator closer to zero and, thus, increase . Hence, larger eigenvalues—controlling modes of the mixed-state transition matrix that decay slowly—have the potential to contribute most to excess entropy. Small eigenvalues—quickly decaying modes—do not contribute. Putting aside the language of eigenvalues, one can paraphrase: slowly decaying transient behavior (of the distribution of distributions over process states) has the most potential to make a process appear complex.

Continuing, the transient information is:

We now see that the transient information is very closely related to the excess entropy, differing only via the square in the denominators. This comparison between and closed-form expressions suggests an entire hierarchy of informational quantities based on eigenvalue weighting.

Performing a similar procedure for the synchronization information shows that:

The expressions reveal a remarkably close relationship between and . Define . Then:

The relationship is now made plain:

Although a bit more cumbersome, perhaps better intuition emerges if we rewrite as .

Again, large eigenvalues—slowly decaying modes of the mixed-state transition matrix—can make the largest contribution to synchronization information; small eigenvalues correspond to quickly decaying modes that do not have the opportunity to contribute. In fact, the potential of large eigenvalues to make large contributions is a recurring theme for many questions one has about a process. Simply stated, long-term behavior—what we often interpret as “complex” behavior—is dominated by a process’s largest-eigenvalue modes.

That said, a word of warning is in order. Although large-eigenvalue modes have the most potential to make contributions to a process’s complexity, the actual set of largest contributors also depends strongly on the amplitudes , where is some quantifier vector of interest; e.g., , , or .

Hence, there is as-yet unanticipated similarity between and and another between and —at least assuming diagonalizability. We would like to know the relationships between these quantities more generally. However, deriving the general closed-form expressions for accumulated transients is not tractable via the current approach. Rather, to derive the general results, we deploy the meromorphic functional calculus directly at an elevated level, as we now demonstrate.

Iv Exact Complexities and Complexity Spectra

We now derive the most general closed-form solutions for several complexity measures, from which expressions for related measures follow straightforwardly. This includes an expression for the past–future mutual information or excess entropy, identifying two distinct persistent and transient components, and a novel extension of excess entropy to temporal frequency spectra components. We also give expressions for the synchronization information and power spectra. We explicitly address the class—a common one we argue—of almost diagonalizable dynamics. The section finishes by highlighting finite-order Markov order processes that, rather than being simpler than infinite Markov order processes, introduce technical complications that must be addressed.

Before carrying this out, we define several useful objects. Let be the spectral radius of matrix :

For stochastic , since , let denote the set of eigenvalues with unity magnitude:

We also define:

(19)

and

(20)

Eigenvalues with unity magnitude that are not themselves unity correspond to perfectly periodic cycles of the state-transition dynamic. By their very nature, such cycles are restricted to the recurrent states. Moreover, we expect the projection operators associated with these cycles to have no net overlap with the start-state of the MSP. So, we expect:

(21)

for all . Hence:

(22)

We will also use the fact that, since :

and furthermore:

as a consequence of Eq. (21) and our spectral decomposition.

Having seen complexity measures associated with prediction all take on a similar form in terms of the -MSP state-transition matrix, we expect to encounter similar forms for generically nondiagonalizable state-transition dynamics.

iv.1 Excess entropy

We are now ready to develop the excess entropy in full generality. Our tools turn this into a direct calculation. We find:

Note that here, since unity is not an eigenvalue of . Indeed, the unity eigenvalue was explicitly extracted from the former matrix to make an invertible expression.

For an ergodic process, where , this becomes:

(23)

Computationally, Eq. (23) is wonderfully useful. However, the subtraction of is at first mysterious. Especially so, when compared to the compact result for the excess-entropy spectral decomposition in the diagonalizable case given by Eq. (18).

Let’s explore this. Recall that Ref. [2] showed:

(24)

for any stochastic matrix . From this, we see that the general solution for takes on its most elegant form in terms of the Drazin inverse of :

(25)

Recall too Part I’s explicit spectral decomposition:

(26)

From this and Eq. (25), we see that the past–future mutual information—the amount of the future that is predictable from the past—has the general spectral decomposition:

(27)

iv.2 Persistent excess

In light of Eq. (9), we see that there are two qualitatively distinct contributions to the excess entropy . One comprises the persistent leaky contributions from all :

and the other is a completely ephemeral piece that contributes only up to ’s zero-eigenvalue index :

iv.3 Excess entropy spectrum

Equation (25) immediately suggests that we generalize the excess entropy, a scalar complexity measure, to a complexity function with continuous part defined in terms of the resolvent—say, via introducing the complex variable :

Such a function not only monitors how much of the future is predictable, but also reveals the time scales of interdependence between the predictable features within the observations. Directly taking the -transform of comes to mind, but this requires tracking both real and imaginary parts or, alternatively, both magnitude and complex phase. To ameliorate this, we employ a transform of a closely related function that contains the same information.

Before doing so, we should briefly note that ambiguity surrounds the appropriate excess-entropy generalization. There are many alternate measures that approach the excess entropy as frequency goes to zero. For example, directly calculating from the meromorphic functional calculus, letting we find:

We are challenged, however, to interpret the fact that is not necessarily positive at all frequencies. Another direct calculation shows that:

Enticingly, appears to be positive over all frequencies for all examples checked. It is not immediately clear which, if either, is the appropriate generalization, though. Fortunately, the Fourier transform of a two-sided myopic-entropy convergence function makes our upcoming definition of interpretable and of interest in its own right.

Let h h be the two-sided myopic entropy convergence function defined by:

For stationary processes, it is easy to show that , with the result that h h is a symmetric function. Moreover, h h then simplifies to:

where and, as before, for with .

The symmetry of the two-sided myopic entropy convergence function h h guarantees that its Fourier transform is also real and symmetric. Explicitly, the continuous part of the Fourier transform turns out to be:

a strictly real and symmetric function of the angular frequency . Here, is the redundancy of the alphabet , as in Ref. [5].

The transform also has a discrete impulsive component. For stationary processes this consists solely of the Dirac delta function at zero frequency:

Recall that the Fourier transform of a discrete-domain function is -periodic in the angular frequency . This delta function is associated with the nonzero offset of the entropy convergence curve of positive-entropy-rate processes. The full transform is:

Direct calculation using the meromorphic functional calculus of Ref. [2] shows that:

(28)

This motivates introducing the excess-entropy spectrum :

(29)
(30)

The excess-entropy spectrum rather directly displays important frequencies of apparent entropy reduction. For example, leaky period- processes have a period- signature in the excess entropy spectrum.

As with its predecessors, the excess-entropy spectrum also has a natural decomposition into two qualitatively distinct components:

The excess-entropy spectrum gives an intuitive and concise summary of the complexities associated with a process’ predictability. For example, given a graph of the excess entropy spectrum, the past–future mutual information can be read off as the height of the continuous part of the function as it approaches zero frequency:

Indeed, the limit of zero frequency is necessary due to the delta function in the Fourier transform at exactly zero frequency:

Reflecting on this, the delta function indicates one of the reasons the excess entropy has been difficult to compute in the past. This also sheds light on the role of the Drazin inverse: It removes the infinite asymptotic accumulation, revealing the transient structure of entropy convergence.

We also have a spectral decomposition of the excess-entropy spectrum:

where, in the last equality, we assume that is real. This shows that, in addition to the contribution of typical leaky modes of decay in entropy convergence, the zero-eigenvalue modes contribute uniquely to the excess entropy spectrum. In addition to Lorentzian-like spectral curves contributed by leaky periodicities in the MSP, the excess-entropy spectrum also contains sums of cosines up to a frequency controlled by index , which corresponds to the depth of the MSP’s nondiagonalizability. This is simply the duration of ephemeral synchronization in the time domain.

iv.4 Synchronization information

Once expressed in terms of the -MSP transition dynamic, the derivation of the excess synchronization information closely parallels that of the excess entropy, only with a different ket appended. We calculate, as before, finding:

For an ergodic process where , this becomes:

(31)

From Eq. (24), we see that the general solution for takes on its most elegant form in terms of the Drazin inverse of :

(32)

From Eq. (32) and Eq. (26), we also see that the excess synchronization information has the general spectral decomposition:

(33)

Again the form of Eq. (32) suggests generalizing synchronization information from a complexity measure to a complexity function . In this case, the result is simply related to the Fourier transform of the two-sided myopic state-uncertainty .

iv.5 Power spectra

The extended complexity functions, and just introduced, give the same intuitive understanding for entropy reduction and synchronization respectively as the power spectrum gives for pairwise correlation. Recall that the power spectrum can be written as:

We see that is the resolvent of evaluated along the unit circle for . Hence, by Part I’s decomposition of the resolvent, the general spectral decomposition of the continuous part of the power spectrum is:

As with and , all continuous frequency dependence of the power spectrum again lies simply and entirely in the denominator of the above expression.

Analogous to Ref. [6]’s results, the power-spectrum delta functions arise from the eigenvalues of that lie on the unit circle: