Applicability and interpretation of the deterministic weighted cepstral distance

# Applicability and interpretation of the deterministic weighted cepstral distance

[    [
###### Abstract

Quantifying similarity between data objects is an important part of modern data science. Deciding what similarity measure to use is very application dependent. In this paper, we combine insights from systems theory and machine learning, and investigate the weighted cepstral distance, which was previously defined for signals coming from ARMA models.

We provide an extension of this distance to invertible deterministic linear time invariant single input single output models, and assess its applicability. We show that it can always be interpreted in terms of the poles and zeros of the underlying model, and that, in the case of stable, minimum-phase, or unstable, maximum-phase models, a geometrical interpretation in terms of subspace angles can be given. We then devise a method to assess stability and phase-type of the generating models, using only input/output signal information. In this way, we prove a connection between the extended weighted cepstral distance and a weighted cepstral model norm.

In this way, we provide a purely data-driven way to assess different underlying dynamics of input/output signal pairs, without the need for any system identification step. This can be useful in machine learning tasks such as time series clustering. An iPython tutorial is published complementary to this paper, containing implementations of the various methods and algorithms presented here, as well as some numerical illustrations of the equivalences proven here.

Leuven,imec]Oliver Lauwers, Leuven,imec]Bart De Moor,

KU Leuven, Department of Electrical Engineering (ESAT), STADIUS Center for Dynamical Systems, Signal Processing and Data Analytics, Kasteelpark Arenberg 10, Leuven, Belgium

imec, Leuven, Belgium

Key words:  Dynamic systems, metrics, subspace methods, time series analysis, deterministic systems

## 1 Introduction

Quantifying similarity between two data objects is a quintessential part of system identification, control theory and machine learning. Whether it is to assess how good an estimated system fits a dataset, decide whether the output of a system is close enough to the reference signal, or to discern relations between different data objects, always, an implicit or explicit choice of similarity measure has to be made.

For signals coming from dynamical systems, which are traditionally handled with techniques from the field of systems and control, it is important to keep track of the dynamics of the underlying generative systems. Indeed, in systems and control applications, it are precisely these dynamics that interest us, as they give us information about normal operation characteristics, changes in system behaviour and many other system properties.

However, in modern machine learning applications, these insights about the correlation in time by which time series are characterized, are rarely taken into account. Rather, signals - or time series, these terms will be used interchangeably in this paper - are either treated as ordinary vectors, and a vector distance (e.g. the Euclidean distance) is employed, or one or more features or statistics of the signals are calculated (e.g. mean, median, standard deviation), and a distance measure on these features is employed, largely ignoring the dynamics of the generative models. On the other hand, explicitly identifying systems is infeasible and computationally expensive in typical machine learning applications, characterized by large amounts of very long time series, often in need of an automated solution, without taking into account much domain knowledge.

In this paper, we argue that there is a need for dynamics-based distance measures in machine learning applications in general and time series clustering specifically. We then look at one such measure, the weighted cepstral measure [2, 3, 8, 10, 17], which was previously and up to now only defined for signals generated by ARMA models, driven by white noise inputs. It is used in various clustering and classification tasks, such as human activity recogniton [6, 15], dynamic texture recognition [21, 24] and structural damage identification [25].

The main contributions of this paper are:

• We extend the weighted cepstral distance to general invertible deterministic linear time invariant (LTI) single input single output (SISO) systems.

• We extend the distance measure to signals coming from unstable, maximum-phase systems.

• We prove that these extensions can still be interpreted geometrically with the notion of subspace angles, and system theoretically in terms of poles and zeros of the generative models. This links the weighted cepstral distance to the weighted cepstral norm.

• We interpret the distance in terms of poles and zeros for systems with a mixture of stable and unstable poles and/or minimum and maximum-phase zeros. The interpretation in terms of subspace angles is lacking in this case.

• We provide a purely data-driven way to assess whether a signal comes from a completely minimum-phase/stable, a completely maximum-phase/unstable, or a mixed model, by employing what is known as the complex cepstrum.

This paper is organised as follows. In Section 2, we discuss an example of a machine learning task that is critically dependent on by the distance measure considered: clustering time series. In Section 3, we introduce some notation and define some notions we will use throughout. Section 4 gives a motivating example of how taking into account underlying dynamics can be critical in a clustering problem. This then leads us to consider a model norm, the weighted cepstral distance, in Section 5, connecting it to a time series distance at the end of that Section. We explain why we need to assess whether systems are minimum, maximum or mixed-phase. Section 6 introduces a novel data-driven approach to asses the phasetype of the underlying model based on input/output signals. The paper is accompanied by an iPython notebook tutorial\@footnotemark\@footnotetexthttps://github.com/Olauwers/Applicability-and-interpretation-of-the-deterministic-weighted-cepstral-distance, results of which are briefly discussed in Section 7. Conclusions and future paths of research can be found in Section 8. The Appendices contain some computational and theoretical details of the implementation of the different methods, and the proofs of the several equivalences.

## 2 Clustering signals

Clustering signals is the (unsupervised) task of finding groups of similar time series in a dataset, and is an important topic in contemporary machine learning. Traditional clustering methods for other types of data do not carry over trivially, as signal datasets typically are high-dimensional, and temporal correlations both between signals and within signals need to be taken into account when clustering.

Time series clustering algorithms consist of three main components:

• a similarity measure based on relevant features of the data,

• a clustering scheme,

• a way to evaluate the clustering results.

While the latter two might carry over from other data types, or can be generic across applications, the notion of similarity depends critically on the specific problem at hand.

From the point of view of systems and control theory, we often are interested in the dynamics underlying the signal, and not so much in the specific shape of the signal. If we had access to the generating model, we could calculate distances based on model norms, such as the or norms, which explicitly takes into account the model dynamics. However, when only input and output signals are accessible, we resort to distance measures that can be calculated from the raw data alone, or explicitly identify a model of the system, which can be computationally expensive and typically requires quite some user expertise and intervention (some norms, such as the norm and the weighted cepstral norm, can be calculated from input/output signals, and as such do not require this step). In practice, the problem is often not given much attention and off-the-shelf distance measures that are sometimes ill-suited to handle the application at hand are used, such as shape-based distances, like the Euclidean distance (see [20] for a discussion).

To avoid the latter, we are thus interested in distances based on model norms that can be calculated from raw data, in an automated way (without user chosen design parameters, model order estimation, …). One such distance is the weighted cepstral distance [2, 3, 8, 10, 17], which was proven to be a model norm in the case of data generated by ARMA models. We will formally introduce the data type we handle in Section 3, as well as the weighted cepstral distance. We proceed to illustrate why it is useful in systems and control theory with a motivating example in Section 4.

## 3 Notation and definitions

In this Section, we first introduce some general definitions and notation, then discuss the cepstrum. Afterwards, we introduce the notion of subspace angles, and we end the Section by introducing some distance measures.

### 3.1 General notation

Define to be an output signal, which is the output of a system driven by the input signal . is then the value of the output signal at timepoint , and similarly for . Both signals and are of length . The output signal is generated by a Linear Time Invariant (LTI) Single-Input Single-Output (SISO) dynamical system, of which the dynamics can be described by the state-space model

 {x(k+1)=Ax(k)+Bu(k)y(k)=Cx(k)+Du(k), (1)

with the states of the model and , , and system matrices of appropriate dimensions. We assume this model is invertible. Furthermore, we define for every state-space model an observability matrix as

 Γj=(CCACA2⋯CAj−1)⊺, (2)

where the and matrix are the matrices from Equation (3.1), and denotes the transpose.

Taking the z-transform of the signal, the relation between the input and output becomes, assuming , (in frequency domain, denoted with the variable )

 Y(z)=H(z)U(z), (3)

with the transfer function of the system, written in terms of system matrices as

 H(z)=D+C(z1−A)−1B. (4)

For a SISO LTI system, the transfer function is a rational function in , with both numerator and denominator a polynomial. We can express such a system as

 Y(z)=b(z)d(z)a(z)c(z)U(z). (5)

Here, the polynomial has roots of a magnitude smaller than 1 (called the minimum-phase zeros, denoted by ), the polynomial has roots of magnitude greater than 1 (maximum-phase zeros, ). Similarly, the polynomial in the denominator has roots of magnitude smaller than 1 (stable poles, ) and has roots of magnitude greater than 1 (unstable poles, ).\@footnotemark\@footnotetextWe assume, throughout the paper, that there are no poles or zeros exactly on the unit circle (i.e. for , with the imaginary unit). Since, in the discrete frequency domain, we evaluate at the unit circle, not all notions introduced here will be defined for systems with poles or zeros on that circle. The cepstrum, for example, is not well defined there. These cases will therefore not be considered here. For a discussion on how to deal with these kinds of roots, see [18].

Written out in terms of poles and zeros, the transfer function becomes

 H(z)=g∏qj=1(1−βjz−1)∏rj=1(1−δjz−1)∏pj=1(1−αjz−1)∏sj=1(1−γjz−1), (6)

where , , , denote the degree of the corresponding polynomial of the respective types of zeros and poles and is a constant factor called the gain of the transfer function.

The power spectrum of the system with transfer function is defined as

 Φh(eiω)=H(eiω)¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯H(eiω)=∣∣H(eiω)∣∣2, (7)

and similarly for , the power spectrum of the output, and , for the input. Here the overbar denotes the complex conjugate, the imaginary unit and denotes magnitude. A property of the power spectrum is that . Furthermore, we have from Equation (3.1) that

 Φy(eiω)=Φh(eiω)Φu(eiω) (8)

The assumption of invertibility of the system amounts to assuming in Equation (3.1). The equivalent of Equation (3.1) for the inverse system is then given by

 {x(k+1)=(A−BD−1C)x(k)+BD−1y(k)u(k)=−D−1Cx(k)+D−1y(k). (9)

The corresponding transfer function of the inverse system can be written respectively in terms of system matrices and pole and zero polynomials as

 H−1(z) (10) =D−1−D−1C(z1−A+BD−1C)−1BD−1 =a(z)c(z)b(z)d(z).

From these equations, we can extend all definitions in this section to the inverse system.

### 3.2 Cepstrum

Denoting the inverse Fourier transform as , the power cepstrum\@footnotemark\@footnotetextThe terminology power spectrum stems from the fact that it is based on the power spectrum. A different notion, called complex cepstrum, will be introduced later on. We will use the terms power cepstrum and cepstrum interchangeably. When we refer to the complex cepstrum, we will always write it out explicitly. , of a transfer function is written as

 ch(k)=F−1(logΦh(eiω)), (11)

again adopting similar definitions for and , the power cepstra of respectively output and input.

The rationale behind employing the power cepstrum in signal processing stems from a subfield called homomorphic signal processing [18], where the objective is to simplify complicated multiplicative operators (such as convolutions). As we can see from Equation (3.2), the convolution from time domain changes into a multiplication by virtue of the property in Equation (3.1). The logarithm takes this multiplication to an addition. Finally, the inverse Fourier transform is applied to convert the problem back to (a transformation of) time domain. This type of analysis is often referred to as quefrency alanysis [4].

Starting from equation (3.1), we can express the power cepstrum as (see Appendix A.1 for a derivation)

 ch(k)=p∑j=1α|k|j|k|+s∑j=1γ−|k|j|k|−q∑j=1β|k|j|k|−r∑j=1δ−|k|j|k|∀k≠0, (12)

and

 ch(0)=g′, (13)

which is a combination of the gain of equation (3.1) and some rest terms coming from the maximum-phase zeros and unstable poles. This term is not too important for our purposes, and we will omit futher discussion. The interested reader is referred to [18]. Note that the poles and zeros of magnitude greater than 1 appear as their inverse. To see why this is so, we refer to Appendix A.1.

As an aside, we can (partly) give an interpretation of these cepstrum coefficients in terms of system matrices. Indeed, note that in the minimum-phase, stable case, the poles of a model are the eigenvalues of the -matrix in state space representation (Equation (3.1)). The zeros are the poles of the inverse model (Equation (3.1)), and therefore the eigenvalues of . All these eigenvalues are, of course, of magnitude smaller than 1.

We can then write, using only the ’s and ’s in Equation (3.2) (i.e., the breakpoints with magnitude smaller than 1),

 ch(k)=tr{Akk}−tr⎧⎨⎩(A−BD−1C)kk⎫⎬⎭, (14)

where denotes the trace of a matrix.

We now have a simple relation for the cepstrum coefficients:

 ch(k)=cy(k)−cu(k)∀k, (15)

which allows us to compute the cepstrum coefficients of the transfer functions from the input-output signal pair alone. In principle, the power cepstrum can be calculated from input/output data and then related via (3.2) to poles and zeros of the underlying LTI system.

In analogy to the power cepstrum, we denote the complex cepstrum\@footnotemark\@footnotetextNote that the term complex cepstrum is a bit of a misnomer, as the complex cepstrum coefficients of a real signal are real. The name stems from the fact that in the complex cepstrum, information on the phase of the system is retained, which is not the case in the power cepstrum, as poles and zeros of magnitude greater than 1 appear as their inverses in Equation (3.2). of a transfer function by , and write

 ^ch=Z−1(logH(z)), (16)

where denotes the inverse z-transform.

A similar derivation as the one for the power cepstrum results in

 ^ch(k)=⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩p∑j=1αkjk−q∑j=1βkjk∀k>0log(g′)k=0−s∑j=1γkjk+r∑j=1δkjk∀k<0. (17)

We provide a detailed description on how to numerically compute the power and complex cepstra in Appendix A.

### 3.3 Subspace angles

Principal angles are generalizations of angles between vectors, and describe angles between given subspaces of a vector space, going back to Jordan [12], Hotelling [11] and Akaike [1]. Figure 1 shows a visualisation of these angles, along with a short description of how to obtain them. They are defined iteratively as follows.

Denote the principal angles between two subspaces, and , by , with the dimension of the smallest subspace. Take unit vectors, and . We then define recursively the angles as

 cos(θk)=maximize→v∈V,→w∈W→v⊺⋅→wsubject to||→v||=||→w||=1,→v⊺⋅→vi=0∀i

The and found this way are called principal directions.

These principal angles give a measure of (dis)similarity between subspaces. The smaller the angles, the more similar or closer the subspaces are. If all principle angles were right angles, all principal directions would be perpendicular to each other, and there would be no shared dimensions in the subspaces.

As shown in [8, 9, 22], the squared cosines of the principal angles and the principal directions between the column spaces of two full rank matrices and can be calculated from the symmetric generalized eigenvalue problem

 (0A⊺BB⊺A0)(xy)=λ(A⊺A00B⊺B)(xy), (19)

subject to and . The solutions of this eigenvalue problem are the cosines of the principal angles between the column spaces of and .

From here, one can show that

 cos2θi=λi((A⊺A)−1A⊺B(B⊺B)−1B⊺A), (20)

where denotes the -th eigenvalue of the expression between brackets.

The subspace angles between two models, and are defined as the principal angles between the column spaces of the following infinite observability matrices:

 [M1∢M2]= (21)

where denotes the inverse of model .

### 3.4 Distance measures

We now formally introduce two distance measures:

Euclidean distance

The Euclidean distance is based on the well-known -norm, and given by

 de(y1,y2)= ⎷N∑k=0(y1(k)−y2(k))2. (22)

This distance is purely based on the shape of the particular signal, not on its underlying dynamics, and thus not appropriate for solving problems were these dynamics are the most important factor.\@footnotemark\@footnotetextNote that there is no clear way to take into account the inputs in this distance. Naive approaches like including a term with the euclidean distance between input signals did not change the conclusions presented in this paper. We therefore omit this matter in the rest of our discussion. This is not a criticism against the Euclidean distance, only a statement about its application domain.

Cosine similarity

The cosine similarity gives the cosine of the angle, , between two vectors, i.e.

 dθ(y1,y2) =cos(θ) (23) =∑Nk=0y1(k)y2(k)√∑Nk=0y21(k)√∑Nk=0y22(k).

This similarity interprets the time series and as vectors, and calculates the angle between them. It then returns the cosine of this angle. Note that this is not, strictly speaking, a distance measure, as two orthogonal vectors, which are far apart, will return a right angle, and a cosine of 0. A proper distance would only return a value of 0 for identical objects. It is, however, a similarity measure, and one could define a distance by transforming the end result. We will, however, not do so here, but rather continue with the similarity measure with the understanding that means the two objects are far away from each other, and means two objects are identical to each other (as in this case).

Weighted power cepstral distance

The power cepstral distance is defined in terms of the power cepstrum coefficients of two signals as

 dc(y1,y2)2=∞∑k=1k(ch1(k)−ch2(k))2. (24)

From Equation (3.2), we can see this distance will be related to poles and zeros of the underlying models. This is an indication that the power cepstral distance measures the underlying dynamics of the systems. Indeed, in subsequent Sections, we will show that the power cepstral distance constitutes a model norm and provide a geometrical interpretation for this norm. As can be readily seen from the definition, the zeroth cepstrum coefficients, and do not contribute to the distance, which is why we can omit a more detailed discussion of this zeroth term.

## 4 Motivating example

22footnotetext: The damping is the result of the technical consideration that, analytically, the theoretical framework we develop here does not hold for poles and zeros on the unit circle. A slight damping is enough to alleviate this problem.

When clustering time series from a systems and control perspective, it is necessary to define a notion of similarity that takes into account information about the dynamics of the system. In this section, we present a simple motivating example that shows the need for a model-driven distance measure in systems and control theory.

Suppose we have an exponentially damped sine wave, with a pulsation of 10 rad/s. In our simple example, we have two more signals: an exponentially damped cosine wave, also with a pulsation of 10 rad/s, and an exponentially damped sequence of random numbers. Figure 2 gives part of these signals. Note that there is no input signal provided for these signals, so they are autonomous systems (i.e., the input is a zero signal). A distance measure can then decide which of the latter two signals is most similar to the original sine wave. From a systems and control point of view, it is obvious this would be the cosine wave, as it differs from the original system only in initial state, but not in dynamics.\@footnotemark\@footnotetextIt is important to note that choosing the right distance measure is very much application-dependent. Other applications might rightly deem the random number sequence to be the most similar one. There is thus not one best notion of distance or similarity, but rather a scale of measures varying in appropriateness for the problem at hand.

As can be seen from Table 1, the Euclidean distance deems the similarity between all pairs of signals to be about the same. This is, of course, to be expected, as all vector pairs constitute orthogonal vector pairs. This is not a criticism of the Euclidean distance, but an argument against using it for problems where the underlying dynamics of the signals are critical.

The cosine distance between all signals is about 0. This, again, was to be expected, as sine and cosine constitute orthogonal vectors, and white noise signals are orthogonal to all others.

The weighted cepstral distance, however, correctly identifies the sine and cosine wave as coming from the same model, resulting in a distance close to zero (which would be the analytically expected value; the deviation is the result of the finite length of the series). Furthermore, it recognizes that the sine wave and Gaussian noise are far apart from a dynamical perspective. Interestingly, it returns more or less the same distance for both the sine and cosine wave compared to Gaussian noise, indicating again that it correctly quantifies the difference in dynamics between the signals.

From Table 2, we see that the zeroth, first and second order statistics also do not provide a solution to firmly discern which signal pairs are closest. Again, for all signals, the results are about the same.

Of course, the example here is a toy problem, which can be easily solved by a trivial extension of the Euclidean distance (i.e. time-shifting both signals to minimize the distance between them). However, it does indicate that in systems and control theory, the choice of distance measure should be made with care. A more involved simulation example by the authors can be found in [14]. If the link between the measure and the underlying dynamics is not clear, the validity and usefulness of the clustering results may be in jeopardy.

In what follows, we will interpret the weighted cepstral distance as a model norm in the case of SISO LTI models with known inputs. We then relate this model to the notion of subspace angles between the models. We prove that the distance is a model norm in both the case of minimum-phase, stable and that of maximum-phase, unstable systems. We show some of the problems that arise in the mixed-phase case and a data-driven test to check the phase-type of the underlying dynamics.

## 5 Cepstral norm

In this Section, we consider the power cepstral distance as a model norm, based on the distance measure in Equation (3.4),

 ||H||2c=∞∑k=1k(ch(k))2. (25)

Note that this is not, in fact, a proper model norm. Indeed, it does not take into account information in the zeroth cepstrum coefficient, i.e., information on scale, and only takes into account the power spectrum, which by virtue of Equation (3.1), only takes into account the magnitude of the transfer function, and not it’s angular components (which we call phase information). However, it is a norm on the vector space constituted by equivalence classes containing all systems that have the same power spectrum (3.1), up to a difference in gain.

As shown in [7], the weighted cepstral norm can be linked to the Hilber-Schmidt-Hankel norm of the (double infinite) Hankel matrix of cepstrum coefficients

 Ch=⎛⎜ ⎜ ⎜ ⎜ ⎜⎝ch(1)ch(2)ch(3)…ch(2)ch(3)ch(4)…ch(3)ch(4)ch(5)…⋮⋮⋮⋱⎞⎟ ⎟ ⎟ ⎟ ⎟⎠∈R∞×∞. (26)

The Hilbert-Schmidt norm of is

 ||Ch||2HS=trChC⊺h=∞∑k=1k(ch(k))2, (27)

which is equal to the norm in Equation (5).

In this section, we will give this norm an interpretation in terms of poles and zeros of the underlying model, and the subspace angles of this model. Given a model , these subspace angles are defined (see Equation (3.3)) as

 [M∢M1]=[Γ∞(M)∢Γ∞(M−1)]. (28)

This is equivalent to the angles between and a model, , with the identity function as transfer function. To interpret the norm, we need to consider three different cases:

• minimum-phase, stable systems (i.e. poles and zeros of magnitude smaller than 1),

• maximum-phase, unstable systems (i.e. poles and zeros of magnitude greater than 1),

• the mixed case, with poles and zeros of magnitudes both smaller and greater than 1.

For simplicity, we always assume simple poles and zeros (i.e. multiplicites of 1) in the proofs presented here. Results carry over when this is not the case, but the derivations become much more involved.

This section starts with three Subsections, corresponding to the three different cases. The weighted cepstral distance is interpretable in terms of subspace angles in the minimum and maximum-phase case, but not in the mixed-phase case. A fourth Subsection will provide a link between the weighted cepstral distance from Equation (3.4) and the norm (5). A method to assess phase-type and stability of a system based on data alone will be provided in Section 6.

### 5.1 Minimum-phase stable interpretation

In the case of a discrete-time, completely minimum-phase and stable deterministic system, we write

 Ymin,s(z)=b(z)a(z)Ud(z), (29)

where and are now polynomials built from respectively stable poles and minimum-phase zeros of the system. This assures that both the model and its inverse are completely stable dynamical systems. All poles and zeros are furthermore assumed to be simple. The subscript in denotes the fact that we are dealing with deterministic inputs. ARMA-models fit into this case.

#### 5.1.1 Cepstral norm

From Equation (3.2), we can readily see that in this case

 ch(k)=p∑j=1α|k|j|k|−q∑j=1β|k|j|k|∀k≠0, (30)

and again . We then derive for the norm in Equation (5),

 ||Hmin,s||2c=∞∑k=1k(ch(k))2 (31) =∞∑k=1k(p∑j=1αkjk−q∑j=1βkjk)2 =logp∏i=1q∏j=1∣∣1−αi¯βj∣∣2p∏i,j=1(1−αi¯αj)q∏i,j=1(1−βi¯βj),

where the last equality holds true because of the series expansion

 log(1−x)=−∞∑k=1xkk∀|x|<1. (32)

#### 5.1.2 Subspace angles

Now, we show we can also interpret the subspace angles from Equation (3.3) in terms of poles and zeros of the model. As discussed above Equation (3.3), these angles are defined between the column spaces of two matrices. We can prove (see Appendix B) that the column space of the observability matrix of the system in Equation (5.1) can be expressed in terms of the (simple) poles, of the system, as follows

 range( Γj(Mmin,s)) (33) =range⎛⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜⎝11⋯1α1α2⋯αpα21α22⋯α2p⋮⋮⋱⋮αj−11αj−12⋯αj−1p⎞⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟⎠,

where is the infinite observability matrix of the system in Equation (5.1) truncated at the -th term. Analogously, the column space of the inverse of the minimum-phase system can be expressed as

 range( Γj(M−1min,s)) (34) =range⎛⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜⎝11⋯1β1β2⋯βqβ21β22⋯β2q⋮⋮⋱⋮βj−11βj−12⋯βj−1q⎞⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟⎠,

where is the infinite observability matrix of the inverse system.

Denote the complex conjugate transpose of a matrix with the superscript . We then introduce the notation

 P(1)(1) =ΓHj(Mmin,s)Γj(Mmin,s) (35) P(1)(−1) =ΓHj(Mmin,s)Γj(M−1min,s) P(−1)(1) =ΓHj(M−1min,s)Γj(Mmin,s) P(−1)(−1) =ΓHj(M−1min,s)Γj(M−1min,s).

Now, starting from Equation (3.3), with and plugged in for and respectively, some algebra easily shows that

 logn∏i=1cos2θi (36) =limj→∞logdet(P−1(1)(1)P(1)(−1)P−1(−1)(−1)P(−1)(1)) =logp∏i,j=1(1−αi¯αj)q∏i,j=1(1−βi¯βj)p∏i=1q∏j=1∣∣1−αi¯βj∣∣2 =−||Hmin,s||2c,

where the denote the subspace angles, is the transfer function from Equation (5.1) and the last equivalence follows from Equation (5.1.1).

We have thus, for the minimum-phase, stable case, interpreted the cepstral norm in terms of subspace angles of the model, connecting them through the poles and zeros of the models. This shows that the cepstral norm is indeed an interpretable model norm.

#### 5.1.3 Stochastic inputs

The stochastic case is a corollary of this result. De Cock and De Moor [8, 10] define the cepstral norm of a signal , generated by model , with transfer function , as

 ||H||2c=∞∑k=1k(cy(k))2, (37)

(compared to our definition in terms of cepstra of the transfer functions in Equation (3.4)). In the stochastic case, however, the output cepstrum, , and cepstrum of the transfer function, , coincide. Indeed, the cepstrum coefficients of a white noise input, fulfil the condition that

 cuw(k)=0,∀k≠0, (38)

and they drop out of the expression for the cepstral distance. In this case we then have

 cy(k)=ch(k),∀k≠0, (39)

and the two definitions thus coincide.

#### 5.1.4 Autonomous systems

In the case of an autonomous system (i.e. the sequence in Equation (3.1)), the same observation as in the previous Subsection can be made, namely

 cy(k)=ch(k),∀k≠0, (40)

and the norm can be calculated from the output signal. Autonomous systems have only poles, but the rest of the interpretation remains the same, and we omit the details here. Unstable autonomous systems exist as well, but we will not discuss them in the next Subsection. The results of Subsection 5.2 also apply to this type of system, however.

### 5.2 Maximum-phase unstable interpretation

In the case of discrete-time, completely maximum-phase and unstable dynamical systems, we assume that for the model

 Ymax,u(z)=d(z)c(z)Ud(z), (41)

all zeros and poles that are the roots of polynomials and , respectively, are unstable. All poles and zeros are assumed simple. Both the system and its inverse are completely unstable.

#### 5.2.1 Cepstral norm

Again from Equation (3.2), we see that in this case

 ch(k)=s∑j=1¯γ−|k|j|k|−r∑j=1¯δ−|k|j|k|∀k≠0, (42)

and . The unstable poles and zeros thus appear as their inverses (see Appendix A.1 to see why this is so). Since these inverted poles and zeros are again of magnitude smaller than 1, we can proceed as in the minimum-phase, stable case. This leads to the following expression for Equation (5):

 ||Hmax,u||2c=∞∑k=1k(ch(k))2 (43) =∞∑k=1k⎛⎝s∑j=1γ−kjk−r∑j=1δ−kjk⎞⎠2 =logs∏i=1r∏j=1∣∣1−γ−1i¯δ−1j∣∣2s∏i,j=1(1−γ−1i¯γ−1j)r∏i,j=1(1−δ−1i¯δ−1j),

where is the transfer function from Equation (5.2).

#### 5.2.2 Subspace angles

Sending all the poles and zeros of the system to their inverses, amounts to an inversion of the plane with respect to the unit circle. It is a well-known result from inversive geometry that such a transformation is a conformal mapping of the plane, and thus preserves angles [13, p.479]. Therefore, computing the subspace angles in the inverted space will give the same results as computing them in the original space. However, in the inverted space, unstable/maximum-phase poles and zeros are mapped to their stable/minimum-phase inverses, which allows us to repeat the results from Section 5.1.

We can thus avoid problems with the infinite limit of the observability matrices in Equation (5.1.2) if we invert the whole plane around the unit circle, which maps the (simple) poles and zeros from Equation (5.2) to their inverses. We can then express the range of the observability matrix of the unstable, maximum-phase system as

 range( Γj(Mmax,u)) (45) =⎛⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜⎝11⋯11/γ11/γ2⋯1/γs1/γ211/γ22⋯1/γ2s⋮⋮⋱⋮1/γj−111/γj−12⋯1/γj−1s⎞⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟⎠,

with an analogous result in terms of (simple) zeros for the observability matrix of the inverse of the system, .

Following the same procedure as in the previous Subsection, it is now straightforwardly shown that for the maximum-phase deterministic case

 ||Hmax,u||2c=−log(n∏i=1cos2θi), (46)

with the subspace angles of the model.

This shows that, for the maximum-phase, unstable case, the cepstral norm is again an interpretable model norm.

### 5.3 Mixed-phase interpretation

Suppose now we have a combination of the previous two types of models, where there is a mixture of stable and unstable poles, and both minimum-phase and maximum-phase zeros. We can express such a system as

 Ymix(z)=b(z)d(z)a(z)c(z)Ud(z), (47)

where and contain all stable zeros, , and poles, , respectively, and and contain the unstable zeros, , and poles .

#### 5.3.1 Cepstral norm

We now have the full expression for the cepstrum coefficients, as written down in Equation (3.2), which we repeat here:

 ch(k)= p∑j=1α|k|j|k|+s∑j=1γ−|k|j|k| (48) − q∑j=1β|k|j|k|−r∑j=1δ−|k|j|k|∀k≠0,

giving rise to the cepstrum norm in Equation (44).

#### 5.3.2 Subspace angles

It is not clear how we can give the cepstral norm an interpretation in terms of subspace angles in this case. Directly calculating angles between rows of the observability matrix spanned by poles or zeros of magnitude larger than 1 and rows spanned by poles or zeros of magnitude smaller than 1, makes the limit in Equation (5.1.2) diverge, and renders us unable to prove the equivalence between the cepstral norm and the subspace angles between observability matrices.

Furthermore, Equation (44) shows us that there is no difference in norm between a system with only one stable pole and a system with only one unstable pole , its inverse. Similarly, given any rational polynomial transfer function, we can invert any number of its stable or unstable poles or any minimum-phase or maximum-phase zeros and we will still get the same value for the norm. This is a direct consequence of the fact that the cepstral norm is based on the power spectrum, which is by definition insensitive to stability of poles and zeros, as can be readily seen from Equation (3.1), where the transfer function is evaluated both in (the complex plane) and (the complex plane inverted over the unit circle). This means that the norm for a stable, minimum-phase model and its unstable, maximum-phase equivalent generated by inverting all poles and zeros, would be equal (and, as we will see in the next subsection, the distance between them would ). For most applications where underlying dynamics are critical, this is not acceptable.

The interpretation of the model norm is thus lacking in this case, and we cannot give any meaningful system theoretical insight to the cepstral distance between signals coming from mixed-phase systems. The distance will therefore not be useful in this case. However, given a set of input-output signals, we do not a priori know the phase-type of the underlying model. In the Section 6, we will provide a data-driven way of assessing this phase-type (and thus the applicability of the cepstral distance), by employing the complex cepstrum.

### 5.4 Relation between norm and distance

The cepstral norm in Equation (5) can be related to the cepstral distance in Equation (3.4). We will show this, to simplify notation, for systems with only one stable pole and no zeros, but this result follows straightforwardly for all three cases considered in the previous subsections.

Given two signals, and , coming from two systems, and with transfer functions

 Hi(z)=11−αiz−1,∀i∈{1,2}, (49)

with . The power cepstra coefficients are then expressed, as in Equation (3.2), as

 (50)

We now look at the cascade of transfer functions . Its power cepstra coefficients are

 chtot(k)=α|k|1|k|−α|k|2|k|,∀k≠0 (51)

which means its cepstral norm is

 ||Htot||2c =||H1H−12||2c (52) =∞∑k=1k(αk1k−αk2k)2. =∞∑k=1k(ch1(k)−ch2(k))2 =dc(y1,y2)2.

We have thus identified , which holds true in all cases considered in this paper. This means that, in going from the distance to the norm, zeros of and poles of will act as zeros of the cascaded system, the norm and vice versa for the poles of and the zeros of . All results obtained in this section could just as well be derived for the distance, but keeping track of poles and zeros of the different models results in very cumbersome notation.

Note that the interpretation of the cepstrum in terms of system matrices in Equation (3.2) allows us to not only calculate the distance between time series, but also between time series and state-space models. This can have applications in anomaly detection for industrial processes, where often a model for normal behaviour is known and a large distance between sensor signals and the model can indicate an anomaly.

## 6 Assessing phase-type of systems

Because of the use of the power spectral density in equation (3.2), the power cepstrum does not take into account any information about phase-type and stability of the systems (i.e. whether poles and zeros have magnitudes greater or smaller than 1). Unsurprisingly, it is exactly this information that we need to discern between minimum-phase/stable, maximum-phase/unstable, or mixed models. This can be solved by instead employing the complex cepstrum from Equation (3.2).

 Y(z)=b(z)d(z)a(z)c(z)Ud(z), (53)

where and contain all stable zeros, , and poles, , respectively, and and contain the unstable zeros, , and poles . The degrees of these polynomials are unknown, and might be zero (i.e. no poles and zeros of that type are present in the system). The question we want to answer now, is whether we can, based on input-output signal pairs only, determine whether the system is mixed-phase or not.

If the system is mixed-phase, the results of Subsection 5.3.2 apply, and we cannot straightforwardly interpret the norm in terms of subspace angles. If the system is either minimum-phase stable or maximum-phase unstable, the results of respectively Subsection 5.1 and 5.2 apply. The method in this Section thus serves as a test to determine whether the cepstral norm and the mathematical equivalences to subspace angles behind it apply to the input-output data at hand.

We use the information contained in Equation (3.2) as a test to assess whether the cepstrum norm in Equation (5) is interpretable in terms of subspace angles or not:

• If the coefficients with negative coefficient number are all\@footnotemark\@footnotetextAs can be seen from equation (3.2), the coefficients contain a factor . This means that if the first few coefficients are zero, we can be sure that the following ones will also vanish. We thus do not have to test an infinite number of coefficients. zero, but at least one of those with positive coefficient number is non-zero, the system is minimum-phase, and the results of Section 5.1 apply.

• If at least one of the coefficients with negative coefficient number is non-zero, but all of the ones with positive coefficient number are zero, the system is maximum-phase, and the results of Section 5.2 apply.

• If at least one of the coefficients with negative coefficient number is non-zero, and at least one of the ones with positive coefficient number is also non-zero, the system is mixed-phase, giving rise to the problems discussed in Subsection 5.3.2.

Given an input-output signal dataset, we now have devised a technique that allows us to test whether the reformulated cepstrum distance makes sense. An example for each phase-type is shown in Figure 3.

Note that this technique does not involve any unnecessary or extra calculations. Indeed, comparing the cepstrum coefficients in equation (3.2) with those in equations (5.1.1) and (5.2.1), shows that the complex cepstra of the minimum-phase and maximum-phase case are nothing more than , and , respectively (up to a complex conjugate, which does not matter for the distance, as the complex conjugate pairs always appear together).

The whole procedure of interpreting the cepstral norm in terms of subspace angles is summarized in Figure 4.

## 7 Numerical simulations

An iPython tutorial notebook is available on GitHub\@footnotemark\@footnotetexthttps://github.com/Olauwers/Applicability-and-interpretation-of-the-deterministic-weighted-cepstral-distance, containing a tutorial to check numerically the equivalences shown in this paper and tests the techniques presented for systems of different phase-types.

In time domain, we provide an implementation to check visually the phase of the underlying system (employing and verifying Equation (3.2)), then verify Equation (5.1.2) for stable, minimum-phase systems. For the unstable, maximum-phase case, we start from spectral data to visually check the phase of the underlying system (again employing and verifying Equation (3.2)) and proceed to verify Equation (46). The reader is invited and encouraged to try out the tutorial for various systems, signal lengths, model orders, sampling times, …

We test this for the minimum-phase/stable and maximum-phase/unstable systems in Figure 3. Results are in good agreement with the theory.

Stable minimum-phase system

This system has zeros at , , , and poles at , , . We get an agreement between the weighted cepstral distance and the subspace angle distance (respectively the right hand side and left hand side of Equation (5.1.2)) of order . Computations are done starting from time domain input/output signal data. The agreement with the theoretical formulas in terms of poles and zeros, as in Equation (5.1.1), is of order for the weighted cepstral distance, and of order for the subspace angle norm. Both norms approximate the expression in terms of poles and zeros, though the subspace angle norm is better at this.

Unstable maximum-phase case

This system has zeros at , , , and poles at , , . Since in this case the system is unstable (and any output signals coming from it are unbounded), we start from data in the frequency domain. The agreement between the weighted cepstral distance and the subspace angle distance is again of order . The agreement with the theoretical formulas in terms of poles and zeros, as in Equation (5.2.1), is of order for the weighted cepstral distance, and of order for the subspace angle norm. Both norms again approximate the expression in terms of poles and zeros, though this time, the weighted cepstral norm is better at this.

More examples can be found in the iPython notebook, and the reader is invited to test the equivalences in this paper on their systems/signals of choice.

## 8 Conclusions and future work

In this paper, we extended the subspace angles framework for the weighted cepstral distance from the stable, stochastic AR(MA) model case to its deterministic counterpart. We showed that reformulating the weighted distance in terms of the cepstrum of the transfer function, rather than the output cepstrum, leads to an extension that both reduces to the original stochastic case and readily generalizes to systems with deterministic inputs. We furthermore have proven that this reformulation also extends to unstable, maximum-phase systems, making use of the conformality of circle inversion.

The mixed-phase case, where both stable and unstable poles, and minimum-phase and maximum-phase zeros are present in the same model, remains an open problem. One can hope that a distance measure based on the complex cepstrum, thus taking into account phase information, would resolve this issue. Further research is needed, and will be pursued by the authors in a future paper.

For now, we have provided a way to assess phase-type and stability of the underlying dynamics of input/output signal pairs based on the raw data alone. This test not only allows us to assess whether the interpretation of the weighted cepstral distance in terms of subspace angles is valid for any given application, but also returns the cepstra of the input and output signals if it is valid. In this sense, this test is computationally ‘free’, i.e. it does not introduce any extra computations.

We have formulated a data-driven distance measure, i.e. a method that needs nothing more than the input-output signal data, that provides us with insight into the underlying dynamical systems, allowing us to assess differences in dynamics, without ever having to identify the actual generating systems.

In future works, the results of this reformulation of the cepstrum distance will be applied to several engineering applications, such as signal clustering, classification and fault detection. An approximative on-line algorithm will also be implemented.

Finally, the extension to several more general model classes (i.e. MIMO, spatio-temporal, non-linear) remains an open problem.

## Acknowledgements

O.L. is an SB PhD Fellow with the FWO. This research was supported in part by the Belgian Federal Science Policy Office: IUAP P7/19 (DYSCO, Dynamical systems, control and optimization, 2012-2017); the Flemish Government: IWT: PhD grants - Industrial Research fund (IOF) KU Leuven Internal Funds C16/15/059, C32/16/013; imec strategic funding 2017.

## References

• [1] Hirotugu Akaike. Stochastic theory of minimal realization. IEEE Transactions on Automatic Control, 19(6):667–674, 1974.
• [2] Jeroen Boets, Katrien De Cock, and Bart De Moor. A mutual information based distance for multivariate gaussian processes. In 46th IEEE Conference on Decision and Control, pages 3048–3053. IEEE, 2007.
• [3] Jeroen Boets, Katrien De Cock, Bart De Moor, Marcelo Espinoza, et al. Clustering time series, subspace identification and cepstral distances. Communications in Information & Systems, 5(1):69–96, 2005.
• [4] Bruce P Bogert, Michael JR Healy, and John W Tukey. The quefrency alanysis of time series for echoes: Cepstrum, pseudo-autocovariance, cross-cepstrum and saphe cracking. In Proceedings of the symposium on time series analysis, volume 15, pages 209–243, 1963.
• [5] E. Oran Brigham. The Fast Fourier Transform and Its Applications. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1988.
• [6] Rizwan Chaudhry, Ferda Ofli, Gregorij Kurillo, Ruzena Bajcsy, and Rene Vidal. Bio-inspired dynamic 3d discriminative skeletal features for human action recognition. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, June 2013.
• [7] Katrien De Cock, Bernard Hanzon, and Bart De Moor. On a cepstral norm for an arma model and the polar plot of the logarithm of its transfer function. Signal Processing, 83(2):439 – 443, 2003.
• [8] Katrien De Cock. Principal angles in system theory, information theory and signal processing, 2002. PhD thesis, 293 pp, KU Leuven , May 2002.
• [9] Katrien De Cock and Bart De Moor. Subspace angles and distances between arma models. In Proc. of the Intl. Symp. of Math. Theory of networks and systems, volume 1, 2000.
• [10] Katrien De Cock and Bart De Moor. Subspace angles between arma models. Systems & Control Letters, 46(4):265–270, 2002.
• [11] Harold Hotelling. Relations between two sets of variates. Biometrika, 28(3/4):321–377, 1936.
• [12] Camille Jordan. Essai sur la géométrie à dimensions. Bulletin de la Société mathématique de France, 3:103–174, 1875.
• [13] David C Kay. College Geometry: A Unified Development. CRC Press, 2011.
• [14] Oliver Lauwers and Bart De Moor. A time series distance measure for efficient clustering of input/output signals by their underlying dynamics. IEEE Control Systems Letters, 1:286–291, June 2017.
• [15] Huaping Liu, Lianzhi Yu, Wen Wang, and Fuchun Sun. Extreme learning machine for time sequence classification. Neurocomputing, 174:322–330, 2016.
• [16] Ketao Liu, Robert N Jacques, and David W Miller. Frequency domain structural system identification by observability range space extraction. Journal of Dynamic Systems, 118:211, 1996.
• [17] Richard J Martin. A metric for arma processes. Signal Processing, IEEE Transactions on, 48(4):1164–1170, 2000.
• [18] A.V. Oppenheim and R.W. Schafer. Digital Signal Processing (First edition). Prentice Hall International Editions. Prentice-Hall International, 1975.
• [19] D.B. Percival and A.T. Walden. Spectral analysis for physical applications. Cambridge University Press, 1993.
• [20] Cássio Martini Martins Pereira and Rodrigo F de Mello. Common dissimilarity measures are inappropriate for time series clustering. Revista de Informática Teórica e Aplicada, 20(1):25–48, 2013.
• [21] Payam Saisan, Gianfranco Doretto, Ying Nian Wu, and Stefano Soatto. Dynamic texture recognition. In Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on, volume 2, pages II–II. IEEE, 2001.
• [22] Peter Van Overschee and Bart De Moor. Subspace identification for linear systems: Theory, Implementation, Applications. Kluwer Academic Publishers, 1996.
• [23] P. Welch. The use of fast fourier transform for the estimation of power spectra: a method based on time averaging over short, modified periodograms. IEEE Transactions on audio and electroacoustics, 15(2):70–73, 1967.
• [24] Feng Yang, Gui-Song Xia, Gang Liu, Liangpei Zhang, and Xin Huang. Dynamic texture recognition by aggregating spatial and temporal features via ensemble {SVMs}. Neurocomputing, 173, Part 3:1310 – 1321, 2016.
• [25] Haitao Zheng and Akira Mita. Damage indicator defined as the distance between arma models for structural health monitoring. Structural Control and Health Monitoring, 15(7):992–1005, 2008.

## A The cepstrum

An iPython notebook accompanying this paper, available on GitHub\@footnotemark\@footnotetexthttps://github.com/Olauwers/Applicability-and-interpretation-of-the-deterministic-weighted-cepstral-distance, implements the proposed techniques in this paper. In this appendix, we first give a derivation of the power cepstrum in terms of poles and zeros, and then expand on the computation of the cepstrum. We do this for an output signal , but similar considerations hold for input signals and transfer functions.

### a.1 Deriving the power cepstrum

In this subsection, we explicitly perform the analytic calculation in Equation (3.2). A similar derivation can be found, for example, in [18].

Starting from equation (3.1), we can express the power spectrum as

 Φh(eiω) (A.1)

Taking the logarithm, we write

 log (Φh(eiω))=log(g) (A.2)

Using the series expansions from Equation (5.1.1) and

 log (1−xe−iω)=−log(x−1%eiωx−1eiω−1) (A.3) =−log(−x−1eiω)−log(11−x−1eiω) =−log(−x−1eiω)−∞∑n=1x−nneinω∀|x|>1,

we then find

 log(Φh(eiω))=log(g)−q∑j=1⎛⎜⎝∞∑n=1βnjne−inω+∞∑n=1¯¯¯βnjneinω⎞⎟⎠−r∑j=1(log(−¯¯¯δ−1e−iω)+log(−δ−1eiω)∞∑n=1δ−njneinω+∞∑n=1¯¯¯δ−njne−inω+∞∑n=1δ−