Efficient Measuring of Congruence on High Dimensional Time Series

# Efficient Measuring of Congruence on High Dimensional Time Series

Jörg P. Bachmann joerg.bachmann@informatik.hu-berlin.de Johann-Christoph Freytag freytag@informatik.hu-berlin.de
December 29, 2018
###### Abstract

A time series is a sequence of data items; typical examples are streams of temperature measurements, stock ticker data, or gestures recorded with modern virtual reality motion controllers. Quite some research has been devoted to comparing and indexing time series. Especially, when the comparison should not be affected by time warping, the ubiquitous Dynamic Time Warping distance function (DTW) is one of the most analyzed time series distance functions. The Dog-Keeper distance (DK) is another example for a distance function on time series which is truely invariant under time warping.

For many application scenarios (e. g. motion gesture recognition in virtual reality), the invariance under isometric spatial transformations (i. e. rotation, translation, and mirroring) is as important as the invariance under time warping. Distance functions on time series which are invariant under isometric transformations can be seen as measurements for the congruency of two time series. The congruence distance (CD) is an example for such a distance function. However, it is very hard to compute and it is not invariant under time warpings.

In this work, we are taking one step towards developing a feasable distance function which is invariant under isometric spatial transformations and time warping: We develop four approximations for CD. Two of these even satisfy the triangle inequality and can thus be used with metric indexing structures. We show that all approximations serve as a lower bound to CD. Our evaluation shows that they achieve remarkable tightness while providing a speedup of more than two orders of magnitude to the congruence distance.

## 1 Introduction

Multimedia retrieval is a common application which requires finding similar objects to a query object. We consider examples such as gesture recognition with modern virtual reality motion controllers and classification of handwritten letters where the objects are multi-dimensional time series.

In many cases, similarity search is performed using a distance function on the time series, where small distances imply similar time series. A nearest neigbor query to the query time series can be a -nearest neighbor (-NN) query or an -nearest neighbor (-NN) query: A -NN query retrieves the most similar time series; an -NN query retrieves all time series with a distance of at most .

In our examples, the time series of the same classes (e. g., same written characters or same gestures) differ by temporal as well as spatial displacements. Time warping distance functions such as dynamic time warping (DTW[14] and the Dog-Keeper distance (DK[6, 10] are robust against temporal displacements. They map pairs of time series representing the same trajectory to small distances. Still, they fail when the time series are rotated or translated in space.

The distance functions defined and analyzed in this paper measure the (approximate) congruence of two time series. Thereby, the distance between two time series and shall be iff can be transformed into by rotation, translation, and mirroring; in this case, and are said to be congruent. A value greater than shall correlate to the amount of transformation needed to turn the time series into congruent ones.

The classical Congruence problem basically determines whether two point sets are congruent considering isometric transformations (i. e., rotation, translation, and mirroring) [11, 3]. For - and -dimensional spaces, there are results providing algorithms with runtime when is the size of the sets [3]. For larger dimensionalities, they provide an algorithm with runtime . For various reasons (e. g. bounded floating point precision, physical measurement errors), the approximated Congruence problem is of much more interest in practical applications. Different variations of the approximated Congruence problem have been studied (e. g. what types of transformations are used, is the assignment of points from to known, what metric is used) [11, 3, 12, 2].

The Congruence problem is related to our work, since the problem is concerned with the existence of isometric functions such that a point set maps to another point set. The main difference is, that we consider ordered lists of points (i. e. time series) rather than pure sets. It turned out, that solving the approximated Congruence problem is NP-hard regarding length and dimensionality [5].

With this work, we contribute by evaluating the congruence distance with an implementation based on a nonlinear optimizer. We propose two approximations to the congruence distance which have linear runtime regarding the dimensionality and (quasi-) quadratic runtime regarding the length of the time series. We improve the complexity of both approximations at cost of approximation quality, such that their complexity is (quasi-) linear regarding the length of the time series. We evaluate the approximations experimentally.

### 1.1 Basic Notation

We denote the natural numbers including zero with and the real numbers with . For we denote the modulo operator by . The set of all powers of two is denoted via .

Elements of a -dimensional vector are accessed using subindices, i. e. is the third element of the vector. Sequences (here also called time series) are usually written using capital letters, e. g. is a sequence of length . Suppose , then denotes the -th element of the -th vector in the sequence . The projection to the -th dimension is denoted via , i. e. . The Euclidean norm of a vector is denoted via , thus denotes the Euclidean distance between and .

We denote the set of -dimensional orthogonal matrices with , the identity matrix with and the transposed of a matrix with . For a matrix in , we denote the matrix holding the absoloute values with .

### 1.2 Congruence Distance

While DTW compares two time series and , it is (nearly) invariant under time warpings. In detail, consider and as warpings by duplicating elements (e. g. ), then DTW minimizes the L1 distance under all time warps:

 DTW(S,T)\coloneqqminσ,τ|σ(S)|−1∑i=0d(σ(S)i,τ(T)i)

with .

On the other hand, the congruence distance is invariant under all isometric transformations. The difference to DTW is, that it minimizes the L1 distance by multiplying an orthogonal matrix and adding a vector:

 dC(S,T) \coloneqqminM,vf(M,v) \coloneqqminM,vn−1∑i=0d(si,M⋅ti+v) (1)

The computation of is an optimization problem where (cf. Equation (1.2)) corresponds to the objective function. For time series in , the orthogonality of yields a set of equality based constraints.

## 2 Approximating the Congruency

Consider two time series , an arbitrary orthogonal matrix , and a vector . Using the triangle inequality, we obtain

 d(si,M⋅sj+v)⩽ d(si,M⋅ti+v)+d(ti,tj)+d(M⋅tj+v,sj) ⇒ ∣∣d(si,sj)−d(ti,tj)∣∣⩽ d(si,M⋅ti+v)+d(sj,M⋅tj+v) (2)

i. e. we can estimate the congruence distance without actually solving the optimization problem. We unroll this idea to propose two approximating algorithms in Section 2.1 and 2.2.

Considering the well-known self-similarity matrix of a time series, the left hand side of Equation (2) matches the entry of the difference between two self-similarity matrices. Usually, the self-similarity matrix is used to analyze a time series for patterns (e. g. using Recurrence Plots [9]). The important property that makes the self-similarity matrix useful for approximating the congruence distance, is its invariance under transformations considered for the congruence distance, i. e. rotation, translation, and mirroring.

The self-similarity matrix of an arbitrary time series is defined as follows:

 ΔT  \coloneqq  (d(ti,tj))0⩽i,j

Note, that and . In fact, the self-similarity matrix completely describes the sequence up to congruence, i.e., up to rotation, translation, and mirroring of the whole sequence in [5]: Two time series and are congruent iff they have the same self-similarity matrix, i. e.

 ∃M∈MO(k),v∈Rk: S=M⋅T+v ⟺ ΔS=ΔT. (3)

### 2.1 Metric Approximation

Equation (2) and (2) yield the approach for approximating the congruency: We measure the congruency of two time series and via a metric on their self-similarity matrices.

###### Definition 2.1 (Delta Distance).

Let be two time series of length . The delta distance is defined as follows:

 dΔ(S,T)\coloneqq 12max0<δ
###### Proposition 2.2.

The delta distance satisfies the triangle inequality.

###### Proof.

Consider three time series and and fixiate a which maximizes in Definition 2.1. Then

 dδ(R,T)= n−1∑i=0∣∣d(ri,r(i+δ∗)%n)−d(ti,t(i+δ∗)%n)∣∣ ⩽n−1∑i=0∣∣d(ri,r(i+δ∗)%n)+d(si,s(i+δ∗)%n)∣∣+ ∣∣d(si,s(i+δ∗)%n)−d(ti,t(i+δ∗)%n)∣∣ ⩽dΔ(R,S)+dΔ(S,T)

prooves the triangle inequality. ∎

Since d is symmetric, inherits its symmetry. Hence, is a pseudo metric on the set of time series of length where all time series of an equivalence class are congruent to each other.

We omit providing pseudo code since the computation matches the formula in Definition 2.1. The complexity of computing the delta distance grows quadratically with the length of the time series.

Our next aim is to show that the the delta distance provides a lower bound on the congruence distance , as formulated in the following theorem.

###### Theorem 2.3.

For all time series and , the following holds:

 dΔ(S,T)  ⩽  dC(S,T).
###### Proof.

Fixiate a which maximizes in Definition 2.1. Using the triangle inequality as in Equation (2) yields

 dΔ(S,T)= 12n−1∑i=0∣∣d(si,s(i+δ∗)%n)−d(ti,t(i+δ∗)%n)∣∣ =12n−1∑i=0∣∣d(si,s(i+δ∗)%n)− d(M⋅ti+v,M⋅t(i+δ∗)%n+v)∣∣ ⩽12n−1∑i=0(d(si,M⋅ti+v)+ d(s(i+δ∗)%n,M⋅t(i+δ∗)%n+v)) =n−1∑i=0d(si,M⋅ti+v)

for arbitrary and . Hence, . ∎

In this section, we provided the delta distance, which is a metric lower bound to the congruence distance.

### 2.2 Greedy Approximation

The approach of the delta distance is simple: For time series and , it only sums up values along a (wrapped) diagonal in and chooses the largest value. However, another combination of elements within as addends might provide a better approximation of the congruence distance. Since it is a computational expensive task, to try all combinations, we try to find a good combination using a greedy algorithm for selecting the entries of .

The greedy algorithm first sorts the elements in descending order and stores them in a sequence . While iterating over the sequence , it adds to a global sum and masks the indices and as already seen. Elements in the queue which access already seen indices are skipped, thus each index is used at most once. Basically, this is the reason, why the greedy delta distance (denoted as ) is a lower bound to the congruence distance. Theorem 2.4 proves the last statement and Algorithm 1 provides the pseudo code for the computation.

The complexity is dominated by sorting elements, which takes steps.

###### Theorem 2.4.

For all time series and , the following holds:

 dG(S,T)  ⩽  dC(S,T).
###### Proof.

Let be the list of elements from the queue in Algorithm 1 which have not been skipped. Since each index is appears at most once in this list, the following inequality holds for arbitrary orthogonal matrices and vectors :

 dG(S,T)=r∑a=1dia,ja ⩽r∑a=1∣∣d(sia,sja)−d(M⋅tia+v,M⋅tja+v)∣∣ ⩽r∑a=1d(sia,tia)+d(M⋅tia+v,M⋅tja+v) ⩽n−1∑i=0d(si,M⋅ti+v)

Hence, . ∎

### 2.3 Runtime improvement

The complexity of the delta distance and greedy delta distance is linear regarding the dimensionality but quadratic in length. In this section, we motivate an optimization for both algorithms.

Time series usually do not contain random points, but they come from continuous processes in the real world, i. e. the distance between two successive elements is rather small. Hence, the distances and are probably close to each other if , i. e. if is much larger than . This insight leads to the idea, to only consider elements where is a power of two, i. e. we consider less elements for larger temporal distances.

#### The Fast Delta Distance:

Adapting the idea to the delta distance yields the following definition.

###### Definition 2.5 (Fast Delta Distance).

Let be two time series of length . The fast delta distance is defined as follows:

 ~dΔ(S,T)\coloneqq 12max0<δ

Since we omit some values in Definition 2.1, the fast version is a lower bound to , i. e. the following theorem holds:

###### Theorem 2.6.

For all time series and , the following holds:

 ~dΔ(S,T)  ⩽  dΔ(S,T)

Especially, the fast delta distance is also a lower bound to the congruence distance. For time series of length the complexity of the fast delta distance improves to . On the other hand, equivalence classes regarding the fast delta distance might include time series which are not congruent.

#### The Fast Greedy Delta Distance:

Incorporating the idea for improving the runtime into the greedy delta distance simply changes Line  of Algorithm 1: We only consider values for the variable , which add a power of to the variable . Algorithm 2 provides the line to change in Algorithm 1 in order to achieve the fast greedy delta distance.

The fast greedy delta distance is again dominated by the sorting of elements. This time, elements have to be sorted, thus its complexity is . Hence, the fast versions both have quasi linear runtime regarding length and linear runtime regarding dimensionality.

An inequality such as in Theorem 2.6 does not exist for the fast greedy delta distance. Also, there is no correlation between the (fast) delta distance and the (fast) greedy distance. Though, the evaluation shows that the greedy delta distance provides a much better approximation in most cases. Call for Section 3 for an evaluation of their tightness to the congruence distance.

## 3 Evaluation

Since the exact computation of the congruence distance is a computational hard problem (an thus not feasable in practical applications), we are mainly interested in the evaluation of the approximations. Unfortunately, there is no direct algorithm for the computation of the congruence distance and we have to consider the computation of the congruence distance as a nonlinear optimization problem. For two time series and , we will denote the distance value computed by an optimizer with . Since an optimizer might not find the global optimum, all values for the congruence distance (computed by an optimizer) in this section, are in fact upper bounds to the correct but unknown value of the congruence distance, i. e. . This given circumstance complicates the evaluation of our approximations to the congruence distance.

To estimate the tightness of the approximations, we first evaluate our optimizer on problems for which we know the correct results (cf. Section 3.1). In those cases, where the error of the optimizer is small, the estimation of the tightness of our approximations is accurate. On the other hand, when the error of the optimizer increases, our estimation of the tightness of our approximations are loose and the approximation might be tighter than the experiments claim.

For a detailed explanation, consider a lower bound for the congruence distance (e. g. might be one of , , , or ) and suppose , i. e. is the error of the optimizer. Then, we have the following correlation between the estimated tightness and the real tightness:

 ℓ(S,T)dC(S,T) =ℓ(S,T)dO(S,T)−ε⩾ℓ(S,T)dO(S,T)

Hence, for small errors , the estimated tightness is accurate and for large errors we underestimate the tightness. In Section 3.2 and 3.3, we evaluate the tightness and the speedup of approximations to the (optimizer based) congruence distance, respectively.

### 3.1 Congruence Distance: An Optimization Problem

Consider fixed time series and in with length . The congruence distance is a nonlinear optimization problem with equality based constraints. The function to minimize is

 f(M,v) =n−1∑i=0d(si,M⋅ti+v)

while the equality based constraints correspond the the constraints for orthogonal matrices:

 M⋅MT =I.

As a initial “solution” for the optimizer, we simply choose and .

We manually transformed time series with a random orthogonal matrix and a small random vector and solved the optimization problem to examine whether our optimizer is working properly. Clearly, we expect the optimizer to find a solution with value . Whenever the optimizer claimed large distance values, we concluded that the optimizer is not working. We tried different optimizer strategies and chose an augmented lagrangian algorithm [7] with the BOBYQA algorithm [13] as local optimizer for further experiments because it promised the best performance with these experiments. We used the implementations provided by the NLopt library [1].

We used the RAM dataset generator [4] to evaluate the optimizer on time series with varying dimensionality. We solved optimization problems for varying dimensionality and removed all of those runs where the optimizer did not find a reasonable solution (i. e. runs where the optimizer yielded solutions larger than ). Figure 1 shows the distance values proposed by the optimizer (and therefore the error it makes) per dimensionality up to dimensionality . For higher dimensionalities, the optimizer completely failed to find any reasonable value near although we gave it enough resources of any kind (e. g. number of iterations, computation time, etc.). Figure 1 also shows that the computation times rapidly increase with increasing dimensionality. Because of the raising error and runtime with increasing dimensionality, an evaluation of the congruence distance on higher dimensionality is not feasable. Hence, we can only consider up to -dimensional time series in all further experiments.

### 3.2 Tightness of Approximations

In order to evaluate the tightness of the (fast) delta distance and (fast) greedy delta distance as lower bounds to the congruence distance, we used the RAM dataset generator [4] as well as a real world dataset with 2-dimensional time series (character trajectories [8], contains over 2800 time series). Other real world datasets with higher dimensionality have not been suitable because the optimizer failed to compute the congruence distance.

Since making the (greedy) delta distance time warping aware is future work, we have to deal with time warping another way. We simply preprocess our datasets, such that each time series, seen as a trajectory, moves with constant speed, i. e. for each dewarped time series, the following holds:

 d(ti,ti+1)≈d(ti+1,ti+2).

We achieve this property by simply reinterpolating the time series regarding the arc length.

Figure 2 shows the tightness of the approximations on RAM datasets. As we expected, the greedy delta distance provides the tightest approximation to the congruence distance (provided by our optimizer).

As we observed in Section 3.1, the error of our optimizer increases with increasing dimensionality. Hence, the tightness of the optimizer to the real congruence distance is decreasing. Since we can observe a similar behaviour here (the tightness of the approximation is decreasing with increasing dimensionality), the reason might be the inaccuracy of the optimizer. Either way, we can see that the tightness is above in most cases. Especially when using the greedy delta distance, the tightness is above in most cases.

On the character trajectories dataset, the delta distance and the greedy delta distance achieved a tightness of and , respectively.

### 3.3 Speedup of Approximations

Figure 3 shows the speedup of the approximations to the optimizer. As expected, the speedup increases exponentially with increasing dimensionality. While the fast delta distance is the fastest algorithm, it also provides the worst approximation (compare with Figure 2). On the other hand, the greedy delta distance provides the best approximation while being the slowest algorithm. Still, the greedy delta distance is multiple orders of magnitudes faster than our optimizer.

The following speedups have been achieved on the character trajectory dataset: with the delta distance; with the fast delta distance; with the greedy delta distance; with the fast greedy delta distance. The results are similar to those on the RAM generated datasets.

## 4 Conclusion and Future Work

In this paper, we analyzed the problem of measuring the congruence between two time series. We provided four measures for approximating the congruence distance which are at least orders of magnitude faster than the congruence distance. The first (namely, the delta distance) provides the additional ability to be used in metrix indexing structures. The second (greedy delta distance) loses this benefit, but seems to achieve a better approximation. Both approximations have linear complexity regarding the dimensionality but at least quadratic complexity regarding the length of the time series. The other two approximations address this problem at a cost of approximation quality. They have quasi-linear runtime regarding the length.

In practical applications, time series distance functions need to be robust against time warping. The approximations provided in this work are based on comparing self-similarity matrices of time series. Based on this idea, our next step is to develop a time warping distance function measuring the congruency.

## References

• [1] Nlopt. Accessed: 2018-05-15.
• [2] H. Alt and L. J. Guibas. Discrete geometric shapes: Matching, interpolation, and approximation: A survey. Technical report, Handbook of Comput. Geometry, 1996.
• [3] H. Alt, K. Mehlhorn, H. Wagener, and E. Welzl. Congruence, similarity and symmetries of geometric objects. Discrete Comput. Geom., 3(3):237–256, Jan. 1988.
• [4] J. P. Bachmann and J. Freytag. High dimensional time series generators. CoRR, abs/1804.06352, 2018.
• [5] J. P. Bachmann, J. Freytag, B. Hauskeller, and N. Schweikardt. Measuring congruence on high dimensional time series. CoRR, abs/1805.10697, 2018.
• [6] J. P. Bachmann and J.-C. Freytag. Dynamic Time Warping and the (Windowed) Dog-Keeper Distance, pages 127–140. Springer International Publishing, Cham, 2017.
• [7] E. G. Birgin and J. M. Martínez. Improving ultimate convergence of an augmented lagrangian method. Optimization Methods Software, 23(2):177–195, Apr. 2008.
• [8] D. Dheeru and E. Karra Taniskidou. UCI machine learning repository, 2017.
• [9] J.-P. Eckmann, S. O. Kamphorst, and D. Ruelle. Recurrence plots of dynamical systems. EPL (Europhysics Letters), 4(9):973, 1987.
• [10] T. Eiter and H. Mannila. Computing discrete fréchet distance. Technical report, Technische Universität Wien, 1994.
• [11] P. J. Heffernan and S. Schirra. Approximate decision algorithms for point set congruence. In Proc. of the Symposium on Comput. Geometry, SCG, pages 93–101. ACM, 1992.
• [12] P. Indyk and S. Venkatasubramanian. Approximate congruence in nearly linear time. Comput. Geom. Theory Appl., 24(2):115–128, Feb. 2003.
• [13] M. J. D. Powell. The bobyqa algorithm for bound constrained optimization without derivatives. 01 2009.
• [14] H. Sakoe and S. Chiba. Readings in speech recognition. In A. Waibel and K.-F. Lee, editors, Readings in Speech Recognition, chapter Dynamic Programming Algorithm Optimization for Spoken Word Recognition, pages 159–165. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1990.
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters