# Video-Based Action Recognition Using Rate-Invariant Analysis of Covariance Trajectories

Abstract

Statistical classification of actions in videos is mostly performed by extracting relevant features, particularly covariance features, from image frames and studying time series associated with temporal evolutions of these features. A natural mathematical representation of activity videos is in form of parameterized trajectories on the covariance manifold, i.e. the set of symmetric, positive-definite matrices (SPDMs). The variable execution-rates of actions implies variable parameterizations of the resulting trajectories, and complicates their classification. Since action classes are invariant to execution rates, one requires rate-invariant metrics for comparing trajectories. A recent paper represented trajectories using their *transported square-root vector fields* (TSRVFs), defined by parallel translating scaled-velocity vectors of trajectories to a reference tangent space on the manifold. To avoid arbitrariness of selecting the reference and to reduce distortion introduced during this mapping, we develop a purely intrinsic approach where SPDM trajectories are represented by redefining their TSRVFs at the starting points of the trajectories, and analyzed as elements of a vector bundle on the manifold. Using a natural Riemannain metric on vector bundles of SPDMs, we compute geodesic paths and geodesic distances between trajectories in the quotient space of this vector bundle, with respect to the re-parameterization group. This makes the resulting comparison of trajectories invariant to their re-parameterization. We demonstrate this framework on two applications involving video classification: visual speech recognition or lip-reading and hand-gesture recognition. In both cases we achieve results either comparable to or better than the current literature.

Keywords: Action recognition, covariance manifold, trajectories on manifolds, vector bundles, rate-invariant classification

## 1Introduction

The problem of classification of human actions or activities in video sequences is both important and challenging. It has applications in video surveillance, lip reading, pedestrian tracking, hand-gesture recognition, manufacturing quality control, human-machine interfaces, and so on. Since the size of video data is generally very high, the task of video classification is often performed by extracting certain low-dimensional features of interest – geometric, motion, colorimetric features, etc – from each frame and then forming temporal sequences of these features for full videos. Consequently, analysis of videos get replaced by modeling and classification of longitudinal observations in a certain feature space. (Some papers discard temporal structure by pooling all feature together but that represents a severe loos of information.) Since many features are naturally constrained to lie on nonlinear manifolds, the corresponding video representations form parameterized trajectories on these manifolds. Examples of these manifolds include unit spheres, Grassmann manifolds, tensor manifolds, and the space of probability distributions.

One of the most commonly used and effective feature in image analysis is a covariance matrix, as shown via applications in medical imaging [1] and computer vision [3]. These matrices are naturally constrained to be symmetric positive-definite matrices (SPDMs) and have also played a prominent role as region descriptors in texture classification, object detection, object tracking, action recognition and face recognition. Tuzel et al. [3] introduced the concept of *covariance tracking* where they extracted a covariance matrix for each video frame and studied the temporal evolution of this matrix in the context of pedestrian tracking in videos. Since the set of SPDMs is a nonlinear manifold, denoted by , a whole video segment can be represented as a (parameterized) trajectory on . In this paper we focus on the problem of classification of actions or activities by treating them as parameterized trajectories on . The two specific applications we will study are: visual-speech recognition and hand-gesture classification. Fig. ? illustrates examples of video frames for these two applications.

One challenge in characterizing activities as trajectories comes from the variability in execution rates. The execution rate of an activity dictates the parameterization of the corresponding trajectory. Even for the same activity performed by the same person, the execution rates can potentially differ a lot. Different execution rate implies that the corresponding trajectories go through the same sequences of points in but have different parameterizations. Directly analyzing such trajectories without alignment, e.g. comparing the difference, calculating point-wise mean and covariance, can be misleading (the mean is not representative of individual trajectories, and the variance is artificially inflated).

To make these issues precise, we develop some notation first. Let be a trajectory and let be a positive diffeomorphism such that and . This plays the role of a time-warping function, or a re-parameterization function, so that the composition is now a time-warped or re-parameterized version of . In other words, the trajectory goes through the same set of points as but at a different rate (speed).

**Pairwise Registration**: Now, let be two trajectories on . The process of registration of and is to find a warping such that is optimally registered to for all . In order to ascribe a meaning to optimality, we need to develop a precise criterion.**Groupwise or Multiple Registration**: This problem can be extended to more than two trajectories: let be trajectories on , and we want to find out time warpings such that for all , the variables are optimally registered. A solution for pairwise registration can be extended to the multiple alignment problem as follows – for the given trajectories, first define a*template*trajectory and then align each given trajectory to this template in a pairwise fashion. One way of defining this template is to use the mean of given trajectories under an appropriately chosen metric.

Notice that the problem of *comparisons of trajectories* is different from the problem of *curve fitting* or *trajectory estimation* from noisy data. Many papers have studied spline-type solutions for fitting curves to discrete, noisy data points on manifolds [9] but in this paper we assume that the trajectories are already available through some other means.

### 1.1Past Work & Their Limitations

There are very few papers in the literature for analyzing, in the sense of comparing, averaging or clustering, trajectories on nonlinear manifolds. Let denote the geodesic distance resulting from the chosen Riemannian metric on . It can be shown that the quantity forms a proper distance on the set , the space of all trajectories on . For example, [14] uses this metric, combined with the arc-length distance on , to cluster hurricane tracks. However, this metric is not immune to different temporal evolutions of hurricane tracks. Handling this variability requires performing some kind of temporal alignment. It is tempting to use the following modification of this distance to align two trajectories:

but this can lead to degenerate solutions (also known as the *pinching problem*, described for real-valued functions in [15]). Pinching implies that a severely distorted is used to eliminate (or minimize) those parts of that do not match with , which can be done even when is mostly different from . While this degeneracy can be avoided using a regularization penalty on , some of the other problems remain, including the fact that the solution is not symmetric.

A recent solution, presented in [16], develops the concept of elastic trajectories to deal with the parameterization variability. It represents each trajectory by its transported square-root vector field (TSRVF) defined as:

where is pre-determined reference point on and denotes a parallel transport of the vector from the point to *along a geodesic path*. This way a trajectory can be mapped into the tangent space and one can compare/align them using the norm on that vector space. More precisely, the quantity provides not only a criterion for optimality of but also approximates a proper metric for averaging and other statistical analyses. (The exact metric is based on the use of semigroup , the set of all absolutely continuous, weakly increasing functions, rather than . We refer the reader to [16] for details.) This TSRVF representation is an extension of the SRVF used for elastic shape analysis of curves in Euclidean spaces [17]. One limitation of this framework is that the choice of reference point, , is left arbitrary. The results can potentially change with and make it difficult to interpret the results. A related, and bigger issue, is that the transport of tangent vectors to , along geodesics, can introduce large distortion, especially when the trajectories are far from on the manifold.

### 1.2Our Approach

We present a different approach that does not require a choice of . Here the trajectories are represented by their transported vector fields but without a need to transport them to a global reference point. For each trajectory , the reference point is chosen to be its starting point , and the transport is performed along the trajectory itself. In other words, for each , the velocity vector is transported along to the tangent space of the starting point . This idea has been used previously in [11] and others for mapping trajectories into vector spaces, and results in a relatively stable curve, with smaller distortion than the TSRVFs of [16]. We then develop a metric-based framework for comparing, averaging, and modeling such curves in a manner that is invariant to their re-parameterizations. Consequently, this framework provides a natural solution for removal of rate, or *phase*, variability from trajectory data.

Another issue is the choice of Riemannian metric on . Although a larger number of Riemannian structures and metrics have been used for in past papers and books [18], they do not provide all the mathematical tools we will need for ensuing statistical analysis. Consequently, we use a different Riemannian structure than those previous papers, a structure that allows relevant mathematical tools for applying the proposed framework.

The rest of this paper paper is organized as follows. In Section 3, we introduce a framework of aligning, averaging and comparing of trajectories on a general manifold . Since we mainly focus on , in Section 2, we introduce the details of the Riemannian structure on used to implement the comparison framework described in Section 3. In Section 4, we demonstrate the proposed work with real-world action recognition data involving two applications: lip-reading and hand-gesture recognition.

## 2Riemannian Structure on

Here we will discuss the geometry of and impose a Riemannian structure that facilitates our analysis of trajectories on . Most of the background material is derived in Appendix Appendix A with only the final expressions noted here. For a beginning reader in differential geometry, we strongly recommend reading Appendix A first.

We start by choosing an appropriate *Riemannian metric* on . Then, from the resulting structure, we derive expressions for the following: (1) *geodesic paths* between arbitrary points on ; (2) *parallel transport* of tangent vectors along geodesic paths; (3) *exponential map*; (4) *inverse exponential map*; and (5) *Riemannian curvature tensor*. Several past papers have studied the space of SPDMs as a nonlinear manifold and imposed a metric structure on that manifold [2]. While they mostly focus on defining distances, a few of them originate from a Riemannian structure with expressions for geodesics and exponential maps. However, they do not provide expressions for all desired items (parallel transport and Riemannian curvature tensor). In this section, we utilize a particular Riemannian structure on , which is similar but not identical to the one in [2], and is particularly convenient for our purposes. This Riemannian structure has been used previously for other applications such as spline-fitting in [13].

Let be the space of SPDMs, and let be its subset of matrices with determinant one. The tangent spaces of and at , where is the identity matrix, are and . The exponential map at can be shown to be the standard matrix exponential: for any (), there is () such that (), where denotes the matrix (Lie) exponential; this relationship is one-to-one. Our approach is first to identify the space with the quotient space and borrow the Riemannian structure from the latter directly. Then, we straightforwardly extend the Riemannian structure on to . The Riemannian geometries of and its quotient space are discussed in Appendix A. As described there, the Riemannian metric at any point is defined by pulling back the tangent vectors under to , and then using the metric (see Equation 4). This definition leads to expressions for exponential map, its inverse, parallel transport of tangent vectors, and the Riemannian curvature tensor on . It also induces a Riemannian structure on the quotient space in a natural way because it is invariant to the action of on .

### 2.1Riemannian Structure on

To make the connection with , we state a useful result termed the *polar decomposition* of square matrices. Recall that for any square matrix , one can decompose it uniquely as where is a SPDM with determinant one and . We note in passing that this fact makes a section of under the action of . (If a group acts on a manifold, then a section of that action is defined to be a subset of the manifold that intersects each orbit of the action in at most one point. It is easy to see that satisfies that condition for the action of on . For any orbit , intersects that orbit in only one point given by such that .) We also note that this section is *not* an orthogonal section since it is not perpendicular to the orbit, i.e., the tangent space is not perpendicular to the tangent space . We will identify with the quotient space via a map defined as:

for any . One can check that this map is well defined and is a diffeomorphism. This square-root is the symmetric, positive-definite square-root of a symmetric matrix. One can verify that lies in by letting (polar decomposition), and then . The inverse map of is given by: . This establishes a one-to-one correspondence between the quotient space and . We will use the map to push forward the chosen Riemannian metric from the quotient space to .

: With that induced Riemannian metric, we can derive the geodesic path and the geodesic distance between any and in . The idea is to pullback these points into the quotient space using , compute the geodesic there and then map the result back to using . As mentioned earlier and . The expression for geodesic between these points in the quotient space is given in Appendix A. We compute such that for some . (Let . Note that , and the rotation matrix brings to .) The corresponding geodesic in the quotient space () is given by and, therefore, the desired geodesic in is

The corresponding geodesic distance is

: To determine the parallel transport along the geodesic path , we recall from Appendix A that the two orthogonal subspaces of : and , which we call the *vertical* and the *horizontal* tangent subspaces, respectively. We identify the tangent space with the horizontal subspace in , so that .

Now let be a tangent vector that needs to be translated along a geodesic path from to , given by . Similar to the case of quotient space in Appendix A, let such that is identified with . In the quotient space, the parallel transport of along the geodesic is a vector field . In , the geodesic is obtained by the forward map , . Therefore, the parallel transport vector field in along the geodesic is the image of under :

where and .

: Let and be three vectors on , then the Riemannian curvature tensor on with these arguments can be calculated in the follows. Let and be elements in such that , and , the tensor is given by:

We summarize these mathematical tools for trajectories analysis on the manifold :

**Exponential map:**Give a point and a tangent vector , the exponential map is given as:**Geodesic distance**: For any two points , the geodesic distance between them in is given by: where and .**Inverse exponential map:**For any , the inverse exponential map can be calculated using the formula:**Parallel transport:**Given and a tangent vector , the tangent vector at which is the parallel transport of along the shortest geodesic from to is: , where and .**Riemannian curvature tensor**: For any point and tangent vectors and , the Riemannian curvature tensor is given as: , where and .

### 2.2Extension of Riemannian Structure to

We now extend the Riemannian structure on to . Since for any we have , we can express with . Thus, is identified with the product space of . Moreover, for any smooth function on , the Riemannian metric , where and denote the Riemannian metrics on and respectively, gives the structure of a “warped” Riemannian product. In this paper we set .

: Using the above Riemannian metric () on , the resulting squared distance on between and is:

For two points and , let for some , and , we have . Therefore, the resulting squared geodesic distance between two SPDMs and is

Next, let denote the geodesic in , where is a geodesic on and is a geodesic in . The geodesic from to in is given by from earlier derivation, and the geodesic , a line segment from to , is . Therefore, with simple calculation, the matrices in corresponding to the geodesic path are .

: If is a tangent vector to at , where is identified with and , we can express as with being a tangent vector of at . The parallel transport of is the parallel transport of each of its two components in the corresponding space, and the transport of is itself. The other part, i.e. the parallel translation of in has been dealt earlier.

: By identifying the tangent vector , the non-zero curvature tensors are just the part in . Therefore, the curvature tensor for , and on is the same as .

Again, we summarize these mathematical tools on :

**Exponential map:**Give and a tangent vector . We denote , where , and . The exponential map is given as , where**Geodesic distance**: For any , the squared geodesic distance between them is : , where .**Inverse exponential map:**For any , the inverse exponential map , where and .**Parallel transport:**For any and a tangent vector , the parallel transport of along the geodesic from to is: , where and .**Riemannian curvature tensor**: For tangent vectors and , the Riemannian curvature tensor is the same as .

### 2.3Differences From Past Riemannian Structures

Pennec et al. [2] and others [21] and even others before them have utilized a Riemannian structure on that is commonly known as *the Riemanian metric* in the literature. Here we clarify the difference between our Riemannian structure and the structure used in these past papers.

We have used a natural bijection given by to inherit the Riemannian structure from the quotient space to . Its inverse is given by . There is another bijection between these same two sets, defined by , and with the inverse . (Note that an element of can be raised to any positive power in a well-defined way by diagonalizing.) If we use to transfer the Riemannian structure from the quotient space onto , we will reach the one used in earlier papers. In other words, the structure on the quotient space is the same for both cases, only the inherited structure on is different due to the different bijections. Note that for all , . It follows that the map from defined by forms an isometry with our metric on one side and the older metric on the other.

The Riemannian metric on (trace metric on pulled-back matrices at identity) is invariant under right translation by elements of and under left translation by elements of . It follows that the induced metric on the quotient space is invariant under the natural left action of on . (This action is given by .) It is a well-known fact that, up to a fixed scalar multiple, this is the only metric on that is invariant under the action by . This aspect is same for both the cases. If we push forward the action of on the quotient space via , the resulting action of on is given by , but if we push forward the action via , the resulting action of on is given by .

Given a simple relationship between the two Riemannian structures, one may conjecture that the relevant formulae (parallel transport, curvature tensor, etc) can be mapped from one setup to another. While this is true in principle, the practice is much harder since it involves the expression for the differential of this map which is somewhat complicated.

## 3Analysis of Trajectories on

Now that we have expressions for calculating relevant geometric quantities on , we return to the problem of analyzing trajectories on . In the following we derive the framework for a general Riemannian manifold , keeping in mind that in our applications.

### 3.1Representation of Trajectories

Let denote a smooth trajectory on a Riemannian manifold and denote the set of all such trajectories: . Define to be the set of all orientation preserving diffeomorphisms of : is a diffeomorphism. forms a group under the composition operation. If is a trajectory on , then is a trajectory that follows the same sequence of points as but at the evolution rate governed by . More technically, the group acts on , , according to .

We introduce a new representation of trajectories that will be used to compare and register them. We assume that for any two points , we have a mechanism for parallel transporting any vector **along** from to , denoted by .

Note the difference in this definition from the one in [16] where the parallel transport was along geodesics to a reference point . Here the parallel transport is along and to the starting point . The concept of parallel transportation of velocity vector along trajectories has also been used previously in [11] and others. This reduces distortion in representation relative to the parallel transport of [16] to a faraway reference point.

This TSRVF representation maps a trajectory on to a curve in . What is the range space of all such mapping? For any point , denote the set of all square-integrable curves in as . The space of interest, then, becomes an infinite-dimensional vector bundle , which is the indexed union of for every . Note that the TSRVF representation is bijective: any trajectory is uniquely represented by a pair , where is the starting point, is its TSRVF. We can reconstruct the trajectory from using the covariant integral defined later (in Algorithm ?).

### 3.2Riemannian Structure on

In order to compare trajectories, we will compare their corresponding representations in and that requires a Riemannian structure on the latter space. Let be two trajectories on , with starting points and , respectively, and let the corresponding TSRVFs be and . Now are represented as two points in the vector bundle over . This representation space is an infinite-dimensional bundle, whose fiber over each point in is .

We impose the following Riemannian structure on . For an element in , where , , we naturally identify the tangent space at to be: . To see this, suppose we have a curve in given by , . The velocity vector to this curve at is given by , where denotes , and denotes *covariant differentiation of tangent vectors*. The Riemannian inner product on is defined in an obvious way: If and are both elements of , define

where the inner products on the right denote the original Riemannian metric in .

Next, for given two points and on , we are interested in finding the geodesic path connecting them. Let be a path with and . We have the following characterization of geodesics on .

: We will prove this theorem in two steps.

(1) First, we consider with a simpler case where the space of interest is the tangent bundle of the Riemannian manifold . An element of is denoted by , where and . It is natural to identify . The Riemannian inner product on is defined in the obvious way: If and are both elements of , define

and, again, the inner products on the right denote the original Riemannian metric on . Suppose we have a path in given by . We define the energy of this path by

The integrand is the inner product of the velocity vector of the path with itself. It is a standard result that a geodesic on can be characterized as a path that is a critical point of this energy function on the set of all paths between two fixed points in . To derive local equations for this geodesic, we now assume we have a parameterized family of paths denoted by , where is the parameter of each individual path in the family (as above) and the variable tells us which path in the family we are in. Assume and takes values on for some small . We want all the paths in this family to start and end at the same points of , so assume that and are constant functions of . The energy of the path with index is given by:

To simplify notation in what follows, we will write for and for . To establish conditions for to be critical, we take the derivative of with respect to at :

We will use two elementary facts: (a) and (b) , without presenting their proofs. Plugging these facts into the above calculation then makes equal to:

The second equality comes from using integration by parts on the first and third term, taking into account the fact that and vanish at , (since all the paths begin and end at the same point). Now, using the standard identities and , we obtain: equals

Now, is critical for if and only if for every possible variation of and of , which is clearly true if and only if

Thus we have derived the geodesic equations for .

(2) Now we consider the case of the infinite dimensional vector bundle whose fiber over is , . A point in is denoted by , where the variable corresponds to the -parameter in . The tangent space to at is . Suppose and are elements of this tangent space and we use the Riemannian metric:

Now we want to work out the local equations for geodesics in . A path in is denoted by . The energy calculation is basically the same as above but surround everything with integration with respect to . So, it starts out with

(Of course does not involve the parameter , but surrounding it with does not change its value!)

In order to do the variational calculation, we now consider a parametrized family of such paths, denoted by where we assume that and are constant functions of , and for each , and are constant functions of , since we want every path in our family to start and end at the same points of .

Then, following through the computation exactly as in earlier case, we obtain

In order for our path to be critical for , must vanish for every variation of and of , which is clearly true if and only if

The geodesic path can be intuitively understood as follows: (1) is a baseline curve on connecting and , and the covariant differentiation of at the tangent space of equals the negative integral of the Riemannian curvature tensor with respect to . In other words, values of at each equally determine the geodesic acceleration of in the first equation. (2) The second equation leads to a fact that is covariant linear, i.e. and for every and . For a geodesic path connecting and , it is natural to let and , where and represent the parallel transport of and along to , and is the difference between the TSRVFs and in , defined as . In Fig. ?, we illustrate the geodeisc path between two trajectories on a simpler manifold . In each case, the yellow solid line denotes the baseline and the intermediate lines are the covariant integrals (in Algorithm ?) of with starting point . As comparison, the dash yellow line shows the geodesic between the starting points and on .

Theorem ? is only a characterization of geodesics but does not provide explicit expressions for them. In the following section, we develop a numerical solution for constructing geodesics on .

### 3.3Numerical Computation of Geodesic in

There are two main approaches in numerical construction of geodesic paths on manifolds. The first approach, called *path-straightening*, initializes with an arbitrary path between the given two points on the manifold and then iteratively “straightens” it until a geodesic is reached. The second approach, called the *shooting method*, tries to “shoot” a geodesic from the first point, iteratively adjusting the shooting direction, so that the resulting geodesic passes through the second point. In this paper, we use the shooting method to obtain the geodesic paths on .

In order to implement the shooting method, we need the exponential map on . Given a point and a tangent vector , the exponential map for gives a geodesic path on . Equation ? helps us with this construction as follows. The two equations essentially provide expressions for second-order covariant derivatives of and components of the path. Therefore, using numerical techniques, we can perform covariant integration of these quantities to recover the path itself.

Once we have a numerical procedure for the exponential map, we can establish the shooting method for finding geodesics. Let be the starting point and be the target point, the shooting method iteratively updates the tangent or shooting vector on such that . Then, the geodesic between and is given by , . The key step here is to use the current discrepancy between the point reached, , and the target, , to update the shooting vector