# ImitationFlow: Learning Deep Stable Stochastic Dynamic Systems by Normalizing Flows

## Abstract

We introduce ImitationFlow, a novel Deep generative model that allows learning complex globally stable, stochastic, nonlinear dynamics. Our approach extends the Normalizing Flows framework to learn stable Stochastic Differential Equations. We prove the Lyapunov stability for a class of Stochastic Differential Equations and we propose a learning algorithm to learn them from a set of demonstrated trajectories. Our model extends the set of stable dynamical systems that can be represented by state-of-the-art approaches, eliminates the Gaussian assumption on the demonstrations, and outperforms the previous algorithms in terms of representation accuracy. We show the effectiveness of our method with both standard datasets and a real robot experiment.

bcBCBehavioural Cloning \newacronymirlIRLInverse Reinforcement Learning \newacronymlfdLfDLearning from Demonstration \newacronymemEMExpectation Maximization \newacronymprompProMPProbabilistic Movement Primitives \newacronymdmpDMPDynamic Movement Primitives \newacronymsedsSEDSStable Estimator of Dynamical Systems \newacronymgmrGMRGaussian Mixture Regressor \newacronymgprGPRGaussian Process Regressor \newacronymlwrLWRLocally Weighted Regressor \newacronymkmpKMPKernelized Movement Primitives \newacronymclfCLFControl Lyapunov Function \newacronymwsaqfWSAQFWeighted Sum of Asymmetric Quadratic Function) \newacronymnilcNILCNeurally Imprinted Lyapunov Candidate \newacronymclfdmCLF-DMControl Lyapunov Function-based Dynamic Movements \newacronymgclGCLGuided Cost Learning \newacronymmleMLEMaximun Likelihood Estimation \newacronymsdeSDEStochastic Differential Equation \newacronymodeODEOrdinary Differential Equation \newacronymprobsProbSProbabilistic Segmentation \newacronymcrfCRFConditional Random Fields \newacronymppcaPPCAProbabilistic Principal Component Analysis \newacronymgmccGMCCGeneralized Multiple Correlation Coeficcient \newacronymhriHRIHuman-Robot Interaction \newacronymipIPInteraction Primitives \newacronymhmmHMMHidden Markov Model \newacronymcacCACCanonical Correlation Coefficient \newacronymrv Coefficient \newacronymdcordCorDistance Correlation \newacronymdtwDTWDynamic Time Warping \newacronymedrEDREdit Distance With Real Penalty \newacronymtwedTWEDTime Warp Edit Distance \newacronymr2Coefficient of Determination \newacronymsqpSQPSuccessive Quadratic Programming \newacronymrkhsRKHSReproducing Kernel Hilbert Space \newacronymicnnICNNInput-Convex Neural Network \newacronympcaPCAPrincipal Component Analysis \newacronymmafMAFMasked Autoregressive Flow \newacronymiafIAFInverse Autoregressive Flow

## I Introduction

To have fully autonomous robots in real-world scenarios, robots need to be able to perform a wide variety of complex tasks and skills. However, programming a robot to perform complex motion is often a time-consuming task, which requires both expertise and domain knowledge. Imitation Learning tackles this problem and provides robotics non-expert users the capability to teach robots complex trajectories providing a few demonstrations [26, 1].

Given a set of demonstrations, the objective of Imitation learning is to find a generative model that produces trajectories similar to the demonstrated ones. A correct selection of the model will lead to a successful and safe imitation. These generative models are called Movement Primitives. Movement primitives can be divided into three different categories: time-dependant, state-dependant, and time-state dependant movement primitives.

In this work, we focus on state dependant motion primitives as they are robust to both spatial and temporal perturbation. This class of motion primitives is complementary to the time-dependent motion primitives approaches, such as the \glsdmp [27] and the \glspromp [18] approaches, that are particularly well suited when the movement presents a clear temporal (or phase) dependency, while robustness to perturbation is not a major concern.

One of the main issues of state-dependent movement primitives is to learn stable dynamics: While several models exist [25, 3, 4]; most of them are not good ensuring stable dynamics out of the region of the demonstrated trajectories. This issue limits the applications of these methods to real-world scenarios, as it can be difficult to ensure the correct execution of the learned motion. This issue has been faced by the \glsseds algorithm [6]. In this model, the global asymptotically stability is ensured by a quadratic Lyapunov function. Due to the proposed Lyapunov function, the learned dynamics are restricted to continuously decreasing distance towards the attractor. In order to overcome the limitation of \glsseds, different approaches were proposed [13, 7, 22, 16, 19, 21, 28, 10]. However, except for [29], all these models consider deterministic models. Also, all these works consider only point-to-point dynamics.

Contributions In this paper, we present ImitationFlow, a novel approach for learning stable stochastic nonlinear dynamical systems. Our methodology merges Deep generative models like Normalizing Flows [24, 2, 17] with stable dynamical systems modeling.

Our approach not only is capable of representing a wider class of dynamical systems w.r.t. previous works in the field, but it can also represent arbitrary complex densities, removing the Gaussian noise assumption of the demonstrated trajectories. The proposed model can describe both strike-based and periodic movements in a single framework, without changing the core learning algorithm.

We also formally prove that ImitationFlow can learn globally asymptotically stable stochastic dynamics. This property ensures global convergence to the goal for point-to-point motions, ensuring that the learned movement can be applied from any starting point of the workspace. We have released a Pytorch Implementation at https://github.com/TheCamusean/iflow

## Ii Background & Notation

### Ii-a Problem Statement

Let be a set of expert demonstrated trajectories, where each trajectory of length is a sequence of observations . For clarity of presentation, we assume that each element of the trajectory is generated with a fixed sampling time , but our derivations can be extended to the variable sampling time scenario.

Let be the distribution of the trajectory starting point. We assume that each trajectory element is generated by the following \glssde:

(1) |

where is the state, is a continuous function, is a continuous matrix function, and is a dimensional Brownian motion (also called Wiener process).

We remark that (1) is written in autonomous form. The general case, in which and are functions of both the state and time can be easily recovered by defining , .

Our problem is to maximize the likelihood of the unknown model parameters w.r.t. the set of observed trajectories . We frame our learning problem as the following optimization problem

(2) |

where and is defined as

### Ii-B Stability for Stochastic Dynamical systems

Lyapunov stability has various definitions in the case of stochastic dynamical systems. One option is to study the stability in probability, in which the stability is studied replacing by its expected value [14].

By Itô’s formula, the time derivative of the Lyapunov function, , is expressed by the following differential equation

(3) |

where , is known as the diffusion equation. The expected value of is

(4) |

Assume there exist a , , and strictly increasing functions such that

(5a) | ||||

(5b) |

Then, the trivial solution for our \glssde is stochastically asymptotically stable [14].

### Ii-C Normalizing Flows

Normalizing Flows provide a method for expressive density approximation. [24]. Requiring only a base distribution, usually uniform or normal distribution, and a set of bijective transformations, Normalizing Flows allows explicit density estimation and, in most cases, to sample from these complex distributions [17, 2].

Given a latent variable , sampled from a certain distribution and a diffeomorphic transformation , such that , we can compute the distribution of , , in terms of , by the change of variable rule.

(6) |

## Iii Proposed Method

ImitationFlow model extends the concept of Structured Inference Networks for Nonlinear State Space Models [11] to Normalizing Flows. Considering Normalizing Flows as emission function allows exact inference at the cost of imposing a deterministic emission.

The proposed model’s architecture is presented in Fig. 2, and the dynamics are modelled using (7). Our model is composed of two main components. In the latent space , the transition model follows some stable stochastic dynamics parameterized by . Then, we use, as emission function, a bijective, continuous and differentiable transformation transforming the data from the latent space to the observation space .

(7) |

where and are the learnable parameters of the model. Given that the Jacobian of , , is easy to compute, we can reframe (7) to compute the stochastic dynamic model for

(8) |

### Iii-a Learning Algorithm

In this section, we describe how we solve the optimization problem described in (2). In order to estimate correctly the likelihood of the trajectories, we should estimate the probability density of initial states . However, if a few trajectories are available, this estimation can be problematic. We leverage on the fact that the system represents stable dynamics: this means that we will have a stationary distribution in the limit

We exploit this property by assuming that the distribution of the last point of the trajectory is the stationary one. Differently from most of others probabilistic estimation algorithms, where the structural density is considered with forward conditioning, , in our approach we consider a backward conditioning with . It is straightforward to prove that this view is equivalent under Bayes’s rule.

Given the model proposed in (7), we can rewrite the probability distributions in terms of . By applying the change of variable rule we obtain

(9) |

Due to the fact that and are the same event, it follows that . Applying again the change of variable rule we obtain

(10) |

To optimize (10) we first select a simple stochastic dynamical system for the latent dynamics. We choose a parameterization such that the asymptotic behavior of the system e.g., stable equilibrium point or limit cycle, is enforced. Then we select a suitable class of normalizing flows as emission function . In this work, we will use the Euler-Maruyama [20] integration scheme to integrate the \glssde that describes the dynamics of the latent space.

The resulting algorithm is summarized in Alg. 1. On each iteration, a random trajectory and a sampling time are selected. By the inverse of the bijective transformation the latent trajectory is computed. The determinant of the inverse jacobian is also computed as we require it for computing the probability in (9) and (10). The latent trajectory is split to be able to compute the conditional probability and the probability in the end .

### Iii-B Latent Stochastic Linear Dynamics

A simple stochastic stable dynamic system is the stochastic Linear dynamics

(11) |

where and are the learnable parameters of the latent dynamics. We impose stability by constraining the eigenvalues real part to be smaller than zero, .

Besides, we require to compute the stationary distribution of . To simplify the problem, we consider the stationary distribution of the discretized system. Let and , then, we know that

where can be computed in closed form in the vectorized form as

(12) |

To learn the model it is also required to compute the backward conditional probability . Linear stochastic dynamics are invertible. Therefore, we can compute the conditional backward probability, which is normally distributed

### Iii-C Stability analysis for latent stable dynamics

In the following, we prove the asymptotic stability in probability of the learned system, under the assumption of a stable latent dynamic.

###### Lemma 1.

For any diffeomorphic transformation , if the dynamics of are stochastically asymptotically stable, then the dynamics of are also stochastically asymptotically stable.

###### Proof.

We follow a derivation similar to the one proposed in [16] for Lyapunov stability analysis in \glsode. Let be a Lyapunov function for the latent dynamics. We define the following Lyapunov candidate for the observed dynamics:

From this definition it follows that

(13) |

From the equality in (13) and by the fact that U is a valid Lyapunov candidate we get that the condition in (5a) is also satisfied.

To satisfy the condition in (5b), we compute the diffusion (3) of the Lyapunov function . Considering (3) and (8)

(14) |

Moreover, we can rewrite , and in terms of and .

(15) |

as we don’t have explicit time dependency on .

(16) |

where . Finally, we can rewrite

(17) |

Introducing (15), (16) and (III-C) in the diffusion equation (14)

(18) |

By hypothesis satisfies the condition in (5b), therefore the condition is also satisfied by . ∎

### Iii-D Latent Stochastic Limit Cycles

In two dimensional space, attractive limit cycles can be represented by Linear Dynamics in polar coordinates , and then apply the change of Variable rule to move to cartesian coordinates. Given , the radius and , the angle as the latent state, the dynamics can be represented by stochastic linear dynamics

(19) | ||||

(20) |

where are learnable linear dynamics parameters and are learnable standard deviation parameters. The state is transformed from polar coordinates to cartesian coordinates and back with a diffeomorphic transformation, so it allows us to apply the change of variable rule

Similarly to section III-B, in order to compute the loss, we need the stationary distribution . Applying the change of Variable rule, we obtain

where the probability is split between the determinant of the Jacobian for the polar coordinates and the probability of the polar coordinates in . The determinant of the Jacobian is

(21) |

The density of the stationary distribution is the product of two separate distribution families. While, evolves with a linear dynamic towards , leading to a normal distribution, the stationary distribution for depends on the initial density, . Unfortunately, in full generality, the density at the limit is not uniquely given, as it depends on the initial density distribution. For instance, if we assume that the initial phase distribution is uniform, the stationary density will be

where is computed the same way we did in (12). We compute the conditional probability as in the previous section.

For higher dimensions, we propose to maintain the limit cycle in a two-dimensional plane and add further linear attractor dynamics towards zero for the extra coordinates.

## Iv Experimental Evaluation

In this section, we evaluate the learning quality of our model in different datasets. We compare ImitationFlow with \glsseds and \glsclfdm in the LASA dataset [6]. Then, we show that our model is capable of representing limit cycle dynamics, for both generation and discrimination of trajectories. Finally, we show that our algorithm scales to real-world problems by learning drums playing motion in a real robot.

(a) Error SEDS | (b) Frechet Distance | (c) Swept Area | (d) DTW distance |

### Iv-a Point-to-Point trajectories

LASA dataset is a set of two-dimensional point-to-point trajectories with different shapes. This dataset is widely used for testing stable dynamics models. We learn the dynamics and we compare our results with baseline methods such as \glsseds and \glsclfdm in 4 metrics: \glsseds error [6], Swept Area Error [7], Frechet distance and DTW error [5].

As we deal with a small set of demonstrations, the mean velocity of the demonstrated trajectories is set as the initial velocity of the latent linear dynamics. With random initialization parameters, while the model can still match the trajectory closely, it struggles to learn the proper magnitudes of velocities. For these experiments, we model the emission function as a concatenation of 10 coupling layers [2] and 10 orthogonal transformations [9].

To compute the metrics, we generate the expected trajectory from the same starting point of demonstrations and we compute the similarity w.r.t the demonstrations. We compare the obtained similarity measurements with the ones obtained for \glsseds and \glsclfdm.

Fig. 3 shows that ImitationFlow outperforms the other algorithms in all the metrics, providing more accurate results. Moreover, we can see that our model is more robust than \glsseds and \glsclfdm: it is clear that ImitationFlow is the less variant and produces fewer outliers.

The generated trajectories are shown in Fig. 4. ImitationFlow can learn stochastic dynamic systems that follow the trajectories provided by the user while maintaining stability. The vector field in Fig. 4 has been computed with the expected value and we can observe the variance in our model in the light blue trajectories.

### Iv-B Limit Cycle behaviours

We can model a complex limit cycle in the observation space by switching the latent linear dynamic model with a simple limit cycle behavior. To prove this, we have recorded two-dimensional handwritten data, representing different letters. The recorded demonstrations maintain similar shape, orientation, and velocity in the drawings.

Similarly to the previous section, proper initialization of the latent dynamics increases the performance of our model. We apply \glspca over the trajectories and we extract the main frequency of the cycle through Fourier transform. This frequency is set as the initial angular velocity. For the nonlinear emission, we considered a concatenation of ten Coupling layers with ten orthogonal transformations. The dynamics generated by ImitationFlows are shown in Fig. 5. The model can perfectly represent the complex demonstrated dynamics.

### Iv-C Classification for IROS letters

Normalizing Flows compute the exact distribution of the data. This property distinguishes them from other density approximators [8] that lack the normalization term. Due to this fact, we can compare the obtained probabilities and use them for classification.

We show this by applying the previously learned letter models for the limit cycles as classification modules. A simple classifier is built by computing the probability of the given trajectory and selecting the class with the highest probability

(22) |

We have recorded ten new handwritten trajectories for each class, and we use them for testing. As shown in Fig. 7, the classifier gets a high rate of correct classification for all the shapes, with a small prediction error for R and S. The classifier is not considering any scaling or rotating term, which could improve our results. Failure mostly occurs when there is a scale or velocity discrepancy between training and testing trajectories.

### Iv-D Learning drums playing on a robot

We study the applicability of ImitationFlow in a real robotics application, with higher dimensionality(Cartesian Position with fixed orientation). We have recorded trajectories while doing kinesthetic teaching with a DLR-KUKA lightweight robot in drums playing task, presented in Fig. 1.

To deal with a higher dimensional problem, we switch the Coupling layer [2] with a \glsmaf [17]. The orthogonal transformation remains on each output of the \glsmaf layer. We initialize the model in a similar way to the previous section. We apply a Fourier transform in the first dimension of \glspca space and the main frequency is set as the initial angular velocity.

We show in Fig. 7, a generated trajectory compared with the demonstrated data in cartesian space. We consider a starting configuration that is far from the limit cycle and generate a full trajectory offline. We show that the model can generate a stable trajectory towards the attractor.

## V Related Work

Learning (globally) stable dynamical systems is a crucial topic in the \glslfd community, as these kinds of systems provide motion generation controllers that are robust to perturbations and can generalize the motion in regions of the state space where no demonstration is available. Previous works on this topic can be divided into two different categories.

The first set of approaches is based on Lyapunov stability analysis. The common idea behind these methods is to find a candidate function from a family of Lyapunov functions. One of the first examples of these approaches is the \glsseds algorithm [6], that can learn globally stable dynamics towards a goal position. The candidate function is selected from the family of quadratic Lyapunov functions, while the dynamical system is learned using a finite mixture of Gaussian functions. The optimal solution is obtained by maximizing the log-likelihood of the demonstration under the stability constraint using \glssqp. The main limitation of \glsseds is that the dynamics are restricted to contractive ones due to the limitation imposed by the quadratic Lyapunov function family. To overcome this issue, and improve the class of representable movements, the \glsclfdm approach [7] has been proposed. To do so, the authors propose a richer class of Lyapunov functions, the \glswsaqf. The method uses a \glsclf to find stable dynamics. Dynamics are modeled using an unstable dynamical system that is stabilized using an additional control signal. While \glsclfdm is more expressive than the \glsseds algorithm, the learning process can be problematic [16]. An alternative solution to the lack of representation power of \glsseds is the \glsseds algorithm [16]. Differently from the \glsclfdm approach, this algorithm adds a diffeomorphic transformation on top of the stable dynamics learned by \glsseds. However, an application-dependent diffeomorphic transformation must be provided by the designer. Similar in spirit to our work is [19], which also used a diffeomorphic transformation to inherit the stability properties. While in their case, a kernel-based local translation is used, in our case, we use invertible networks as bijective transformation. In [29], the learned stable dynamics were also stochastic. Anyway, the distribution family of the model was imposed and not learned from the data. The Lyapunov stability principle has been studied also in combination with neural networks. In [15], the authors learn stable dynamics and the Lyapunov candidate from data using a neural network. While they can find accurate results, stability is not guaranteed globally.

More recently in [10] also deep models have been proposed for learning stable dynamic systems. In their paper, a deep neural network is proposed that learns the Lyapunov candidate and the system dynamics jointly by using an \glsicnn.

Recently, another set of approaches has been developed, building on the contraction analysis to find a globally stable solution. In [21], the authors propose a method that uses the same dynamic model as \glsseds, but they exploit contraction analysis, instead of Lyapunov functions, to enforce global asymptotical stability. In [28], a contractive vector field is learned by \glsrkhs. This approach can learn a set of equilibrium points and constraint the local curvature using convex optimization.

There have been a few attempts of extending the Normalizing flows to time series [12, 23]. While these approaches benefit from the flexibility of normalizing flows and present remarkable results, they do not have any theoretical guarantees on the stability of the learned dynamical system.

To the best of our knowledge, our method is the only approach that has global stability guarantees for any distribution \glssde, while being able to learn more complex attractors, such as limit cycles.

## Vi Discussion and Future Work

In this paper, we presented a Deep Generative Model that extends the Normalizing Flows for learning stable \glssdes. We have demonstrated the Lyapunov stability for a class of \glssdes and propose a learning algorithm to learn them from a set of demonstrated trajectories.

We show that our model outperforms state-of-the-art algorithms for stable dynamics learning. Moreover, we have shown that our model is not limited to point-to-point dynamics, and we show how it can be extended to complex limit cycle behaviors by plugging a different latent dynamic. The density estimation property of our model has been applied in a classification task, and we show that our model can be extended to higher dimensions to solve a real robot task.

In future works, we will include conditioning to our model, to adapt the dynamics to different environments. Furthermore, we will improve the generalization capabilities of our model in areas without demonstration, to obtain smoother trajectories towards the stable attractor.

### References

- (1997) Locally weighted learning. In Lazy learning, pp. 11–73. Cited by: §I.
- (2017) Density estimation using real nvp. In International Conference in Learning Representations, Cited by: §I, §II-C, §IV-A, §IV-D.
- (2008) Dynamical system modulation for robot learning via kinesthetic demonstrations. IEEE Transactions on Robotics 24 (6), pp. 1463–1467. Cited by: §I.
- (2019) Kernelized movement primitives. The International Journal of Robotics Research 38 (7), pp. 833–852. Cited by: §I.
- (2019-05) Similarity measures for identifying material parameters from hysteresis loops using inverse analysis. International Journal of Material Forming. Cited by: §IV-A.
- (2011) Learning stable nonlinear dynamical systems with gaussian mixture models. IEEE Transactions on Robotics 27 (5), pp. 943–957. Cited by: §I, §IV-A, §IV, §V.
- (2014) Learning control lyapunov function to ensure stability of dynamical system-based robot reaching motions. Robotics and Autonomous Systems 62 (6), pp. 752–765. Cited by: §I, §IV-A, §V.
- (2014) Auto-encoding variational bayes. In Conference on Learning Representations, Cited by: §IV-C.
- (2018) Glow: generative flow with invertible 1x1 convolutions. In Advances in Neural Information Processing Systems, pp. 10215–10224. Cited by: §IV-A.
- (2019) Learning stable deep dynamics models. In Advances in Neural Information Processing Systems 32, pp. 11126–11134. Cited by: §I, §V.
- (2017) Structured inference networks for nonlinear state space models. In AAAI conference on artificial intelligence, Cited by: §III.
- (2020) VideoFlow: a conditional flow-based model for stochastic video generation. In International Conference on Learning Representations, Cited by: §V.
- (2013) Neurally imprinted stable vector fields. In European Symposium on Artificial Neural Networks, Cited by: §I.
- (2007) Stochastic differential equations and applications. Elsevier. Cited by: §II-B, §II-B.
- (2013) Neural learning of stable dynamical systems based on data-driven lyapunov candidates. In IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 1216–1222. Cited by: §V.
- (2015) Learning robot motions with stable dynamical systems under diffeomorphic transformations. Robotics and Autonomous Systems 70, pp. 1–15. Cited by: §I, §III-C, §V.
- (2017) Masked autoregressive flow for density estimation. In Advances in Neural Information Processing Systems, pp. 2338–2347. Cited by: §I, §II-C, §IV-D.
- (2013) Probabilistic movement primitives. In Advances in Neural Information Processing Systems, pp. 2616–2624. Cited by: §I.
- (2016) Fast diffeomorphic matching to learn globally asymptotically stable nonlinear dynamical systems. Systems & Control Letters 96, pp. 51–59. Cited by: §I, §V.
- (2010) Numerical solution of stochastic differential equations with jumps in finance. Vol. 64, Springer Science & Business Media. Cited by: §III-A.
- (2017) Learning partially contracting dynamical systems from demonstrations.. In Conference on Robot Learning, pp. 369–378. Cited by: §I, §V.
- (2015) Learning contracting nonlinear dynamics from human demonstration for robot motion planning. In ASME, Dynamic Systems and Control Conference, Cited by: §I.
- (2019) Filtering normalizing flows. In Bayesian Deep Learning Workshop at NeurIPS, Cited by: §V.
- (2015) Variational inference with normalizing flows. In International Conference on Machine Learning, Vol. 37, pp. 1530–1538. Cited by: §I, §II-C.
- (2002) Scalable techniques from nonparametric statistics for real time robot learning. Applied Intelligence 17 (1), pp. 49–60. Cited by: §I.
- (1997) Learning from demonstration. In Advances in Neural Information Processing Systems, pp. 1040–1046. Cited by: §I.
- (2006) Dynamic movement primitives-a framework for motor control in humans and humanoid robotics. In Adaptive motion of animals and machines, pp. 261–280. Cited by: §I.
- (2018) Learning contracting vector fields for stable imitation learning. arXiv preprint arXiv:1804.04878. Cited by: §I, §V.
- (2017) Learning stable stochastic nonlinear dynamical systems. In International Conference on Machine Learning, pp. 3502–3510. Cited by: §I, §V.