Geodesic Density Tracking with Applications to Data Driven Modeling
Abstract
Many problems in dynamic data driven modeling deals with distributed rather than lumped observations. In this paper, we show that the MongeKantorovich optimal transport theory provides a unifying framework to tackle such problems in the systemscontrol parlance. Specifically, given distributional measurements at arbitrary instances of measurement availability, we show how to derive dynamical systems that interpolate the observed distributions along the geodesics. We demonstrate the framework in the context of three specific problems: (i) finding a feedback control to track observed ensembles over finitehorizon, (ii) finding a model whose prediction matches the observed distributional data, and (iii) refining a baseline model that results a distributionlevel predictionobservation mismatch. We emphasize how the three problems can be posed as variants of the optimal transport problem, but lead to different types of numerical methods depending on the problem context. Several examples are given to elucidate the ideas.
1 Introduction
In traditional systems theory, modeling and control synthesis assumes the availability of measurements in the form of vector signals or trajectories observed over time. However, in many applications, observations are not lumped variables, rather they are distributed over spatial dimensions. For example, in Nuclear Magnetic Resonance (NMR) spectroscopy and Imaging (MRI) applications, the measurement variable is magnetization distribution [1], since sensing individual magnetization states of the order of Avogadro number , remains a technological limitation. On the other hand, in process industry applications like papermaking [2, 3], measurement and control of distributions are motivated by the design choice of tracking desired fibre length and filler size distributions. Similar examples can be found in biological systems [4].
Another motivation to consider modeling, identification and control problems in the distributional setting, comes from the recent proliferation of cyberphysical systems. The tight integration of control, communication and computation has resulted information deluge, popularly termed as “big data” [5]. With the abundance of data, it becomes imperative to seek constructive algorithms that can lead to better phenomenological models or better controllers, specially in the presence of parametric uncertainties and lack of understanding of the firstprinciple physics. The objective of this paper is to introduce a framework, grounded on the theory of optimal transport [6], to generate models that track distributional observations over finite horizon.
1.1 Nomenclature and the Interpretation of Probability
We use the terms “distribution” and “ensemble” interchangeably with the common meaning of the availability of a multitude of real measurements at a fixed time. Also, we assume that the underlying true dynamics generating the measurements is smooth enough [7] so that the realizations are absolutely continuous, and hence we can talk about “densities” in lieu of “distributions”. The optimal transport framework described in this paper will work even if this assumption is violated [6], namely if the distributions exist but the densities don’t.
Notice that the observations are in distributional or density level, does not necessarily imply that the underlying state dynamics is governed by a partial differential equation (PDE) or stochastic differential equation (SDE). Indeed, the density may arise from the parametric and initial condition uncertainties, although the underlying state dynamics may be deterministic, governed by an (unknown) ordinary differential equation (ODE). In this sense, the term “density” refers to the concentration of trajectories, and since the state space mass is preserved under the action of the flow, one can interpret [8] the trajectory concentration as probability density in the sense of propensity [9]. On the other hand, it may indeed be the case that the underlying true dynamics generating the observations, is governed by a SDE, naturally giving rise to probability densities, even if the initial conditions and parameters are fixed. Also, in cases where the observable is naturally distributed over spatial dimensions (e.g. a color image), then one can suitably normalize the data to enable our densitybased optimal transport framework. Hence, without loss of generality, we term the observations as probability density functions (PDFs).
1.2 Related Work
The idea of PDF control is not new in the control literature. Previous works like [10, 11, 12] dealt with asymptotic density shaping, meaning the feedback controls were found so that the desired PDF coincides with the stationary PDF of the closedloop dynamics. Similar ideas predate in terms of covariance control [13]. The framework presented in this paper differs from the existing literature in that we track the distributions observed over finite horizons, and the time instances of distributional measurement availability need not be equispaced. The question then becomes: “over any given horizon, what needs to be done at the realization level trajectory dynamics, such that we track the observations at the ensemble level in some optimal sense?” The optimal transport theory allows us to achieve finitetime distributional tracking while guaranteeing that minimum amount of work is done over each horizon.
1.3 Notations
Most notations are standard. The symbol denotes pushforward of a probability measure. and refer to the kernel and image of a linear operator, respectively. The symbol “Id” denotes identity vector map of appropriate dimension. The notation means that the random vector has the joint PDF . Furthermore, denotes the gradient operator with respect to spatial variables, and refers to determinant of a matrix. The symbol stands for the Hessian. Unless otherwise specified, the superscript refers to optimality, while the superscript refers to the MoorePenrose pseudoinverse of a matrix. The notation denotes the identity matrix, denotes Gaussian PDF with mean and covariance , denotes uniform PDF.
1.4 Problem Formulation
We consider a sequence of time instances , when the measurement vector is recorded as a sequence of PDFs . If we introduce , then we have . The general problem statement can now be stated as follows (see Fig. 1).
Problem 1
For each interval , find a dynamical system that minimizes the total transportation cost
(1) 
over all transportation policy , such that , , .
To understand the problem, let , which is nothing but the differential mass over the product space . If the cost per unit mass equals squared Euclidean distance , then (1) denotes the total cost to transport the density to , while preserving mass (since both and are PDFs). Notice that the total cost depends on the choice of the PDF supported over , that dictates the transportation policy, and hence . However, finding the joint PDF with given marginals and , is not unique. Thus, we seek to find that transportation policy or joint PDF , which is the minimizer of the total transportation cost (1).
Notice that Problem 1 is rather generic in the sense, it does not impose any structural constraint on the dynamical system to be determined. In the context of dynamic data driven modeling, we are interested in three variants of Problem 1, stated next.
Problem 1.1 (Finite Horizon Feedback Control of Output PDFs)
Solve Problem 1 with prespecified control structure on (e.g. affine or nonaffine, linear or nonlinear) by finding the state or output feedback control .
Remark 1
Notice that Problem 1.1 is more specific than Problem 1 in the sense that although could be found by solving Problem 1, additional conditions may be necessary for feedback control to exist that satisfies the desired control structure.
Remark 2
The order of modeling/identification is same as the dimension () of the output vector. This is helpful in practice since is usually much less than the dimension of the true state space. In other words, the model (ODE or map) will be over the outputs, thus naturally resulting reduced order models.
Remark 3
In Problem 1.3, refers to a synthetic notion of time. In the context of refinement problem, the physical time stays zero order hold for each timehorizon in Fig. 1.
1.5 Organization of the Paper
This paper is structured as follows. In Section II, the necessary background on optimal transport theory is given. In particular, we connect different formulations of optimal transport with the different dynamic data driven modeling problems described in the previous subsection, and show how they lead to different types of numerical solutions. Next, using the ideas from Section II, we discuss Problem 1.1, 1.2 and 1.3 in Sections III, IV and V, respectively. To illustrate the solution methodology, analytical and numerical results are provided for example problems. Section VI concludes the paper.
2 Background on Optimal Transport
2.1 Primal Formulation
The optimal transport theory originated in 1781 when Gaspard Monge considered [14] the problem of moving a pile of soil from an excavation to another site that entails minimum work. This idea went mostly unnoticed for 160 years until Leonid Kantorovich provided a modern treatment [15] of this subject in 1942 (the English translation [16] appeared in 1958), which eventually led to the Nobel prize in economics in 1975. In the theory of MongeKantorovich optimal transport, one defines a distance, called Wasserstein distance, between two given PDFs and , that measures the shape difference between them.
Definition 1
(Wasserstein distance) The Wasserstein distance of order 2 (henceforth referred simply as Wasserstein distance ), between two dimensional random vectors , and , is defined as
(2) 
where the is taken with respect to the joint PDF that makes the cost function achieve the infimum. The symbol denotes the set of all joint PDFs supported over , having finite second moments, whose first marginal is , and second marginal is .
Remark 4
It can be shown [17] that defines a metric on the manifold of PDFs, and remains well defined between the distributions even though the random vectors and are not absolutely continuous (i.e. and don’t exist).
Remark 5
Remark 6
Remark 7
The infinite dimensional LP (2) can be solved by directly discretizing the problem in terms of the samples of the constituent PDFs (see [18, 19]). As shown in the first row of Table 1, this results a large scale finite dimensional LP, whose solution provides a consistent approximation [6] of the true solution (of the infinite dimensional LP).
2.2 Variational Formulation for Optimal Transport Map
Instead of solving (2), one could directly solve for the the optimal transport map , that satisfies , by solving
(4) 
Remark 8
Since there are infinite ways to morph to , (4) looks for an optimal pushforward map that would require minimum amount of transport effort among all possible pushforward maps . Then the map characterizes the optimal transport of Problem 1.
Remark 9
In a seminal paper [20], Brenier proved the existence and uniqueness of . Further, his polar factorization theorem [20] proved that the unique vector function can be written as a gradient of a scalar function, i.e. . Furthermore, the scalar function is convex. The optimal transport map is also known as the Brenier map.
Remark 10
Although the cost function in (4) is quadratic in , the pushforward constraint is nonlinear and nonconvex in . Thus, a direct numerical optimization is not straightforward. As shown in the second row of Table 1, [21] used the fact that is curlfree, to formulate a regularized sequential quadratic program (SQP) to solve (4) as
(5) 
where , and is a regularization parameter. The constraint .
2.3 PDE Formulation for Optimal Transport Map
From Remark 9, we can substitute in the pushforward constraint . Then it follows that must solve
(6) 
This is a second order, nonlinear, stationary, elliptic PDE, known as the MongeAmpère equation, to be solved for as a function of . In principle, if we can solve (6), then would solve Problem 1. However, as mentioned in the third row of Table 1, numerically solving the PDE (6) remains a research challenge.
2.4 BenamouBrenier Spacetime Variational Formulation
Benamou and Brenier proposed [23] a dynamic reformulation of the static optimization problem (4) by introducing a synthetic notion of time, which we denote as . Their main result is that the spatial optimization problem (4), is equivalent to solving the following spacetime optimization problem:
(7)  
(8) 
Remark 11
It is important to understand the meaning of solving the optimization problem (7)(8). Notice that the spatial and temporal integrals in the cost function can be interchanged. Thus, if we fix , then the cost is the instantaneous kinetic energy of the ensemble during transport, where each sample moves according to the deterministic ODE , corresponding to the Liouville PDE [24] , appearing in the constraint. Hence, the cost function in (7) is equal to the total kinetic energy up to time . Consequently, equals total work done during the transport process. The optimization is over a pair of vector field and joint PDF , and is convex in both.
Remark 12
It can be shown [6] that the minimizing vector field in the above optimization problem, is a pressureless potential flow. In other words, , where the scalar function solves the HamiltonJacobi equation
(9) 
Remark 13
In p. 384 of [23], using Legendre transform, (7)(8) was further converted to a saddle point optimization problem, which was numerically solved using the augmented Lagrangian technique [25]. Recently, an improved numerical method to solve (7)(8) has been proposed [26] via proximal operator splitting. We will use this technique in Section IV for numerical simulations.
Mathematical formulation  Problem type  Numerical method  Solver 

Primal formulation in  Infinite dimensional LP (2)  Direct discretization  Large scale LP solver 
(e.g. MOSEK^{®} used in [18, 19])  
Variational Formulation for  Quadratic cost with nonlinear  Regularized SQP  “Discretizethenoptimize” 
optimal transport map  nonconvex constraints (4)  (optimality not guaranteed)  solver in [21] 
PDE Formulation for  MongeAmpère PDE (6)  Not well studied  See review article [22] 
optimal transport map  (Second order nonlinear elliptic PDE)  for current research status  
BrenierBenamou spacetime  Nonsmooth convex  Proximal operator splitting  Staggered grid 
variational formulation in  optimization problem (7)(8)  discretization in [26] 
2.5 Wasserstein Geodesics on the Manifold of PDFs
One merit of the BenamouBrenier approach described in Section II.D is that it constructs the transportation path, which is a geodesic connecting the source and target PDFs, and yields the intermediate PDFs satisfying McCann’s displacement interpolation [27]. In particular, the following results [6] hold.

Without loss of generality, let the synthetic time , i.e. in the notation of Section II.D, set . Then the BenamouBrenier vector field constructs the geodesic curve between . Recall that the PDF is the source PDF, and is the target PDF. In other words, has the variational characterization
(10) and it lies on the geodesic curve connecting and . The Wasserstein distance is the length of this geodesic curve on the manifold of PDFs.

As a corollary of the above result, the intermediate optimal transport map that satisfies , is obtained via linear interpolation between the identity map Id and , i.e.
(11) Also, the intermediate Wasserstein distance is obtained via linear interpolation:
(12) (13) However, the intermediate PDF is obtained via nonlinear (displacement) interpolation:
(14) (15)
3 Feedback Control for FiniteHorizon Density Tracking
In this Section, we consider Problem 1.1 under prespecified control structures. For brevity, we only consider the case when has discretetime LTI state dynamics, and the state PDFs are prescribed Gaussians.
3.1 Linear Gaussian PDF Control in Discrete Time
Consider a linear system , , with a sequence of Gaussian PDFs , . The objective is to find state feedback over each time interval , such that , while guaranteeing minimal transportation cost (1). Using the idea of Section II.B, we transcribe the problem of finding optimal control to that of finding the optimal transport map (a.k.a. Brenier map) , where
(16) 
subject to the constraints (C1) , (C2) , and (C3) . Then we have the following result.
Theorem 1
Consider the discretetime Gaussian PDF control problem under LTI structure, i.e. in Fig. 1, let , where . Further, let be given by the discretetime LTI structure: , . Then the state feedback that minimizes the transportation cost (1), has the following properties.

The optimal state feedback, if exists, must be affine.

Optimal state feedback exists iff , where
(17) (18) 
If exists, then the optimal state feedback is given by the pair , i.e. , where , and , for arbitrary real matrixvector pair of appropriate dimensions.

If is full rank, then the optimal state feedback is unique, and is given by , .
Given, , we know [28, 29] that satisfying
(19) 
is a unique affine transformation . Since the optimal transport is , the optimal controller, if exists, must be affine, i.e. of the form , where and solve the linear matrix equations
(20) 
Now, from Lemma 2.4 in [30], there exists solving the equation iff
(21) 
On the other hand, the matrixvector equation admits solution iff
(22) 
When , then the (nonunique) solution is given by [30]: , and , for arbitrary real matrixvector pair of appropriate dimensions. If is full rank, then exists and , resulting the unique solution.
Remark 14
Notice that till now, we assumed is given by the same LTI pair . It is easy to see that the above Theorem generalizes when the LTI pair are different for different horizons.
3.2 How is this Different from Ensemble Control
In recent years, a set of tools have been developed [31, 32, 1] for finitehorizon distribution shaping using openloop control signal, under the constraint that the same openloop excitation is applied to all realizations of the system dynamics (characterized by parametric dispersions). Our approach of solving Problem 1.1 is different from this paradigm, termed as “ensemble control”, in at least two ways. First, the ensemble control seeks an openloop solution while ours is closedloop solution. Second, the ensemble controllability needs to be established to guarantee the existence of such open loop excitation, however, if found, will allow sensorless ensemble shaping. On contrary, our feedback implementation requires sensing at the ensemble level, but guarantees geodesic transport over each horizon.
4 Data Driven ^{th} Order Modeling
In this Section, we consider Problem 1.2, namely interpolating observed distributional data by identifying dynamical models over each finite horizon, in the absence of a priori structural knowledge (unlike Problem 1.1) about the models. The only choice the modeler can make is to decide whether a discretetime or continuoustime model is apt. Once this choice is made, a deterministic trajectorylevel model is desired that satisfies the two point boundary value problem in the output PDF level, at the beginning and end of the horizon length. Notice that we restrict ourselves to derive a deterministic flow or map, even though the observed PDFs may have been generated by a true but unknown state dynamics governed by PDE or SDE. In this sense, the modeling problem can be thought of as a sequence of finitehorizon distributional realization problems.
To illustrate how optimal transport ideas can befit here, we work out an example problem. Since the discretetime Gaussian modeling problem can be dealt similar to Section III.A, we choose a continuoustime nonGaussian scenario.
Example 1
Consider the case when the true dynamics is given by the Duffing oscillator
(23) 
where , , . One can verify that for these values of the parameters , the dynamics (23) has three equilibria: , . Linear stability analysis tells that the origin is a saddle node while the remaining two equilibria are stable foci. We use (23) only to generate synthetic data and assume that the knowledge of this true vector field is unavailable to the modeler.
To generate the true distributional PDFs, we assume that the initial joint state PDF . We generate 500 samples from this uniform PDF, and evaluate them at . Starting from these samples, we evolve the joint state PDF subject to (23) by solving the Liouville PDE , where is the Duffing vector field. We perform this uncertainty propagation by solving the methodofcharacteristics ODE corresponding to the Liouville PDE. Details on this methodology can be found in Section II of [24]. This procedure results scattered colored data (Fig. 2) at every time , , where the location of the samples are determined from the dynamics while the color value at a sample location indicates the exact (unlike Monte Carlo histograms) joint PDF value at that sample location, at that time. Since , hence Fig. 2 depicts the sequence , in the nomenclature of Section I.
Let Fig. 2 be the distributional data akin to Fig. 1, observed by the modeler. A continuoustime model is sought over each horizon: . To solve this problem, we employ the BenamouBrenier spacetime optimization formulation described in Section II.D, resulting a vector field per horizon, which solves the two point Liouville boundary value problem (guaranteeing endpoint PDF matches) while incurring minimum amount of work over each . For this purpose, we take the two end point scattered data representation of and , and interpolate the data over a regular grid, followed by DouglasRachford proximal operator splitting algorithm [26] to solve the ensuing nonsmooth convex optimization (7)(8), resulting the vector field . Fig. 3 and 4 show the gridded observed PDFs and the intermediate PDF reconstructions for , and , respectively, superimposed with their respective BenamouBrenier vector fields (black arrows). In Fig. 5, we compare the PDF transportation paths for in , for the true Duffing dynamics (23) and the optimal transport dynamics. In view of Remark 12, this plot shows that unlike the BrenierBenamou gradient vector field, (23) does not result into geodesic PDF transport. This is not surprising, since , i.e. Duffing vector field has nonzero vorticity everywhere except , thus causing a clockwise rotational flow that requires more transportation effort than what could be achieved by a gradient flow.
5 Model Refinement
In this Section, we consider Problem 1.3, namely refining a baseline model against experimental data. We first formulate the model refinement problem as that of finding the optimal transport map introduced in Section II.
5.1 General formulation
We formulate the model refinement problem (Fig. 6) as the natural successor of the distributional model validation formulation proposed in [18, 19]. In the validation problem, the model predicted output PDF is compared with the experimentally observed output PDF , at each instance of measurement availability , , and an inference is made by looking at the predictionobservation gap quantified via . The key insight behind our refinement formulation is that usually there is no specific requirement on the structure of the refined model, as long as we can make the refined dynamics track the observed output PDFs. This provides us the freedom to formulate the model refinement problem over the model’s output map while keeping the model’s state equation intact. This has two implications: (i) the refinement algorithm will involve the output dimension , typically less than the state dimension, and (ii) both state and output modeling errors would be accounted by updating the model’s output map. To make the ideas precise, we give the model refinement problem statement for a model whose output map is given by , where and are modelpredicted state and output vectors.
5.2 Problem Statement: Transcribing Model Refinement Problem as Finding Optimal Transport Map
At , let us introduce , and denote . We want to find the Brenier map for updating the predicted output, i.e. , where and . In other words, find such that . Clearly, this problem is underdetermined since there are many ways to morph to . Then we must look for an optimal pushforward map that would require minimum amount of transport effort among all possible pushforward maps , i.e. we solve (4). Once has been found, the refined model is given by augmenting the model’s state equation with the new output map:
(24) 
Example 2
(Refining Linear Model against Gaussian Measurements) Let the true data being generated by the discretetime LTI system , , that is unknown to the modeler. The proposed model is , , where the SchurCohn stable matrices and are given by
(25) 
and the output matrices are
(26) 
Starting from the initial Gaussian state PDF with , , we refine the model at three instances of measurement availability: , and 3. The results for the model refinement algorithm are shown in Fig. 7. To illustrate how the results of Section II.E are applied in this particular refinement problem, we provide the following Theorem.
Theorem 2
(^{th} synthetic time PDF at ^{th} physical time) Let and consider the above linear Gaussian refinement problem with initial PDF . At the ^{th} instance of measurement availability, the intermediate PDF during refinement is a Gaussian PDF where
(27) 
(28) 
where
(29) 
We know that , and similarly, . On the other hand, we have , and similarly, .
6 Conclusions
In this paper, we argued that many problems in dynamic data driven modeling lead to distributional observation, either in natural stochastic sense, or in the sense of concentration. We showed how the optimal transport theory can offer a disciplined approach to solve such problems like finite horizon feedback control of PDFs, datadriven reducedorder modeling, and refining a baseline model.
References
 J.S. Li, and N. Khaneja, “Ensemble Control of Bloch Equations”, IEEE Transactions on Automatic Control. Vol. 54, No. 3, pp. 528–536, 2009.
 H. Wang, “Robust Control of the Output Probability Density Functions for Multivariable Stochastic Systems with Guaranteed Stability”, IEEE Transactions on Automatic Control. Vol. 44, No. 11, pp. 2103–2107, 1999.
 H. Wang, H. Baki, and P. Kabore, “Control of Bounded Dynamic Stochastic Distributions using Square Root Models: An Applicability Study in Papermaking Systems”. Transactions of the Institute of Measurement and Control. Vol. 23, No. 1, pp. 51–68, 2001.
 E. Brown, J. Moehlis, and P. Holmes, “On the Phase Reduction and Response Dynamics of Neural Oscillator Populations”. Neural Computation. Vol. 16, No. 4, pp. 673–715, 2004.
 T. White, Hadoop: The Definitive Guide, O’Reilly Media, 3^{rd} Edition, 2012.
 C. Villani, Topics in Optimal Transportation. Graduate Studies in Mathematics, First ed., American Mathematical Society; 2003.
 R.J. DiPerna, and PL. Lions, “Ordinary Differential Equations, Transport Theory and Sobolev Spaces”. Inventiones Mathematicae, Vol. 98, No. 3, pp. 511–547, 1989.
 A. Lasota, and M. Mackey, Chaos, Fractals and Noise: Stochastic Aspects of Dynamics, Applied Mathematical Sciences, Vol. 97, SpringerVerlag, New York, 1994.
 J.C. Willems, “Discussion on: “Why Is Resorting to Fate Wise? A Critical Look at Randomized Algorithms in Systems and Control” ”. European Journal on Control, Vol. 16, No. 5, pp. 436–440, 2010.
 M. Forbes, J.F. Forbes, and M. Guay, “Regulatory Control Design for Stochastic Processes: Shaping the Probability Density Function”. Proceedings of the 2003 American Control Conference, Vol. 5, pp. 3998–4003, 2003.
 C.X. Zhu, W.Q. Zhu, and Y.F. Yang, “Design of Feedback Control of A Nonlinear Stochastic System for Targeting A Prespecified Stationary Probability Distribution”. Probabilistic Engineering Mechanics, Vol. 30, pp. 20–26, 2012.
 L. Guo, and H. Wang, “Generalized Discretetime PI Control of Output PDFs Using Square Root Bspline Expansion”. Automatica, Vol. 41, No. 1, pp. 159–162, 2005.
 A.F. Hotz, and R.E. Skelton, “A Covariance Control Theory”. 24^{th} IEEE Conference on Decision and Control, Vol. 24, pp. 552–557, 1985.
 G. Monge, “Mémoire sur la thèorie des déblais et des remblais”. Histoire de l’Académie Royale des Sciences de Paris, pp. 666–704, 1781.
 L.V. Kantorovich, “On the Translocation of Masses”. Comptes Rendus (Doklady) de l’Académie des Sciences de l’URSS, Vol. XXXVII, No. 78, pp. 199–201, 1942.
 L.V. Kantorovich, “On the Translocation of Masses”. Management Science, Vol. 5, No. 1, pp. 1–4, 1958.
 S.T. Rachev, Probability Metrics and the Stability of Stochastic Models. John Wiley, First ed., 1991.
 A. Halder, and R. Bhattacharya, “Model Validation: A Probabilistic Formulation”, IEEE Conference on Decision and Control, Orlando, Florida, 2011.
 A. Halder, and R. Bhattacharya, “Further Results on Probabilistic Model Validation in Wasserstein Metric”, IEEE Conference on Decision and Control, Maui, 2012.
 Y. Brenier, “Polar Factorization and Monotone Rearrangement of Vectorvalued Functions”. Communications on Pure and Applied Mathematics, Vol. 44, No. 4, pp. 375–417, 1991.
 E. Haber, T. Rehman, and A. Tannnenbaum, “An Efficient Numerical Method for the Solution of the Optimal Mass Transfer Problem”. SIAM Journal on Scientific Computing, Vol. 32, No. 1, pp. 197–211, 2010.
 X. Feng, R. Glowinski, and M. Neilan, “Recent Developments in Numerical Methods for Fully Nonlinear Second Order Partial Differential Equations”. SIAM Review, Vol. 55, No. 2, pp. 205–267, 2013.
 J.D. Benamou, and Y. Brenier, “A Computational Fluid Mechanics Solution to the MongeKantorovich Mass Transfer Problem”. Numerische Mathematik, Vol. 84, No. 3, pp. 375–393, 2000.
 A. Halder, and R. Bhattacharya, “Dispersion Analysis in Hypersonic Flight During Planetary Entry Using Stochastic Liouville Equation”. Journal of Guidance, Control and Dynamics, Vol. 34, No. 2, pp. 459–474, 2011.
 M. Fortin, and R. Glowinski, Augmented Lagrangian Methods: Applications to the Numerical Solution of Boundaryvalue Problems. Vol. 15, North Holland, 1983.
 N. Papadakis, G. Peyré, and E. Oudet, “Optimal Transport with Proximal Splitting”. arXiv preprint, arXiv:1304.5784, 2013.
 R.J. McCann, “A Convexity Principle for Interacting Gases”. Advances in Mathematics, Vol. 128, No. 1, pp. 153–179, 1997.
 I. Olkin, and F. Pukelsheim, “The Distance Between Two Random Vectors with Given Dispersion Matrices”. Linear Algebra and its Applications, Vol. 48, pp. 257–263, 1982.
 M. Knott, and C.S. Smith, “On the Optimal Mapping of Distributions”. Journal of Optimization Theory and Applications, Vol. 43, No. 1, pp. 39–49, 1984.
 A. Ohara, and T. Kitamori, “Geometric Structures of Stable State Feedback Systems”. IEEE Transactions on Automatic Control, Vol. 38, No. 10, pp. 1579–1583, 1993.
 J.S. Li, Control of Inhomogeneous Ensembles. Ph.D. thesis, Harvard University, 2006.
 J.S. Li and N. Khaneja, “Ensemble Control on Lie Groups”. 7^{th} IFAC Symposium on Nonlinear Control Systems, 2007.