# Probabilistic Riemannian submanifold learning with wrapped Gaussian process latent variable models

###### Abstract

Latent variable models learn a stochastic embedding from a low-dimensional latent space onto a submanifold of the Euclidean input space, on which the data is assumed to lie. Frequently, however, the data objects are known to satisfy constraints or invariances, which are not enforced by traditional latent variable models. As a result, significant probability mass is assigned to points that violate the known constraints. To remedy this, we propose the wrapped Gaussian process latent variable model (WGPLVM). The model allows non-linear, probabilistic inference of a lower-dimensional submanifold where data is assumed to reside, while respecting known constraints or invariances encoded in a Riemannian manifold. We evaluate our model against the Euclidean GPLVM on several datasets and tasks, including encoding, visualization and uncertainty quantification.

Probabilistic Riemannian submanifold learning with wrapped Gaussian process latent variable models

Anton Mallasto Department of Computer Science University of Copenhagen mallasto@di.ku.dk Søren Hauberg Department of Mathematics and Computer Science Technical University of Denmark sohau@dtu.dk Aasa Feragen Department of Computer Science University of Copenhagen aasa@di.ku.dk

noticebox[b]Preprint. Work in progress.\end@float

## 1 Introduction

We contribute a probabilistic submanifold learning algorithm that respects constraints and invariances by taking values on a Riemannian manifold. By forcing the learned submanifold to lie on the manifold, we avoid assigning positive probability mass to points that violate these restrictions. As the algorithm learns submanifolds that respect the known constraints, we get more efficient data encoding, more faithful visualizations, and less conservative uncertainty estimates.

Manifold learning estimates a low-dimensional manifold that captures the trend of given data. Classical algorithms [30, 28, 4] learn a low distortion projection from an assumed submanifold of the original, Euclidean ambient space, onto a low-dimensional Euclidean latent space. Latent variable models (LVMs) [19, 12, 20] learn the reverse latent embedding from the latent space into the ambient space, associating each point in the latent space with an ambient space point. In the well-known Gaussian process latent variable model (GPLVM) [20], the latent embedding is a Gaussian process (GP) over the latent space, and hence learns not only a manifold embedding into , but also a model of its uncertainty. GPLVMs have inspired other LVMs [31, 22, 33], that all rely on Euclidean geometry.

Where do manifolds come from? In many applications, the measured objects satisfy constraints [3, 35, 34, 15, 24, 5] or have invariances [18, 10, 17], which lead to nonlinear data spaces when enforced. Symmetric positive definite matrices appear e.g. as covariance descriptors in economy and computer vision [37, 32] and tensors in diffusion MRI or materials science [3, 8], and directional statistics refers to data constrained to a sphere [34, 24]. Invariances are similarly common; for instance, object shape is invariant to translation, scaling and rotation [18, 10]. In these examples, as we enforce the constraints or factor out the invariants, we force the data to reside on an ambient, usually nonlinear, manifold . While it is possible to work with the original, Euclidean representation of the data, this will typically result in learned manifolds whose points do not satisfy the constraints of the data. This is illustrated in Fig. 1 for a set of femur bone movement representations extracted from motion capture of a walking human [34]. The Euclidean GPLVM learns a submanifold that does not reside on the sphere, corresponding to a femur bone with non-constant length. As we show in our experiments below, the assignment of significant probability mass outside the manifold leads to overestimated uncertainty. In essence, the model has to learn what we already know.

Submanifold learning algorithms, illustrated in Fig. 2, thus aim to infer an embedding map (dashed red) from a latent space to a submanifold (dashed red) of a known ambient manifold , of poins that satisfy the constraints. The map associates the data (dark grey) with latent variables (blue). Principal geodesic analysis (PGA) [9] estimates geodesic submanifolds, principal curves [13] and barycentric subspaces [25] estimate less constrained submanifolds. Probabilistic PGA [38] introduces uncertainty by estimating probabilistic geodesic subspaces.

Contributions. In this work, we introduce the wrapped Gaussian process latent variable model (WGPLVM). WGPLVM generalizes the Euclidean GPLVM model to Riemannian manifolds by employing the wrapped Gaussian processes (WGPs) introduced in [23]. Like in the Euclidean case, the WGPLVM provides an embedding from the latent space through a WGP, which comes with uncertainty estimates. This ensures that the WGP takes values that satisfy all enforced constraints and invariances, yielding meaningful submanifolds. We demonstrate the WGPLVM on several different manifolds and tasks. We show that our method provides more efficient encoding of the original data compared to the Euclidean GPLVM; that it better captures trends in the data, resulting in better visualization results; and that it provides less conservative uncertainty estimates.

## 2 Preliminaries

This section introduces the necessary preliminaries and notation. We first review the Euclidean Gaussian processes (GP) and the Gaussian process latent variable model (GPLVM) [21]. Next, we summarize the necessary concepts from Riemannian geometry. Continuing, we review wrapped Gaussian processes (WGPs) introduced in [23], which form the cornerstone of the present work.

Gaussian processes. Let denote a multivariate Gaussian distribution with mean and covariance matrix , and write the associated probability density function as for . A Gaussian process (GP) [27] is a collection of random variables, so that any finite subcollection is jointly Gaussian, where are elements of the index set. Any GP is uniquely characterized by the pair

(1) |

called the mean function and covariance function , denoted . For more, see [27].

Gaussian process latent variable model. The Gaussian process latent variable model (GPLVM) is a GP-based dimensionality reduction technique, which aims to learn a chart embedding the lower-dimensional latent space onto a submanifold of the ambient space where the given data set is assumed to live. GPLVM learns a GP indexed by , taking values in , whose mean function is . The method assumes a prior GP with hyper-parameters . Let . By abuse of notation, write for the data matrices of and . Then, for the independent variables and hyperparameters are optimized to maximize the log-likelihood

(2) |

The posterior chart is then given by conditioning on and . Note that the points in the learned submanifold are in no way guaranteed to satisfy constraints or invariances.

Riemannian geometry. A Riemannian manifold is a smooth manifold with a Riemannian metric, i.e. a smoothly varying inner product on the tangent space at each , which induces a distance function on . Each element in the tangent bundle defines a geodesic (locally shortest path) on , so that and .

The Riemannian exponential map is given by , where is the geodesic corresponding to . The exponential map at is a diffeomorphism between a neighborhood and a neighborhood , which is chosen in a maximal way to preserve injectivity. The logarithmic map is characterized by the identity . Outside of , we use to denote with a minimal norm, chosen in a measurable way. The complement of in is called the cut-locus at , where unique minimizing geodesics cannot be defined. A number of useful manifolds have empty cut-locus, such that ; these include manifolds with non-positive curvature as well as the space of positive-definite symmetric matrices used below.

Let and . The differential of with respect to (in some coordinate chart) is given by (see supplementary material for [26])

(3) |

where are Jacobi fields solving the linear ordinary differential equation

(4) |

with initial conditions , , and , . Here is a matrix given by and is an orthonormal basis for , defined by letting and each evolves through parallel transportation. Furthermore, denotes the curvature tensor and is the -by- identity matrix. For a thorough exposition in Riemannian geometry, see [6].

Let be Riemannian manifolds with metrics , exponential maps and logarithmic maps for . Then turns into a Riemannian manifold when endowed with the metric , which has the component-wise computed exponential map . The logarithmic map on the product manifold is defined likewise.

Wrapped Gaussian distributions. Let be an -dimensional Riemannian manifold. Let be a measure on and be a measurable map. We define the push-forward as for any measurable set in . A random point on follows a wrapped Gaussian distribution (WGD), if for some and a symmetric positive definite matrix

(5) |

denoted by . The WGD is thus defined by a GD in the tangent space , that is pushed-forward onto by the exponential map (see Fig. 3). We call the basepoint of , and the tangent space covariance.

Two random points , are jointly WGD, if is a WGD on the product manifold , given by

(6) |

for some matrix . Then, can be conditioned on , resulting in a push-forward of a Gaussian mixture in by the exponential map

(7) |

where is the preimage of . The means and covariance matrices of the Gaussian mixture components are given by

(8) |

and the component weights are

(9) |

Wrapped Gaussian processes. Wrapped Gaussian processes generalize GPs to Riemannian manifolds [23]. Analogous to GPs, a collection of random points on a Riemannian manifold indexed over a set is a wrapped Gaussian process (WGP), if every finite subcollection is jointly WGD on . The functions

(10) |

are called the basepoint function and the tangent space covariance function of (also called as kernel of ), respectively. To denote such a WGP, we use the notation .

Formally, a WGP can be viewed as a GP from the index space to , the family of tangent spaces over the basepoint function . Then, the resulting GP is pushed forward to using the Riemannian exponential map over to obtain the WGP, see Fig. 4.

## 3 Wrapped Gaussian process latent variable model

We now introduce the wrapped Gaussian process latent variable model (WGPLVM) for data lying on an -dimensional ambient Riemannian manifold , where the ”raw” data resides. The goal of WGPLVM is to learn a lower-dimensional submanifold , where the data is assumed to reside. The WGPLVM model is a straight-forward generalization of the GPLVM model, where instead of GPs, we maximize the likelihood of our data combined with the latent variables under the WGPs that are suitable for the manifold context. The WGPLVM pipeline is illustrated in Fig. 5.

We consider a family of WGPs from the latent space onto the ambient manifold , where are hyperparameters, that will be optimized over. The basepoint function can be utilized to delocalize the learning process in order to avoid distortions of the metric caused by the linearization of the curved . The kernel affects how much observations in different tangent spaces affect each other. In order to do this in a coherent way, the kernel should be expressed in a smooth frame (a smoothly changing basis over ). Such a frame can e.g. be contructed by parallel transporting a specific basis along . Any smooth frame can be used, but the kernel should then be adapted to the frame.

The likelihood of a single pair of a latent variable and a data point under the prior WGP is given by

(11) |

where , being the th component of . In practice, we assume the coordinates to be independent.

The approximation in Eq. (11) only takes into account the preimage of with a minimal norm (and thus maximal likelihood), denoted by . The expression gives a lower bound for , thus, maximizing the likelihood of maximizes the lower bound for . It also enforces the WGPLVM to prefer local models over ones that wrap considerable probability mass around the manifold. Note that, for manifolds with empty cut-locus (such as ones with non-positive curvature), the approximation in (11) is exact.

The objective function to be maximized is then the approximated log-likelihood

(12) | ||||

for which the gradient with respect to is given by

(13) |

The differential can be computed using Jacobi fields as explained in expression (3), if no analytical expression exists.

Assuming that the data is i.i.d, the approximate log-likelihood for the data set can be written using Eq. , by considering as a single element of the product manifold . This quantitity is then maximized by optimizing over and the hyperparameters , resulting in the optimal latent variables and hyperparameters for the kernel.

The approximate submanifold elements denoted by can then be predicted at latent variables , by conditioning on the data with the latent variables and (using Eq. (7)). The conditional distribution will then be a non-centered GP defined on pushed forward by the exponential map (see Fig. 5). Then, the mean chart is given by .

In Eq. (7), if the preimage is not uniquely defined, the conditional distribution is approximated by using a preimage with minimal norm, as previously. This approximation is justified as the weights of the components of the Gaussian mixture decrease exponentially as increases.

## 4 Experiments

We demonstrate the WGPLVM on three different manifolds, arising from three different applications: The sphere, Kendall’s shape space [18], and the space of symmetric, positive definite (SPD) matrices. Furthermore, the WGPLVM model is compared with the Euclidean GPLVM model, where an expected consequence is for the mean and sample charts not to lie on the manifold. This effect is clearly visible in Fig. 6 below. A third model, also shown in Fig. 6, is a simple and standard modification of the Euclidean GPLVM, where the GP chart and its samples are projected onto the manifold in order to make them satisfy the desired constraints.

We first introduce the four datasets used and their associated tasks, along with dataset-specific details related to training the models.

Femur dataset on . A set of directions of the left femur bone of a person walking in a circular pattern [1, 13] is measured at time points. We aim to learn the -dimensional submanifold, on which this data approximately lies. The latent variable optimization was initialized using principal curves [13], and the prior WGP and GP had kernel

(14) |

and mean and , respectively, where is the Fréchet mean of the training set and are hyperparameters optimized to maximize the likelihood of the dataset with the latent variables . The coordinates are assumed to be independent, and the same kernel with each coordinate. The trained models are visualized in Fig. 6.

Diatom shapes in Kendall’s shape space. On Kendall’s shape space we analyze a set of outline shapes of 780 diatoms [16, 7], whose species (out of 37 in total) are related to their shapes. A two dimensional latent space is learned, using the prior , with the basepoint function being constantly the Fréchet mean of the population and given by the radial basis function (RBF) kernel

(15) |

Again, the coordinates of our data vectors are assumed to be independent and the same kernel parameters are used for each coordinate. We initialize the GPLVM and WGPLVM models with PGA and PCA, respectively.

Diffusion tensors in . In the space of SPD matrices with the affine-invariant metric [3, 8], we collect a set of diffusion tensors from a diffusion MRI dataset, sampled with approximately uniform fractional anisotropy (FA) values. The FA is a well-known tensor shape descriptor; see the supplementary material for the definition. The diffusion MRI image was a single subject from the Human Connectome Project [36, 11, 29]. In diffusion MRI, low-dimensional encoding with an uncertainty estimate has potential for speeding up image acquisition and processing.

Crypto-tensors in . On with the affine-invariant metric [3, 8], we collect the price of 10 popular crypto-currencies^{1}^{1}1Bitcoin, Dash, Digibyte, Dogecoin, Litecoin, Vertcoin, Stellar, Monero, Ripple, and Verge. in the time 2.12.2014-15.5.2018. The crypto-currency intra-relationship at a given time is encoded in the covariance matrix between the prices in the past 20 days; we include every th day in the period. The result is 126 covariance matrices of size . See the paper [37] for a discussion of covariance descriptors in economy.

For both datasets, the basepoint function and kernel as well as the latent variable initialization are chosen as for Kendall’s shape space.

Application 1: Encoding. We evaluate the quality of the low-dimensional encoding provided by the latent space. We divide the respective datasets into independent training and test sets (consisting of and of the data, respectively), and learn the models on the training set. The trained latent spaces were of dimension 1 (femur), 2 (diatoms), 3 (diffusion tensors) and 2 (crypto-tensors); the dimension was chosen based on prior knowledge. Each test set element is ”encoded” by its projection onto the latent space, which is the point with minimal distance . The quality of encoding was evaluated using the mean Riemannian distance between a test point and its reconstruction from the latent space encoding using the mean chart. Each such experiment was repeated 10 times with different training and test sets; the results are reported in Table 1 below. We see that the WGPLVM encoding is significantly superior on the two tensor datasets, and comparable in the two other cases.

Dataset | Femur | Diatoms | Diffusion tensors | Crypto-tensors |
---|---|---|---|---|

GPLVM | ||||

GPLVMProj | ||||

WGPLVM |

Application 2: Visualization. In Fig. 7, we illustrate the latent spaces of WGPLVM versus GPLVM on the crypto-tensors, which come with an associated time variable, shown in color. The WGPLVM provides a smoother and more consistent transition in color, while the Euclidean GPLVM plots all the earlier (dark blue) tensors on top of each other. Similar visualizations for the diatom and diffusion-tensor datasets can be found in the supplementary material; in these examples, the two visualizations are not significantly different in quality. The visualization was done using GPy [2].

Application 3: Uncertainty quantification. An attractive feature of the GPLVM is that it does not just predict an optimal submanifold embedding, but an optimal distribution of the embedding, which also contains an estimate of uncertainty. The uncertainty estimates are evaluated using the femur dataset on . Since the chart GPs live in different spaces, the likelihoods of observed data under the two different models are not directly comparable. However, all three models yield confidence intervals, which we compare using 20 resampled training and test sets, with 250 and 78 test points, respectively. We project each element in the test set on the mean predicted submanifold for the associated latent variable; sample the learned chart 50 times; and compute the fraction of the samples that lie closer to the prediction than the test point. We visualize this in a cumulative histogram in Fig. 8. In the plot, each -value tells which fraction of the samples from the resulting WGP/GP lie closer to the prediction than the fraction of test points given by the -value. For a perfect model fit, we would see the curve (dashed line). The experiment shows that the models overestimate uncertainty, but that WGPLVM overestimates it less, and thus gives a better model fit. In particular, WGPLVM provides a less conservative uncertainty estimate.

A second, visual, illustration of uncertainty quantification is shown in Fig. 9. Here, four randomly selected test diatoms are reconstructed from their latent space projections. We see the original shape and the reconstructed shape with a point cloud illustrating model uncertainty by plotting the landmark points corresponding to reconstructions via samples from the WGP/GP chart in the three models. Again, we see a much greater level of uncertainty in GPLVM and GPLVMProj.

## 5 Discussion and Conclusion

We introduced the WGPLVM model for non-parametric and probabilistic submanifold learning, which allows us to learn stochastic submanifolds that adhere to constraints or invariances encoded in a Riemannian manifold. We evaluated the WGPLVM on several manifolds and tasks against the GPLVM and a modified GPLVM projected onto the manifold. When validating the quality of the latent space encoding, as well as quality of low-dimensional visualization in the latent space, the WGPLVM encodes the data space better or comparably to the GPLVM and projected GPLVM models. In particular, the WGPLVM model provides superior quality uncertainty estimates.

One might suspect that the improved performance stems from a ”for free” dimensionality reduction through constraints. In this context, it is interesting to note that the most significant improvement in both reconstruction error and visualization was obtained on , where the Riemannian manifold is a full-dimensional, convex subset of the Euclidean ambient space. This may still be due to the constraints, as we see in Table 1 that even here, the mean prediction falls off the manifold. Another possibility, though, is that the difference is caused by the choice of metric.

The projected GPLVM model clearly improves over the vanilla GPLVM model, but it is also notably outperformed by WGPLVM. Due to the metric and curvature of the manifold, interpolation between two points in the ambient space does not necessarily project even approximately onto the manifold interpolation between the projected points. This distortion propagates to the statistics relying on interpolation, and explains both the reduced reconstruction capability and the increased variance.

Although the WGPLVM model provides flexibility through the prior basepoint function, we fixed this to be the Fréchet mean of the training population in our experiments. This choice is well justified if the data set is local enough, and makes the comparison to GPLVM more fair. The flexibility to delocalize the learning process through the basepoint function is, however, important for inference on manifolds when the locality assumption fails. The non-trivial optimization of the basepoint function thus provides an important venue for future research.

In summary, we provide the WGPLVM, a probabilistic submanifold learning algorithm that respects known constraints and invariances by taking values in the associated Riemannian manifold. We compare the model to its Euclidean counterparts it on a number of manifolds, datasets and tasks, and show that it has superior representation capabilities, leading to better encoding, more faithful visualizations and less conservative uncertainty estimates.

## 6 Acknowledgements

Anton Mallasto and Aasa Feragen were supported by Centre for Stochastic Geometry and Advanced Bioimaging, funded by a grant from the Villum Foundation, and Søren Hauberg was supported by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement n 757360), as well as a research grant (15334) from VILLUM FONDEN. Data were provided [in part] by the Human Connectome Project, WU-Minn Consortium (Principal Investigators: David Van Essen and Kamil Ugurbil; 1U54MH091657) funded by the 16 NIH Institutes and Centers that support the NIH Blueprint for Neuroscience Research; and by the McDonnell Center for Systems Neuroscience at Washington University. The data [in part] used in this project was obtained from mocap.cs.cmu.edu<http://mocap.cs.cmu.edu>. The database was created with funding from NSF EIA-0196217.

## References

- [1] CMU Graphics Lab Motion Capture Database . http://mocap.cs.cmu.edu/. The data used in this project was obtained from mocap.cs.cmu.edu. The database was created with funding from NSF EIA-0196217.
- [2] GPy Gaussian Process Framework for Python . https://sheffieldml.github.io/GPy/.
- [3] P. G. Batchelor, M. Moakher, D. Atkinson, F. Calamante, and A. Connelly. A rigorous framework for diffusion tensor calculus. Magnetic Resonance in Medicine, 53(1):221–225, 2005.
- [4] M. Belkin and P. Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15(6):1373–1396, 2003.
- [5] W. Boomsma and J. Frellsen. Spherical convolutions and their application in molecular modelling. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30, pages 3433–3443. Curran Associates, Inc., 2017.
- [6] M. P. Do Carmo. Riemannian geometry. Birkhauser, 1992.
- [7] J. du Buf and M. Bayer. Automatic Diatom Identification. 2002.
- [8] P. T. Fletcher and S. Joshi. Principal Geodesic Analysis on Symmetric Spaces: Statistics of Diffusion Tensors, pages 87–98. 2004.
- [9] P. T. Fletcher, C. Lu, S. M. Pizer, and S. Joshi. Principal geodesic analysis for the study of nonlinear statistics of shape. IEEE transactions on medical imaging, 23(8):995–1005, 2004.
- [10] O. Freifeld and M. Black. Lie bodies: A manifold representation of 3D human shape. In A. Fitzgibbon et al. (Eds.), editor, European Conference on Computer Vision (ECCV), Part I, LNCS 7572, pages 1–14. Springer-Verlag, 2012.
- [11] M. F. Glasser, S. N. Sotiropoulos, J. A. Wilson, T. S. Coalson, B. Fischl, J. L. Andersson, J. Xu, S. Jbabdi, M. Webster, J. R. Polimeni, et al. The minimal preprocessing pipelines for the Human Connectome project. Neuroimage, 80:105–124, 2013.
- [12] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In Advances in Neural Information Processing Systems (NIPS), 2014.
- [13] S. Hauberg. Principal curves on Riemannian manifolds. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI).
- [14] S. Hauberg. Principal curves on Riemannian manifolds. IEEE transactions on pattern analysis and machine intelligence, 38(9):1915–1921, 2016.
- [15] S. Hauberg, S. Sommer, and K. S. Pedersen. Natural metrics and least-committed priors for articulated tracking. Image and Vision Computing, 30(6-7):453–461, 2012.
- [16] A. Jalba, M. Wilkinson, and J. Roerdink. Shape representation and recognition through morphological curvature scale spaces. IEEE Transactions on Image Processing, 15(2):331–341, feb 2006.
- [17] S. H. Joshi, E. Klassen, A. Srivastava, and I. Jermyn. A novel representation for Riemannian analysis of elastic curves in r. In 2007 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2007), 18-23 June 2007, Minneapolis, Minnesota, USA, 2007.
- [18] D. G. Kendall. Shape manifolds, Procrustean metrics, and complex projective spaces. Bulletin of the London Mathematical Society, 16(2):81–121, 1984.
- [19] D. P. Kingma and M. Welling. Auto-Encoding Variational Bayes. In Proceedings of the 2nd International Conference on Learning Representations (ICLR), 2014.
- [20] N. Lawrence. Probabilistic non-linear principal component analysis with Gaussian process latent variable models. J. Mach. Learn. Res., 6:1783–1816, 2005.
- [21] N. D. Lawrence. Gaussian process latent variable models for visualisation of high dimensional data. In Advances in neural information processing systems, pages 329–336, 2004.
- [22] N. D. Lawrence and A. J. Moore. Hierarchical Gaussian process latent variable models. In Proceedings of the 24th international conference on Machine learning, pages 481–488. ACM, 2007.
- [23] A. Mallasto and A. Feragen. Wrapped Gaussian process regression on Riemannian manifolds. In CVPR - IEEE Conference on Computer Vision and Pattern Recognition, to appear, 2018.
- [24] K. V. Mardia and P. E. Jupp. Directional statistics, volume 494. John Wiley & Sons, 2009.
- [25] X. Pennec. Barycentric subspaces and affine spans in manifolds. In Geometric Science of Information - Second International Conference, GSI 2015, Palaiseau, France, October 28-30, 2015, Proceedings, pages 12–21, 2015.
- [26] X. Pennec. Barycentric subspace analysis on manifolds. arXiv preprint arXiv:1607.02833, 2016.
- [27] C. E. Rasmussen. Gaussian processes in machine learning. In Advanced lectures on machine learning, pages 63–71. Springer, 2004.
- [28] S. T. Roweis and L. K. Saul. Nonlinear dimensionality reduction by locally linear embedding. SCIENCE, 290:2323–2326, 2000.
- [29] S. Sotiropoulos, S. Moeller, S. Jbabdi, J. Xu, J. Andersson, E. Auerbach, E. Yacoub, D. Feinberg, K. Setsompop, L. Wald, et al. Effects of image reconstruction on fiber orientation mapping from multichannel diffusion MRI: reducing the noise floor using SENSE. Magnetic resonance in medicine, 70(6):1682–1689, 2013.
- [30] J. B. Tenenbaum, V. d. Silva, and J. C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500):2319–2323, 2000.
- [31] M. Titsias and N. D. Lawrence. Bayesian Gaussian process latent variable model. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pages 844–851, 2010.
- [32] O. Tuzel, F. Porikli, and P. Meer. Region covariance: A fast descriptor for detection and classification. European Conference on Computer Vision (ECCV), pages 589–600, 2006.
- [33] R. Urtasun and T. Darrell. Discriminative Gaussian process latent variable model for classification. In Proceedings of the 24th international conference on Machine learning, pages 927–934. ACM, 2007.
- [34] R. Urtasun, D. J. Fleet, and P. Fua. 3d people tracking with Gaussian process dynamical models. In Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, volume 1, pages 238–245. IEEE, 2006.
- [35] Z. V., S. M., and S. G. Visualizing Gradients of Stress Tensor Fields. 2017.
- [36] D. C. Van Essen, S. M. Smith, D. M. Barch, T. E. Behrens, E. Yacoub, K. Ugurbil, W.-M. H. Consortium, et al. The wu-minn Human Connectome project: an overview. Neuroimage, 80:62–79, 2013.
- [37] A. G. Wilson and Z. Ghahramani. Generalised Wishart processes. In UAI 2011, Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence, Barcelona, Spain, July 14-17, 2011, pages 736–744, 2011.
- [38] M. Zhang and P. T. Fletcher. Probabilistic principal geodesic analysis. In Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013, Lake Tahoe, Nevada, United States., pages 1178–1186, 2013.

## Appendix A Details on manifolds used

The -sphere is a Riemannian manifold with exponential and logarithmic maps given by

(16) |

where is the -norm induced by the standard Euclidean innerproduct .

Kendall’s shape space forms a quontient manifold of the sphere, so the operations defined for apply, when working with the right quotient representatives. Kendall’s shape space has the additional constraint of representing shapes with respect to an optimal translation between a pair of shapes. Let be the data matrices of two shapes, where is the amount of landmarks, and each column represents the -coordinates after quontienting away scale and translation. Then, the Procrustean distance between the shapes is given by

(17) |

where is a rotation matrix. The shapes are aligned by choosing a reference point, and aligning the population elements by minimizing the Procrustean distance.

The space of symmetric, positive definite matrices can be given the structure of a Riemannian manifold, by endowing it with the affine-invariant metric. The tangent space at each point is the space of -by- symmetric matrices, and the affine-invariant metric is given by

(18) |

and the exponential and logarithmic maps are given by

(19) |

where stands for the matrix exponential and for the matrix logarithm.

## Appendix B Latent space visualization

Here we provide the latent space visualizations for the diffusion-tensor and diatom datasets.

The fractional anisotropy (FA) of a SPD matrix is a shape descriptor taking values between and , where an FA of corresponds to a round tensor, and an FA near corresponds to a very thin one. Given the eigenvalues for an SPD matrix, its FA is defined as

where is the mean of the eigenvalues. In the latent space shown in Fig. 10, the latent variables are colored according to the FA of their associated tensor, and we see that both models provide a smooth transition between different FA values.

The latent space visualization of the diatom dataset is found in Fig. 11; here the latent variables are colored by the species of the corresponding diatom, see Fig. 12 for a visualization of species representatives.