Medians and means in Riemannian geometry: existence, uniqueness and computation
This paper is a short summary of our recent work on the medians and means of probability measures in Riemannian manifolds. Firstly, the existence and uniqueness results of local medians are given. In order to compute medians in practical cases, we propose a subgradient algorithm and prove its convergence. After that, Fréchet medians are considered. We prove their statistical consistency and give some quantitative estimations of their robustness with the aid of upper curvature bounds. We also show that, in compact Riemannian manifolds, the Fréchet medians of generic data points are always unique. Stochastic and deterministic algorithms are proposed for computing Riemannian -means. The rate of convergence and error estimates of these algorithms are also obtained. Finally, we apply the medians and the Riemannian geometry of Toeplitz covariance matrices to radar target detection.
It has been widely accepted that the history of median begins from the following question raised by P. Fermat in 1629: given a triangle in the plan, find a point such that the sum of its distances to the three vertices of the triangle is minimum. It is well known that the answer to this question is: if each angle of the triangle is smaller than , then the minimum point is such that the three segments joining it and the vertices of the triangle form three angles equal to ; in the opposite case, the minimum point is the vertex whose angle is no less than . This point is called the median or the Fermat point of the triangle.
The notion of median also appears in statistics since a long time ago. In 1774, when P. S. Laplace tried to find an appropriate notion of the middle point for a group of observation values, he introduced “the middle of probability”, the point that minimizes the sum of its absolute differences to data points, this is exactly the one dimensional median used by us nowadays.
A sufficiently general notion of median in metric spaces was proposed in 1948 by M. Fréchet in his famous article , where he defined a -mean of a random variable to be a point which minimizes the expectation of its distance at the power to . This flexible definition allows us to define various typical values, among which there are two important cases: and , corresponding to the notions of median and mean, respectively.
Apparently, the median and mean are two notions of centrality for data points. As a result, one may wonder that which one is more advantageous? Statistically speaking, the answer to this question depends on the distribution involved. For example, the mean has obvious advantage over the median when normal distributions are used. On the contrary, as far as Cauchy distributions are concerned, the empirical mean has the same accuracy as one single observation, so that it would be better to use the median instead of the mean in this situation. Perhaps the most significant advantage of the median over the mean is that the former is robust but the latter is not, that is to say, the median is much less sensitive to outliers than the mean. Roughly speaking, in order to move the median of a group of data points to arbitrarily far, at least a half of data points should be moved. Oppositely, in order to move the mean of a group of data points to arbitrarily far, it suffices to move one data point. So that medians are in some sense more prudent than means, as argued by M. Fréchet. The robustness property makes the median an important estimator in situations when there are lots of noise and disturbing factors.
The first formal definition of means for probability measures on Riemannian manifolds was made by H. Karcher in . To introduce Karcher’s result concerning means, consider a Riemannian manifold with Riemannian distance and
is a geodesic ball in centered at with a finite radius . Let be an upper bound of sectional curvatures in and be the injectivity radius of . Under the following condition:
where if , then is interpreted as , Karcher showed that, with the aid of estimations of Jacobi fields, the local energy functional
is strictly convex, thus it has a unique minimizer , which he called the Riemannian center of mass of the probability measure . Moreover, is also the unique solution of the following equation:
From then on, local means of probability measures on Riemannian manifolds are also called Karcher means, meanwhile, global means are often called Fréchet means. A rather general result concerning the uniqueness of local means was proved by W. S. Kendall in . As a particular case of Kendall’s result, the condition
is sufficient to ensure the uniqueness of the Kacher means of .
Some generalizations of Karcher mean are given by many authors. For instance, M. Emery and G. Mokobodzki defined in  the exponential barycenters and convex barycenters for measures on affine manifolds. They also showed that a point is a convex barycenter of a probability if and only if there exists a continuous martingale starting from with terminal law . The uniqueness of exponential barycenters are generalized by M. Arnaudon and X. M. Li in  to probability measures on convex affine manifolds with semilocal convex geometry. Moreover, the behavior of exponential barycenters when measures are pushed by stochastic flows is also considered in . In order to study harmonic maps between Riemannian manifolds with probabilistic methods, J. Picard also gave a generalized notion of barycenters in . As we noted before, Karcher means are only local minimizers of the energy functional in (2), but it is easily seen that can be defined not only on the closed ball but also on the whole manifold as long as the second moment of is finite. This leads to the global minimizers of the second moment function of , which is just the original definition of means made by Fréchet. Global minimizers are more useful in statistics than local ones, so that it is necessary to know whether or under which conditions the Karcher mean of is in fact the Fréchet mean. For the case when is a discrete measure supported by finitely many points in the closed upper hemisphere, S. R. Buss and J. P. Fillmore showed in  that if the support of is not totally contained in the equator then has a unique Karcher mean which lies in the open hemisphere and equals to the Fréchet mean. Inspired by the methods of Buss and Fillmore, B. Afsari showed in  that if the upper curvature bound and the injectivity radius in (4) is replaced by the ones of the larger ball , then all the Fréchet -means of lie inside . Particularly, the Karcher mean coincides with the Fréchet mean. The existence and uniqueness of -means in Finsler geometry are recently proved by M. Arnaudon and F. Nielsen in . They also showed that Finslerian -means are limiting points of continuous gradient flows and developed algorithms for computing -means in Finsler geometry.
Medians of discrete sample points on the sphere are studied by economists and operational research experts in the 1970s and 1980s, but they used the name “location problems on a sphere”. For data points lying in a spherical disc of radius smaller than , Drezner and Wesolowsky showed in  that the cost function is unimodal in that disc and the Fréchet median is unique if the data points are not contained in a single great circle. It is also shown by Z. Drezner in  that if all the sample points are contained in a great circle, then one of the sample points will be a Fréchet median. Perhaps the first work about Fréchet medians on Riemannian manifolds is the paper  by R. Noda and his coauthors. They proved the uniqueness, characterizations and position estimations of Fréchet medians for discrete sample points lying in a Cartan-Hadamard manifold. In order to do robust statistics for data living in a Riemannian manifold P. T. Fletcher and his coauthors defined in  the local medians for discrete sample points and showed their existence and uniqueness.
In this paper, we present our results on medians and means of probability measures in Riemannian manifolds. Above all, the motivation of our work: radar target detection is introduced in section 2. After that, in section 3 we define local medians for probability measures in Riemannian manifolds and consider the problems of uniqueness and approximation. Under the assumption that the support of the probability measure is contained in a convex ball, we give some results on the characterization, the position estimation and the uniqueness of medians. Then we propose a subgradient algorithm to estimate medians as well as giving its convergence result without condition of the sign of curvatures. Our algorithm improves the one proposed in  which is shown to be convergent only if the manifold is nonnegatively curved. Finally, the problem of error estimation and rate of convergence are also considered.
The aim of section 4 is to give some basic properties of Fréchet medians of probability measures in Riemannian maniolds. Firstly, we give the consistency result of Fréchet medians in proper metric spaces. Particularly, if a probability measure has only one Fréchet median, then any sequence of empirical Fréchet medians will converge almost surely to it. After that, we study the robustness of Fréchet medians in Riemannian manifolds. It is well known that in Euclidean spaces, if a group of data points has more than a half concentrated in a bounded region, then its Fréchet median cannot be drown arbitrarily far when the other points move. A generalization and refinement of this result for data points in Riemannian manifolds is given in Theorem 4.3. This theorem also generalizes a result in  which states that if the probability measure is supported in a strongly convex ball, then all its Fréchet medians lie in that ball. At the end of this section, the uniqueness question of Fréchet sample medians is considered in the context of compact Riemannian manifolds. It is shown that, apart from several events of probability zero, the Fréchet sample medians are unique if the sample vector has a density with respect to the canonical Lebesgue measure of the product manifold. In other words, the Fréchet medians of generic data points are always unique.
Section 5 is devoted to presenting algorithms for computing Fréchet -means in order to meet practical needs. Theorem 5.1 gives stochastic algorithms which converge almost surely to -means in manifolds, which are easier to implement than gradient descent algorithm since computing the gradient of the function to minimize is not needed. The idea is at each step to go in the direction of a point of the support of . The point is chosen at random according to and the size of the step is a well chosen function of the distance to the point, and the number of the step. The speed of convergence is given by Theorem 5.2, which says that the renormalized inhomogeneous Markov chain of Theorem 5.1 converges in law to an inhomogeneous diffusion process. We give the explicit expression of this process, as well as its local characteristic. After that, the performance of the stochastic algorithms are illustrated by simulations. Finally, we show that the -mean of can also be computed by the method of gradient descent. The questions concerning the choice of stepsizes and error estimates of this deterministic method are also considered. We note that, for the case when , M. Arnaudon and F. Nielsen developed in  an efficient algorithm to compute the circum-center of probability measures in Riemannian manifolds.
In section 6, we consider the manifold of Toeplitz covariance matrices parameterized by the reflection coefficients which are derived from Levinson’s recursion of autoregressive models. The explicit expression of the reparametrization and its inverse are obtained. With the Riemannian metric given by the Hessian of a Kähler potential, we show that the manifold is in fact a Cartan-Hadamard manifold with lower sectional curvature bound . After that, we compute the geodesics and use the subgradient algorithm introduced in section 3 to find the median of Toeplitz covariance matrices. Finally, we give some simulated examples to illustrate the application of the median method to radar target detection.
2 Motivation: radar target detection
Suggested by J. C. Maxwell’s seminal work on electromagnetism, H. Hertz carried out an experiment in 1886 which validated that radio waves could be reflected by metallic objects. This provided C. Hüelsmeyer the theoretical foundation of his famous patent on “telemobiloscope” in 1904. He showed publicly in Germany and Netherlands that his device was able to detect remote metallic objects such as ships, even in dense fog or darkness, so that collisions could be avoided. Hüelsmeyer’s “telemobiloscope” is recognized as the primogenitor of modern radar even though it could only detect the direction of an object, neither its distance nor its speed. This is because the basic idea of radar was already born: send radio waves in a predetermined direction and then receive the possible echoes reflected by a target. In order to know the distance and the radial speed of the target, it suffices to send successively two radio waves. In fact, it is easily seen that the distance of the target can be computed by the formula
where is the speed of light and is the time interval between every emission and reception in the direction under test. Moreover, the radial speed of the target can be deduced by the Doppler effect which states that the frequency of a wave is changed for an observer moving relatively to the source of the wave. More precisely,
where and are the wavelength and the skewing of the two emitted radio waves, respectively. As a result, the direction, the distance and the speed of the target can all be determined.
For simplicity, from now on we only consider a fixed direction in which a radar sends radio waves. Since the range of emitted waves are finite, we can divide this direction into some intervals each of which represents a radar cell under test. The radar sends each time a rafale of radio waves in this direction and then receive the returning echoes. For each echo we measure its amplitude and phase , so that it can be represented by a complex number . As a result, the observation value of each radar cell is a complex vector , where is the number of waves emitted in each rafale.
The aim of target detection is to know whether there is a target at the location of some radar cell in this direction. Intuitively speaking, a target is an object whose behavior on reflectivity or on speed is very different from its environment. The classical methods for target detection is to compute the difference between the discrete Fourier transforms of the radar observation values of the cell under test and that of its ambient cells. The bigger this difference is, the more likely a target appears at the location of the cell under test. However, the performance of these classical methods based on Doppler filtering using discrete Fourier transforms together with the Constant False Alarm Rate (CFAR) is not very satisfactory due to their low resolutions issues in perturbed radar environment or with smaller bunch of pulses.
In order to overcome these drawbacks, a lot of mathematical models for spectra estimation were introduced, among which the method based on autoregressive models proposed by F. Barbaresco in  is proved to be very preferable. We shall introduce this method in Chapter 6 of this dissertation. The main difference between this new method and the classical ones is that, instead of using directly the radar observation value of each cell, we regard it as a realization of a centered stationary Gaussian process and identify it to its covariance matrix . Thus the new observation value for each radar cell is a covariance matrix which is also Toeplitz due to the stationarity of the process. As a result, the principle for target detection becomes to find the cells where the covariance matrix differs greatly from the average matrix of its neighborhood. Once such cells are determined we can conclude that there are targets in these locations. In order to carry out this new method, there are two important things which should be considered seriously. One is to define a good distance between two Toeplitz covariance matrices. The other is to give a reasonable definition of the average of covariance matrices, which should be robust to outliers so as to be adapted to perturbed radar environment, and develop an efficient method to compute it in practical cases. These works will be done in the following by studying the Riemannian geometry of Toeplitz covariance matrices and the medians of probability measures in Riemannian manifolds.
3 Riemannian median and its estimation
In this section, we define local medians of a probability measure on a Riemannian manifold, give their characterization and a natural condition to ensure their uniqueness. In order to compute medians in practical cases, we also propose a subgradient algorithm and show its convergence. The mathematical details of this section can be found in .
In more detail, let be a complete Riemannian manifold with Riemannian metric and Riemannian distance . We fix an open geodesic ball
in centered at with a finite radius . Let and denote respectively a lower and an upper bound of sectional curvatures in . The injectivity radius of is denoted by . Furthermore, we assume that the radius of the ball verifies
where if , then is interpreted as .
We consider a probability measure on whose support is contained in the open ball and define a function
This function is 1-Lipschitz, hence continuous on the compact set . The convexity of the distance function on yields that is also convex. Hence we don’t need to distinguish its local minima from its global ones. Now we can give the following definition:
A minimum point of is called a median of . The set of all the medians of will be denoted by . The minimal value of will be denoted by .
It is easily seen that is compact and convex. Moreover, by computing the right derivative of we can prove the following characterization of .
The set is characterized by
where for ,
is a tangent vector at satisfying .
Observing that every geodesic triangle in has at most one obtuse angle, we can prove the following result which gives a position estimation for the medians of .
is contained in the smallest closed convex subset of containing the support of .
In Euclidean case, it is well known that if the sample points are
not collinear, then their medians are unique. Hence we get a natural
condition of to ensure the uniqueness for medians in Riemannian case:
The support of is not totally contained in
any geodesic. This means that for every geodesic :
, we have
This condition implies that is strictly convex along every geodesic in , so that it has one and only one minimizer, as stated by the theorem below.
If condition holds, then has a unique median.
With further analysis, we can show a stronger quantitative version of Theorem 3.2, which is crucial in the error estimations of the subgradient algorithm as well as in the convergence proof of the stochastic algorithm for computing medians in section 5.
If condition holds, then there exits a constant such that for every one has
where is the unique median of .
The main results of approximating medians of by subgradient method is summarized in the following theorem. The idea stems from the basic observation that is a subgradient of at for every .
Let be a sequence of real numbers such that
Define a sequence by and for ,
Then there exists some constant such that if we choose for every , then the sequence is contained in and verifies
Moreover, if the sequence also verifies
then there exists some such that .
We can choose the constant in Theorem 3.4 to be
The proposition below gives the error estimation of the algorithm in Theorem 3.4.
Let condition hold and the stepsizes in Theorem 3.4 satisfy
Then there exists , such that for every ,
where is the unique median of and the sequence is defined by
which converges to when . More explicitly, for every ,
4 Some properties of Fréchet medians in Riemannian manifolds
This section is devoted to some basic results about Fréchet medians, or equivalently, global medians. We show the consistency of Fréchet medians in proper metric spaces, give a quantitative estimation for the robustness of Fréchet medians in Riemannian manifolds and show the almost sure uniqueness of Fréchet sample medians in compact Riemannian manifolds. We refer to  for more details of this section.
4.1 Consistency of Fréchet medians in metric spaces
In this subsection, we work in a proper metric space (recall that a metric space is proper if and only if every bounded and closed subset is compact). Let denote the set of all the probability measures on verifying
For every we can define a function
This function is 1-Lipschitz hence continuous on . Since is proper, attains its minimum (see [28, p. 42]), so we can give the following definition:
Let be a probability measure in , then a global minimum point of is called a Fréchet median of . The set of all the Fréchet medians of is denoted by . Let denote the global minimum of .
By the Kantorovich-Rubinstein duality of -Wasserstein distance (see [30, p. 107]), we can show that Fréchet medians are characterized by -Lipschitz functions. A corresponding result that Riemannian barycenters are characterized by convex functions can be found in [25, Lemma 7.2].
Let and be also separable, then
where denotes the set of all the 1-Lipschitz functions on .
The following theorem states that the uniform convergence of first moment functions yields the convergence of Fréchet medians.
Let be a sequence in and be another probability measure in . If converges uniformly on to , then for every , there exists , such that for every we have
Let be a sequence of i.i.d random variables of law and be a sequence of random variables such that with . If has a unique Fréchet median , then a.s.
4.2 Robustness of Fréchet medians in Riemannian manifolds
The framework of this subsection is a complete Riemannian manifold whose dimension is no less than 2. We fix a closed geodesic ball
in centered at with a finite radius and a probability measure such that
The aim of this subsection is to estimate the positions of the Fréchet medians of , which gives a quantitative estimation for robustness. To this end, the following type of functions are of fundamental importance for our methods. Let , define
Obviously, is continuous and attains its minimum.
By a simple estimation on the minimum of we get the following basic result.
The set of all the Fréchet medians of verifies
It is easily seen that the conclusion of Theorem 4.2 also holds if is only a proper metric space.
As a direct corollary to Theorem 4.2, if is a probability measure in such that for some point one has , then is the unique Fréchet median of .
In view of Theorem 4.2, let be an upper bound of sectional curvatures in and be the injectivity radius of . By computing the minima of some typical functions in model spaces , and , and then comparing with the ones in , we get the following main result of this subsection.
where if , then is interpreted as
i) If and , then
Moreover, any of the two conditions below implies :
ii) If , then
iii) If , then
Finally any of the above three closed balls is contained in the open ball .
4.3 Uniqueness of Fréchet sample medians in compact Riemannian manifolds
Before introducing the results of this subsection we give some notations. For each point , denotes the unit sphere in . Moreover, for a tangent vector , the distance between and its cut point along the geodesic starting from with velocity is denoted by . Certainly, if there is no cut point along this geodesic, then we define . For every point , where is a fixed natural number, we write
The set of all the Fréchet medians of , is denoted by .
The following theorem states that in order to get the uniqueness of Fréchet medians, it suffices to move two data points towards a common median along some minimizing geodesics for a little distance.
Let and . Fix two normal geodesics such that , , and . Assume that
Then for every and we have
Generally speaking, the non uniqueness of Fréchet medians is due to some symmetric properties of data points. As a result, generic data points should have a unique Fréchet median. In mathematical language, this means that the set of all the particular positions of data points is of Lebesgue measure zero. After eliminate all these particular cases we obtain the following main result:
Assume that is compact. Then has a unique Fréchet median for almost every .
In probability language, Theorem 4.5 is equivalent to say
is an -valued random variable with density, then has a unique Fréchet median almost surely. Clearly, the same statement is also true if are independent and -valued random variables with density.
5 Stochastic and deterministic algorithms for computing means of probability measures
In this section, we consider a probability measure supported by a regular geodesic ball in a manifold and, for any , define a stochastic algorithm which converges almost surely to the -mean of . Assuming furthermore that the functional to minimize is regular around , we prove that a natural renormalization of the inhomogeneous Markov chain converges in law into an inhomogeneous diffusion process. We give the explicit expression of this process, as well as its local characteristic. After that, the performance of the stochastic algorithms are illustrated by simulations. Finally, we show that the -mean of can also be computed by the method of gradient descent. The questions concerning the choice of stepsizes and error estimates of this deterministic method are also considered. For more mathematical details of this section, see  and .
5.1 Stochastic algorithms for computing -means
Let be a Riemannian manifold whose sectional curvatures verify , where are positive numbers. Denote by the Riemannian distance on . Let be a geodesic ball in and be a probability measure with support included in a compact convex subset of . Fix . We will always make the following assumptions on :
The support of is not reduced to one point. Either or the support of is not contained in a line, and the radius satisfies
has a unique minimizer in , the -mean of , and moreover . If , is the median of .
The existence and uniqueness of -means in Finsler geometry are recently proved by M. Arnaudon and F. Nielsen in . They also showed that Finslerian -means are limiting points of continuous gradient flows and developed algorithms for computing -means in Finsler geometry.
In the following theorem, we define a stochastic gradient algorithm to approximate the -mean and prove its convergence. In the sequel, let
Let be a sequence of independent -valued random variables, with law . Let be a sequence of positive numbers satisfying
where is a constant.
Letting , define inductively the random walk by
where , with the convention .
The random walk converges in and almost surely to .
For the case when , M. Arnaudon and F. Nielsen developed in  an efficient algorithm to compute the circum-center of probability measures in Riemannian manifolds.
In the following example, we focus on the case and where drastic simplifications occur.
In the case when and is a compactly supported probability measure on , the stochastic gradient algorithm (5.1) simplifies into
If furthermore , clearly and , so that the linear relation
holds true and an easy induction proves that
Now, taking , we have
The stochastic gradient algorithm estimating the mean of is given by the empirical mean of a growing sample of independent random variables with distribution . In this simple case, the result of Theorem 5.1 is nothing but the strong law of large numbers. Moreover, fluctuations around the mean are given by the central limit theorem and Donsker’s theorem.
The fluctuation of the random walk defined in Theorem 5.1 is summarized in the following theorem.
Assume that in Theorem 5.1
for some . We define for the Markov chain in by
Assume that is in a neighborhood of and . Then the sequence of processes converges weakly in to a diffusion process given by
where is the standard Brownian motion in and satisfying
is an orthonormal basis diagonalizing the symmetric bilinear form and are the associated eigenvalues.
5.2 Simulations of stochastic algorithms
5.2.1 A non uniform measure on the unit square in the plane
Here is the Euclidean plane and is the renormalized restriction to the square of an exponential law on . The red path represents one trajectory of the inhomogeneous Markov chain corresponding to , with linear interpolation between the different steps. The red point is . Black circles represent the values of .
5.2.2 Medians in the Poincaré disc
In the two figures below, is the Poincaré disc, the blue points are data points and the red path represents one trajectory of the inhomogeneous Markov chain corresponding to , with linear interpolation between the different steps. The green points are medians computed by the subgradient method developed in section 3.
5.3 Computing -means by gradient descent
Gradient descent algorithms for computing are given in the following theorem. In view of Theorem 3.4, it suffices to consider the case when .
Assume that . Let and for define
where is a sequence of real numbers such that
Then the sequence is contained in and converges to .
The following proposition gives the error estimations of the gradient descent algorithms in Theorem 5.3.
Assume that for every in Theorem 5.3, then the following
error estimations hold:
i) if , then for ,
ii) if , then for ,
where the constant
Moreover, the sequences and both tend to zero.
6 Riemannian geometry of Toeplitz covariance matrices
applications to radar target detection
In this section we study the Riemannian geometry of the manifold of Toeplitz covariance matrices of order . The explicit expression of the reflection coefficients reparametrization and its inverse are obtained. With the Riemannian metric given by the Hessian of a Kähler potential, we show that the manifold is in fact a Cartan-Hadamard manifold with lower sectional curvature bound . The geodesics in this manifold are also computed. Finally, we apply the subgradient algorithm introduced in section 3 and the Riemannian geometry of Toeplitz covariance matrices to radar target detection. We refer to  for more mathematical details of this section.
6.1 Reflection coefficients parametrization
Let be the set of Toeplitz Hermitian positive definite matrices of order . It is an open submanifold of . Each element can be written as
For every , the upper left -by- corner of is denoted by . It is associated to a -th order autoregressive model whose Yule-Walker equation is
where are the optimal prediction coefficients and is the mean squared error.
The last optimal prediction coefficient is called the
-th reflection coefficient and is denoted by . It is
easily seen that are uniquely determined by
the matrix . Moreover, the classical Levinson’s recursion gives
that . Hence, by letting , we obtain a map
between two submanifolds of
where is the unit disc of the complex plane.
Using the Cramer’s rule and the method of Schur complement we get the following proposition.
is a diffeomorphism, whose explicit expression is
is the submatrix of obtained by deleting the first row and the last column. On the other hand, if , then its inverse image under can be calculated by the following algorithm:
6.2 Riemannian geometry of Toeplitz covariance matrices
From now on, we regard as a Riemannian manifold whose metric, which is introduced in  by the Hessian of the Kähler potential
is given by
The metric (8) is a Bergman type metric and it has be shown in  that this metric is not equal to the Fisher information metric of . But J. Burbea and C. R. Rao have proved in [18, Theorem 2] that the Bergman metric and the Fisher information metric do coincide for some probability density functions of particular forms. A similar potential function was used by S. Amari in  to derive the Riemannian metric of multi-variate Gaussian distributions by means of divergence functions. We refer to  for more account on the geometry of Hessian structures.
With the metric given by (8) the space is just the product of the Riemannian manifolds and , where
The latter is just times the classical Poincaré metric of . Hence is a Cartan-Hadamard manifold whose sectional curvatures verify . The Riemannian distance between two different points and in is given by
where , ,
The geodesic from to in