Geodesic Normal distribution on the circle

# Geodesic Normal distribution on the circle

Abstract This paper is concerned with the study of a circular random distribution called geodesic Normal distribution recently proposed for general manifolds. This distribution, parameterized by two real numbers associated to some specific location and dispersion concepts, looks like a standard Gaussian on the real line except that the support of this variable is and that the Euclidean distance is replaced by the geodesic distance on the circle. Some properties are studied and comparisons with the von Mises distribution in terms of intrinsic and extrinsic means and variances are provided. Finally, the problem of estimating the parameters through the maximum likelihood method is investigated and illustrated with some simulations.

Circular statistics deal with random variables taking values on hyperspheres and can be included in the broader field of directional statistics. Applications of circular statistics are numerous and can be found, for example, in fields such as climatology (wind direction data [MJ00]), biology (pigeons homing performances [Wat83]) or earth science (earthquake locations occurence and other data types, see [MJ00] for examples) among others.

A circular distribution is a probability distribution function (pdf) which mass is concentrated on the circumference of a unit circle. The support of a random variable representing an angle measured in radians may be taken to or . We will focus here on continuous circular distributions, that is on absolutely continuous (w.r.t. the Lebesgue measure on the circumference) distributions. A pdf of a circular random variable has to fulfill the following axioms

• .

• for any integer (i.e. is periodic).

Among many models of circular data, the von Mises distribution plays a central role (essentially due to the similarities shared with the Normal distribution on the real line). A circular random variable (for short r.v.) is said to have a von Mises distribution, denoted by , if it has the density function

 f(θ;μ,κ)=12πI0(κ)eκcos(θ−μ),

where and are parameters and where is the modified Bessel function of order 0. The aim of this paper is to review some properties of another circular distribution introduced by [Pen06] which also shares similarities with the Normal distribution on the real line. For some and for some parameter , a r.v. is said to have a geodesic Normal distribution denoted in the following , if it has the density function

 f(θ;μ,γ)=k−1(γ)e−γ2dG(μ,θ)2,

where is the geodesic distance on the circle and where is the normalizing constant defined by

 k(γ):=√2πγ(Φ(π√γ)−Φ(−π√γ)=√2πγerf(π√γ2),

where is the cumulative distribution function of a standard Gaussian random variable and where is the error function.
Let us underline that Pennec [Pen06] introduced the geodesic Normal distribution for general Riemannian manifolds. We focus here on a special (and simple) manifold, the circle, in order to highlight its basic properties and compare them with the most classical circular distribution, namely the distribution. That is, we present here a new study of the distribution in the framework of circular statistics and provide results in terms of estimation and asymptotic behaviour.

One of the main conclusions of this paper may be summarized as follows. While the von Mises distribution has strong relations with the notion of extrinsic moments (that is with trigonometric moments), we will emphasize, in this paper, that the geodesic Normal distribution has strong relations with intrinsic moments that is with the Fréchet mean (defined as the angle minimizing the expectation of ) and the geodesic variance. and distributions definition are closely related to respectively extrinsic and intrinsic moments, and we present their similarities together with dissimilarities.

After introducing the distribution in Section 1, we present a brief review on intrinsic and extrinsic quantities that allow characterization of distributions on the circle in Section 2. In Section 3, we present extrinsic and intrinsic properties of and distributions. Then, in Section 4, we rapidly explain how to simulate random variables on the circle. Finally, in Section 5, we present the maximum likelihood estimators for the distributions and study their asymptotic behaviour. Numerical simulations illustrate the presented results.

## 1 Geodesic Normal distribution through the tangent space

As introduced in [Pen06] the geodesic Normal distribution is defined for random variables taking values on Riemannian manifolds and is based on the “geodesic distance” concept and on the total intrinsic variance [BP03, BP05, Pen06]. On a Riemannian manifold , one can define at each point a scalar product in the tangent plane attached to the manifold at . On , among the possible smooth curves between two points and , the curve of minimum length is called a geodesic. The length of the curve is understood as integration of the norm of its instantaneous velocity along the path, and with the norm at position on taken as: . It is well-known that, given a point and a vector , there exists only one geodesic with and with tangent vector .

Through the exponential map, each vector is associated to a point reached in unit time, i.e. . Using the notation adopted in [Pen06], the vector defined in associated to the geodesic that starts from at time and reaches at time is denoted . Thus, the exponential map (at point ) maps a vector of to a point , i.e. . Now the geodesic distance, denoted , between and is:

 dG(x,y)=√⟨−→xy,−→xy⟩x

The Log map is the inverse map that associates to a point in the neighbourhood of a vector , i.e. .

A random variable taking values in , with density function is said to have a geodesic Normal distribution with parameters and , a matrix, denoted by , if:

 f(y;μ,Γ)=k−1exp⎛⎜⎝−−→μyT.Γ.−→μy2⎞⎟⎠.

The parameter is related to some specific location concept. Namely, [Pen06] has proved that corresponds to the intrinsic or Fréchet mean of the random variable (see Section 2 for more details). The normalizing constant is given by:

 k=∫Mexp⎛⎜⎝−−→μyT.Γ.−→μy2⎞⎟⎠dM(y)

where is the Riemannian measure (induced by the Riemannian metric). The matrix is called the concentration matrix and is related to the covariance matrix of the vector given for random variables on a Riemannian manifold by

 Σ:=E[−→μY−→μYT]=k∫M−→μy.−→μyTexp⎛⎜⎝−−→μyT.Γ.−→μy2⎞⎟⎠dM(y).

Note that in the case where , the manifold is flat and the geodesic distance is nothing more than the Euclidian distance. In this case, we retrieve the classical definition of a Gaussian variable in with and corresponding respectively to the classical expectation and to the inverse of the covariance matrix .

The case of the circle We now present, as done in [Pen06], the case where is the unit circle . The exponential chart is the angle . Note that this chart is “local” and is thus defined at a point on the manifold. This must be kept in mind especially for explicit calculation. Here, as the tangent plane to is the real line, takes values on the segment between and . Note that and are omitted as they are in the cut locus of the “development point” (point on the manifold where the tangent plane is attached). is simply here.

As stated before, the distribution on the circle has density function given for by:

 f(θ;μ,γ)=k−1(γ)e−γ2dG(μ,θ)2,

where is a nonnegative real number. Note that is the arc length between and . The normalization is:

 k(γ)=∫μ+πμ−πe−γ2dG(μ,θ)2dθ=√2πγerf(π√γ2),

with the development made around .

In order to consider a distribution as a classical circular distribution (defined by axioms (i)-(iii) in the introduction), one must extend the support of this distribution from to . The way to achieve this is to make the geodesic distance periodic. Let us consider the distance for an angle defined by with . Let the density where is replaced by . This new density defines a circular disribution satisfying axioms (i)-(iii). In particular,

 ∫2π0~f(θ,μ,γ)dθ=∫2π0k−1(γ)e−γ2~dG(μ,θ)2dθ=∫μ+πμ−πk−1(γ)e−γ2dG(μ,θ)2dθ=1.

For the sake of simplicity, will be understood as the distance in the rest of the paper. And therefore, the density of a distribution is considered periodic, defined in and with values on . In the case of a distribution (and actually for most of circular distributions) no such considerations are needed; the periodic nature being included through the function.

## 2 Classical measures of location and dispersion for circular random variables

We briefly present the concepts of extrinsic and intrinsic moments for random variables on the circle. While the former are well-known in circular statistics (e.g. [MJ00]), the later, based on the geodesic distance on the circle, are less used in this domain. They have been introduced and commonly used when dealing with genereal Riemannian manifolds (e.g. [Kar77, Zie77, BP03] and the numerous references therein).

#### Extrinsic moments

In circular statistics, it is well established that trigonometric moments give access to measures of mean direction and circular variances. Considering a cicular random variable , its order trigonometric moment is defined as where and . These later quantities are extrinsic by definition. The first order trigonometric moment is thus where is called the mean resultant length () and is the mean direction. In the following, we refer to as the extrinsic mean. The extrinsic variance is indeed the circular variance defined as . In the sequel, extrinsic moments will be used in place of trigonometric moments, keeping in mind that they are the same quantities. For more details on trigonometric moments see [MJ00, JAS01].

#### Intrinsic moments

Another way to consider moments for distributions of random variables on the circle is to use the fact that is a Riemannian manifold and thus the geodesic distance can be used to define intrinsic moments. Given a random variable with values on , we define by , the intrinsic mean set (a particular case of a Fréchet mean set), where we recall that is the geodesic distance on the circle, i.e. the arc length. When the intrinsic mean set is reduced to a single element, is simply called the intrinsic mean. The intrinsic variance is then uniquely defined by where is the intrinsic mean (set) defined above. For more details and a thourough study of intrinsic statistics for random variables on Riemannian manifolds, see [Pen06] or [BP03, BP05]. Other concepts of intrinsic variance (e.g. variances obtained by residuals and by projection) exist (see [HHM10] for thourough description). In this paper, we only focus on the intrinsic mean and (total) variance which are sufficient to basically compare and distributions.

## 3 Basic properties of the gN(μ,γ) and vM(μ,κ) distributions

#### Symmetry property

First, let us say that like the distribution, the distribution has a mode for , and a symmetry around . The distribution has an anti-mode for . For a distribution, the density is not defined at these points. However, the shared behaviour is the decreasing of both densities on each interval and .

#### Extrinsic and intrinsic means and variances

Table 1 summarizes extrinsic and intrinsic means and variances for both distributions of interest. Let us make some comments. In [KS08], the authors follow the works of Le ([Le98, Le01]) and give very simple conditions on the density of a circular random variable that ensure the existence and unicity of the intrinsic mean. It is left to the reader that applying Theorem 1 of [KS08] allows us to assert that the intrinsic mean of a or is . Furthermore, the computation of for a distribution (resp. for a distribution) can be found in [Pen06] (resp. e.g. [MJ00]). The intrinsic variance for a is quite obvious and is omitted. It remains to explain how we obtain the extrinsic moments for a distribution. Both are indeed derived through the th trigonometric moment of a distribution reported in the following proposition.

###### Proposition 1.

The -th trigonometric moment () of a distribution, denoted by and defined by is given by

 φp=eipμe−p22γRe(erf(π√γ2−ip√2γ))erf(π√γ2),

where is the error function defined for any complex number by .

###### Proof.

Let ,

 E[eipθ]=k−1(γ)∫2π0eipθe−γ2dG(θ,μ)2dθ=k−1(γ)∫μ+πμ−πeipθe−γ2dG(θ,μ)2dθ.

Since for , ,

 E[eipθ] = k−1(γ)∫π−πeip(θ+μ)e−γ2θ2dθ = eipμk−1(γ)∫π−πe−((θ√γ2−ip√2γ)2−(ip√2γ)2)dθ = eipμe−p22γk−1(γ)√2πγ12(erf(π√γ2−ip√2γ)−erf(−π√γ2−ip√2γ)) = eipμe−p22γRe(erf(π√γ2−ip√2γ))erf(π√γ2),

since for any complex number , . ∎

Figure 1 shows the evolutions of the extrinsic and intrinsic variances in terms of the concentration parameter for the and for the gN. It is interesting to notice that the von Mises distribution and the geodesic Normal distributions have intrinsic variance equal to when or equals zero, corresponding to the variance of the uniform distribution on the circle. Note also that intrinsic and extrinsic variances tend to zero as or tend to infinity.

#### Entropy property

and have both a peculiar position respectively amongst distributions defined on Riemannian manifolds and circular statistics distributions: both maximize a certain definition of the entropy. As explained in [JAS01] (characterization due to Mardia [Mar72]), the circular distribution that maximizes the entropy defined with respect to the angular random variable (see [MJ00] for exact definition), subject to the constraint that the first trigonometric moment is fixed, i.e. for and fixed, is a vM(,), where . In a similar way, as demonstrated in [Pen06], the distribution defined using the geodesic distance on which maximizes the entropy, when it is defined in the tangent plane, and subject to the constraints that and are fixed, is the ,. One can thus conclude that and distributions play “similar” roles in that they maximize the entropy with respect to either extrinsic or intrinsic moments of the distribution.

#### Linear approximation of ¯¯¯¯¯¯μθ

Recall that the random variable represents the algebraic measure of the vector . The support of is . Its cdf is given for by

 F¯¯¯¯¯μθ(t)=k−1(γ)∫S11[−π,t](¯¯¯¯¯¯μy)e−γ2¯¯¯¯¯μy2dM(y)=k−1(γ)∫t−πe−γ2θ2dθ.

This reduces to . In other words, is nothing else than a truncated Gaussian random variable with support with mean 0 and scale parameter , that is

 ¯¯¯¯¯¯μθ\lx@stackreld=Z∣∣|Z|≤π, where Z∼N(0,1/√γ). (1)

For large concentration parameters , the geodesic Normal distribution can be “approximated” by a linear normal distribution in the following sense.

###### Proposition 2.

Let , then as , .

###### Proof.

Let us denote by the moment generating function of the random variable , then for

 h(t) = k−1(γ)∫π−πe√γtθe−γ2θ2dθ = et22k−1(γ)√γ∫π√γ−π√γe−12(θ−t)2dθ = et22Φ(π√γ−t)−Φ(−π√γ−t)Φ(π√γ)−Φ(−π√γ).

Therefore, as , for fixed , converges towards which is the moment generating function of a standard Gaussian random variable. ∎

Such a result also holds for the von Mises distribution: as , , see e.g. Proposition 2.2 in [JAS01].

## 4 Simulation of a geodesic Normal distribution and examples

The generation of a distribution with support on is extremely simple following (1). It consists in two steps.

1. Generate

2. Set .

Figure 2 presents some examples for different values of location parameter and concentration parameter .

## 5 Maximum Likelihood Estimation

### 5.1 Preliminary and notation

Let us consider now the identification problem of estimating the parameters of a distribution from the observations . In this section, we will denote by and the unknown parameters to estimate. We assume that and . Also, we propose to denote by and the empirical intrinsic and extrinsic means defined by

 ˆμI := argminμ∈S11nn∑i=1dG(μ,θi)2. (2) ˆμE := Arg(ˆφ1), with ˆφ1:=1n∑jcos(θj)+i1n∑jsin(θj). (3)

Obtained through the minimization of an empirical function, is not necessarily reduced to a single element. The natural intrinsic and extrinsic variances are then denoted by and and uniquely given by

 ˆσ2I=1nn∑i=1dG(ˆμI,θi)2 and ˆσ2E=1−|ˆφ1|. (4)

In the following, we will need the following Lemma and notation.

###### Lemma 3.

For a random variable , let for and , then

 V(μ,γ)=g(δ)1[0,π)(δ)+g(−δ)1(−π,0](δ),

where

 g(δ):=∫π−π(δ+α)2f(α)dα+4π∫ππ−δ(2π−(δ+α))f(α)dα

and .

###### Proof.

Let us fix , then

 ¯¯¯¯¯¯¯μα={¯¯¯¯¯¯¯¯¯μμ⋆+¯¯¯¯¯¯¯¯¯μ⋆αwhen α∈(μ⋆−π,μ⋆+π−(μ⋆−μ))¯¯¯¯¯¯¯¯¯μμ⋆+¯¯¯¯¯¯¯¯¯μ⋆α−2πwhen α∈(μ⋆+π−(μ⋆−μ),μ⋆+π)

Denoting , this expansion allows us to derive

 V(μ,γ) = E[dG(μ,θγ)2] = ∫π−δ−π(δ+α)2f(α)dα+∫ππ−δ(2π−δ−α)2f(α)dα = ∫π−π(δ+α)2f(α)dα+4π∫ππ−δ(2π−(δ+α))f(α)dα.

Now, let , then

 ¯¯¯¯¯¯¯μα={¯¯¯¯¯¯¯¯¯μμ⋆+¯¯¯¯¯¯¯¯¯μ⋆αwhen α∈(μ⋆−π+(μ−μ⋆),μ⋆+π)¯¯¯¯¯¯¯¯¯μμ⋆+¯¯¯¯¯¯¯¯¯μ⋆α+2πwhen α∈(μ⋆−π,μ⋆−π+(μ−μ⋆)),

 V(μ,γ) = ∫π−π−δ(δ+α)2f(α)dα+∫−π−δ−π(2π+δ+α)2f(α)dα = ∫π−π(δ+α)2f(α)dα+4π∫−π−δ−π(2π+(δ+α))f(α)dα = ∫π−π(δ+α)2f(α)dα+4π∫ππ+δ(2π+(δ−α))f(α)dα = g(−δ).

Obviously corresponds to the intrinsic variance of . Then, from Table 1, this function does not depend on and will therefore be simplified to . As used in Theorem 4, let us recall here the expression of the later quantity.

 V(γ)=V(μ⋆,γ):=1γ(1−2πk−1(γ)e−γπ22) with k(γ)=√2πγerf(π√γ2) (5)

### 5.2 Maximum Likelihood Estimate

The log-likelihood expressed for i.i.d. distributions is given by :

 ℓ(μ,γ)=−nlog(k(γ))−γ2n∑i=1dG(μ,θi)2.

Let and assume that the true parameter belongs to the interior of . The MLE estimates and asymptotic results are given by the following result.

###### Theorem 4.

The MLE estimate of corresponds to the intrinsic sample mean set, that is . The MLE estimate of is uniquely given by , where is the function defined by (5) and where is the intrinic sample variance.
As , is a strongly consistent estimate of .
As , the MLE estimates satisfy the following central limit theorem

 √n(ˆμMLE−μ⋆,ˆγMLE−γ⋆)T\lx@stackreld→N(0,J−1(γ⋆)),

where is the Fisher information matrix given by with and

We emphasize that we do not derive analytic formulas for but we prove its uniqueness by proving that the function is a strictly decreasing function (which is illustrated by Figure 1). From a practical point of view, the computation of (as well as the one of ) has been derived using a simple optimization algorithm.

As Mardia and Jupp did for the von Mises distribution ([MJ00], Section 5.3 p. 86), is regarded as unwrapped onto the line for the asymptotic normality result.

As it is for a Gaussian distribution on the real line or for a distribution on the circle, the Fisher information matrix of a distribution does not depend on the true location parameter and the two estimates of and are asymptotically independent. Let us also note that the geodesic moment estimates, that is the estimates of and based on the first two geodesic moments equations and exactly fit to the maximum likelihood estimates. Here is again another analogy with the distribution, since the MLE of a distribution correspond to the estimates of and based on the extrinsic moments (see [MJ00] for further details).

###### Proof.

Since the minimum of defines the intrinsic sample mean set, the MLE of correponds to the intrinsic sample mean set of . Now, the partial derivative of with respect to is given by

 ∂ℓ∂γ(μ,γ)=−k′(γ)k(γ)−12n∑i=1dG(μ,θi)2.

Let us note that

 k′(γ)=−12∫π−πθ2e−γ2θ2dθ=−k(γ)2E[dG(μ⋆,θγ)2]=−k(γ)2V(γ). (6)

Replacing by its MLE estimate and taking the derivative of w.r.t. equal to zero implies that the MLE estimate of is defined by the following equation:

 V(ˆγMLE)=1nn∑i=1dG(ˆμMLE,θi)2=:ˆσ2I.

The proof is ended by showing that is a striclty decreasing function on . Similarly to (6), we notice that . Now,

 V′(γ) = −2⎛⎝k′′(γ)k(γ)−(k′(γ)k(γ))2⎞⎠ (7) = −2(14E[dG(μ⋆,θγ)4]−E[dG(μ⋆,θγ)2/2]2) = −12Var[dG(μ⋆,θγ)2]<0.

As corresponds to the unique intrinsic mean for a distribution, the strong consistency of the intrinsic sample mean set is derived from e.g. Theorem 2.3 of [BP03]. Now, for the consistency of , let us introduce the variable . From the LLN, converges almost surely towards . Moreover, we may prove that tends to zero (almost surely). The later following from

 |Mn−ˆσ2I| = ∣∣ ∣∣1nn∑i=1(dG(ˆμMLE,θi)2−dG(μ⋆,θi)2)∣∣ ∣∣ ≤ 4πnn∑i=1∣∣dG(θi,ˆμMLE)−dG(θi,μ⋆)∣∣ ≤ 4πnn∑i=1dG(ˆμMLE,μ⋆)=4πdG(ˆμMLE,μ⋆).

Combining the previous convergences leads to the almost sure convergence of to and to the result since is continuous and invertible on .

Standard theory of maximumn likelihood estimators (Theorem 5.1 p. 463 of [LC98]) shows that asymptotic normality result holds. The verification of the assumptions (A-D) of [LC98], p.462-463 are omitted; we just focus, here, on the computation of the Fisher information matrix. The antidiagonal term is given by

 J12:=12E[∂∂μdG(μ,θγ)2]∣∣∣μ=μ⋆,γ=γ⋆=12∂V∂μ(μ⋆,γ⋆)=0,

since corresponds to the intrinsic mean and thus minimizes the geodesic variance. The asymptotic variance of is given by the inverse of

 J2(γ⋆)=k′′(γ⋆)k(γ⋆)−k′(γ⋆)2k(γ⋆)2.

Recall that from (7), this constant is positive. Now, the last term to compute is the asymptotic variance of given by the inverse of

 J1(γ⋆):=γ2E[∂2∂μ2dG(μ,θγ)2]∣∣∣μ=μ⋆,γ=γ⋆=γ⋆2∂2V∂μ2(μ⋆,γ⋆).

From Lemma 3, is a function of . Without loss of generality, assume (the other case leads to the same conclusion), then the function (in Lemma 3) is twice continuous differentiable on and . Setting in the last equation leads to the stated result. ∎

### 5.3 Simulation study

We have investigated the efficiency of the maximum likelihood estimates in a simulation study. A part of the results are presented in Table 2. As expected, the empirical MSE of both estimates of the parameters and converge towards zero as the sample size grows. We also notice that it’s more complicated to estimate the intrinsic mean when the concentration parameter is low. Unlike this, the concentration parameter is better estimated for low values of . These facts are confirmed by Figure 3 which shows the constants of the asymptotic variances (for both estimates), i.e. and , in terms of . Figure 4 illustrates the central limit theorem satisfied by the MLE estimates.