The affinely invariant distance correlation

# The affinely invariant distance correlation

[    [    [    [ [ Institut für Angewandte Mathematik, Universität Heidelberg, Im Neuenheimer Feld 294, 69120 Heidelberg, Germany Heidelberg Institute for Theoretical Studies and Karlsruhe Institute of Technology, HITS gGmbH, Schloss-Wolfsbrunnenweg 35, 69118 Heidelberg, Germany Department of Statistics, Pennsylvania State University, University Park, PA 16802, USA.
\smonth4 \syear2013\smonth8 \syear2013
\smonth4 \syear2013\smonth8 \syear2013
\smonth4 \syear2013\smonth8 \syear2013
###### Abstract

Székely, Rizzo and Bakirov (Ann. Statist. 35 (2007) 2769–2794) and Székely and Rizzo (Ann. Appl. Statist. 3 (2009) 1236–1265), in two seminal papers, introduced the powerful concept of distance correlation as a measure of dependence between sets of random variables. We study in this paper an affinely invariant version of the distance correlation and an empirical version of that distance correlation, and we establish the consistency of the empirical quantity. In the case of subvectors of a multivariate normally distributed random vector, we provide exact expressions for the affinely invariant distance correlation in both finite-dimensional and asymptotic settings, and in the finite-dimensional case we find that the affinely invariant distance correlation is a function of the canonical correlation coefficients. To illustrate our results, we consider time series of wind vectors at the Stateline wind energy center in Oregon and Washington, and we derive the empirical auto and cross distance correlation functions between wind vectors at distinct meteorological stations.

\kwd
\aid

0 \volume20 \issue4 2014 \firstpage2305 \lastpage2330 \doi10.3150/13-BEJ558

\runtitle

Affinely invariant distance correlation

{aug}

1]\initsJ.\fnmsJohannes \snmDueck\thanksref1, 1]\initsD.\fnmsDominic \snmEdelmann\thanksref1, 2]\initsT.\fnmsTilmann \snmGneiting\thanksref2
and 3]\initsD.\fnmsDonald \snmRichards\corref\thanksref3label=e4]richards@stat.psu.edu

affine invariance \kwddistance correlation \kwddistance covariance \kwdhypergeometric function of matrix argument \kwdmultivariate independence \kwdmultivariate normal distribution \kwdvector time series \kwdwind forecasting \kwdzonal polynomial

## 1 Introduction

Székely, Rizzo and Bakirov SzeRizBak07 and Székely and Rizzo SzeRiz09 , in two seminal papers, introduced the distance covariance and distance correlation as powerful measures of dependence. Contrary to the classical Pearson correlation coefficient, the population distance covariance vanishes only in the case of independence, and it applies to random vectors of arbitrary dimensions, rather than to univariate quantities only.

As noted by Newton New09 , the “distance covariance not only provides a bona fide dependence measure, but it does so with a simplicity to satisfy Don Geman’s elevator test (i.e., a method must be sufficiently simple that it can be explained to a colleague in the time it takes to go between floors on an elevator).” In the case of the sample distance covariance, find the pairwise distances between the sample values for the first variable, and center the resulting distance matrix; then do the same for the second variable. The square of the sample distance covariance equals the average entry in the componentwise or Schur product of the two centered distance matrices. Given the theoretical appeal of the population quantity, and the striking simplicity of the sample version, it is not surprising that the distance covariance is experiencing a wealth of applications, despite having been introduced merely half a decade ago.

Specifically, let and be positive integers. For column vectors and , denote by and the standard Euclidean norms on the corresponding spaces; thus, if then

 |s|p=(s21+⋯+s2p)1/2,

and similarly for . For vectors and of the same dimension, , we let be the standard Euclidean scalar product of and . For jointly distributed random vectors and , let

 fX,Y(s,t)=Eexp[i⟨s,X⟩p+i⟨t,Y⟩q]

be the joint characteristic function of , and let and be the marginal characteristic functions of and , where and . Székely et al. SzeRizBak07 introduced the distance covariance between and as the nonnegative number defined by

 V2(X,Y)=1cpcq∫Rp+q|fX,Y(s,t)−fX(s)fY(t)|2|s|p+1p|t|q+1qdsdt, (1)

where denotes the modulus of and

 cp=\uppi(1/2)(p+1)Γ((1/2)(p+1)). (2)

The distance correlation between and is the nonnegative number defined by

 R(X,Y)=V(X,Y)√V(X,X)V(Y,Y) (3)

if both and are strictly positive, and defined to be zero otherwise. For distributions with finite first moments, the distance correlation characterizes independence in that with if and only if and are independent.

A crucial property of the distance correlation is that it is invariant under transformations of the form

 (X,Y)⟼(a1+b1C1X,a2+b2C2Y), (4)

where and , and are nonzero real numbers, and the matrices and are orthogonal. However, the distance correlation fails to be invariant under the group of all invertible affine transformations of , which led Székely et al. SzeRizBak07 , pages 2784–2785, and Székely and Rizzo SzeRiz09 , pages 1252–1253, to propose an affinely invariant sample version of the distance correlation.

Adapting this proposal to the population setting, the affinely invariant distance covariance between distributions and with finite second moments and nonsingular population covariance matrices and , respectively, can be introduced as the nonnegative number defined by

 ˜V2(X,Y)=V2(Σ−1/2XX,Σ−1/2YY). (5)

The affinely invariant distance correlation between and is the nonnegative number defined by

 ˜R(X,Y)=˜V(X,Y)√˜V(X,X)˜V(Y,Y)∑ (6)

if both and are strictly positive, and defined to be zero otherwise. In the sample versions proposed by Székely et al. SzeRizBak07 , the population quantities are replaced by their natural estimators. Clearly, the population affinely invariant distance correlation and its sample version are invariant under the group of invertible affine transformations, and in addition to satisfying this often-desirable group invariance property (Eaton Eat89 ), they inherit the desirable properties of the standard distance dependence measures. In particular, and, for populations with finite second moments and positive definite covariance matrices, if and only if and are independent.

The remainder of the paper is organized as follows. In Section 2, we review the sample version of the affinely invariant distance correlation introduced by Székely et al. SzeRizBak07 , and we prove that the sample version is strongly consistent. In Section 3, we provide exact expressions for the affinely invariant distance correlation in the case of subvectors from a multivariate normal population of arbitrary dimension, thereby generalizing a result of Székely et al. SzeRizBak07 in the bivariate case; our result is non-trivial, being derived using the theory of zonal polynomials and the hypergeometric functions of matrix argument, and it enables the explicit and efficient calculation of the affinely invariant distance correlation in the multivariate normal case.

In Section 4, we study the behavior of the affinely invariant distance measures for subvectors of multivariate normal populations in limiting cases as the Frobenius norm of the cross-covariance matrix converges to zero, or as the dimensions of the subvectors converge to infinity. We expect that these results will motivate and provide the theoretical basis for many applications of distance correlation measures for high-dimensional data.

As an illustration of our results, Section 5 considers time series of wind vectors at the Stateline wind energy center in Oregon and Washington; we shall derive the empirical auto and cross distance correlation functions between wind vectors at distinct meteorological stations. Finally, we provide in Section 6 a discussion in which we make a case for the use of the distance correlation and the affinely invariant distance correlation, which we believe to be appealing and powerful multivariate measures of dependence.

## 2 The sample version of the affinely invariant distance correlation

In this section, which is written primarily to introduce readers to distance correlation measures, we describe sample versions of the affinely invariant distance covariance and distance correlation as introduced by Székely et al. SzeRizBak07 , pages 2784–2785, and Székely and Rizzo SzeRiz09 , pages 1252–1253.

First, we review the sample versions of the standard distance covariance and distance correlation. Given a random sample from jointly distributed random vectors and , we set

 X=[X1,…,Xn]∈Rp×nandY=[Y1,…,Yn]∈Rq×n.

A natural way of introducing a sample version of the distance covariance is to let

be the corresponding empirical characteristic function, and to write and for the respective marginal empirical characteristic functions. The sample distance covariance then is the nonnegative number defined by

 V2n(X,Y)=1cpcq∫Rp+q|fnX,Y(s,t)−fnX(s)fnY(t)|2|s|p+1p|t|q+1qdsdt,

where is the constant given in (2).

Székely et al. SzeRizBak07 , in a tour de force, showed that

 V2n(X,Y)=1n2n∑k,l=1AklBkl, (7)

where

 akl=|Xk−Xl|p,¯ak⋅=1nn∑l=1akl,¯a⋅l=1nn∑k=1akl,¯a⋅⋅=1n2n∑k,l=1akl

and

 Akl=akl−¯ak⋅−¯a⋅l+¯a⋅⋅,

and similarly for , , , , and , where . Thus, the squared sample distance covariance equals the average entry in the componentwise or Schur product of the centered distance matrices for the two variables. The sample distance correlation then is defined by

 Rn(X,Y)=Vn(X,Y)√Vn(X,X)Vn(Y,Y) (8)

if both and are strictly positive, and defined to be zero otherwise. Computer code for calculating these sample versions is available in an R package by Rizzo and Székely autokey16 .

Now let and denote the usual sample covariance matrices of the data and , respectively. Following Székely et al. SzeRizBak07 , page 2785, and Székely and Rizzo SzeRiz09 , page 1253, the sample affinely invariant distance covariance is the nonnegative number defined by

 ˜V2n(X,Y)=V2n(S−1/2XX,S−1/2YY) (9)

if and are positive definite, and defined to be zero otherwise. The sample affinely invariant distance correlation is defined by

 ˜Rn(X,Y)=˜Vn(X,Y)√˜Vn(X,X)˜Vn(Y,Y)∑ (10)

if the quantities in the denominator are strictly positive, and defined to be zero otherwise. The sample affinely invariant distance correlation inherits the properties of the sample distance correlation; in particular

 0≤˜Rn(X,Y)≤1,

and implies that , that the linear spaces spanned by and have full rank, and that there exist a vector , a nonzero number , and an orthogonal matrix such that .

Our next result shows that the sample affinely invariant distance correlation is a consistent estimator of the respective population quantity.

###### Theorem 2.1

Let be jointly distributed random vectors with positive definite marginal covariance matrices and , respectively. Suppose that is a random sample from , and let and . Also, let and be strongly consistent estimators for and , respectively. Then

 V2n(ˆΣ−1/2XX,ˆΣ−1/2YY)→˜V2(X,Y),

almost surely, as . In particular, the sample affinely invariant distance correlation satisfies

 ˜Rn(X,Y)→˜R(X,Y), (11)

almost surely.

{pf}

As the covariance matrices and are positive definite, we may assume that the strongly consistent estimators and also are positive definite. Therefore, in order to prove the first statement it suffices to show that

 V2n(ˆΣ−1/2XX,ˆΣ−1/2YY)−V2n(Σ−1/2XX,Σ−1/2YY)→0, (12)

almost surely. By the decomposition of Székely et al. SzeRizBak07 , page 2776, equation (2.18), the left-hand side of (12) can be written as an average of terms of the form

Using the identity

 ∣∣ˆΣ−1/2X(Xk−Xl)∣∣p∣∣ˆΣ−1/2Y(Yk−Ym)∣∣q

we obtain

 ∣∣ˆΣ−1/2X(Xk−Xl)∣∣p∣∣ˆΣ−1/2Y(Yk−Ym)∣∣q−∣∣Σ−1/2X(Xk−Xl)∣∣p∣∣Σ−1/2Y(Yk−Ym)∣∣q ≤∥∥ˆΣ−1/2X−Σ−1/2X∥∥∥∥ˆΣ−1/2Y−Σ−1/2Y∥∥|Xk−Xl|p|Yk−Ym|q +∥∥ˆΣ−1/2Y−Σ−1/2Y∥∥∣∣Σ−1/2X(Xk−Xl)∣∣p|Yk−Ym|q,

where the matrix norm is the largest eigenvalue of in absolute value. Now we can separate the three sums in the decomposition of Székely et al. SzeRizBak07 , page 2776, equation (2.18) and place the factors like in front of the sums, since they appear in every summand. Then, and tend to zero and the remaining averages converge to constants (representing some distance correlation components) almost surely as , and this completes the proof of the first statement. Finally, the property (11) of strong consistency of is obtained immediately upon setting and .

Székely et al. SzeRizBak07 , page 2783, proposed a test for independence that is based on the sample distance correlation. From their results, we see that the asymptotic properties of the test statistic are not affected by the transition from the standard distance correlation to the affinely invariant distance correlation. Hence, a completely analogous but different test can be stated in terms of the affinely invariant distance correlation. Noting the results of Kosorok Kos09 , Section 4; Kos13 , we raise the possibility that the specific details can be devised in a judicious, data-dependent way so that the power of the test for independence increases when the transition is made to the affinely invariant distance correlation. Alternative multivariate tests for independence based on distances have recently been proposed by Heller et al. HelHelGor13 and Székely and Rizzo SzeRiz13 .

## 3 The affinely invariant distance correlation for multivariate normal populations

We now consider the problem of calculating the affinely invariant distance correlation between the random vectors and where , a multivariate normal distribution with mean vector , covariance matrix , where and have nonsingular marginal covariance matrices and , respectively.

For the case in which , that is, the bivariate normal distribution, the problem was solved by Székely et al. SzeRizBak07 . In that case, the formula for the affinely invariant distance correlation depends only on , the correlation coefficient, and appears in terms of the functions and , both of which are well-known to be special cases of Gauss’ hypergeometric series. Therefore, it is natural to expect that the general case will involve generalizations of Gauss’ hypergeometric series, and Theorem 3.1 below demonstrates that such is indeed the case. To formulate this result, we need to recall the rudiments of the theory of zonal polynomials (Muirhead Mui82 , Chapter 7).

A partition is a vector of nonnegative integers such that . The integer is called the weight of ; and , the length of , is the largest integer such that . The zonal polynomial is a polynomial mapping from the class of symmetric matrices to the real line which satisfies several properties, the following of which are crucial for our results:

1. [(b)]

2. Let denote the group of orthogonal matrices in . Then

 Cκ(K′ΛK)=Cκ(Λ) (13)

for all ; thus, is a symmetric function of the eigenvalues of .

3. The polynomial is homogeneous of degree in : For any ,

 Cκ(δΛ)=δ|κ|Cκ(Λ). (14)
4. If is of rank , then whenever .

5. For any nonnegative integer ,

 ∑|κ|=kCκ(Λ)=(trΛ)k. (15)
6. For any symmetric matrices ,

 ∫O(q)Cκ(K′Λ1KΛ2)dK=Cκ(Λ1)Cκ(Λ2)Cκ(Iq), (16)

where denotes the identity matrix and the integral is with respect to the Haar measure on , normalized to have total volume 1.

7. Let be the eigenvalues of . Then, for a partition with one part,

 C(k)(Λ)=k!(1/2)k∑i1+⋯+iq=kq∏j=1(1/2)ijλijjij!, (17)

where the sum is over all nonnegative integers such that , and

 (α)k=Γ(α+k)Γ(α)=α(α+1)(α+2)⋯(α+k−1),

, is standard notation for the rising factorial. In particular, on setting , , we obtain from (17)

 C(k)(Iq)=((1/2)q)k(1/2)k (18)

(Muirhead Mui82 , page 237, equation (18), Gross and Richards GroRic87 , page 807, Lemma 6.8).

With these properties of the zonal polynomials, we are ready to state our key result which obtains an explicit formula for the affinely invariant distance covariance in the case of a Gaussian population of arbitrary dimension and arbitrary covariance matrix with positive definite marginal covariance matrices. This formula turns out to be a function depending only on the dimensions and and the eigenvalues of the matrix , that is, the squared canonical correlation coefficients of the subvectors and . For fixed dimensions this implies , where and are the canonical correlation coefficients of and . Due to the functional invariance, the maximum likelihood estimator (MLE) for the affinely invariant distance correlation in the Gaussian setting is hence defined by , where are the MLEs of the canonical correlation coefficients.

###### Theorem 3.1

Suppose that , where

 Σ=(ΣXΣXYΣYXΣY)

with , , and . Then

 ˜V2(X,Y)=4\uppicp−1cpcq−1cq∞∑k=122k−2k!22k(1/2)k(−1/2)k(−1/2)k((1/2)p)k((1/2)q)kC(k)(Λ), (19)

where

 Λ=Σ−1/2YΣYXΣ−1XΣXYΣ−1/2Y∈Rq×q. (20)
{pf}

We may assume, with no loss of generality, that is the zero vector. Since and both are positive definite the inverse square-roots, and , exist.

By considering the standardized variables and , we may replace the covariance matrix by

 ˜Σ=(IpΛXYΛXY′Iq),

where

 ΛXY=Σ−1/2XΣXYΣ−1/2Y. (21)

Once we have made these reductions, it follows that the matrix in (20) can be written as and that it has norm less than or equal to . Indeed, by the partial Iwasawa decomposition of , viz., the identity,

 ˜Σ=(Ip0ΛXY′Iq)(Ip00Iq−ΛXY′ΛXY)(IpΛXY0Iq),

where the zero matrix of any dimension is denoted by , we see that the matrix is positive semidefinite if and only if is positive semidefinite. Hence, in the Loewner ordering and therefore .

We proceed to calculate the distance covariance . It is well-known that the characteristic function of is

 f˜X,˜Y(s,t)=exp[−12(st)′˜Σ(st)]=exp[−12(|s|2p+|t|2q+2s′ΛXYt)],

where and . Therefore,

 ∣∣f˜X,˜Y(s,t)−f˜X(s)f˜Y(t)∣∣2=(1−exp(−s′ΛXYt))2exp(−|s|2p−|t|2q),

and hence

 cpcqV2(˜X,˜Y) = ∫Rp+q(1−exp(−s′ΛXYt))2exp(−|s|2p−|t|2q)ds|s|p+1pdt|t|q+1q = ∫Rp+q(1−exp(s′ΛXYt))2exp(−|s|2p−|t|2q)ds|s|p+1pdt|t|q+1q,

where the latter integral is obtained by making the change of variables within the former integral.

By a Taylor series expansion, we obtain

 (1−exp(s′ΛXYt))2 = 1−2exp(s′ΛXYt)+exp(2s′ΛXYt) = ∞∑k=22k−2k!(s′ΛXYt)k.

Substituting this series into (3) and interchanging summation and integration, a procedure which is straightforward to verify by means of Fubini’s theorem, and noting that the odd-order terms integrate to zero, we obtain

 cpcqV2(˜X,˜Y)=∞∑k=122k−2(2k)!∫Rp+q(s′ΛXYt)2kexp(−|s|2p−|t|2q)ds|s|p+1pdt|t|q+1q. (23)

To calculate, for , the integral

 (24)

we change variables to polar coordinates, putting and where , , and . Then the integral (24) separates into a product of multiple integrals over , and over , respectively. The integrals over and are standard gamma integrals,

 (25)

and the remaining factor is the integral

 ∫Sq−1∫Sp−1(θ′ΛXYϕ)2kdθdϕ, (26)

where and are unnormalized surface measures on and , respectively. By a standard invariance argument,

 ∫Sp−1(θ′v)2kdθ=|v|2kp∫Sp−1θ2k1dθ,

. Setting and applying some well-known properties of the surface measure , we obtain

 ∫Sp−1(θ′ΛXYϕ)2kdθ = |ΛXYϕ|2kp∫Sp−1θ2k1dθ = 2cp−1Γ(k+1/2)Γ(1/2p)Γ(k+(1/2)p)Γ(1/2)(ϕ′Λϕ)k.

Therefore, in order to evaluate (26), it remains to evaluate

 Jk(Λ)=∫Sq−1(ϕ′Λϕ)kdϕ.

Since the surface measure is invariant under transformation , , it follows that for all . Integrating with respect to the normalized Haar measure on the orthogonal group, we conclude that

 (27)

We now use the properties of the zonal polynomials. By (15),

 (ϕ′K′ΛKϕ)k=(trK′ΛKϕϕ′)k=∑|κ|=kCκ(K′ΛKϕϕ′);

therefore, by (16),

 ∫O(q)(ϕ′K′ΛKϕ)kdK=∑|κ|=k∫O(q)Cκ(K′ΛKϕϕ′)dK=∑|κ|=kCκ(Λ)Cκ(ϕϕ′)Cκ(Iq).

Since is of rank then, by property (c), if ; it now follows, by (15) and the fact that , that

 C(k)(ϕϕ′)=∑|κ|=kCκ(ϕϕ′)=(trϕϕ′)k=(ϕ′ϕ)k=|ϕ|2kq=1.

Therefore,

 ∫O(q)(ϕ′K′ΛKϕ)kdK=C(k)(Λ)C(k)(Iq)=(1/2)k((1/2)q)kC(k)(Λ),

where the last equality follows by (18). Substituting this result at (27), we obtain

 Jk(Λ)=2cq−1(1/2)k((1/2)q)kC(k)(Λ).

Collecting together these results, and using the well-known identity , we obtain the representation (19), as desired.

We remark that by interchanging the roles of and in Theorem 3.1, we would obtain (19) with in (20) replaced by

 Λ0=Σ−1/2XΣXYΣ−1YΣYXΣ−1/2X∈Rp×p.

Since and have the same characteristic polynomial and hence the same set of nonzero eigenvalues, and noting that depends only on the eigenvalues of , it follows that . Therefore, the series representation (19) for remains unchanged if the roles of and are interchanged.

The series appearing in Theorem 3.1 can be expressed in terms of the generalized hypergeometric functions of matrix argument (Gross and Richards GroRic87 , James Jam64 , Muirhead Mui82 ). For this purpose, we introduce the partitional rising factorial for any and any partition as

 (α)κ=q∏j=1(α−(1/2)(j−1))kj.

Let where is not a nonnegative integer, for all and . Then the generalized hypergeometric function of matrix argument is defined as

 lFm(α1,…,αl;β1,…,βm;S)=∞∑k=01k!∑|κ|=k(α1)κ⋯(αl)κ(β1)κ⋯(βm)κCκ(S),

where is a symmetric matrix. A complete analysis of the convergence properties of this series was derived by Gross and Richards GroRic87 , page 804, Theorem 6.3, and we refer the reader to that paper for the details.

###### Corollary 3.2

In the setting of Theorem 3.1, we have

 ˜V2(X,Y) = 4\uppicp−1cpcq−1cq(3F2(12,−12,−12;12p,12q;Λ)
{pf}

It is evident that