The affinely invariant distance correlation
Abstract
Székely, Rizzo and Bakirov (Ann. Statist. 35 (2007) 2769–2794) and Székely and Rizzo (Ann. Appl. Statist. 3 (2009) 1236–1265), in two seminal papers, introduced the powerful concept of distance correlation as a measure of dependence between sets of random variables. We study in this paper an affinely invariant version of the distance correlation and an empirical version of that distance correlation, and we establish the consistency of the empirical quantity. In the case of subvectors of a multivariate normally distributed random vector, we provide exact expressions for the affinely invariant distance correlation in both finitedimensional and asymptotic settings, and in the finitedimensional case we find that the affinely invariant distance correlation is a function of the canonical correlation coefficients. To illustrate our results, we consider time series of wind vectors at the Stateline wind energy center in Oregon and Washington, and we derive the empirical auto and cross distance correlation functions between wind vectors at distinct meteorological stations.
0 \volume20 \issue4 2014 \firstpage2305 \lastpage2330 \doi10.3150/13BEJ558
Affinely invariant distance correlation
1]\initsJ.\fnmsJohannes \snmDueck\thanksref1,
1]\initsD.\fnmsDominic \snmEdelmann\thanksref1,
2]\initsT.\fnmsTilmann \snmGneiting\thanksref2
and
3]\initsD.\fnmsDonald \snmRichards\corref\thanksref3label=e4]richards@stat.psu.edu
affine invariance \kwddistance correlation \kwddistance covariance \kwdhypergeometric function of matrix argument \kwdmultivariate independence \kwdmultivariate normal distribution \kwdvector time series \kwdwind forecasting \kwdzonal polynomial
1 Introduction
Székely, Rizzo and Bakirov SzeRizBak07 and Székely and Rizzo SzeRiz09 , in two seminal papers, introduced the distance covariance and distance correlation as powerful measures of dependence. Contrary to the classical Pearson correlation coefficient, the population distance covariance vanishes only in the case of independence, and it applies to random vectors of arbitrary dimensions, rather than to univariate quantities only.
As noted by Newton New09 , the “distance covariance not only provides a bona fide dependence measure, but it does so with a simplicity to satisfy Don Geman’s elevator test (i.e., a method must be sufficiently simple that it can be explained to a colleague in the time it takes to go between floors on an elevator).” In the case of the sample distance covariance, find the pairwise distances between the sample values for the first variable, and center the resulting distance matrix; then do the same for the second variable. The square of the sample distance covariance equals the average entry in the componentwise or Schur product of the two centered distance matrices. Given the theoretical appeal of the population quantity, and the striking simplicity of the sample version, it is not surprising that the distance covariance is experiencing a wealth of applications, despite having been introduced merely half a decade ago.
Specifically, let and be positive integers. For column vectors and , denote by and the standard Euclidean norms on the corresponding spaces; thus, if then
and similarly for . For vectors and of the same dimension, , we let be the standard Euclidean scalar product of and . For jointly distributed random vectors and , let
be the joint characteristic function of , and let and be the marginal characteristic functions of and , where and . Székely et al. SzeRizBak07 introduced the distance covariance between and as the nonnegative number defined by
(1) 
where denotes the modulus of and
(2) 
The distance correlation between and is the nonnegative number defined by
(3) 
if both and are strictly positive, and defined to be zero otherwise. For distributions with finite first moments, the distance correlation characterizes independence in that with if and only if and are independent.
A crucial property of the distance correlation is that it is invariant under transformations of the form
(4) 
where and , and are nonzero real numbers, and the matrices and are orthogonal. However, the distance correlation fails to be invariant under the group of all invertible affine transformations of , which led Székely et al. SzeRizBak07 , pages 2784–2785, and Székely and Rizzo SzeRiz09 , pages 1252–1253, to propose an affinely invariant sample version of the distance correlation.
Adapting this proposal to the population setting, the affinely invariant distance covariance between distributions and with finite second moments and nonsingular population covariance matrices and , respectively, can be introduced as the nonnegative number defined by
(5) 
The affinely invariant distance correlation between and is the nonnegative number defined by
(6) 
if both and are strictly positive, and defined to be zero otherwise. In the sample versions proposed by Székely et al. SzeRizBak07 , the population quantities are replaced by their natural estimators. Clearly, the population affinely invariant distance correlation and its sample version are invariant under the group of invertible affine transformations, and in addition to satisfying this oftendesirable group invariance property (Eaton Eat89 ), they inherit the desirable properties of the standard distance dependence measures. In particular, and, for populations with finite second moments and positive definite covariance matrices, if and only if and are independent.
The remainder of the paper is organized as follows. In Section 2, we review the sample version of the affinely invariant distance correlation introduced by Székely et al. SzeRizBak07 , and we prove that the sample version is strongly consistent. In Section 3, we provide exact expressions for the affinely invariant distance correlation in the case of subvectors from a multivariate normal population of arbitrary dimension, thereby generalizing a result of Székely et al. SzeRizBak07 in the bivariate case; our result is nontrivial, being derived using the theory of zonal polynomials and the hypergeometric functions of matrix argument, and it enables the explicit and efficient calculation of the affinely invariant distance correlation in the multivariate normal case.
In Section 4, we study the behavior of the affinely invariant distance measures for subvectors of multivariate normal populations in limiting cases as the Frobenius norm of the crosscovariance matrix converges to zero, or as the dimensions of the subvectors converge to infinity. We expect that these results will motivate and provide the theoretical basis for many applications of distance correlation measures for highdimensional data.
As an illustration of our results, Section 5 considers time series of wind vectors at the Stateline wind energy center in Oregon and Washington; we shall derive the empirical auto and cross distance correlation functions between wind vectors at distinct meteorological stations. Finally, we provide in Section 6 a discussion in which we make a case for the use of the distance correlation and the affinely invariant distance correlation, which we believe to be appealing and powerful multivariate measures of dependence.
2 The sample version of the affinely invariant distance correlation
In this section, which is written primarily to introduce readers to distance correlation measures, we describe sample versions of the affinely invariant distance covariance and distance correlation as introduced by Székely et al. SzeRizBak07 , pages 2784–2785, and Székely and Rizzo SzeRiz09 , pages 1252–1253.
First, we review the sample versions of the standard distance covariance and distance correlation. Given a random sample from jointly distributed random vectors and , we set
A natural way of introducing a sample version of the distance covariance is to let
be the corresponding empirical characteristic function, and to write and for the respective marginal empirical characteristic functions. The sample distance covariance then is the nonnegative number defined by
where is the constant given in (2).
Székely et al. SzeRizBak07 , in a tour de force, showed that
(7) 
where
and
and similarly for , , , , and , where . Thus, the squared sample distance covariance equals the average entry in the componentwise or Schur product of the centered distance matrices for the two variables. The sample distance correlation then is defined by
(8) 
if both and are strictly positive, and defined to be zero otherwise. Computer code for calculating these sample versions is available in an R package by Rizzo and Székely autokey16 .
Now let and denote the usual sample covariance matrices of the data and , respectively. Following Székely et al. SzeRizBak07 , page 2785, and Székely and Rizzo SzeRiz09 , page 1253, the sample affinely invariant distance covariance is the nonnegative number defined by
(9) 
if and are positive definite, and defined to be zero otherwise. The sample affinely invariant distance correlation is defined by
(10) 
if the quantities in the denominator are strictly positive, and defined to be zero otherwise. The sample affinely invariant distance correlation inherits the properties of the sample distance correlation; in particular
and implies that , that the linear spaces spanned by and have full rank, and that there exist a vector , a nonzero number , and an orthogonal matrix such that .
Our next result shows that the sample affinely invariant distance correlation is a consistent estimator of the respective population quantity.
Theorem 2.1
Let be jointly distributed random vectors with positive definite marginal covariance matrices and , respectively. Suppose that is a random sample from , and let and . Also, let and be strongly consistent estimators for and , respectively. Then
almost surely, as . In particular, the sample affinely invariant distance correlation satisfies
(11) 
almost surely.
As the covariance matrices and are positive definite, we may assume that the strongly consistent estimators and also are positive definite. Therefore, in order to prove the first statement it suffices to show that
(12) 
almost surely. By the decomposition of Székely et al. SzeRizBak07 , page 2776, equation (2.18), the lefthand side of (12) can be written as an average of terms of the form
Using the identity
we obtain
where the matrix norm is the largest eigenvalue of in absolute value. Now we can separate the three sums in the decomposition of Székely et al. SzeRizBak07 , page 2776, equation (2.18) and place the factors like in front of the sums, since they appear in every summand. Then, and tend to zero and the remaining averages converge to constants (representing some distance correlation components) almost surely as , and this completes the proof of the first statement. Finally, the property (11) of strong consistency of is obtained immediately upon setting and .
Székely et al. SzeRizBak07 , page 2783, proposed a test for independence that is based on the sample distance correlation. From their results, we see that the asymptotic properties of the test statistic are not affected by the transition from the standard distance correlation to the affinely invariant distance correlation. Hence, a completely analogous but different test can be stated in terms of the affinely invariant distance correlation. Noting the results of Kosorok Kos09 , Section 4; Kos13 , we raise the possibility that the specific details can be devised in a judicious, datadependent way so that the power of the test for independence increases when the transition is made to the affinely invariant distance correlation. Alternative multivariate tests for independence based on distances have recently been proposed by Heller et al. HelHelGor13 and Székely and Rizzo SzeRiz13 .
3 The affinely invariant distance correlation for multivariate normal populations
We now consider the problem of calculating the affinely invariant distance correlation between the random vectors and where , a multivariate normal distribution with mean vector , covariance matrix , where and have nonsingular marginal covariance matrices and , respectively.
For the case in which , that is, the bivariate normal distribution, the problem was solved by Székely et al. SzeRizBak07 . In that case, the formula for the affinely invariant distance correlation depends only on , the correlation coefficient, and appears in terms of the functions and , both of which are wellknown to be special cases of Gauss’ hypergeometric series. Therefore, it is natural to expect that the general case will involve generalizations of Gauss’ hypergeometric series, and Theorem 3.1 below demonstrates that such is indeed the case. To formulate this result, we need to recall the rudiments of the theory of zonal polynomials (Muirhead Mui82 , Chapter 7).
A partition is a vector of nonnegative integers such that . The integer is called the weight of ; and , the length of , is the largest integer such that . The zonal polynomial is a polynomial mapping from the class of symmetric matrices to the real line which satisfies several properties, the following of which are crucial for our results:

[(b)]

Let denote the group of orthogonal matrices in . Then
(13) for all ; thus, is a symmetric function of the eigenvalues of .

The polynomial is homogeneous of degree in : For any ,
(14) 
If is of rank , then whenever .

For any nonnegative integer ,
(15) 
For any symmetric matrices ,
(16) where denotes the identity matrix and the integral is with respect to the Haar measure on , normalized to have total volume 1.

Let be the eigenvalues of . Then, for a partition with one part,
(17) where the sum is over all nonnegative integers such that , and
, is standard notation for the rising factorial. In particular, on setting , , we obtain from (17)
(18) (Muirhead Mui82 , page 237, equation (18), Gross and Richards GroRic87 , page 807, Lemma 6.8).
With these properties of the zonal polynomials, we are ready to state our key result which obtains an explicit formula for the affinely invariant distance covariance in the case of a Gaussian population of arbitrary dimension and arbitrary covariance matrix with positive definite marginal covariance matrices. This formula turns out to be a function depending only on the dimensions and and the eigenvalues of the matrix , that is, the squared canonical correlation coefficients of the subvectors and . For fixed dimensions this implies , where and are the canonical correlation coefficients of and . Due to the functional invariance, the maximum likelihood estimator (MLE) for the affinely invariant distance correlation in the Gaussian setting is hence defined by , where are the MLEs of the canonical correlation coefficients.
Theorem 3.1
Suppose that , where
with , , and . Then
(19) 
where
(20) 
We may assume, with no loss of generality, that is the zero vector. Since and both are positive definite the inverse squareroots, and , exist.
By considering the standardized variables and , we may replace the covariance matrix by
where
(21) 
Once we have made these reductions, it follows that the matrix in (20) can be written as and that it has norm less than or equal to . Indeed, by the partial Iwasawa decomposition of , viz., the identity,
where the zero matrix of any dimension is denoted by , we see that the matrix is positive semidefinite if and only if is positive semidefinite. Hence, in the Loewner ordering and therefore .
We proceed to calculate the distance covariance . It is wellknown that the characteristic function of is
where and . Therefore,
and hence
where the latter integral is obtained by making the change of variables within the former integral.
By a Taylor series expansion, we obtain
Substituting this series into (3) and interchanging summation and integration, a procedure which is straightforward to verify by means of Fubini’s theorem, and noting that the oddorder terms integrate to zero, we obtain
(23) 
To calculate, for , the integral
(24) 
we change variables to polar coordinates, putting and where , , and . Then the integral (24) separates into a product of multiple integrals over , and over , respectively. The integrals over and are standard gamma integrals,
(25) 
and the remaining factor is the integral
(26) 
where and are unnormalized surface measures on and , respectively. By a standard invariance argument,
. Setting and applying some wellknown properties of the surface measure , we obtain
Therefore, in order to evaluate (26), it remains to evaluate
Since the surface measure is invariant under transformation , , it follows that for all . Integrating with respect to the normalized Haar measure on the orthogonal group, we conclude that
(27) 
We now use the properties of the zonal polynomials. By (15),
therefore, by (16),
Since is of rank then, by property (c), if ; it now follows, by (15) and the fact that , that
Therefore,
where the last equality follows by (18). Substituting this result at (27), we obtain
Collecting together these results, and using the wellknown identity , we obtain the representation (19), as desired.
We remark that by interchanging the roles of and in Theorem 3.1, we would obtain (19) with in (20) replaced by
Since and have the same characteristic polynomial and hence the same set of nonzero eigenvalues, and noting that depends only on the eigenvalues of , it follows that . Therefore, the series representation (19) for remains unchanged if the roles of and are interchanged.
The series appearing in Theorem 3.1 can be expressed in terms of the generalized hypergeometric functions of matrix argument (Gross and Richards GroRic87 , James Jam64 , Muirhead Mui82 ). For this purpose, we introduce the partitional rising factorial for any and any partition as
Let where is not a nonnegative integer, for all and . Then the generalized hypergeometric function of matrix argument is defined as
where is a symmetric matrix. A complete analysis of the convergence properties of this series was derived by Gross and Richards GroRic87 , page 804, Theorem 6.3, and we refer the reader to that paper for the details.
Corollary 3.2
In the setting of Theorem 3.1, we have
It is evident that
Therefore, we now can write the series in (19), up to a multiplicative constant, in terms of a generalized hypergeometric function of matrix argument, in that
Due to property (14) it remains to show that the zonal polynomial series expansion for the generalized hypergeometric function of matrix argument converges absolutely for all with in the Loewner ordering. By (18)