Brownian distance covariance

Brownian distance covariance

[ [    [ [ Bowling Green State University, Hungarian Academy of Sciences and Bowling Green State University Department of Mathematics and Statistics
Bowling Green State University
Bowling Green, Ohio 43403
USA
and
Rényi Institute of Mathematics
Hungarian Academy of Sciences
Budapest
Hungary
\printeade2
Department of Mathematics and Statistics
Bowling Green State University
Bowling Green, Ohio 43403
USA
\printeade1
\smonth6 \syear2009\smonth10 \syear2009
\smonth6 \syear2009\smonth10 \syear2009
\smonth6 \syear2009\smonth10 \syear2009
Abstract

Distance correlation is a new class of multivariate dependence coefficients applicable to random vectors of arbitrary and not necessarily equal dimension. Distance covariance and distance correlation are analogous to product-moment covariance and correlation, but generalize and extend these classical bivariate measures of dependence. Distance correlation characterizes independence: it is zero if and only if the random vectors are independent. The notion of covariance with respect to a stochastic process is introduced, and it is shown that population distance covariance coincides with the covariance with respect to Brownian motion; thus, both can be called Brownian distance covariance. In the bivariate case, Brownian covariance is the natural extension of product-moment covariance, as we obtain Pearson product-moment covariance by replacing the Brownian motion in the definition with identity. The corresponding statistic has an elegantly simple computing formula. Advantages of applying Brownian covariance and correlation vs the classical Pearson covariance and correlation are discussed and illustrated.

\kwd
\doi

10.1214/09-AOAS312 \volume3 \issue4 2009 \firstpage1236 \lastpage1265 \newproclaimremarkRemark \newproclaimdefinitionDefinition \newproclaimexampleExample

\runtitle

Brownian covariance \relateddoiDiscussed in \doi10.1214/09-AOAS312A, \doi10.1214/09-AOAS312B, \doi10.1214/09-AOAS312C, \doi10.1214/09-AOAS312D, \doi10.1214/09-AOAS312E, \doi10.1214/09-AOAS312F and \doi10.1214/09-AOAS312G; rejoinder at \doi10.1214/09-AOAS312REJ.

{aug}

A]\fnmsGábor J. \snmSzékelylabel=e2]gszekely@nsf.gov\thanksreft1 and B]\fnmsMaria L. \snmRizzolabel=e1]mrizzo@bgsu.edu\corref \thankstextt1Research supported in part by the NSF.

Distance correlation \kwddcor \kwdBrownian covariance \kwdindependence \kwdmultivariate.

1 Introduction

The importance of independence arises in diverse applications, for inference and whenever it is essential to measure complicated dependence structures in bivariate or multivariate data. This paper focuses on a new dependence coefficient that measures all types of dependence between random vectors and in arbitrary dimension. Distance correlation and distance covariance (Székely, Rizzo, and Bakirov srb07 ()), and Brownian covariance, introduced in this paper, provide a new approach to the problem of measuring dependence and testing the joint independence of random vectors in arbitrary dimension. The corresponding statistics have simple computing formulae, apply to sample sizes (not constrained by dimension), and do not require matrix inversion or estimation of parameters. For example, the distance covariance (dCov) statistic, derived in the next section, is the square root of

where and are simple linear functions of the pairwise distances between sample elements. It will be shown that the definitions of the new dependence coefficients have theoretical foundations based on characteristic functions and on the new concept of covariance with respect to Brownian motion. Our independence test statistics are consistent against all types of dependent alternatives with finite second moments.

Classical Pearson product-moment correlation () and covariance measure linear dependence between two random variables, and in the bivariate normal case is equivalent to independence. In the multivariate normal case, a diagonal covariance matrix implies independence, but is not a sufficient condition for independence in the general case. Nonlinear or nonmonotone dependence may exist. Thus, or do not characterize independence in general.

Although it does not characterize independence, classical correlation is widely applied in time series, clinical trials, longitudinal studies, modeling financial data, meta-analysis, model selection in parametric and nonparametric models, classification and pattern recognition, etc. Ratios and other methods of combining and applying correlation coefficients have also been proposed. An important example is maximal correlation, characterized by Rényi renyi59 ().

For multivariate inference, methods based on likelihood ratio tests (LRT) such as Wilks’ Lambda wilks () or Puri-Sen ps () are not applicable if dimension exceeds sample size, or when distributional assumptions do not hold. Although methods based on ranks can be applied in some problems, many classical methods are effective only for testing linear or monotone types of dependence.

There is much literature on testing or measuring independence. See, for example, Blomqvist blom50 (), Blum, Kiefer, and Rosenblatt bkr61 (), or methods outlined in Hollander and Wolfe hw99 () and Anderson anderson03 (). Multivariate nonparametric approaches to this problem can be found in Taskinen, Oja, and Randles tor05 (), and the references therein.

Our proposed distance correlation represents an entirely new approach. For all distributions with finite first moments, distance correlation generalizes the idea of correlation in at least two fundamental ways: {longlist}[(ii)]

is defined for and in arbitrary dimension.

characterizes independence of and . The coefficient is a standardized version of distance covariance , defined in the next section. Distance correlation satisfies , and only if and are independent. In the bivariate normal case, is a deterministic function of , and with equality when .

Thus, distance covariance and distance correlation provide a natural extension of Pearson product-moment covariance and correlation , and new methodology for measuring dependence in all types of applications.

The notion of covariance of random vectors with respect to a stochastic process is introduced in this paper. This new notion contains as distinct special cases distance covariance and, for bivariate , . The title of this paper refers to , where is a Wiener process.

Brownian covariance is based on Brownian motion or Wiener process for random variables and with finite second moments. An important property of Brownian covariance is that if and only if and are independent.

A surprising result develops: the Brownian covariance is equal to the distance covariance. This equivalence is not only surprising, it also shows that distance covariance is a natural counterpart of product-moment covariance. For bivariate , by considering the simplest nonrandom function, identity (), we obtain . Then by considering the most fundamental random processes, Brownian motion , we arrive at . Brownian correlation is a standardized Brownian covariance, such that if Brownian motion is replaced with the identity function, we obtain the absolute value of Pearson’s correlation .

A further advantage of extending Pearson correlation with distance correlation is that while uncorrelatedness () can sometimes replace independence, for example, in proving some classical laws of large numbers, uncorrelatedness is too weak to imply a central limit theorem, even for strongly stationary summands (see Bradley bradley81 (), bradley88 (), bradley07 ()). On the other hand, a central limit theorem for strongly stationary sequences of summands follows from type conditions (Székely and Bakirov TR08 ()).

Distance correlation and distance covariance are presented in Section 2. Brownian covariance is introduced in Section 3. Extensions and applications are discussed in Sections 4 and 5.

2 Distance covariance and distance correlation

Let in and in be random vectors, where and are positive integers. The lower case and will be used to denote the characteristic functions of and , respectively, and their joint characteristic function is denoted . In terms of characteristic functions, and are independent if and only if . Thus, a natural approach to measuring the dependence between and is to find a suitable norm to measure the distance between and .

Distance covariance is a measure of the distance between and the product . A norm and a distance are defined in Section 2.2. Then an empirical version of is developed and applied to test the hypothesis of independence

In Székely et al. srb07 () an omnibus test of independence based on the sample distance covariance is introduced that is easily implemented in arbitrary dimension without requiring distributional assumptions. In Monte Carlo studies, the distance covariance test exhibited superior power relative to parametric or rank-based likelihood ratio tests against nonmonotone types of dependence. It was also demonstrated that the tests were quite competitive with the parametric likelihood ratio test when applied to multivariate normal data. The practical message is that distance covariance tests are powerful tests for all types of dependence.

2.1 Motivation

Notation

The scalar product of vectors and is denoted by . For complex-valued functions , the complex conjugate of is denoted by and The Euclidean norm of in is . A primed variable is an independent copy of ; that is, and are independent and identically distributed (i.i.d.).

For complex functions defined on , the -norm in the weighted space of functions on is defined by

(1)

where is an arbitrary positive weight function for which the integral above exists.

With a suitable choice of weight function , discussed below, we shall define a measure of dependence

which is analogous to classical covariance, but with the important property that if and only if and are independent. In what follows, is chosen such that we can also define

and similarly define . Then a standardized version of is

a type of unsigned correlation.

In the definition of the norm (1) there are more than one potentially interesting and applicable choices of weight function , but not every leads to a dependence measure that has desirable statistical properties. Let us now discuss the motivation for our particular choice of weight function leading to distance covariance.

At least two conditions should be satisfied by the standardized coefficient : {longlist}[(ii)]

and only if independence holds.

is scale invariant, that is, invariant with respect to transformations , for . However, if we consider integrable weight function , then for and with finite variance

The above limit is obtained by considering the Taylor expansions of the underlying characteristic functions. Thus, if the weight function is integrable, can be arbitrarily close to zero even if and are dependent. By using a suitable nonintegrable weight function, we can obtain an that satisfies both properties (i) and (ii) above.

Considering the operations on characteristic functions involved in evaluating the integrand in (2.1), a promising solution to the choice of weight function is suggested by the following lemma.

Lemma 1

If , then for all in

where

and is the complete gamma function. The integrals at and are meant in the principal value sense: where is the unit ball (centered at 0) in and is the complement of .

A proof of Lemma 1 is given in Székely and Rizzo sr05b (). Lemma 1 suggests the weight functions

(3)

The weight functions (3) result in coefficients that satisfy the scale invariance property (ii) above.

In the simplest case corresponding to and Euclidean norm ,

(4)

where

(5)

(The constant is the surface area of the unit sphere in .)

{remark}

Lemma 1 is applied to evaluate the integrand in (2.1) for weight functions (3) and (4). For example, if (4), then by Lemma 1 there exist constants and such that for in and in ,

Distance covariance and distance correlation are a class of dependence coefficients and statistics obtained by applying a weight function of the type (3), . This type of weight function leads to a simple product-average form of the covariance (8) analogous to Pearson covariance. Other interesting weight functions could be considered (see, e.g., Bakirov, Rizzo and Székely brs06 ()), but only the weight functions (3) lead to distance covariance type statistics (8).

In this paper we apply weight function (4) and the corresponding weighted norm , omitting the index , and write the dependence measure (2.1) as . Section 4.1 extends our results for .

For finiteness of , it is sufficient that and .

2.2 Definitions

{definition}

The distance covariance () between random vectors and with finite first moments is the nonnegative number defined by

Similarly, distance variance () is defined as the square root of

By definition of the norm , it is clear that and if and only if and are independent.

{definition}

The distance correlation () between random vectors and with finite first moments is the nonnegative number defined by

(7)

Several properties of analogous to are given in Theorem 3. Results for the special case of bivariate normal are given in Theorem 6.

The distance dependence statistics are defined as follows. For a random sample of i.i.d. random vectors from the joint distribution of random vectors in and in , compute the Euclidean distance matrices and . Define

where

Similarly, define , for .

{definition}

The nonnegative sample distance covariance and sample distance correlation are defined by

(8)

and

(9)

respectively, where the sample distance variance is defined by

(10)

The nonnegativity of and may not be immediately obvious from the definitions above, but this property as well as the motivation for the definitions of the statistics will become clear from Theorem 1 below.

2.3 Properties of distance covariance

Several interesting properties of distance covariance are obtained. Results in this section are summarized as follows: {longlist}[(iii)]

Equivalent definition of in terms of empirical characteristic functions and norm .

Almost sure convergence and .

Properties of , , and .

Properties of and .

Weak convergence of , the limit distribution of , and statistical consistency.

Results for the bivariate normal case. Many of these results were obtained in Székely et al. srb07 (). Here we give the proofs of new results and readers are referred to srb07 () for more details and proofs of our previous results.

An equivalent definition of

The coefficient is defined in terms of characteristic functions, thus, a natural approach is to define the statistic in terms of empirical characteristic functions. The joint empirical characteristic function of the sample, , is

The marginal empirical characteristic functions of the sample and sample are

respectively. Then an empirical version of distance covariance could have been defined as where the norm is defined by the integral as above in (1). Theorem 1 establishes that this definition is equivalent to Definition 2.2.

Theorem 1

If is a sample from the joint distribution of , then

The proof applies Lemma 1 to evaluate the integral with . An intermediate result is

(11)

where

Then the algebraic identity where is given by Definition 2.2, is established to complete the proof.

As a corollary to Theorem 1, we have . It is also easy to see that the statistic if and only if every sample observation is identical. If , then for . Thus, implies that , and

so .

{remark}

The simplicity of formula (8) for in Definition 2.2 has practical advantages. Although the identity (11) in Theorem 1 provides an alternate computing formula for , the original formula in Definition 2.2 is simpler and requires less computing time (13 less time per statistic on our current machine, for sample size 100). Reusable computations and other efficiencies possible using the simpler formula (8) execute our permutation tests in 94% to 98% less time, which depends on the number of replicates. It is straightforward to apply resampling procedures without the need to recompute the distance matrices. See Example 5.2, where a jackknife procedure is illustrated.

Theorem 2

If and , then almost surely

Corollary 1

If , then almost surely

Theorem 3

For random vectors and such that , the following properties hold: {longlist}[(iii)]

, and if and only if and are independent.

, for all constant vectors , , scalars , and orthonormal matrices , in and , respectively.

If the random vector is independent of the random vector , then

Equality holds if and only if and are both constants, or and are both constants, or are mutually independent.

implies that , almost surely.

, for all constant vectors in , scalars , and orthonormal matrices .

If and are independent, then . Equality holds if and only if one of the random vectors or is constant.

Proofs of statements (iii) and (vi) are given in the Appendix.

Theorem 4
{longlist}

[(iii)]

.

if and only if every sample observation is identical.

.

implies that the dimensions of the linear subspaces spanned by and respectively are almost surely equal, and if we assume that these subspaces are equal, then in this subspace

for some vector , nonzero real number and orthogonal matrix .

Theorem 3 and the results below for the dCov test can be applied in a wide range of problems in statistical modeling and inference, including nonparametric models, models with multivariate response, or when dimension exceeds sample size. Some applications are discussed in Section 5.

Asymptotic properties of

A multivariate test of independence is determined by or , where is as defined in Theorem 1. If we apply the latter version, it normalizes the statistic so that asymptotically it has expected value 1. Then if , under independence, converges in distribution to a quadratic form

(12)

where are independent standard normal random variables, are nonnegative constants that depend on the distribution of , and . A test of independence that rejects independence for large (or ) is statistically consistent against all alternatives with finite first moments.

In the next theorem we need only assume finiteness of first moments for weak convergence of under the independence hypothesis.

Theorem 5 ((Weak convergence))

If and are independent and , then

where is a complex-valued zero mean Gaussian random process with covariance function

for , .

Corollary 2

If , then {longlist}[(iii)]

If and are independent, then where is a nonnegative quadratic form of centered Gaussian random variables (12) and .

If and are independent, then where is a nonnegative quadratic form of centered Gaussian random variables and .

If and are dependent, then and

Corollary 2(i), (ii) guarantees that the dCov test statistic has a proper limit distribution under the hypothesis of independence for all and with finite first moments, while Corollary 2(iii) shows that under any dependent alternative, the dCov test statistic tends to infinity (stochastically). Thus, the dCov test of independence is statistically consistent against all types of dependence.

The dCov test is easy to implement as a permutation test, which is the method that we applied in our examples and power comparisons. For the permutation test implementation one can apply test statistic . Large values of (or ) are significant. The dCov test and test statistics are implemented in the energy package for R in functions dcov.test, dcov, and dcor R (), energy ().

We have also obtained a result that gives an asymptotic critical value applicable to arbitrary distributions. If is a quadratic form of centered Gaussian random variables and , then

for all , where is the quantile of a chi-square variable with 1 degree of freedom. This result follows from a theorem of Székely and Bakirov sb (), page 181.

Thus, a test that rejects independence if has an asymptotic significance level at most . This test criterion could be quite conservative for many distributions. Although this critical value is conservative, it is a sharp bound; the upper bound is achieved when and are independent Bernoulli variables.

Results for the bivariate normal distribution

When has a bivariate normal distribution, there is a deterministic relation between and .

Theorem 6

If and are standard normal, with correlation , then: {longlist}[(iii)]

The relation between and for a bivariate normal distribution is shown in Figure 1.

Figure 1: Dependence coefficient (solid line) and correlation (dashed line) in the bivariate normal case.

3 Brownian covariance

To introduce the notion of Brownian covariance, let us begin by considering the squared product-moment covariance. Recall that a primed variable denotes an i.i.d. copy of the unprimed symbol . For two real-valued random variables, the square of their classical covariance is

(13)

Now we generalize the squared covariance and define the square of conditional covariance, given two real-valued stochastic processes and . We obtain an interesting result when and are independent Weiner processes.

First, to center the random variable in the conditional covariance, we need the following definition. Let be a real-valued random variable and a real-valued stochastic process, independent of . The -centered version of is defined by

(14)

whenever the conditional expectation exists.

Note that if is identity, we have . The important examples in this paper apply Brownian motion/Weiner processes.

3.1 Definition of Brownian covariance

Let be a two-sided one-dimensional Brownian motion/Wiener process with expectation zero and covariance function

(15)

This is twice the covariance of the standard Wiener process. Here the factor 2 simplifies the computations, so throughout the paper, covariance function (15) is assumed for .

{definition}

The Brownian covariance or the Wiener covariance of two real-valued random variables and with finite second moments is a non-negative number defined by its square

(16)

where does not depend on .

Note that if in is replaced by the (nonrandom) identity function , then , the absolute value of Pearson’s product-moment covariance. While the standardized product-moment covariance, Pearson correlation (), measures the degree of linear relationship between two real-valued variables, we shall see that standardized Brownian covariance measures the degree of all kinds of possible relationships between two real-valued random variables.

The definition of can be extended to random processes in higher dimensions as follows. If is an -valued random variable, and is a random process (random field) defined for all and independent of , define the -centered version of by

whenever the conditional expectation exists.

{definition}

If is an -valued random variable, is an -valued random variable, and and are arbitrary random processes (random fields) defined for all , , then the covariance of is defined as the nonnegative number whose square is

(17)

whenever the right-hand side is nonnegative and finite.

In particular, if and are independent Brownian motions with covariance function (15) on , and respectively, the Brownian covariance of and is defined by

(18)

Similarly, for random variables with finite variance define the Brownian variance by

{definition}

The Brownian correlation is defined as

whenever the denominator is not zero; otherwise .

In the following sections we prove that exists for random vectors  and with finite second moments, and derive the Brownian covariance in this case.

3.2 Existence of

In the following, the subscript on Euclidean norm for is omitted when the dimension is self-evident.

Theorem 7

If is an -valued random variable, is an -valued random variable, and , then is nonnegative and finite, and

where , , and are i.i.d.

{pf}

Observe that

and this is always nonnegative. For finiteness, it is enough to prove that all factors in the definition of have finite fourth moments. Equation (7) relies on the special form of the covariance function (15) of . The remaining details are in the Appendix.

See Section 4.1 for definitions and extension of results for the general case of fractional Brownian motion with Hurst parameter and covariance function .

3.3 The surprising coincidence:

Theorem 8

For arbitrary , with finite second moments

{pf}

Both and are nonnegative, hence, it is enough to show that their squares coincide. Lemma 1 can be applied to evaluate . In the numerator of the integral we have terms like

where are i.i.d. and are i.i.d. Now apply the identity

and Lemma 1 to simplify the integrand. After cancelation in the numerator of the integrand, there remains to evaluate integrals of the type