Generalizing Distance Covariance to Measure and Test Multivariate Mutual Dependence

# Generalizing Distance Covariance to Measure and Test Multivariate Mutual Dependence

Ze Jin, David S. Matteson Research support from an NSF Award (DMS-1455172), a Xerox PARC Faculty Research Award, and Cornell University Atkinson Center for a Sustainable Future (AVF-2017).
August 19, 2019
###### Abstract

We propose three new measures of mutual dependence between multiple random vectors. Each measure is zero if and only if the random vectors are mutually independent. The first generalizes distance covariance from pairwise dependence to mutual dependence, while the other two measures are sums of squared distance covariances. The proposed measures share similar properties and asymptotic distributions with distance covariance, and capture non-linear and non-monotone mutual dependence between the random vectors. Inspired by complete and incomplete V-statistics, we define empirical and simplified empirical measures as a trade-off between the complexity and statistical power when testing mutual independence. The implementation of corresponding tests is demonstrated by both simulation results and real data examples.

Key words:characteristic functions; distance covariance; multivariate analysis; mutual independence; V-statistics

## 1 Introduction

Let be a set of variables where each component , is a continuous random vector, and let be an i.i.d. sample from , the joint distribution of . We are interested in testing the hypothesis

 H0:X1,…,Xd are mutually independent,HA:X1,…,Xd are dependent,

which has many applications, including independent component analysis (Matteson and Tsay, 2017), graphical models (Fan et al., 2015), naive Bayes classifiers (Tibshirani et al., 2002), etc. This problem has been studied under different settings and assumptions, including pairwise () and mutual () independence, univariate () and multivariate () components, and more. Specifically, we focus on the general case that are not assumed jointly normal.

The most extensively studied case is pairwise independence with univariate components (): Rank correlation is considered as a non-parametric counterpart to Pearson’s product-moment correlation (Pearson, 1895), including Kendall’s (Kendall, 1938), Spearman’s (Spearman, 1904), etc. Bergsma and Dassios (2014) proposed a test based on an extension of Kendall’s , testing an equivalent condition to . Additionally, Hoeffding (1948) proposed a non-parametric test based on marginal and joint distribution functions, testing a necessary condition to investigate .

For pairwise independence with multivariate components (): Székely et al. (2007), Székely and Rizzo (2009) proposed a test based on distance covariance with fixed and , testing an equivalent condition to . Further, Székely and Rizzo (2013a) proposed a -test based on a modified distance covariance for the setting in which is finite and , testing an equivalent condition to as well.

For mutual independence with univariate components (): One natural way to extend the pairwise rank correlation to multiple components is to collect the rank correlations between all pairs of components, and examine the norm () of this collection. Leung and Drton (2015) proposed a test based on the norm with , and , and Han and Liu (2014) proposed a test based on the norm with , and . Each are testing a necessary condition to , in general.

For mutual independence with multivariate components (): This challenging scenario has not been well studied. Yao et al. (2016) proposed a test based on distance covariance between all pairs of components with , testing a necessary condition to . Inspired by distance covariance in Székely et al. (2007), we propose a new test based on measures of mutual dependence with fixed and in this paper, testing an equivalent condition to . All computational complexities in this paper make no reference to the dimensions , as they are treated as constants.

Our measures of mutual dependence involve V-statistics, and are 0 if and only if mutual independence holds. They belong to energy statistics (Székely and Rizzo, 2013b), and share many statistical properties with distance covariance. Besides, Pfister et al. (2016) proposed -variable HilbertSchmidt independence criterion (dHSIC) under the same setting, which originates from HSIC (Gretton et al., 2005), and also is 0 if and only if mutual independence holds. Although dHSIC involves V-statistics as well, they pursue kernel methods and overcome the computation bottleneck by resampling and Gamma approximation, while we take advantage of characteristic functions and resort to incomplete V-statistics.

The weakness of testing mutual independence by a necessary condition, all pairwise independencies motivates our work on measures of mutual dependence, which is demonstrated by examples in section 5: If we directly test mutual independence based on the measures of mutual dependence proposed in this paper, we successfully detect mutual dependence. Alternatively, if we check all pairwise independencies based on distance covariance, we fail to detect any pairwise dependence, and mistakenly conclude that mutual independence holds probably because the mutual effect averages out when we narrow down to a pair.

The rest of this paper is organized as follows. In section 2, we give a brief overview of distance covariance. In section 3, we generalize distance covariance to complete measure of mutual dependence, with its properties and asymptotic distributions derived. In section 4, we propose asymmetric and symmetric measures of mutual dependence, defined as sums of squared distance covariances. We present synthetic and real data analysis in section 5, followed by simulation results in section 6111An accompanying R package EDMeasure (Jin et al., 2018) is available on CRAN.. Finally, section 7 is the summary of our work. All proofs have been moved to appendix.

The following notations will be used throughout this paper. Let denote a concatenation of (vector) components into a vector. Let where , such that is the marginal dimension, , and is the total dimension. The assumed “” under is denoted by , where , , are mutually independent, and are independent. Let be independent copies of , i.e., , and be independent copies of , i.e., . Let the weighted norm of complex-valued function be defined by where , is the complex conjugate of , and is any positive weight function for which the integral exists.

Given the i.i.d. sample from , let denote the corresponding i.i.d. sample from , , such that . Denote the joint characteristic functions of and as and , and denote the empirical versions of and as and .

## 2 Distance Covariance

Székely et al. (2007) proposed distance covariance to capture non-linear and non-monotone pairwise dependence between two random vectors ().

are pairwise independent if and only if , , which is equivalent to , if the integral exists. A class of the weight functions make the integral a finite and meaningful quantity composed of -th moments according to Lemma 1 in Székely and Rizzo (2005), where , and is the gamma function.

The non-negative distance covariance is defined by , where

 w0(t)=(Kp1Kp2|t1|p1+1|t2|p2+1)−1, (1)

with and , while any following result can be generalized to . If , then , and if and only if are pairwise independent.

The non-negative empirical distance covariance is defined by . Calculating via the symmetry of Euclidian distances has the time complexity . Some asymptotic properties of are derived. If , then (i) . (ii) Under , where is a complex-valued Gaussian process with mean zero and covariance function . (iii) Under , .

## 3 Complete Measure of Mutual Dependence

Generalizing the idea of distance covariance, we propose complete measure of mutual dependence to capture non-linear and non-monotone mutual dependence between multiple random vectors ().

are mutually independent if and only if , , which is equivalent to , if the integral exists. We put all components together instead of separating them, and choose the weight function

 w1(t)=(Kp|t|p+1)−1. (2)
###### Definition 1.

The complete measure of mutual dependence is defined by

 Q(X)=∥ϕX(t)−ϕ˜X(t)∥2w1=∫Rp|ϕX(t)−ϕ˜X(t)|2w1(t)dt.

We can show an equivalence to mutual independence based on according to Lemma 1 in Székely and Rizzo (2005).

###### Theorem 1.

If , then , and if and only if are mutually independent. In addition, has an interpretation as expectations

 Q(X)=E|X−˜X′|+E|X′−˜X|−E|X−X′|−E|˜X−˜X′|.

It is straightforward to estimate by replacing the characteristic functions with the empirical characteristic functions from the sample.

###### Definition 2.

The empirical complete measure of mutual dependence is defined by

 Qn(X)=∥ϕnX(t)−ϕn˜X(t)∥2w1=∫Rp|ϕnX(t)−ϕn˜X(t)|2w1(t)dt.
###### Lemma 1.

has an interpretation as complete V-statistics

 Qn(X) = 2nd+1n∑k,ℓ1,…,ℓd=1|Xk−(Xℓ11,…,Xℓdd)|+1n2n∑k,ℓ=1|Xk−Xℓ| −1n2dn∑k1,…,kd,ℓ1,…,ℓd=1|(Xk11,…,Xkdd)−(Xℓ11,…,Xℓdd)|,

whose naive implementation has the time complexity .

In view of the definition of distance covariance, it may seem natural to define the measure using the weight function

 w2(t)=(Kp1…Kpd|t1|p1+1…|td|pd+1)−1, (3)

which equals when . Given the weight function , we can define the squared distance covariance of mutual dependence and its empirical counterpart , which equal and when . The naive implementation of has the time complexity .

The reason to favor instead of is a trade-off between the moment condition and time complexity. We often cannot afford the time complexity of or , and have to simplify them through incomplete V-statistics. An incomplete V-statistic is obtained by sampling the terms of a complete V-statistic, where the summation extends over only a subset of the tuple of indices. To simplify by replacing complete V-statistics with incomplete V-statistics, requires the additional -th moment condition , while does not require any other condition in addition to the first moment condition . Thus, we can reduce the complexity of to with a weaker condition, which makes and from a more general solution. Moreover, we define the simplified empirical version of as

 ϕn⋆˜X(t)=1nn∑k=1ei∑dj=1⟨tj,Xk+j−1j⟩=1nn∑k=1ei⟨t,(Xk1,…,Xk+d−1d)⟩,

in order to substitute for simplification, where is interpreted as for .

###### Definition 3.

The simplified empirical complete measure of mutual dependence is defined by

 Q⋆n(X)=∥ϕnX(t)−ϕn⋆˜X(t)∥2w1=∫Rp|ϕnX(t)−ϕn⋆˜X(t)|2w1(t)dt.
###### Lemma 2.

has an interpretation as incomplete V-statistics

 Q⋆n(X) = 2n2n∑k,ℓ=1|Xk−(Xℓ1,…,Xℓ+d−1d)|+1n2n∑k,ℓ=1|Xk−Xℓ| −1n2n∑k,ℓ=1|(Xk1,…,Xk+d−1d)−(Xℓ1,…,Xℓ+d−1d)|,

whose naive implementation has the time complexity .

Using a similar derivation to Theorem 2 and 5 of Székely et al. (2007), some asymptotic distributions of are obtained as follows.

###### Theorem 2.

If , then

 Qn(X)a.s.⟶n→∞Q(X) and Q⋆n(X)a.s.⟶n→∞Q(X).
###### Theorem 3.

If , then under , we have

 nQn(X)D⟶n→∞∥ζ(t)∥2w1 and nQ⋆n(X)D⟶n→∞∥ζ⋆(t)∥2w1,

where are complex-valued Gaussian processes with mean zero and covariance functions

 R(t,t0) = d∏j=1ϕXj(tj−t0j)+(d−1)d∏j=1ϕXj(tj)¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ϕXj(t0j)−d∑j=1ϕXj(tj−t0j)∏ℓ≠jϕXℓ(tℓ)¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ϕXℓ(t0ℓ), R⋆(t,t0) = 2R(t,t0).

Under , we have

 nQn(X)a.s.⟶n→∞∞ and nQ⋆n(X)a.s.⟶n→∞∞.

Therefore, a mutual independence test can be proposed based on the weak convergence of in Theorem 3. Since the asymptotic distributions of depend on , a permutation procedure is used to approximate them in practice.

## 4 Asymmetric and Symmetric Measures of Mutual Dependence

As an alternative, we now propose the asymmetric and symmetric measures of mutual dependence to capture mutual dependence via aggregating pairwise dependencies.

The subset of components on the right of is denoted by , with , . The subset of components except is denoted by , with , .

We denote pairwise independence by . The collection of pairwise independencies implied by mutual independence includes “one versus others on the right”

 {X1⊥⊥X1+,X2⊥⊥X2+,…,Xd−1⊥⊥Xd}, (4)

“one versus all the others”

 {X1⊥⊥X−1,X2⊥⊥X−2,…,Xd⊥⊥X−d}, (5)

and many others, e.g., . In fact, the number of pairwise independencies resulting from mutual independence is at least , which grows exponentially with the number of components . Therefore, we cannot test mutual independence simply by checking all pairwise independencies even with moderate .

Fortunately, we have two options to test only a small subset of all pairwise independencies to fulfill the task. The first one is that holds if and only if (4) holds, which can be verified via the sequential decomposition of distribution functions. This option is asymmetric and not unique, having feasible subsets with respect to different orders of . The second one is that holds if and only if (5) holds, which can be verified via the stepwise decomposition of distribution functions and the fact that implies . This option is symmetric and unique, having only one feasible subset.

To shed light on why these two options are necessary and sufficient conditions to mutual independence, we present the following inequality that the mutual dependence can be bounded by a sum of several pairwise dependencies as

 |ϕX(t)−d∏j=1ϕXj(tj)|≤d−1∑c=1|ϕ(Xc,Xc+)((tc,tc+))−ϕXc(tc)ϕXc+(tc+)|2.

In consideration of these two options, we test a set of pairwise independencies in place of mutual independence, where we use to test pairwise independence.

###### Definition 4.

The asymmetric and symmetric measures of mutual dependence are defined by

 R(X)=d−1∑c=1V2((Xc,Xc+)) and S(X)=d∑c=1V2((Xc,X−c)).

We can show an equivalence to mutual independence based on according to Theorem 3 of Székely et al. (2007).

###### Theorem 4.

If , then , and if and only if are mutually independent.

It is straightforward to estimate by replacing the characteristic functions with the empirical characteristic functions from the sample.

###### Definition 5.

The empirical asymmetric and symmetric measures of mutual dependence are defined by

 Rn(X)=d−1∑c=1V2n((Xc,Xc+)) and Sn(X)=d∑c=1V2n((Xc,X−c)).

The implementations of have the time complexity . Using a similar derivation to Theorem 2 and 5 of Székely et al. (2007), some asymptotic properties of are obtained as follows.

###### Theorem 5.

If , then

 Rn(X)a.s.⟶n→∞R(X) and Sn(X)a.s.⟶n→∞S(X).
###### Theorem 6.

If , then under , we have

 nRn(X)D⟶n→∞d−1∑j=1∥ζRj((tj,tj+))∥2w0 and nSn(X)D⟶n→∞d∑j=1∥ζSj((tj,t−j))∥2w0,

where are complex-valued Gaussian processes corresponding to the limiting distributions of . Under , we have

 nRn(X)a.s.⟶n→∞∞ and nSn(X)a.s.⟶n→∞∞.

It is surprising to find that are mutually independent asymptotically, and are mutually independent asymptotically as well, which is a crucial discovery behind Theorem 6.

Alternatively, we can plug in instead of in Definition 4 and instead of in Definition 5, and define the asymmetric and symmetric measures accordingly, which equal when . The naive implementations of have the time complexity . Similarly, we can replace with to simplify them, and define the simplified empirical asymmetric and symmetric measures , reducing their complexities to without any other condition except the first moment condition . Through the same derivations, we can show that , have similar convergences as in Theorem 5 and 6.

## 5 Illustrative Examples

We start with two examples comparing different methods to show the value of our mutual independence tests. In practice, people usually check all pairwise dependencies to test mutual independence, due to the lack of reliable and universal mutual independence tests. It is very likely to miss the complicated mutual dependence structure, and make unsound decisions in corresponding applications assuming that mutual independence holds.

### 5.1 Synthetic Data

We define a triplet of random vectors on , where , , the first element of is and the remaining elements are , and are mutually independent. Clearly, is a pairwise independent but mutually dependent triplet.

An i.i.d. sample of is randomly generated with sample size and dimension . On the one hand, we test the null hypothesis are mutually independent using proposed measures . On the other hand, we test the null hypotheses , , and using distance covariance . An adaptive permutation size is used for all tests.

As expected, mutual dependence is successfully captured, as the p-values of mutual independence tests are 0.0143 (), 0.0286 (), 0 (), 0.0381 () and 0 (). Meanwhile, the p-values of pairwise independence tests are 0.2905 (), 0.2619 (), and 0.3048 (). According to the Bonferroni correction for multiple tests among all the pairs, the significance level should be adjusted as for pairwise tests. As a result, no signal of pairwise dependence is detected, and we cannot reject mutual independence.

### 5.2 Financial Data

We collect the annual Fama/French 5 factors in the past 52 years between 1964 and 2015. In particular, we are interested in whether mutual dependence among three factors, Mkt-RF (excess return on the market), SMB (small minus big), and RF (risk-free return) exists, where annual returns are considered as nearly independent observations. Both histograms and pair plots of are depicted in Figure 1.

For one, we apply a single mutual independence test are mutually independent. For another, we apply three pairwise independence tests , , and . An adaptive permutation size is used for all tests.

The p-values of mutual independence tests are 0.0236 (), 0.0642 (), 0.0541 (), 0.1588 () and 0.1486 (), indicating that mutual dependence is successfully captured. In the meanwhile, the p-values of pairwise independence tests using distance covariance are 0.1419 (), 0.5743 () and 0.5405 (). Similarly, the significance level should be adjusted as according to the Bonferroni correction, and thus we cannot reject mutual independence, since no signal of pairwise dependence is detected.

## 6 Simulation Studies

In this section, we evaluate the finite sample performance of proposed measures , by performing simulations similar to Székely et al. (2007), and compare them to benchmark measures (Székely et al., 2007) and (Han and Liu, 2014). We also include permutation tests based on finite-sample extensions of , denoted by .

We test the null hypothesis with significance level and examine the empirical size and power of each measure. In each scenario, we run 1,000 repetitions with the adaptive permutation size where is the sample size, for all empirical measures that require a permutation procedure to approximate their asymptotic distributions, i.e., .

In the following two examples, we fix and change from 25 to 500, and compare , to .

###### Example 1 (pairwise multivariate normal).

, where . Under , , . Under , , . See results in Table 2 and 2.

###### Example 2 (pairwise multivariate non-normal).

, where . . Under , , . Under , , . See results in Table 4 and 4.

For both example 1 and 2, the empirical size of all measures is close to . The empirical power of is almost the same as that of , while the empirical power of is lower than that of , which makes sense because we trade-off testing power and time complexity for simplified measures.

In the following two examples, we fix and change from 25 to 500, and compare to .

###### Example 3 (mutual multivariate normal).

, where . Under , , . Under , , . See results in Table 6 and 6.

###### Example 4 (mutual multivariate non-normal).

. where . , . Under , , . Under , , . See results in Table 8 and 8.

For both example 3 and 4, the empirical size of all measures is close to . The empirical power of is almost the same, the empirical power of is almost the same, while the empirical power of is lower than that of , , which makes sense since we trade-off testing power and time complexity for simplified measures.

In the last example, we change from 5 to 50 and fix , and compare , to .

###### Example 5 (mutual univariate normal high-dimensional).

. where . Under , , . Under