Distribution-Free GOF Testing for Copulas

Asymptotically Distribution-Free Goodness-of-Fit Testing for Copulas

Abstract.

Consider a random sample from a continuous multivariate distribution function with copula . In order to test the null hypothesis that belongs to a certain parametric family, we construct an under asymptotically distribution-free process that serves as a tests generator. The process is a transformation of the difference of a semi-parametric and a parametric estimator of . This transformed empirical process converges weakly to a standard multivariate Wiener process, paving the way for a multitude of powerful asymptotically distribution-free goodness-of-fit tests for copula families. We investigate the finite-sample performance of our approach through a simulation study and illustrate its applicability with a data analysis.

1. Introduction

Consider a -variate () distribution function (df) with continuous margins . By the representation theorem of Sklar (1959), there is a unique df on the unit hypercube with uniform margins such that

(1)

In fact, if is a random vector with joint df , then it is easily seen that the unique df satisfying (1) is the joint df of the component-wise probability integral transforms , given by

(2)

with denoting the left-continuous quantile function associated with , i.e. , for .

The df satisfying (1) or (2) is called the copula associated with , and it is a representation of the dependence structure between the margins of , since contains no information about the margins, yet together with the margins it characterizes . Thus copulas allow separate modeling of margins and dependence structure in multivariate settings, which has proved to be a useful approach in a wide range of applied fields, from medicine and climate research to finance and insurance. We refer to recent comprehensive monographs such as Nelsen (2006), Joe (2015) and Durante and Sempi (2016) for more background on copula theory and its various applications.

The present work is concerned with goodness-of-fit (GOF) testing for copulas. More specifically, we assume that an i.i.d. sample is observed from an unknown -variate df with continuous margins and copula as above. We are interested in testing the hypothesis against the alternative , where denotes a parametric family of copulas, indexed by a finite-dimensional parameter . There is a rich cornucopia of parametric copula families used in various applications (see, e.g., Ch. 4 of Joe (2015) or Ch. 6 of Durante and Sempi (2016) for extensive lists), and new ones are introduced regularly in the literature, so the testing problem just described is clearly very relevant for practitioners making use of copula modeling in their work.

GOF testing for copulas is not a new problem, and several approaches have been proposed in the literature since the early 1990s, each with their advantages and limitations in specific situations. A partial list includes Genest and Rivest (1993), Shih (1998), Wang and Wells (2000), Breymann et al. (2003), Fermanian (2005), Genest et al. (2006) and Dobrić and Schmid (2007). There seems to be no single approach that is universally preferred over others. For a broad overview and comparison of various GOF testing procedures for copulas, we refer to Berg (2009), Genest et al. (2009) and Fermanian (2013). As pointed out in the latter papers, a common problem with many GOF approaches is that the asymptotic distribution of the test statistic under the null hypothesis depends on the particular family that is tested for, as well as on the unknown true value of the parameter . In other words, many of the proposed GOF tests in the literature are not asymptotically distribution-free. As a result, the asymptotic distribution of the test statistics under the null hypothesis cannot be tabulated for universal use, and approximate -values have to be computed for each model via, e.g., specialized bootstrap procedures such as the ones outlined in Genest et al. (2009), App. A-D.

We develop an approach to construct asymptotically distribution-free GOF tests for any parametric copula family satisfying some rather mild smoothness assumptions. We do not propose a particular test statistic, but instead construct a whole test process on the unit hypercube , which converges weakly to a standard -variate Wiener process under the null hypothesis. Thus GOF tests can be conducted by comparing the observed path of this test process with the statistical behavior of a standard Wiener process. Various functionals of the test process can be used for this comparison, such as the absolute maximum over or integral functionals. Since the weak limit of the test process is a standard process independent of the family or the true value of , the limiting distributions of these functionals will also be independent of and , and they only need to be tabulated once for use in all testing problems. Our results can also be used to test the GOF of fully specified copula models rather than parametric families.

Our approach relies on parametric estimation of the marginal distributions . That is, we assume that there is a parametric family of univariate dfs such that for . In fact, this requirement can be relaxed to for , where the parametric families may be different from each other, but we will stick with the stronger requirement for notational simplicity.

The remainder of this paper is structured as follows. In Section 2, we introduce two estimators for the copula , a parametric one that works under the null hypothesis and a semi-parametric one that works in general. We consider the normalized difference between these estimators and determine its weak limit under the null hypothesis, as the sample size tends to infinity. We will see that the distribution of depends on the family , as well as on the true values of the parameter and the marginal parameters, so cannot be used as a basis for distribution-free testing. The crucial step in our approach is introduced in Section 3, where we describe a transformation that turns into a standard Wiener process on . In Section 4, we apply an empirical version of this transformation to , and show that the resulting process converges weakly to a standard Wiener process on under the null hypothesis. This is our main result, and the transformed process is the test process that was alluded to above. In Section 5, we present some simulation results that demonstrate the finite-sample behavior of some functionals of under the null and alternative hypotheses, and we then apply our approach to a real-world data set and analyze the results. Section 6 contains all the proofs.

2. Comparing two copula estimators

As in the Introduction, we assume that are i.i.d. random vectors with common df , which has continuous marginal dfs and copula . We further assume throughout that the marginal dfs are members of some parametric family of univariate dfs, , indexed by , where is some open subset of . This means that there exist such that for . Our ultimate aim is to test the hypothesis , where the parameter runs through some open subset of . Throughout Sections 2-4, we will assume that the null hypothesis holds, i.e. there is a such that .

There are various ways of estimating the copula from i.i.d. data, depending on the assumptions one is willing to make about the underlying model, as well as the requirements one chooses to impose on the estimator, such as smoothness. Perhaps the most straightforward and well-known copula estimator is the non-parametric empirical copula discussed in Ruymgaart (1973), Rüschendorf (1976), Deheuvels (1979, 1981), Gaenssler and Stute (1987), Fermanian et al. (2004) and Segers (2012), among others. Other commonly used approaches for copula estimation include two-step methods where the first step involves (non-parametric or parametric) estimation of the margins and the second step estimates the copula parametrically from marginal data transformed in accordance with the first step. Such estimators are studied in, e.g., Genest et al. (1995) and Shih and Louis (1995). A broad overview of various copula estimation methods can be found in Charpentier et al. (2007), Choroś et al. (2010) and Ch. 5 of Joe (2015).

In this paper, we will make use of two estimators for : a parametric estimator and a semi-parametric estimator . We do not specify the estimator , but require it to satisfy a rather non-restrictive convergence assumption to be stated below. The semi-parametric estimator is defined as

where denote appropriate estimators for , denotes the quantile function associated with , and denotes the -variate empirical df generated by the sample :

Here, is short-hand notation for “ for all ”. Note that in view of the representation (2) of the copula and our parametric assumption on the marginal dfs of , the estimator is a natural one. To our knowledge, its asymptotic behavior has not been studied in the existing literature.

Under the null hypothesis, both and estimate the true copula , while only correctly estimates when the null hypothesis does not hold. Thus the asymptotic discrepancy between the two estimators provides a natural starting point for a GOF test. With that in mind, we define

(3)

Our first result will be a theorem describing the asymptotic behavior of , but first we establish some notation and state the necessary assumptions about the various estimators introduced above, as well as about the parametric families and .

Let denote the empirical df generated by the (unobserved) copula sample , . That is,

Note that we can then write

We also define

so that is the classical empirical process associated with the df . The asymptotic behavior of is well-known, see e.g. Neuhaus (1971): we have in the Skorohod space , where “” denotes weak convergence and is a -Brownian bridge, that is, a mean-zero Gaussian process on with covariance structure

Here, .

The assumptions needed for our first result are listed below.

A1. There exist a -variate random vector and -variate random vectors such that

(4)

in .

A2. The mappings

and

are continuous on .

A3. The mapping

is continuous on , the mapping is bounded on compact subsets of , and the mapping is continuous on .

Theorem 2.1.

Let be the process defined in (3), and let . Under Assumptions A1-A3,

(5)

in , where and are short-hand notation for and .

Remark 2.2.

Heuristically, the limiting process consists of a -Brownian bridge plus additive terms “contributed by” the estimation of the -dimensional parameters , plus additive terms “contributed by” the estimation of the -dimensional parameter . Since the distribution of clearly depends on the underlying family , as well as the (unknown) true values of and , Theorem 2.1 is far from suitable for distribution-free testing.

Remark 2.3.

The convergence in (5) does not necessarily hold in under the stated assumptions. For example, in the case , consider the parametric family , with

Note that is a beta distribution with shape parameters fixed at 1 and , and a free scale parameter . Also note that satisfies Assumption A3. However, for any , the expression

is unbounded near and hence the process is in general not well-defined on the closed hypercube , so the convergence in (5) cannot hold in .

3. Transforming into a standard Wiener process

As we observed in the previous section, the empirical process cannot directly be used as a basis for distribution-free testing, since its limiting process depends on the underlying family and unknown parameter values. We will remedy this problem by transforming into a standard -variate Wiener process. The transformation itself will depend on and parameter values, but the distribution of the resulting process will not, which will facilitate asymptotically distribution-free testing. We first introduce some notation and assumptions.

Recall that

with a -Wiener process on , i.e. a mean-zero Gaussian process with covariance . We can thus alternatively express the limiting process in (5) as

(6)

for . Hence we see that is of the form

(7)

where the are some random variables, and the are deterministic functions on defined by

(8)

We note that (7) is analogous to the bivariate form (23) in Can et al. (2015). In that paper, a transformation of such processes into a standard bivariate Wiener process was described, which was an application of the “innovation martingale transform” idea developed in Khmaladze (1981, 1988, 1993). This idea has been applied to various statistical problems in the literature over the last couple of decades; see, for example, McKeague et al. (1995), Nikabadze and Stute (1997), Stute et al. (1998), Koenker and Xiao (2002, 2006), Khmaladze and Koul (2004, 2009), Delgado et al. (2005) and Dette and Hetzler (2009). We will apply a suitable innovation martingale transform to construct a standard -variate Wiener process on from the process in (7).

We first state the necessary assumptions and establish some notation.

A4. For each , the copula has a strictly positive density on , and the mappings

and

are continuous on .

Now, with the functions as defined in (3), let us denote

(9)

so that

Let denote the column vector consisting of . We will also write for the vector with true parameter values , replaced by arbitrary values .

Finally, given , let

and introduce matrices

(10)

We also define

(11)

Note that in the nomenclature of likelihood theory, is the vector of score functions for the underlying copula model (extended by the constant function 1 in the first component), and is a partial Fisher information matrix constructed from these functions.

Our next assumption is:

A5. The matrices in (11) are well-defined and invertible for all , , .

Given , let denote the point , and given points , let denote the hyperrectangle . Also, given , let denote the point .

We are now ready to state our transformation result.

Theorem 3.1.

Let be the limiting process appearing in Theorem 2.1 and let . If Assumptions A4-A5, restricted to the true parameter values , , hold, then the process

(12)

is a standard Wiener process on .

Remark 3.2.

The transformation in (3.1) “annihilates” the terms following in (7) to produce a -Wiener process on , which is then normalized and scaled to the entire hypercube , so that the end result is indeed a standard Wiener process on . In the next section, we will describe how the transformation of Theorem 3.1 facilitates asymptotically distribution-free testing for .

4. Goodness-of-fit testing

Recall the empirical process defined in (3). In Theorem 2.1 we have derived its weak limit as , and in Theorem 3.1 we have described a transformation that turns into a standard Wiener process on . In this section, we will apply the same transformation (or rather, its empirical version, with unknown parameters replaced by estimators) to , and we will show that the resulting process converges weakly to a standard Wiener process. This is the main result of this paper.

Applying transformation (3.1) to , with unknown parameters replaced by estimators, we obtain the following empirical process on :

(13)

Here, and are short-hand notations for and , respectively.

Before stating the convergence result on , we introduce some further notation and assumptions. Given a hyperrectangle and a function , let denote the total variation of on in the sense of Vitali; see e.g. Owen (2005), Sec. 4 for a definition. Also, given and , let denote the cardinality of , and let denote the point in obtained by discarding all coordinates of for . Moreover, given disjoint subsets with , let denote the function on obtained by fixing the argument of at for and at for .

We consider an alternative concept of “total variation” on , as follows:

(14)

with denoting a disjoint union. In other words, sums the Vitali variations over the hyperrectangle and over all of its “faces” where the coordinate is fixed at or , for at least one . Note that is a variant of the so-called Hardy-Krause variation for multivariate functions; cf. Owen (2005), Def. 2. For and functions , we will write instead of , for brevity.

Let us also denote

Similarly, we will denote for , with the as defined in (9). We introduce the final assumption needed for our convergence result:

A6. For any , we have and . Also, and for .

Theorem 4.1.

Let . Under Assumptions A1-A6, the process in (4) converges weakly to a standard Wiener process in .

Remark 4.2.

Theorem 4.1 is analogous to Theorem 2.1 in that it describes the asymptotic behavior of an empirical process constructed from the data . However, unlike the process , the asymptotic behavior of is distribution-free: it converges to a standard Wiener process. Thus a test for the null hypothesis can now be performed by assessing how the observed path of compares to the “usual” statistical behavior of a standard Wiener process. Since this comparison can be done through many different functionals of , we can construct a multitude of asymptotically distribution-free tests. In the following section, we demonstrate through simulations and a real-world data analysis how such tests can be conducted.

Remark 4.3.

The statement and proof of Theorem 4.1 also applies, with obvious modifications, in the case , where denotes a fully specified copula. Thus the test process can also be used for testing null hypotheses of the form . This will also be demonstrated in the simulations of the following section.

5. Simulations and data analysis

In this section we present the results of a simulation study and a data analysis in order to illustrate the applicability of our approach in finite samples. All computations are performed in R. The code to implement the simulations and the data analysis is available from the authors upon request.

5.1. Simulation study

We consider two widely used parametric copula models, namely Clayton and Gumbel, and we demonstrate how one might test for the goodness-of-fit of these models, both in parametric and fully specified form, using our approach. We limit the simulations to the bivariate case, and we perform tests on simulated data both under the null and alternative hypotheses.

To be more specific, we consider the following copula models:

Clayton:

Clayton(1): ;

Gumbel:

Gumbel(2):

To test for Clayton and Clayton(1) models, we generate 500 samples of size from the bivariate distribution with Exponential(1) margins and Clayton(1) copula. To test for Gumbel and Gumbel(2) models, we generate 500 samples of size from the bivariate distribution with Lomax(3,1) margins and Gumbel(2) copula. Recall that for and , the Lomax(, ) distribution has the cdf

so it is a shifted Pareto distribution with tail parameter and scale parameter .

From each simulated sample, we compute the test process in (4) on a grid spanned over . Parameter estimates are computed through maximum likelihood (ML) estimation. For the parametric Clayton and Gumbel models, ML estimates are computed for the entire bivariate distribution (Exponential margins + Clayton copula in the first case, Lomax margins + Gumbel copula in the second), while for the Clayton(1) and Gumbel(2) models, ML estimates are computed separately for the two marginal parameter sets (Exponential margins in the first case, Lomax margins in the second). Note that Assumption A1 holds for these estimators, which follows from arguments similar to those for asymptotic normality of MLEs. Assumptions A2-A6 are smoothness assumptions that are straightforward to verify for the considered models.

To compare the observed paths of to a standard Wiener process, two functionals are computed from each path of , namely:

where denotes the mesh length of the grid , i.e. 1/200. To create benchmark distribution tables for these statistics, we also simulate 10,000 true standard Wiener process paths on the grid and compute the same functionals for each path. We denote these functionals by and .

Figure 5.1. PP-plots for the Kolmogorov-Smirnov (top) and Cramér-von Mises (bottom) type test statistics

For each model, we construct PP-plots to compare the empirical distributions of and with the theoretical distributions of and (as inferred from the 10,000 simulated Wiener process paths). The results are shown in Fig. 5.1 below. We observe a good match of empirical and limiting distributions for both statistics, especially in the upper right corners of the plots, which are important for testing. These results suggest that Theorem 4.1 yields good finite-sample approximations. This is also confirmed by the observed fractions of Type I errors at 5% and 1% significance levels, given in Table 5.1. Note that the rejection counts are consistent with draws from a Binomial or a Binomial distribution, respectively.

We emphasize here that due to the distribution-free nature of our approach, the critical values of the test statistics need to be computed only once. The critical values of and at commonly used significance levels are given in Table 5.2 below.

Clayton Clayton(1) Gumbel Gumbel(2)
34/500 29/500 25/500 28/500
26/500 23/500 31/500 27/500

Clayton Clayton(1) Gumbel Gumbel(2)
4/500 6/500 5/500 2/500
7/500 8/500 3/500 3/500

Table 5.1. Observed rejection frequencies at 5% (top) and 1% (bottom) significance levels under the various null hypotheses
10% 5% 1%
2.124 2.373 2.986
0.537 0.723 1.197

Table 5.2. Critical values of and at various significance levels

To analyze the behavior of the test statistics under the alternative hypothesis, we generate 100 out-of-model samples (of size as before) for each of the four models considered above. For the Clayton model, we generate samples from the bivariate distribution with Exponential(1) margins and Frank(4) copula. For the Clayton(1) model, we use Exponential(1) margins and Clayton(3) copula. For the Gumbel model, we use Lomax(3,1) margins and Clayton(2) copula, and for the Gumbel(2) model, we use Lomax(3,1) margins and Gumbel(4) copula. From each sample, we construct the test process on the grid and compute the two test statistics and as before. The resulting rejection frequencies at 5% significance level are shown in Table 5.3 below. These numbers suggest that tests based on our approach have high power.

testing for: Clayton Clayton(1) Gumbel Gumbel(2)
sampled from: Frank(4) Clayton(3) Clayton(2) Gumbel(4)
96/100 93/100 100/100 100/100
98/100 94/100 100/100 100/100

Table 5.3. Observed rejection frequencies at the 5% significance level under the various alternative hypotheses

5.2. Data analysis

We consider a data set consisting of log-concentrations of seven metallic elements (uranium [U], lithium [Li], cobalt [Co], potassium [K], caesium [Cs], scandium [Sc], titanium [Ti]) in 655 water samples collected near Grand Junction, Colorado in the late 1970s. In Cook and Johnson (1986), the pairwise dependence structures of U-Cs, Co-Ti and Cs-Sc log-concentrations were investigated, and it was found that the Clayton copula, or a two-parameter extension of it, provides a better fit (in terms of likelihood values) to each of these pairs than the normal copula, under the assumption of normal marginal distributions. This two-parameter extension of the Clayton copula is defined as:

(15)

for parameters and . Note that when , the expression (5.2) reduces to the usual Clayton copula with parameter .

For our analysis, we focus on the pair Co-Sc (which was not investigated in Cook and Johnson (1986)), since the assumption of normal margins seems most plausible for the Co and Sc log-concentrations; see normal QQ-plots in Fig. 5.2 below. Also see Fig. 5.3 for a scatter plot of the Co-Sc log-concentrations as well as a scatter plot of the rank-transformed data.

Figure 5.2. Normal QQ-plots for Co and Sc log-concentrations
Figure 5.3. Scatter plots for Co and Sc log-concentrations (left) and normalized ranks of Co and Sc log-concentrations (right)

Under the assumption of normal margins, we test for four parametric copula families: Clayton, Frank, Gumbel, and the two-parameter family described in (5.2), which we will call the Cook-Johnson copula. Recall that the bivariate Frank family of copulas is given by

for , with . Following the methodology described in Subsection 5.1, we compute the test process and the corresponding test statistics and for each of these four models. For the observed values of the test statistics, we compute -values from the benchmark distribution tables constructed from the true bivariate standard Wiener process. The results are given in Table 5.4. We observe that all models except Frank are rejected at 5% significance level, and Clayton and Cook-Johnson models in particular are rejected very strongly.

The model with normal margins and Frank copula yields maximum likelihood estimates of and for the mean and standard deviation of Co log-concentrations, and for the mean and standard deviation of Sc log-concentrations, and for the Frank copula parameter. The estimated value of suggests moderate positive dependence, corresponding to a Kendall’s of 0.544 and Spearman’s of 0.743. Direct sample estimates for these coefficients are and , respectively. The contour plot of the fitted Frank copula density, together with the scatter plot of the rank-transformed data, can be seen in Fig. 5.4.

Clayton Frank Gumbel C-J
0.0000 0.1730 0.0338 0.0000
0.0000 0.1156 0.0206 0.0000

Table 5.4. -values for various copula models for the Co-Sc log-concentrations, under assumption of marginal normality
Figure 5.4. Contour plot of the fitted Frank copula density, superimposed on the scatter plot of the rank-transformed Co and Sc log-concentrations

6. Proofs

Proof of Theorem 2.1

By Skorohod’s representation theorem (Billingsley (1999), Theorem 6.7) there is a probability space where all random elements in (4) are defined and the convergence (4) holds in probability. We work on this probability space.

Let us introduce the notation for , with

It is then easy to verify that

(16)

We will show that the three components produce the three components of the process in the limit.

Let us first prove the auxiliary result that

(17)

The proof of (17) will be identical for each index , so we will suppress it below for ease of notation. For any , we can use the Mean Value Theorem to write

for some that lies on the line segment connecting to in . It follows that

(18)

where absolute values of vectors should be interpreted component-wise. Assumptions A1 and A3 ensure that each summand on the right-hand side of (6) is uniformly over , so the convergence (17) is established.

Now recall the decomposition (16) of . The first component, , is equal to by definition. By virtue of (17), the convergence (4) in probability, and the (uniform) continuity of over , we can now conclude that

(19)

Moreover, by the Mean Value Theorem, the process can be rewritten for each as

for some lying on the line segment connecting to . Therefore,

(20)