# Exceedance-based nonlinear regression of tail dependence

Abstract The probability and structure of co-occurrences of extreme values in multivariate data may critically depend on auxiliary information provided by covariates. In this contribution, we develop a flexible generalized additive modeling framework based on high threshold exceedances for estimating covariate-dependent joint tail characteristics for regimes of asymptotic dependence and asymptotic independence. The framework is based on suitably defined marginal pretransformations and projections of the random vector along the directions of the unit simplex, which lead to convenient univariate representations of multivariate exceedances based on the exponential distribution. Good performance of our estimators of a nonparametrically designed influence of covariates on extremal coefficients and tail dependence coefficients are shown through a simulation study. We illustrate the usefulness of our modeling framework on a large dataset of nitrogen dioxide measurements recorded in France between 1999 and 2012, where we use the generalized additive framework for modeling marginal distributions and tail dependence in monthly maxima. Our results imply asymptotic independence of data observed at different stations, and we find that the estimated coefficients of tail dependence decrease as a function of spatial distance and show distinct patterns for different years and for different types of stations (traffic vs. background).

Keywords Asymptotic independence, Extreme value theory, Generalized additive models, Penalized likelihood, Tail dependence.

## 1 Introduction

Modeling co-occurrence patterns of extreme values arising in multi-component systems is crucial for an accurate prediction of aggregated risks. The limiting dependence structures for extreme values do not present a simple parametric form, such as the Gaussian dependence arising from the multivariate central limit theorem, which has spawned an extensive literature covering a wide range of parametric to fully nonparametric dependence models. When asymptotic independence arises, the limit model has a simple form but does not indicate the rate of convergence to this limit, suggesting alternative joint tail representations should be used to model the residual dependence at the observed levels of the process. In this paper, we use threshold exceedance data, and we develop nonparametric modeling approaches suitable for characterizing asymptotic dependence and asymptotic independence when the strength of the dependence and more generally its shape may be governed by additional information given by covariate data. Only a few approaches exist in the current literature for modeling and estimating covariate influence on the joint tail structure. In a parametric framework, Mhalla, de Carvalho & Chavez-Demoulin (2017) proposed integrating covariate information through the parameters indexing angular density models, but we want to avoid such strong parametric assumptions. In the asymptotic dependence case, a nonparametric approach based on a baseline density for the dependence, modified through a density ratio to obtain a set of different dependence models according to covariate information, was developed by De Carvalho & Davison (2014). More recently, Mhalla, Chavez-Demoulin & Naveau (2017) proposed very flexible modeling of the extremal dependence based on a generalized additive model with shape constraints using blockwise maxima data. In this paper, we extend their approach from maxima to threshold exceedance data, and we lift the restriction on asymptotic dependence by proposing a set of tools that work for both asymptotic dependence classes.

Classical extreme value theory and practice are founded on max-stable processes, which are the only non-trivial limits arising from pointwise normalized maxima taken over independent replicates of a continuous stochastic process , with a set of indexes. When observed over a finite-dimensional set, the -dimensional joint distribution of a max-stable random vector is of multivariate extreme value type with marginal generalized extreme value distributions and a max-stable dependence structure (de Haan & Ferreira, 2006, Section 9.2). A handy representation of is obtained when the marginal distributions are transformed to a common unit Fréchet distribution . The multivariate max-stable distribution is then written as with the exponent function measuring the strength and form of the dependence; the max-stable process is then said to be simple. We will assume this property throughout, without loss of generality. The convergence of the extremal dependence of any distribution in the domain of attraction of is chacterized by the property of multivariate regular variation (Resnick, 1987):

(1) |

Max-stable models are useful when data are asymptotically dependent. When this assumption does not hold, limiting models are of little practical use as the property (1) fails to distinguish between asymptotic independence and exact independence (Heffernan & Resnick, 2007), and models for the joint tails based on hidden regular variation (Resnick, 2002) are indispensable.

Joint tail characterizations at the interface of asymptotic dependence and independence are often presented in a bivariate setup. In more than two dimensions, some pairs of components may be asymptotically dependent while others are not; for a deeper theoretical treatment, see de Haan & Zhou (2011). We say here that a stochastic process is asymptotically independent if all of its pairs of components have this property; the only possible max-stable limit in this case is full independence. In view of our aim of modeling the joint tail decay while abstracting away from the univariate marginal distributions, we now suppose that the -dimensional random vector is nonnegative with Pareto marginal distributions by applying a marginal probability integral transform to if necessary. The use of unit Fréchet margins (denoted by ), which are tail equivalent to , would yield the same representations in the following.

With asymptotic independence of and , their dependence strength vanishes as we move further into the joint tail. A general flexible representation of the joint tail, leading to a broad class of models suitable for asymptotic independence, was introduced by Ledford & Tawn (1996, 1997) for bivariate random vectors and then generalized to the multivariate setup by Wadsworth & Tawn (2013). We denote by a direction (also called weight or angle) that is on the unit simplex in . Any positive random vector can be represented as . The joint tail representation is

(2) |

where is a slowly varying function at infinity for any value of held fixed. The function is called the angular dependence function (Wadsworth & Tawn, 2012, 2013). It must satisfy certain shape constraints and describes the decay rate along rays in direction . In the case of asymptotic dependence, we have and as . If (1) holds, then tends to a limit expressed through values of the exponent function , which for is . In the bivariate case, generalizes the coefficient of tail dependence introduced by Ledford & Tawn (1996) and the dependence measure (Coles et al., 1999), where characterizes the joint tail decay rate along the diagonal of the first hyperoctant. The coefficient can be defined in dimensions as . In a similar way, the extremal coefficient (Schlather & Tawn, 2003) is related to via .

In the remainder of the paper, Section 2 introduces projections based on weighted maxima and minima of multivariate random vectors leading to convenient univariate representations with appealing distributional properties for characterizing dependence in the tail. In Section 3, we develop nonparametric inference for such dependence based on a generalized additive modeling framework allowing the inclusion of covariate influence. The simulation study in Section 4 illustrates the good performance of our methods when the model is exact for the data but also when it represents an asymptotic approximation to the true data distribution. In the application presented in Section 5, we use our new techniques to reveal the influence of spatial distance and time on the co-occurrence patterns of extreme values in a large dataset of French air pollution data. Conclusions with an outlook on future work are given in Section 6.

## 2 Max- and min-projections

We define the notions of max-projection and min-projection of a vector with respect to a weight vector . The max-projection is given as , and the min-projection is defined as . The link between the two projections is established through the inversion using the convention that and .

### 2.1 Asymptotic dependence

We assume that the random vector with distribution function is in the max-domain of attraction of a simple max-stable process . The exponent function in (1) characterizing a max-stable random vector is positive, continuous, convex, and homogeneous of order such that for . Exploiting the homogeneity of , an alternative characterization of the extremal dependence is possible through the Pickands dependence function (Pickands, 1981), where

(3) |

Given , (3) implies . We denote by the max-projection of the random vector . Then is Fréchet distributed with scale parameter reflecting the level of dependence in at an angle : For , we get

(4) | |||||

From (3), we conclude that the scale parameter is equal to , and we remark that the corresponding min-projection follows an exponential distribution with rate . As argued by Ledford & Tawn (1997), a censored version of the multivariate max-stable distribution is a natural model for asymptotic dependent threshold exceedances. In view of convergence (1), this corresponds to applying the approximation for large values of or equivalently, to replacing by using the logarithmic series approximation for small . We propose to consider projected data values using standard exponentially distributed , where we use ”” to emphasize the tail inversion. As left-censoring the upper tail is equivalent to right-censoring the lower tail after inverting the tails, we model, below a small fixed threshold, the projected values by the distribution, while we censor the values above the threshold.

Our max-projection is closely related to the max-projection used by Mhalla, Chavez-Demoulin & Naveau (2017) for estimating max-stable dependence from maxima data where no censoring is applied. They define

(5) |

which corresponds to . The link between the distribution functions of the two max-projections arises from the fact that implies for a random variable . The components of the direction in (5) are inversed as we define the Pickands dependence function differently from Mhalla, Chavez-Demoulin & Naveau (2017).

### 2.2 Residual dependence in asymptotic independence

In this section, we consider marginal transformations of the data vector to either standard Pareto margins or to unit exponential margins . We define the min-projection of as for a direction in . Based on the multivariate tail representation (2), the function

is regularly varying at infinity with index . Thus,

(6) | |||||

where as . Equivalently, the excesses of the structure variable above a high threshold are exponentially distributed with rate in the limit: Setting in (6) yields

(7) |

By using the tail structure (2) for characterizing asymptotic independence and appropriate marginal pretransformations, we can therefore model the positive excess of the min-projection above a fixed high threshold through an -distribution.

A more specific yet still very flexible class of asymptotically independent processes satisfying (2) are the inverted max-stable processes discussed in Wadsworth & Tawn (2012). In unit Pareto margins, the limit relation (2) is exact for those models and can be written

(8) |

with the Pickands dependence function of the associated max-stable process, which takes the role of the angular dependence function . If is an inverted max-stable process, then the original max-stable process is recovered through the transformation . The slowly varying function in (2) is equal to in this special case, such that

(9) |

Therefore, if we assume an inverted max-stable dependence in the joint tail of , we can model data values of above a fixed high threshold through the exponential distribution in (9) while censoring values below . This is different from the general setup with arbitrary unknown where only the excesses above the threshold are used, not the information contained in the censoring indicator .

### 2.3 The case of Gaussian dependence

Suppose that and are unit Fréchet distributed. On one hand, Ledford & Tawn (1996) showed that if and have bivariate normal dependence with correlation , then

(10) |

On the other hand, if the distribution function of is a bivariate extreme value distribution with exponent function and extremal coefficient , then

(11) |

By assuming asymptotic dependence for data from a Gaussian copula with , we would fit an extreme value model at a sub-asymptotic level. Such model misspecification will result in biased extremal coefficient estimates smaller than , the value for asymptotic independence, owing to residual dependence in the data at finite levels. We can approximately quantify this bias by equating (10) and (11). The resulting subsasymptotic extremal coefficient is equal to

(12) |

with the marginal threshold. This relationship between , , and is displayed in Figure 1 where the threshold is set to high quantiles of the unit Fréchet distribution, with .

For any , the coefficient tends to when tends to infinity. This is obtained from an application of the binomial series formula to approximate the right-hand side of (12).

The extremal coefficient behaves as expected in the limit cases of perfect independence () and perfect dependence (). By taking high thresholds such that residual dependence in the exceedance data vanishes, the extremal coefficient is close to unless the correlation . Detection of asymptotic independence in the Gaussian dependence case with finite sample size was studied in Bücher et al. (2011) where the authors proposed new estimators of the Pickands dependence function.

### 2.4 An illustration on data

In Sections 2.1 and 2.2, we developed asymptotically justified univariate exponential models for both classes of asymptotic dependence and independence using projections. Equation (2) characterizes the joint tail behavior of random vectors with

where has a well-defined positive limit in the asymptotic dependence case if the regular variation property (1) holds. If we look at data under the assumption of asymptotic dependence, then we calculate the projections to estimate the values of the Pickands dependence function. If we start with the assumption of asymptotic independence, then we calculate to estimate the values of the angular dependence function. A max-stable model characterized by the Pickands function provides an appealing link between the two dependence classes by the possibility of modeling asymptotic independence with the corresponding inverted max-stable model whose function is known a priori and whose angular dependence function is . Notice that the link between the two projections is as follows, using probability integral transformations:

Figure 2 illustrates these relationships on bivariate data simulated according to an extreme value distribution, where various marginal scales and projections at are considered. Empirical thresholds at the and levels are used to censor the upper and lower tails, respectively.

## 3 Inference and regression modeling of dependence

The theory above shows that the projection techniques are appropriate for modeling asymptotic dependence and asymptotic independence in threshold excesses. In either case, we aim to develop a dependence model for the upper joint tail of the data distribution. With asymptotic independence, we transform the data margins to the exponential distribution, calculate the min-projection, and model its excesses above a high threshold through an exponential distribution, whose rate is given by the angular dependence function evaluated at the projection angle. More specifically, the latter corresponds to the Pickands dependence function if we utilize an inverted max-stable model. With asymptotic dependence, we transform the data margins to the unit Fréchet distribution, calculate the max-projection, invert the latter, and model the resulting deficits below a small fixed threshold through an upper-censored exponential distribution, whose rate is given by the Pickands dependence function evaluated at the projection angle.

By using min- and max-projections, inference is based on a univariate variable, which frees us from handling multivariate censoring schemes in likelihood-based approaches and the resulting computational burden. A difference between the two dependence regimes arises in the construction of the projection. On one hand, the max-projection in the case of asymptotic dependence retains only the highest value, which makes sense since the asymptotic theory implies that the limit model gives a good approximation for the components that are large. On the other hand, the min-projection in the case of asymptotic independence retains the smallest value, which contains crucial information on the faster tail decay rate along various directions , while the max-projection would converge to a unit exponential limit carrying no useful information in this case.

In either case, inference for the dependence structure is a two-step procedure based on a univariate structure variable. In the first step, the margins are transformed to the unit Fréchet scale and then inverted to the unit exponential scale (asymptotic dependence) or directly to the unit exponential scale (asymptotic independence), and we then calculate the min-projection. The second step consists of fitting the appropriate exponential model (4) or (7) to and , respectively. In the following, we detail the second step of the inference procedure with a view towards estimating the influence of a set of covariates .

### 3.1 Inference for the case of asymptotic dependence

Given observations of the random vector with unit Fréchet margins, we suppose that is in the max-domain of attraction of an extreme value distribution with Pickands dependence function . We fix a direction and calculate the observed structure variables , . To put focus on the dependence of extremes in the second step of the inference procedure, we censor observations that are above a low threshold , for instance, chosen as the empirical quantile of . The likelihood function is with contributions

When a set of covariates is available, we propose using a generalized additive model (GAM) structure (Wood, 2017) to model the dependence in the extremes, similar to the approach of Mhalla, Chavez-Demoulin & Naveau (2017), who estimated the Pickands dependence function based on the block maxima approach. The dependence of on is assumed to be at the extremal dependence level, i.e., the covariates solely influence the Pickands dependence function . This assumption implies no loss of generality as the marginal inference is performed in a separate step. A very general model for arises from supposing the semi-parametric form

(13) |

where is a link function and and are subvectors of , or products of covariates if interactions between some covariates are considered. The column vector gathers linear coefficients whereas are smooth functions supported on closed intervals and admitting a finite quadratic penalty representation (Green & Silverman, 1993). The column vector gathers all parameters to be estimated in the model, i.e., the vector and the linear basis coefficients of each of the smooth functions . Based on a sample , we estimate the GAM (13) by maximizing the penalized log-likelihood

(14) |

with ,

The integrals in (14) are componentwise roughness penalties with smoothing parameters that balance between the smoothness of the model and its goodness of fit. Higher values of yield smoother fitted curves. The related effective degrees of freedom of each smooth function are defined as , where is the positive definite penalty matrix associated to the basis representation of (Wood, 2017, Chapter 5). The maximization of the penalized log-likelihood (14) is performed based on an outer-iteration procedure. At each iteration, and are estimated separately by penalized iteratively re-weighted least squares (PIRLS) and a prediction error method (Generalized Cross Validation), respectively; see Wood (2017) for a detailed description of the available methods for GAM fitting.

The penalized maximum log-likelihood estimator then defines the estimate of the Pickands dependence function evaluated at . A vast amount of literature on GAM-related theory is available and includes Wood (2004, 2006); Marra & Wood (2011), among others.

### 3.2 Inference for the case of asymptotic independence

Given observations of the random vector with standard Pareto margins, we suppose that has an asymptotic independent tail structure as in (2). We fix a direction and calculate the observed structure variables

We fix a high threshold , for instance, chosen as the the empirical quantile of , and extract the sample of positive excesses , with a positive number of excesses . Then, we maximize the likelihood composed of contributions

(15) |

Given covariate vectors , we proceed as for the asymptotic dependence case by proposing a GAM (13) for , i.e.,

which is fitted by maximizing a penalized version of the likelihood (15) similarly to (14) and results in the estimate of the tail dependence function evaluated at .

## 4 Simulation study

We study the properties of the estimators and when they are related to a covariate vector using the link function , a modification of the link resulting in values within . We focus on the estimation of these two dependence functions in the bivariate case at , which yields estimates of the covariate-dependent coefficients of extremal dependence and tail dependence . Realistic sample sizes are chosen, similar to those in the subsequent application.

### 4.1 Case of asymptotic dependence

Suppose that is an asymptotically dependent random vector with unit Fréchet margins. We first consider the max-domain of attraction (MDA) setting where our model represents an asymptotic approximation of the exact tail behavior in the data.

#### 4.1.1 Data distribution in the maximum domain of attraction

We focus on the bivariate case with a random vector with unit Fréchet margins and an Archimedean copula (Nelsen, 2006, Chapter 4) with generator for some , i.e., the distribution function of is

(16) |

This distribution is in the max-domain of attraction of the logistic bivariate extreme value distribution (Tawn, 1990) with parameter (Fougères, 2004), and can be simulated using the algorithm from Nelsen (2006, example 4.15, p. 144). We estimate the covariate-dependent extremal coefficient using deficits of by setting and fixing different threshold levels.

Equation (4) holds when at least one variable exceeds some high threshold, and the limiting dependence structure in the tails of is equal to that of a bivariate logistic extreme value distribution (Coles & Tawn, 1991). We empirically study the bias of the covariate-dependent extremal coefficient estimator resulting from the application of the limiting extreme value model.

We simulate from a covariate-dependent model (16) with the following generator

where is observed at equally spaced values and

Thus, the covariate-dependent extremal coefficient is

(17) |

The sample size is chosen between for a threshold at the level and at the level, such that an average of threshold deficits of arises for a fixed value of . Additionally, we empirically quantify the performance of our estimator in an overparametrized setting where one of the covariates has no influence on the response by considering a dummy categorical covariate with two levels. In the true model (17), the Pickands dependence function does not depend on . We simulate the observed values of at random and include level-dependent predictors in the GAM structure for ; thus, the estimated model is

where , are the estimated intercepts and , and are the estimated smooth curves describing the dependence of the Pickands function at on and at each level of . The Monte Carlo procedure is based on repetitions, and percentile confidence intervals are constructed. Figure 3 displays the pointwise mean estimates of in both levels of with a threshold set at the level.

As expected, the inclusion of in the model does not prevent it from recovering the true dependence structure of the extremal coefficient on in both levels. Moreover, when considering the point estimates (i.e., the single-run experiments), the results indicate that the variable is not statistically significant in the fitted GAM. Next, we remove this artificial covariate from the model and compare the root mean squared error (RMSE) of the resulting extremal coefficient fits obtained for different threshold levels. The RMSE over the samples is defined as

where is the covariate-dependent extremal coefficient estimate obtained from the th sample. Figure 4 shows the RMSE of for different threshold levels. The RMSE tends to decrease when we take lower threshold levels. Similar shapes of the RMSE function are observed for threshold levels below , with a noticeable decrease in the RMSE for values of corresponding to weak extremal dependence, i.e., for values of between and .

#### 4.1.2 Max-stable data distribution

We now simulate bivariate samples with max-stable dependence where our asymptotic dependence model class contains the exact model. The observations come from the logistic extreme value copula with unit Fréchet margins and dependence parameter , . The covariate-dependent extremal coefficient is . Figure 5 displays the RMSE of the extremal coefficient estimates with respect to for the , , , and threshold levels. The RMSE is mostly unaffected by the threshold level as our modeling assumption (4) holds exactly, as opposed to the sub-asymptotic setting in Section 4.1.1. Therefore, we observe a lower RMSE (see Figure 4), as there is no estimation bias resulting from penultimate modeling.

### 4.2 Case of asymptotic independence

We consider two models for an asymptotically independent random vector with standard exponential margins. The dependence in the tails of depends on the covariates and , where is a categorical covariate with two levels and . Bivariate Gaussian dependence with correlation is observed in the first level of , and an inverted logistic extreme value dependence with dependence parameter is observed in the second level of . The dependence on covariates is as follows:

(18) | |||||

(19) | |||||

Then,

The strength of the tail dependence varies from low (small values of , large values of ) to strong (large values of , small values of ) where unless or (perfect dependence).

For assessing the performance of our generalized additive modeling framework for the coefficient of tail dependence when different strengths of tail dependence are observed, we construct the following model for the angular dependence function:

(20) |

with intercept and smooth functions and of . Our simulation study is based on an average of threshold exceedances of for a fixed value of the covariate vector ; the sample size varies between for a threshold at the level and at the level. As before, repetitions are carried out. We consider different threshold levels to quantify the bias resulting from the estimation of based on (7) at a finite threshold or equivalently (2) at a finite level .

The RMSE defined as

with the covariate-dependent tail dependence coefficient estimate obtained from the th bootstrap sample, is displayed in Figure 6. In the Gaussian case, the RMSE decreases when the threshold level increases but increases when stronger dependence is considered, i.e., when (or , equivalently) increases. This is due to the slow convergence of tail measures at sub-asymptotic levels for the Gaussian dependence; see Coles et al. (1999). In the inverted extreme value case, the estimator performance is largely unaffected by the threshold level as Equation (8) entails that the approximation (2) is exact at finite levels.

## 5 Application to nitrogen dioxide data

We illustrate the modeling of covariate-dependent tail dependence on a nitrogen dioxide () data set, extracted from the European air quality database for pollutants AirBase^{2}^{2}2https://www.eea.europa.eu/data-and-maps/data/aqereporting-2. It comprises measurement stations in France with hourly records of observed over 14 years between 1999 and 2012; see the map of stations in Figure 7. With produced mostly by the burning of fossil fuel and motor vehicle exhaust, we propose to distinguish traffic stations where the level is predominantly determined by nearby traffic ( stations) from background stations, often located in built-up areas, whose level of is influenced by a combination of many sources ( stations). Figure 8 shows measurements in for two background stations located km apart (“Metz-Centre” and “Metz-Borny”) and for two traffic stations located km apart (“Auto A1-Saint-Denis” and “Rue Bonaparte”). The measurements at the two background stations seem to follow a similar pattern, with large values recorded around the same time periods and low values observed during the summer season when most of the is transformed into ozone through sunlight. The comparison of the measurements for the two traffic stations is less straightforward. The magnitude of observations is much higher than for the background sites. Large observations seem to occur mostly locally although the stations being very close.

Such insights justify the distinction between the two station types when investigating the co-occurrence of large concentrations measured at pairs of stations. Moreover, it is natural to ask whether the frequency of co-occurrences of high pollution levels has changed over time, for instance, as a result of regulatory measures. For the present analysis, we reduce the dimension of data by considering monthly maxima at each of the stations, which avoids modeling of hourly patterns in concentration levels and of intraday dependence between the measurements (Shi et al., 2014).

At each station, the marginal distribution of monthly maxima is modeled using a generalized extreme value (GEV) distribution. Its parameters for location and scale are allowed to vary smoothly with the year and month of the observed maxima as follows:

where and denote the year and month of the observed maxima, respectively, and are smooth functions accounting for the trend, and and are cyclic smooth functions accounting for the seasonal component in the data (Wood, 2017, Section 5.3). Likelihood ratio tests are performed to assess whether we need smoothly varying terms or whether parametric and sinusoidal terms already provide a good fit. The final fitted model is then used to transform the data at each station to the unit Fréchet distribution using the probability integral transform.

We focus on modeling the dependence between high measurements recorded at pairs of stations. The dependence should naturally tend to decay with the distance between the stations in each pair. Therefore, we include distance (in kilometers) as a covariate in our model along with the type of area (traffic/background) and time (year), as discussed above. We consider the great circle distance between the stations, i.e., the shortest distance over Earth’s surface, and fix the distance resolution at km, which represents the minimal distance by which stations must be separated to be distinguishable. This setting leads to distinct values for the distance covariate, pairs of observations for the background stations and pairs for the traffic stations. Both tail measures, based on the assumption of either asymptotic dependence or asymptotic independence, are modeled. We construct the following “full” models for the extremal coefficient and the tail dependence coefficient:

(21) | |||||

(22) | |||||

where represents time (in years), the distance between the stations in each pair (in km), and . The interaction between time and distance is represented using a tensor product basis (Wood, 2017, Section 5.6). We fit models (21) and (22) based on the deficits of the quantile of and the exceedances of the quantile of with , respectively.

To start, we conduct a simpler, purely spatial analysis and consider the models without time effects and such that data are pooled together over the whole period for each pair of stations during the estimation. Estimated summaries and are shown in Figure 9; they clearly hint at asymptotic independence with estimates and bootstrap-based pointwise confidence intervals of very close to , while those of are clearly bounded away from . Overall, the joint tail decay rates in asymptotic independence appear to be quite fast with pointwise confidence envelopes of contained between and approximately. A partial explanation for this relatively weak dependence is that our univariate models have already appropriately removed seasonal trends in the data, such that dependence in the resulting residuals cannot arise from intermediate-range clustering in space and time. As a matter of fact, as seasonal patterns are typically spatial, filtering out these patterns would result in a spatial de-clustering in the region where the seasonal features are observed. We detect a stronger weakening of dependence with increasing distance in the background stations, which makes sense because the peaks in the concentrations are strongly influenced by nearby traffic.

Next, we consider regression fits (21) and (21) with spatial and temporal components; i.e., we have a more complex, higher-dimensional model for the predictors, and we expect the estimation uncertainty to be higher. Table 1 summarizes both fitted models. All the considered covariate effects, except for the distance between traffic stations when the tail dependence coefficient is modeled, are statistically significant.

estimate/edf | |||||||||
---|---|---|---|---|---|---|---|---|---|

-value | |||||||||

estimate/edf | |||||||||

-value |

This confirms our intuition that high concentrations are relatively localized for the traffic stations. A block bootstrap procedure treating measurements from each month and each year as independent for each type of area is used to assess the uncertainty. Figure 10 shows cross sections of the estimates of the extremal coefficient for the traffic and background stations; we fix one of the two continuous covariates to show the smooth effect of the other.

For both area types, the estimates are very close to the upper bound of the extremal coefficient for almost all values of and . This result implies weak extremal dependence between the measurements at the pairs of stations, and we can suppose asymptotic independence in the data. We now focus on the estimates of the tail dependence coefficient; the sub-asymptotic modeling of the tails should be more informative in this case. Figure 11 displays cross sections of the estimate of the tail dependence coefficient for the different types of area.

For the considered years, the residual tail dependence is relatively weak and largely unaffected by the distance between traffic stations, owing to the very localized features of traffic pollution inducing high concentrations of . Unreported results with a distance resolution of km have shown globally higher tail dependence estimates (over time) for close traffic stations that are at most km apart. The uncertainty in the estimators is relatively high due to the small amount of information available from the traffic stations. For background stations, the smooth effect of distance on the tail dependence is more pronounced with a decrease in the dependence at larger distances. The sharpness of the decrease shows some variation over time, with higher tail dependence for close background stations observed in . The smooth effect of time on the tail dependence is important mainly for small distances with an overall slight increase over time for the traffic stations and a bump (increase) in the dependence around and for the background stations. These effects are observable for stations up to km apart although with a lower magnitude of the dependence measure.

## 6 Conclusion

Starting from a -dimensional random vector with either asymptotic dependence or independence, we developed min- and max-projection techniques allowing us to simplify the joint tail characterization problem to univariate modeling with well-understood exponential distributions for which generalized additive modeling under censoring is feasible and well-known from survival modeling. The exponential rate carries crucial information about the form and the rate of the joint tail decay in different directions. This setup facilitates flexible inference for the tail dependence as it is based on the excesses or deficits of a univariate exponential random variable while censoring observations that do not contribute to the joint tail, and it allows us to include multiple covariates of different types through the GAM framework. Although we focused on estimating the covariate influence for a fixed direction, we can apply the projection technique of Mhalla, Chavez-Demoulin & Naveau (2017) in different directions to obtain smooth and valid estimates of the Pickands dependence function or the angular dependence function under shape constraints and for a fixed set of covariates.

Our application demonstrates that it is useful to apply both projection techniques in practice to compare covariate-driven estimates for tail dependence summaries in each of the two asymptotic regimes. Our pairwise modeling of measurements in France to investigate the effect of time and spatial distance on the joint tail behavior showed strong evidence against asymptotic dependence. The results of our application gave strong support for asymptotic independence with estimated extremal coefficients close to and the confidence intervals of tail dependence coefficients bounded away from . Our methodology constitutes an important step toward the distinction between asymptotic dependence and independence, when there is no clear evidence for one of the two models or when different model classes arise for different covariate configurations. Formal hypothesis testing of asymptotic dependence against asymptotic independence for a fixed set of covariates would be an important extension. As our censoring mechanisms select different observations according to the two models, likelihood-based tests are not directly applicable, but non-parametric tests of extremal dependence could provide guidance (e.g., Dey & Yan, 2015, Chapters 17 and 18). Finally, depending on the application context, a threshold that varies with covariates and/or directions could be used to determine excesses and deficits, which could reduce or homogenize estimation bias and uncertainty.

## References

- (1)
- Bücher et al. (2011) Bücher, A., Dette, H. & Volgushev, S. (2011), ‘New estimators of the Pickands dependence function and a test for extreme-value dependence’, The Annals of Statistics 39, 1963–2006.
- Coles et al. (1999) Coles, S., Heffernan, J. E. & Tawn, J. A. (1999), ‘Dependence measures for extreme value analyses’, Extremes 2, 339–365.
- Coles & Tawn (1991) Coles, S. & Tawn, J. A. (1991), ‘Modelling extreme multivariate events’, Journal of the Royal Statistical Society, Series B (Statistical Methodology) 53, 377–392.
- De Carvalho & Davison (2014) De Carvalho, M. & Davison, A. C. (2014), ‘Spectral density ratio models for multivariate extremes’, Journal of the American Statistical Association (506), 764–776.
- de Haan & Ferreira (2006) de Haan, L. & Ferreira, A. (2006), Extreme Value Theory: An Introduction, Springer, New York.
- de Haan & Zhou (2011) de Haan, L. & Zhou, C. (2011), ‘Extreme residual dependence for random vectors and processes’, Advances in Applied Probability 43, 217–242.
- Dey & Yan (2015) Dey, D. & Yan, J. (2015), Extreme Value Modeling and Risk Analysis, Chapman and Hall/CRC, New York.
- Fougères (2004) Fougères, A.-L. (2004), ‘Multivariate extremes’, In Extreme Values in Finance, Telecommunications, and the Environment, Ed. Finkenstädt, B. and Rootzén, H., Monographs on Statistics and Applied Probability 99.
- Green & Silverman (1993) Green, P. & Silverman, B. (1993), Nonparametric Regression and Generalized Linear Models: A roughness penalty approach, Chapman & Hall/CRC Monographs on Statistics & Applied Probability, Taylor & Francis.
- Heffernan & Resnick (2007) Heffernan, J. E. & Resnick, S. I. (2007), ‘Limit laws for random vectors with an extreme component’, Annals of Applied Probability 17, 537–571.
- Ledford & Tawn (1996) Ledford, A. W. & Tawn, J. A. (1996), ‘Statistics for near independence in multivariate extreme values’, Biometrika 83, 169–187.
- Ledford & Tawn (1997) Ledford, A. W. & Tawn, J. A. (1997), ‘Modelling dependence within joint tail regions’, Journal of the Royal Statistical Society: Series B (Statistical Methodology) 59, 475–499.
- Marra & Wood (2011) Marra, G. & Wood, S. N. (2011), ‘Practical variable selection for generalized additive models’, Computational Statistics and Data Analysis 55, 2372–2387.
- Mhalla, Chavez-Demoulin & Naveau (2017) Mhalla, L., Chavez-Demoulin, V. & Naveau, P. (2017), ‘Non-linear models for extremal dependence’, Journal of Multivariate Analysis 159, 49–66.
- Mhalla, de Carvalho & Chavez-Demoulin (2017) Mhalla, L., de Carvalho, M. & Chavez-Demoulin, V. (2017), ‘Regression type models for extremal dependence’, arXiv preprint arXiv:1704.08447 .
- Nelsen (2006) Nelsen, R. B. (2006), An Introduction to Copulas, Springer series in statistics, 2nd ed edn, Springer, New York.
- Pickands (1981) Pickands, J. (1981), Multivariate extreme value distributions, in ‘Proc. 43rd Session of the International Statistical Institute’, pp. 859–878.
- Resnick (1987) Resnick, S. I. (1987), Extreme values, regular variation and point processes, Springer, New York.
- Resnick (2002) Resnick, S. I. (2002), ‘Hidden regular variation, second order regular variation and asymptotic independence’, Extremes 5, 303–336.
- Schlather & Tawn (2003) Schlather, M. & Tawn, J. A. (2003), ‘A dependence measure for multivariate and spatial extreme values: Properties and inference’, Biometrika 90, 139–156.
- Shi et al. (2014) Shi, P., Xie, P.-H., Qin, M., Si, F.-Q., Dou, K. & Du, K. (2014), ‘Cluster analysis for daily patterns of SO2 and NO2 measured by the DOAS system in Xiamen’, Aerosol and Air Quality Research 14, 1455–1465.
- Tawn (1990) Tawn, J. A. (1990), ‘Modelling multivariate extreme value distributions’, Biometrika 77, 245–253.
- Wadsworth & Tawn (2012) Wadsworth, J. L. & Tawn, J. A. (2012), ‘Dependence modelling for spatial extremes’, Biometrika 99, 253–272.
- Wadsworth & Tawn (2013) Wadsworth, J. L. & Tawn, J. A. (2013), ‘A new representation for multivariate tail probabilities’, Bernoulli 19(5B), 2689–2714.
- Wood (2004) Wood, S. N. (2004), ‘Stable and efficient multiple smoothing parameter estimation for generalized additive models’, Journal of the American Statistical Association 99, 673–686.
- Wood (2006) Wood, S. N. (2006), ‘On confidence intervals for generalized additive models based on penalized regression splines’, Australian & New Zealand Journal of Statistics 48, 445–464.
- Wood (2017) Wood, S. N. (2017), Generalized Additive Models: An Introduction with R, 2 edn, Chapman and Hall/CRC.