Statistical Estimation of Composite Risk Functionals and Risk Optimization Problems

# Statistical Estimation of Composite Risk Functionals and Risk Optimization Problems

Darinka Dentcheva Stevens Institute of Technology, Hoboken, NJ 07030, USA; Email: darinka.dentcheva@stevens.edu    Spiridon Penev The University of New South Wales, Sidney, 2052 NSW, Australia; Email: s.penev@unsw.edu.au    Andrzej Ruszczyński Rutgers University, Piscataway, NJ 08854, USA; Email: rusz@rutgers.edu
###### Abstract

We address the statistical estimation of composite functionals which may be nonlinear in the probability measure. Our study is motivated by the need to estimate coherent measures of risk, which become increasingly popular in finance, insurance, and other areas associated with optimization under uncertainty and risk. We establish central limit formulae for composite risk functionals. Furthermore, we discuss the asymptotic behavior of optimization problems whose objectives are composite risk functionals and we establish a central limit formula of their optimal values when an estimator of the risk functional is used. While the mathematical structures accommodate commonly used coherent measures of risk, they have more general character, which may be of independent interest.
Keywords: Risk Measures Composite Functionals Central Limit Theorem

## 1 Introduction

Increased interest in the analysis of coherent measures is motivated by their application as mathematical models of risk quantification in finance and other areas. This line of research leads to new mathematical problems in convex analysis, optimization and statistics. The uncertainty in risk assessment is expressed mathematically as a functional of random variable, which may be nonlinear with respect to the probability measure. Most frequently, the risk measures of interest in practice arise when we evaluate gains or losses depending on the choice , which represents the control of a decision maker and random quantities, which may be summarized in a random vector . More precisely, we are interested in the functional , which may be optimized under practically relevant restrictions on the decisions . Most frequently, some moments of the random variable are evaluated. However, when models of risk are used, the existing theory of statistical estimation is not always applicable.

Our goal is to address the question of statistical estimation of composite functionals depending on random vectors and their moments. Additionally, we analyse the optimal values of such functionals, when they depend on finite-dimensional decisions within a deterministic compact set. The known coherent measures of risk can be cast in the structures considered here and we shall specialize our results to several classes of popular risk measures. We emphasize however, that the results address composite functionals of more general structure with a potentially wider applicability.

Axiomatic definition of risk measures was first proposed in [18]. The currently accepted definition of a coherent risk measure was introduced in [1] for finite probability spaces and was further extended to more general spaces in [34, 13]. Given a probability space , we consider the set of random variables, defined on it, which have finite -th moments and denote it by . A coherent measure of risk is a convex, monotonically increasing, and positively homogeneous functional , which satisfies the translation equivariant property for all . Here and we assume that represent losses, i.e., smaller realizations are preferred. Related concepts are introduced in [31, 12].

A measure of risk is called law-invariant, if it depends only on the distribution of the random variable, i.e., if for all random variables having the same distribution.

A practically relevant law-invariant coherent measure of risk is the mean–semideviation of order (see [24, 25], [36, s. 6.2.2]), defined in the following way:

 ϱ(X)=E[X]+κ∥∥(X−E[X])+∥∥p=E[X]+κ[E[(max{0,X−E[X]})p]]1p, (1)

where . Note the nonlinearity with respect to the probability measure in formula (1).

Another popular law-invariant coherent measure of risk is the Average Value at Risk at level (see [30, 26]), which is defined as follows:

 AVaRα(X)=1α∫11−αF−1X(β)dβ=minη∈R{η+1αE[(X−η)+]}. (2)

Here, denotes the distribution function of . The reader may consult, for example, [36, Chapter 6] and the references therein, for more detailed discussion of these risk measures and their representation.

The risk measure plays a fundamental role as a building block in the description of every law-invariant coherent risk measure via the Kusuoka representation. The original result is presented in [20] for risk measures defined on , with an atomless probability space. It states that for every law-invariant coherent risk measure , a convex set exists such that for all , it holds

 ϱ(X)=supm∈M∫10AVaRα(X)m(dα). (3)

Here denotes the set of probability measures on the interval . This result is extended to the setting of spaces with ; see [14], [27], [28], [36], [9], and the references therein.

The extremal representation of on the right hand side of (2) was used as a motivation in [19] to propose the following higher-moment coherent measures of risk:

 ϱ(X)=minη∈R{η+1α∥(X−η)+∥p},p>1. (4)

These risk measures are special cases of a more general family considered in [7]; they are also examples of optimized certainty equivalents of [3]. In the paper [9], the explicit Kusuoka representation for the higher-order risk measures (4) was described by utilising duality theorems from [29]. These risk measures are used for portfolio optimization in [19], where their advantages in in comparison to the classical mean-variance optimization model of Markowitz ([21, 22]) is demonstrated on examples. The recent work [23] indicates that if such type of risk measure is used as a risk criterion in European option portfolio optimization, the time evolution of the portfolio is superior to the evolution of a portfolio optimized with respect to the AVaR risk or with respect to the mean-variance optimization model of Markowitz. Similar observations were recently made in [15].

A connection of measures of risk to the utility theories is discussed in the literature. Many of the risk measures of interest can be expressed via optimization of the so-called optimized certainty equivalent [3] for a suitable choice of the utility function. Relations of risk measures to rank-dependent utility functions are given in [13]. In [10], it is established that coherent measures of risk are a numerical representation of certain preference relation defined on the space of bounded quantile functions.

In practical applications, we deal with samples and stochastic models of the underlying random quantities. Therefore, the questions pertaining to statistical estimation of the measures of risk are crucial to the proper use of law-invariant measures of risk. Several measures of risk have an explicit formula, which can be used as a plug-in estimator, with the original measure replaced by the empirical measure. The empirical quantile is a natural estimator of the Value at Risk. A natural empirical estimator of leads to the use of the -statistic (see [16, 8]). Furthermore, the Kusuoka representation, as well as the use of distortion functions in insurance has motivated the construction and analysis of empirical estimates of spectral measures of risk using -statistic. We refer to [16, 6, 17, 4, 37, 2] for more details on this approach. Some risk measures, such as the tail risk measures of form (4), cannot be estimated via simple explicit formulae but are obtained as a solution of a convex optimization problem with convex constraints. Although asymptotic behavior of optimal values of sample-based expected value models has been investigated before (see [32, Ch. 8], [36, Ch. 5] and the references therein), the existing results do not address models with risk measures.

Our paper is organized as follows. Section 2 contains the key result of our paper, which establishes a central limit formula for a composite risk functional. We provide a characterization of the limiting distribution of the empirical estimators for such functionals. Section 3, contains a central limit formula for risk functionals, which are obtained as a the optimal value of composite functionals. Section 4 provides asymptotic analysis and central limit formulae for the optimal value of optimization problems which use measures of risk in their objective functions. We pay special attention to some popular measures and we discuss several illustrative examples in Sections 2,3, and 4. In Section 5, we perform a simple simulation study to assess the accuracy of our approximations. Section 6 concludes.

## 2 Estimation of composite risk functionals

In the first part of our paper, we focus on functionals of the following form:

 ϱ(X)=E[f1(E[f2(E[⋯fk(E[fk+1(X)],X)]⋯,X)],X)],

where is an -dimensional random vector, , , with and . Let be the domain of the random variable . We denote the probability distribution of by .

Given a sample of independent identically distributed observations, we consider the following plug-in empirical estimate of the value of :

 ϱ(n)=n∑i0=11n[f1(n∑i1=11n[f2(n∑i2=11n[ ⋯fk(n∑ik=11nfk+1(Xik),Xik−1)] ⋯,Xi1)],Xi0)]

Our construction is motivated by the aim to estimate coherent measures of risk from the family of mean–semideviations ([24, 25]).

###### Example 2.1 (Semideviations).

Consider the functional (1) representing the mean–semideviation of order . In this case, we have , and

 f1(η1,x) =x+κη1p1, f2(η2,x) =[max{0,x−η2}]p, f3(x) =x.▲

In order to formulate the main theorem of this section, we introduce several relevant quantities. We define:

 ¯fj(ηj) =∫Xfj(ηj,x)P(dx),j=1,…,k, μk+1 =∫Xfk+1(x)P(dx), μj =¯fj(μj+1),j=1,…,k.

Suppose be compact subsets of such that , . We introduce the notation , where is the space of continuous functions on with values in equipped with the usually supremum norm. The space is equipped with the Euclidean norm and is assumed equipped with the product norm. We use Hadamard directional derivatives of the functions at points in directions , i. e.,

 f′j(μj+1,x;ζj+1)=limt↓0s→ζj+11t[fj(μj+1+ts,x)−fj(μj+1,x)].

For every direction , we define recursively the sequence of vectors:

 ξk+1(d)=dk+1,ξj(d)=∫Xf′j(μj+1,x;ξj+1(d))P(dx)+dj(μj+1),j=k,k−1,…,1. (5)
###### Theorem 2.2.

Suppose the following conditions are satisfied:

• for all , and ;

• For all , the functions , , are Lipschitz continuous:

 ∥fj(η′j,x)−fj(η′′j,x)∥≤γj(x)∥η′j−η′′j∥,∀η′j,η′′j∈Ij,

and .

• For all , the functions , , are Hadamard directionally differentiable.

Then

 √n[ϱ(n)−ϱ]\raisebox−0.86pt$\raisebox−2.0pt$D$−−−−−−−−−−−−−→$ξ1(W),

where is a zero-mean Brownian process on . Here is a Brownian process of dimension on , , and is an -dimensional normal vector. The covariance function of has the following form:

 cov[Wi(ηi),Wj(ηj)]=∫X[fi(ηi,x)−¯fi(ηi)][fj(ηj,x)−¯fj(ηj)]⊤P(dx), (6) ηi∈Ii, ηj∈Ij, i,j=1,…,k, cov[Wi(ηi),Wk+1]=∫X[fi(ηi,x)−¯fi(ηi)][fk+1(x)−μk+1]⊤P(dx), ηi∈Ii, i=1,…,k, cov[Wk+1,Wk+1]=∫X[fk+1(x)−μk+1][fk+1(x)−μk+1]⊤P(dx).
###### Proof.

We define , , and the vector-valued function with block coordinates , , and . Similarly, we define with block coordinates , , and . Consider the empirical estimates of the function :

 h(n)(η)=1nn∑i=1f(η,Xi),n=1,2,….

Due to assumptions (i)–(ii), all functions are elements of the space .

Furthermore, assumptions (i)–(ii) guarantee that the class of functions , , is Donsker, that is, the following uniform Central Limit Theorem holds (see [38, Ex. 19.7]):

 √n(h(n)−¯f)\raisebox−0.86pt$→\raisebox−2.0pt$D$$W, (7) where is a zero-mean Brownian process on with covariance function  cov[W(η′),W(η′′)]=∫X[f(η′,x)−¯f(η′)][f(η′′,x)−¯f(η′′)]⊤P(dx). (8) This fact will allow us to establish asymptotic properties of the sequence . First, we define a subset of containing all elements for which , . We define an operator as follows By construction the value of is equal to the value of and the value of is equal to the value of . To derive the limit properties of the sequence we shall use Delta Theorem (see, [33]). The essence of applying the theorem is in identifying conditions under which a statement about a limit result related to convergence in distribution of a scaled version of a statistic can be translated into a statement about a convergence in distribution of a scaled version of a transformed statistic To this end, we have to verify Hadamard directional differentiability of at . Observe that the point is an element of , because , . Moreover, due to assumption (ii), the following inequality is true for every :  ∥hj(hj+1(hj+2(⋯hk(hk+1)⋯)))−μj∥ ≤∥hj−¯fj∥+∥¯fj(hj+1(hj+2(⋯hk(hk+1)⋯)))−¯fj(μj+1)∥ ≤∥hj−¯fj∥+∫γj(x)P(dx)⋅∥hj+1(hj+2(⋯hk(hk+1)⋯))−μj+1∥. Recursive application of this inequality demonstrates that is an interior point of . Therefore, the quotients appearing in the definition of the Hadamard directional derivative are well defined. Conditions (ii) and (iii) imply that the functions and are also Hadamard directionally differentiable. Consider the operator at . Let be a sequence of directions converging in norm to an arbitrary direction , when . For a sequence and sufficiently large, we have  Ψ′k(h;d) =limℓ→∞1tℓ[Ψk(hk+tℓdℓk,hk+1+tℓdℓk+1)−Ψk(hk,hk+1)] =limℓ→∞1tℓ([hk+tℓdℓk](hk+1+tℓdℓk+1)−hk(hk+1)) =limℓ→∞1tℓ(hk(hk+1+tℓdℓk+1)−hk(hk+1))+dℓk(hk+1+tℓdℓk+1) =h′k(hk+1;dk+1)+dk(hk+1). Consider now the operator . By the chain rule for Hadamard directional derivatives we obtain  Ψ′k−1(h;d) =h′k−1(Ψk(h);Ψ′k(h;d))+dk−1(Ψk(h)). In this way, we can recursively calculate the Hadamard directional derivatives of the operators :  Ψ′j(h;d)=h′j(Ψj+1(h);Ψ′j+1(h;d))+dj(Ψj+1(h)),j=k,k−1,…,1. (9) Now the Delta Theorem [33], relation (7), and the Hadamard directional differentiability of at imply that  √n[ϱ(n)−ϱ(X)]=√n[Ψ(h(n))−Ψ(¯f)]\raisebox−0.86pt\raisebox−2.0ptD−−−−−−−−−−−−−→Ψ′(¯f,W). (10) The application of the recursive procedure (9) at and leads to formulae (5). The covariance structure (6) of follows directly from (8). ∎ We return to Example 2.1 and apply Theorem 2.2. ###### Example 2.3 (Semideviations continued). We have defined the mappings  ¯f1(η1) =E[X]+κη1p1=∫f1(η1,x)P(dx), ¯f2(η2) =E{[max{0,X−η2}]p}, and the constants μ3 =E[X],μ2=E{[max{0,X−E[X]}]p},μ1=ϱ(X). We assume that and is a compact interval containing the support of the random variable . The interval can be defined by choosing so that ; for example may be equal to the diameter of the support of raised to power . The space is and we take a direction . Following (5), we calculate  ξ2(d) =¯f′2(μ3;d3)+d2(μ3)=−pE{[max{0,X−μ3}]p−1}d3+d2(μ3), ξ1(d) We obtain the expression  (11) The covariance structure of the process can be determined from (6). The process has the constant covariance function:  cov[W1(η′),W1(η′′)]=∫X[f1(η′,x)−¯f1(η′)][f1(η′′,x)−¯f1(η′′)]P(dx)=Var[X]. It follows that has constant paths. The third coordinate, has variance equal to . It also follows from (6) that . Therefore, and are, in fact, one normal random variable, which we denote by . Observe that (11) involves only the value of the process at . The variance of the random variable and its covariance with can be calculated from (6) in a similar way:  Var[V2]=E{([max{0,X−E[X]}]p−E([max{0,X−E[X]}]p))2}, cov[V2,V1]= Formula (11) becomes  ξ1(W)=V1+κp(E{[max{0,X−E[X]}]p})1−pp×(V2−pE{[max{0,X−E[X]}]p−1}V1). (12) We conclude that  √n[ϱ(n)−ϱ]\raisebox−0.86pt\raisebox−2.0ptD−−−−−−−−−−−−−→N(0,σ2), where the variance can be calculated in a routine way as a variance of the right hand side of (12), by substituting the expressions for variances and covariances of , , and . ###### Remark 2.4. Following Example 2.3, we could derive the limiting distribution of for as well. However, the risk measure for enjoys a simpler form and is already analysed in the literature (see, [36, Section 6.5].) ## 3 Estimation of Risk Measures Representable as Optimal Value of Composite Functional As an extension of the methods of section 2, we consider the following general setting. Functions , , and a random vector in are given. Our intention is to estimate the value of a composite risk functional  ϱ=minz∈Zf1(z,E[f2(z,X)]). (13) where is a nonempty compact set. We note that the compactness restriction is made for technical convenience and can be relaxed. Let be a random iid sample from the probability distribution of . We construct the empirical estimate  ϱ(n)=minz∈Zf1(z,1n∑ni=1f2(z,Xi)). Our intention is to analyze the asymptotic behavior of , as . Following the method of section 2, we define the mapping as follows:  Φ(z,h)=f1(z,h(z)). The space is equipped with the product norm of the euclidian norm on and the supremum norm on . We also define the functional ,  v(h)=minz∈ZΦ(z,h). (14) Setting  ¯h(z) =E[f2(z,X)], h(n)(z) =1n∑ni=1f2(z,Xi), we see that  ϱ =v(¯h), ϱ(n) =v(h(n)),n=1,2…. Let denote for the set of optimal solutions of problem (13). ###### Theorem 3.1. In addition to the general assumptions, suppose the following conditions are satisfied: • The function is measurable for all ; • The function is differentiable for all , and both and its derivative with respect to the second argument, , are continuous with respect to both arguments; • An integrable function exists such that  ∥f2(z′,x)−f2(z′′,x)∥≤γ(x)∥z′−z′′∥ for all and all ; moreover, . Then  √n[ϱ(n)−ϱ]\raisebox−0.86pt\raisebox−2.0ptD−−−−−−−−−−−−−→minz∈^Z⟨∇f1(z,E[f2(z,X)]),W(z)⟩, (15) where is a zero-mean Brownian process on with the covariance function  cov[W(z′),W(z′′)]=∫X(f2(z′,x)−E[f2(z′,X)])(f2(z′′,x)−E[f2(z′′,X)])⊤P(dx). (16) ###### Proof. Observe that assumptions (i)-(ii) of Theorem 2.2 are satisfied due to the compactness of the set and assumptions (ii)–(iii) of this theorem. Therefore, formula (7) holds:  √n(h(n)−¯h)\raisebox−0.86pt→\raisebox−2.0ptD$$W.

The limiting process is a zero-mean Brownian process on with covariance function (16).

Furthermore, due to assumption (ii), the function is continuous. As the set is compact, problem (14) has a nonempty solution set . By virtue of [5, Theorem 4.13], the optimal value function is Hadamard-directionally differentiable at in every direction with

 v′(¯h;d)=minz∈S(¯h)Φ′h(z,¯h)d,

where is the Fréchet derivative of at . Therefore, we can apply the delta method ([33]) to infer that

 √n(v(h(n))−v(¯h))\raisebox−0.86pt$\raisebox−2.0pt$D$−−−−−−−−−−−−−→$minz∈S(¯h)Φ′h(z,¯h)W.

Substituting the functional form of , we obtain

 Φ′h(z,¯h)=∇f1(z,E[f2(z,X)])δz,

where is the Dirac measure at . Application of this operator to the process yields formula (15). Observe that has continuous paths and the minimum exists. ∎

###### Corollary 3.2.

If, in addition to conditions of Theorem 3.1, the set contains only one element , then the following central limit formula holds:

 (17)

where is a zero-mean normal vector with the covariance

 cov[W(^z),W(^z)]=cov[f2(^z,X),f2(^z,X)].

The following examples show that two notable categories of risk measures fall into the structure (13)

###### Example 3.3 (Average Value at Risk).

Average Value at Risk (2) is one of the most popular and most basic coherent measure of risk. Recall that for a random variable , it is representable as follows:

 AVaRα(X)=minz∈R{z+1αE[(X−z)+]}.

This measure fits in the structure (13) by setting

 f1(z,η) =z+1αη f2(z,X) =max(0,X−z).

The plug-in empirical estimators of (2) have the following form

 ϱ(n)=minz∈R{z+1αnn∑i=1(max(0,Xi−z))}.

If the support of the distribution of is bounded, then so is the support of all empirical distributions and we can assume that the contains the support of the distribution. Observe that all assumptions of Theorem 3.1 are satisfied. If the distribution function of the random variable is continuous at , then the solution of the optimization problem at the right-hand side of (2) is unique. In that case, also the assumptions of Corollary 3.2 are satisfied. We conclude that

 √n[ϱ(n)−ϱ]\raisebox−0.86pt$\raisebox−2.0pt$D$−−−−−−−−−−−−−→$1α(E[max(0,X−^z])W,

where is a normal random variable with zero mean and variance

 Var[W]=E[(max(0,X−^z)−E[max(0,X−^z])2].

We note that the assumption of bounded support of the random variable is not really essential because, we could take a sufficiently large set , which would contain the corresponding quantile of the distribution function of and all empirical quantiles for sufficiently large sample sizes.

Additionally, we refer to another method for estimating the average value at risk at all levels simultaneously, which is discussed in [8], where also central limit formulae under different set of assumptions are established.

###### Example 3.4 (Higher-order Inverse Risk Measures).

Consider a higher order inverse risk measure (4) with :

 ϱ[X]=minz∈R{z+c∥∥max(0,X−z)∥∥p}, (18)

where and is the norm in the space. We define:

 f1(z,y) =z+cy1p, f2(z,x) =(max(0,x−z))p.

If the support of the distribution of is bounded, so is the support of all empirical distributions. In this case, we can find a bounded set (albeit larger than the support of ) such that all solutions of problems (18) belong to this set. For and problem (18) has a unique solution, which we denote by .

The plug-in empirical estimators of (18) have the following form

 ϱ(n)=minz∈R{z+c(1nn∑i=1(max(0,Xi−z))p)1p}. (19)

Observe that all assumptions of Theorem 3.1 and Corollary 3.2 are satisfied. We conclude that

 √n[ϱ(n)−ϱ]\raisebox−0.86pt$\raisebox−2.0pt$D$−−−−−−−−−−−−−→$cp(E[(max(0,X−^z))p])1−ppW, (20)

where is a normal random variable with zero mean and variance

 Var[W]=E[((max(0,X−^z))p−E[(max(0,X−^z))p])2].

## 4 Estimation of Optimized Composite Risk Functionals

In this section, we are concerned with optimization problems in which the objective function is a composite risk functional. Our goal is to establish a central limit formula for the optimal value of such problems.

Our methods allow for the analysis of more complicated structures of optimized risk functionals:

 (21)

Here is a -dimensional random vector, , , with and . We assume that is a compact set in a finite dimensional space and the optimal solution of this problem is unique.

We define the functions:

 ¯fj(u,ηj) =∫Xfj(u,ηj,x)P(dx),j=1,…,k, ¯fk+1(u) =∫Xfk+1(u,x)P(dx), and the quantities μk+1 =¯fk+1(^u), μj =¯fj(^u,μj+1),j=1,…,k.

We assume that compact sets are selected so that , and , . Let us define the space

 H=C(0,1)1(U×I1)×C(0,1)m1(U×I2)×…C(0,1)mk−1(U×Ik)×Cmk(U),

where is the space of -valued continuous functions on , which are differentiable with respect to the second argument with continuous derivatives on . We denote the Jacobian of with respect to the second argument at by . For every direction , we define recursively the sequence of vectors:

 ξk+1(d)=dk+1,ξj(d)=∫Xf′j(^u,μj+1,x)ξ