A Complete Bell polynomials

# Percentiles of sums of heavy-tailed random variables: Beyond the single-loss approximation.

## Abstract

A perturbative approach is used to derive approximations of arbitrary order to estimate high percentiles of sums of positive independent random variables that exhibit heavy tails. Closed-form expressions for the successive approximations are obtained both when the number of terms in the sum is deterministic and when it is random. The zeroth order approximation is the percentile of the maximum term in the sum. Higher orders in the perturbative series involve the right-truncated moments of the individual random variables that appear in the sum. These censored moments are always finite. As a result, and in contrast to previous approximations proposed in the literature, the perturbative series has the same form regardless of whether these random variables have a finite mean or not. For high percentiles, and specially for heavier tails, the quality of the estimate improves as more terms are included in the series, up to a certain order. Beyond that order the convergence of the series deteriorates. Nevertheless, the approximations obtained by truncating the perturbative series at intermediate orders are remarkably accurate for a variety of distributions in a wide range of parameters.

Keywords: Subexponential distributions, Heavy tails, Percentile estimation, Aggregate loss distribution, Censored moments, Value at Risk

\sectionfont\subsectionfont

## 1 Introduction

In this article we derive accurate closed-form approximations for high percentiles of sums of positive independent identically distributed random variables (iidrv’s) with heavy tails. This is an important computational task in applications such as wireless communications [1], workload process [2, 3] and in the quantification of risk in insurance and finance [4, 5]. A particularly important application in finance is the quantification of operational risk [6, 7, 8, 9].

There are several numerical procedures to estimate percentiles of sums of iidrv’s random variables: the Panjer recursion algorithm, a method based on the Fast Fourier Transform, and Monte Carlo simulation [10, 8, 11]. These numerical techniques are efficient and yield accurate estimates of high percentiles of sums of random variables provided that these are not too heavy-tailed: their computational cost increases as the tails of the probability distribution become heavier, and eventually become impracticable. When Monte Carlo simulation is used, this difficulty can be addressed using variance reduction techniques [12, 13].

In this work we take a different approach and derive closed-form approximations for high percentiles of the aggregate distribution based on a perturbative expansion. The zeroth order term in the perturbative expansion is similar to the single-loss approximation [14], which assumes that the sum is dominated by the maximum. This dominance in the sum by the maximum is a property of subexponential distributions, a subclass of heavy-tailed distributions [15, 16]. These types of distributions appear in important areas of application, such as insurance and finance [4], hydrology [17], queueing models [18, 19], the characterization of the Internet [20], and other areas of application [21].

The first order perturbative approximation, which includes the zeroth order term plus a first order correction, is similar to approximations that can be derived from the asymptotic tail behavior of sums of subexponential variables [22, 23, 24, 25, 26, 27, 28, 29, 30]. Assuming that the mean of the individual random variables in the sum is finite, these approximations are all similar to the mean-corrected single-loss formula, which was proposed by [31] using heuristic arguments. In this article we provide an explicit procedure to derive higher order terms in the perturbative expansion, which provides a more accurate approximation to high percentiles of sums of positive iidrv’s.

The perturbative series introduced in this article differs in important aspects from previous approximations proposed in the literature. In particular, the terms in the perturbative series are expressed as a function of the moments of the right-truncated distribution for the individual rv’s in the sum. These censored moments exist even when the moments of the original distribution (without truncation) diverge. Consequently, the same expression is valid for both the finite and infinite mean cases. For high percentiles, the perturbative expansion provides a sequence of approximations that, up to certain order, has increasing quality as more terms are included. Beyond that order the convergence of the series deteriorates.

The article is organized as follows: section 2 presents the derivation of a perturbative expansion for the percentile of sums of two random variables. This expansion is then applied to the estimation of high percentiles of sums of independent random variables in section 3. The key idea is to treat separately the maximum and the remaining terms in the sum. Explicit formulas are derived when , the number of terms in the sum, is either deterministic or stochastic. Section 4 reviews the approximations for high percentiles of sums of iidrv’s that have been proposed in the literature.

The accuracy of the perturbative series is illustrated in section 5 by comparing with exact results or with Monte Carlo estimates, if closed-form expressions are not available. Finally, section 6 summarizes the contributions of this work and discusses the perspectives for further research.

## 2 Perturbative expansion for the percentiles of the sum of two random variables

In this section we derive a perturbative expansion of the percentile of a sum of two random variables. The zeroth order term in the perturbative series is the percentile of one of the variables in the sum. Higher order terms involve the moments of the second variable, conditioned to the first one having a fixed value. In the following section, these general expressions are applied to the particular case sums of random variables by identifying the first random variable with the maximum in the sum and the second one with the remainder.

Let and be two rv’s whose joint distribution function is (density ). Consider the random variable

 Z=X+ϵY, (1)

whose probability distribution is (density ). It is not possible to express this distribution in a closed form that does not involve a convolution, except in special cases [32, 33]. Let and be the -percentiles of and , respectively. The percentile of at probability level can be formally represented by a power series in

 Q=Q0+δQ=Q0+∞∑k=11k!Qkϵk. (2)

The approximation of order to is the result of keeping only the first terms in the series

 Q(K)≡Q0+K∑k=1Qkϵk/k!. (3)

Explicit expressions for the zeroth and first coefficients in (2) have been derived in [34], in the context of credit risk. Also in this context, [35] give an explicit expression for the derivatives , which are used in the perturbative expansion in for , the CDF of the sum. Our goal in this section is to derive a general expression for the terms in a perturbative expansion of the percentile (i.e. the inverse function ).

The starting point of the derivation is the identity

 0 =FZ(Q)−FX(Q0) (4) =∫∞−∞dy∫Q−ϵyQ0dxfX,Y(x,y).

For a sufficiently smooth , one can define the operators

 et∂xf(x) ≡∞∑k=0tkk!∂k∂xkf(x)=f(x+t) (5) ∂−1xf(x) ≡∫x−∞duf(u),

where , and their composition

 (et∂x−1)∂−1xf(x)=∫x+txf(u)du. (6)

In terms of these operators

 ∫Q−ϵyQ0dxfX,Y(x,y)=∫Q0+δQ−ϵyQ0dxfX,Y(x,y) (7) =(e(δQ−ϵy)∂x−1)∂−1xfX,Y(x,y)∣∣x=Q0.

Using this result, (4) can be expressed as

 0=∫∞−∞dy(e(δQ−ϵy)∂x−1)∂−1xfX,Y(x,y)∣∣x=Q0. (8)

Expanding the exponential operator in a formal Taylor power series and using the definition of the complete Bell polynomials (Appendix A, eq. (80)) this expression becomes

 0 =∫∞−∞dy∞∑k=1ϵkk!× (9) Bk((Q1−y)∂x,Q2∂x,…,Qk∂x)∂−1xfX,Y(x,y)∣∣x=Q0.

Since this equality holds for all , each coefficient in the sum must be zero separately. This yields the system of equations

 0=∫∞−∞dy (10) Bk((Q1−y)∂x,Q2∂x,…,Qk∂x)∂−1xfX,Y(x,y)∣∣x=Q0, for k≥1.

Explicit expressions for can be derived in terms of , a centered version of the Bell polynomials (Appendix A, eq. (82))

 Q1 =E[Y|X=Q0] (11) Qk =−1fX(Q0)[ k∑i=1(ki)Ck−i(Q2∂x,…,Qk−i∂x)∂i−1x{fX(x)~Mi(x)} +k−2∑i=2(k−1i−1)QiCk−i(Q2∂x,…,Qk−i∂x)fX(x)]x=Q0, for  k≥2,

where

 ~Mi(x) ≡E[(Q1−Y)i|X=x] (12) =i∑j=0(ij)(−1)jQi−j1Mj(x) Mj(x) ≡E[Yj|X=x].

These recursive formulas for the coefficients and for can be used to compute the approximation to the percentile to any order in . However, the complexity of the explicit formulas for the coefficients increases with their order. The first four terms in the perturbative series are

 Q0= F−1X(α) Q1= E[Y|X=Q0] Q2= −1fX(Q0)∂x{fX(x)~M2(x)}x=Q0 = −1fX(Q0)∂x{fX(x)Var[Y|X=x]}x=Q0 Q3= −1fX(Q0){∂2x(fX(x)~M3(x))+ 3Q2∂x(fX(x)~M1(x))}x=Q0 Q4= −1fX(Q0){∂3x(fX(x)~M4(x))+ 6Q2∂2x(fX(x)~M2(x))+4Q3∂x(fX(x)~M1(x))+ 3Q22∂xfX(x)}x=Q0. (13)

The term can be expressed in terms of the conditional variance () instead of because, for this particular term, can be replaced by . This substitution is not possible in general for higher order terms.

These general expressions for the terms in a perturbative expansion of the percentiles of the sum of two random variables will be applied in the following section to sums of independent random variables, where can be deterministic or stochastic.

## 3 Perturbative expansion around the percentile of the maximum

In this section (2) is used to estimate high percentiles of the sums of independent random variables with heavy tails

 ZN=N∑i=1Li, (14)

where are positive iidrv’s sampled from (the corresponding density is ). Let be the probability distribution of the sum , and the corresponding density. The key idea is to partition the sum into two contributions: the maximum and the sum of the remaining terms

 ZN(ϵ) = XN+ϵYN XN = L[N]≡max[{Li}Ni=1] YN = N−1∑i=1L[i] (15)

where is the -th order statistic of the sample (i.e. ). The formal parameter is introduced to order the terms in the perturbative expansion. It is eventually set to one (), so that . As shown in Appendix D, the perturbative series truncated to first order provides an estimate that is similar to approximations that can be derived from the tail behavior of sums of subexponential variables [25, 26, 30]. Therefore, the analysis presented in [22, 23] can be used to establish the asymptotic properties of this approximation. The issue of convergence of the perturbative series outside of the asymptotic regime is analyzed empirically in section 5. Qualitatively, the perturbation term in (3) is small if ; that is, when the sum (14) is dominated by the maximum. This is the case when the probability distribution of is subexponential, provided that the value of the sum is sufficiently large [15, 16]. In consequence, the perturbative series should be more accurate for high percentiles. The empirical analysis carried out reveals that, for sufficiently high percentiles, the accuracy of the approximation initially improves as more terms are included in the series. However, beyond a certain order the approximation actually becomes worse when further terms are used, which indicates that, in the cases studied, the perturbative series is not convergent.

The probability distribution of the maximum is

 F[N](x)=F(x)N. (16)

The corresponding density is obtained by taking the derivative of (16)

 f[N](x)=NF(x)N−1f(x). (17)

In terms of these, the perturbative expansion (11) becomes

 Q0 =F−1(α1N) (18) Q1 =E[N−1∑i=1L[i]|L[N]=Q0] Qk =−1f[N](Q0)[ k∑i=1(ki)Ck−i(Q2∂x,…,Qk−i∂x)∂i−1x{f[N](x)˜Mi(x)} +k−2∑i=2(k−1i−1)QiCk−i(Q2∂x,…,Qk−i∂x)f[N](x)]x=Q0, for  k≥2,

with

 ˜Mi(x) Missing or unrecognized delimiter for \Big (19) =i∑j=0(ij)(−1)jQi−j1Mj(x),

where is the th conditional moments of the random variable

 Mj(x)≡E⎡⎣(N−1∑i=1L[i])j∣∣ ∣∣L[N]=x⎤⎦. (20)

These closed-form expressions for the terms in the perturbative series (18) are the main contribution of this research. Explicit formulas for the conditional moments (20) can be readily obtained using the invariance of under an arbitrary permutation of the indices

 Mj(x) Missing or unrecognized delimiter for \Big (21) =E[(N−1∑i=1Li)j∣∣{Li≤x}N−1i=1] =∫x0dl1…∫x0dlN−1 (N−1∑i=1li)jN−1∏i=1f(li)F(x).

The last quadrature is the average of the th power of the sum of independent random variables , whose joint distribution is

 f({li}N−1i=1∣∣{li≤x}N−1i=1) =N−1∏i=1f(li|li≤x) (22) =N−1∏i=1f(li)F(x)θ(x−li),

where is a product of Heaviside step functions, which is equal to in the region and outside this region. Using the definition of the complete Bell polynomials (81), it is possible to express the th moment of the sum , where the terms in the sum are constrained to be in the region ,

 Mj(x) =Bj(K1(x),…,Kj(x)), (23)

in terms of the conditional cumulants , defined as

 Kj(x)=djdsj[log(∫∞0dyesyfYN|XN(y|x))]∣∣∣s=0. (24)

Finally, using the property that the th cumulant of a sum of independent variables is the sum of the th cumulants of the individual variables

 Kp(x)=(N−1)κp(x),p=1,2,… (25)

we obtain

 Mj(x) =Bj((N−1)κ1(x),…,(N−1)κj(x)), (26)

where is the th censored cumulant of

 κj(x)=djdsj[log(∫x0dleslf(l)F(x))]∣∣ ∣∣s=0. (27)

These censored cumulants can also be expressed in terms of the censored moments of

 κj(x) = μj(x)−j−1∑i=1(j−1i)κj−i(x)μi(x), (28) μj(x) = ∫x0dlljf(l)F(x),forj=1,2,… (29)

Using these relations, it is possible to derive explicit formulas for the terms in the perturbative series. In particular, the first three are

 Q0 = F−1(α1N) (30) Q1 = (N−1)E[L|L≤Q0] (31) Q2 = −N−1F(Q0)N−1f(Q0) (32) ∂x(F(x)N−1f(x)Var[L|L≤x])x=Q0 = −(N−1)[ ((N−2)f(Q0)F(Q0)+f′(Q0)f(Q0))Var[L|L≤Q0]+ f(Q0)F(Q0)(Q0−E[L|L≤Q0])2].

An attractive feature of this expansion is that the approximation of order depends only on the censored moments of of order lower or equal to . Since they are censored, these always exist, even for distributions whose moments diverge. These expressions have been obtained for cases in which the number of terms in the sum (14) is fixed. In the next section, we derive closed-form expressions for sums with a random number of terms.

### 3.1 Sums with a random number of terms

In many applications the quantities of interest are aggregate random variables consisting of a variable number of terms

 ZN=N∑i=1Li, (33)

where is a discrete random variable whose probability mass function is

 P[N=n]≡pn ,  n=0,…,∞. (34)

In insurance and operational risk [4, 5], where represents the aggregate loss in a fixed time period (e.g. yearly losses), is referred to as the frequency of the loss events. For convenience, we will use this term to refer to in the remainder of the article.

Consider the random variable , with

 XN = L[N] (35) YN = N−1∑i=1L[i], (36)

as in (14,3), where is now a integer random variable. We denote and the corresponding random variables conditional on a fixed value . In terms of the probability distribution of , the probability distribution of the maximum of the terms in the sum (), and of the corresponding density (), the probability distribution and the density of are

 F[N](x)=∞∑n=0pnF[n](x)f[N](x)=∞∑n=0pnf[n](x), (37)

respectively.

For random the zeroth order term in the perturbative expansion satisfies the relation

 α =F[N](Q0)=∞∑n=0pnF[n](Q0)=∞∑n=0pnF(Q0)n (38) =E[F(Q0)N]=M(logF(Q0)),

where is the moment generating function of the random variable

 MN(s)≡E[esN]=∞∑n=0pnesn. (39)

Using this definition we can invert (38)

 Q0=F−1(eM−1N(α)). (40)

Starting from (10) with it is possible derive an expression for the first term in the perturbative series in terms of

 Q1∞∑n=0pnf[n](Q0)= (41) ∞∑n=0pnf[n](Q0)E[n−1∑i=1L[i]∣∣L[n]=Q0].

Using the explicit form of the probability distribution of the maximum and equation (21), we get

 Q1=E[N(N−1)FN(Q0)]E[NFN(Q0)]E[L|L≤Q0]. (42)

For the higher order coefficients an analogous derivation from (11) yields

 −Qk∞∑n=0pnf[n](Q0)=[k∑s=1(ks)Ck−s(…)∂s−1xUs(x) +k−2∑s=2(k−1s−1)QsCk−s(…)∞∑n=0pnf[n](x)]x=Q0 (43)

where

 Us(x)= ∞∑n=0pnf[n](x)E[(Q1−Yn)s|Xn=x] (44) = ∞∑n=0pnf[n](x)s∑q=0(sq)(−1)qQs−q1Mn,q(x) Mn,q(x)≡ Bq((n−1)κ1(x),…,(n−1)κq(x)).

To compute the expected values over the frequency, one needs to isolate the dependency on . For this purpose, it is convenient to use an alternative representation of the Bell polynomials that allows to express moments in terms of cumulants using partitions of sets (Appendix A, eq. (84))

 Mn,q(x)=∑A∈P(q)(n−1)|A|∏b∈Aκ|b|(x), (45)

where is the set of all partitions of the set , and and denote the number of elements in the sets and respectively, and is the th censored cumulant of , as defined in (27).

Using this expression the coefficients become

 Q1 =λ1(Q0)λ0(Q0)κ1(Q0) (46) Qk =−1λ0(Q0)[ k∑s=1(ks)Ck−s(…)∂s−1xs∑q=0(sq)(−1)qQs−q1× ∑A∈P(q)λ|A|(x)∏b∈Aκ|b|(x) +k−2∑s=2(k−1s−1)QsCk−s(…)λ0(x)]x=Q0,  for  k≥2,

with

 λa(x) ≡EN[(N−1)af[N](x)] (47) =f(x)F(x)E[N(N−1)aF(x)N],  for %  a≥0.

The explicit expressions for the first four coefficients are

 Q0 =F−1(eM−1N(α)) (48) Q1 =λ1(Q0)λ0(Q0)κ1(Q0) Q2 =−1λ0(Q0)∂x[Q21λ0−2Q1λ1κ1+λ1κ2+λ2κ21]x=Q0 =−1λ0(Q0)∂x[λ1κ2+(λ2−λ21λ0)κ21]x=Q0 Q3 =−1λ0(Q0){3Q2∂x[Q1λ0−λ1κ1]x=Q0 +∂2x[Q31λ0−3Q21λ1κ1+3Q1(λ1κ2+λ2κ12) −λ1κ3−3λ2κ1κ2−λ3κ13]x=Q0},

where, to simplify the notation, the dependence on in the and has been omitted.

The functions can also be expressed in terms of the moment generating function of as

 λa(x)=f(x)F(x)∂s(∂s−1)aMN(s)∣∣s=logF(x)for  a≥0. (49)

Explicit expressions for the Poisson and negative binomial probability distributions are given in Appendix B. These types of distributions are commonly used in applications.

### 3.2 Approximation in terms of frequency moments for high percentiles

The formulas derived in the previous section (48) are different from the standard single-loss approximation [14] and corrections thereof [31, 25, 26, 30]. In this section we show that for high percentiles one recovers the single-loss approximation and correction terms. In the limit the inverse of the moment generating function in (40) can be approximated as

 MN(s)=E[esN]=1+sE[N]+O(s2), (50)

for . From this expression,

 M−1N(α)≈−1−αE[N],% forα→1−. (51)

This leads to the standard single-loss approximation [14]

 Q0≈QSL≡F−1(1−1−αE[N])). (52)

In this limit, the survival function approaches , and simpler approximate expressions for are obtained by keeping terms only up to 1st order in

 λa(x) =∂xE[(N−1)a(1−S(x))N] (53) ≈∂xE[N(N−1)a(1−NS(x)] =−∂xS(x)E[N(N−1)a] =f(x)a∑s=0(as)(−1)a−sνs+1, a=0,1,…

where are the moments of the frequency distribution. Using these approximations, the high-percentile corrections to the single-loss formula can be expressed directly in terms of the moments of the frequency distribution

 Q1 ≈ (E[N2]E[N]−1)E[L|L≤Q0] (54) Q2 ≈ −(E[N2]E[N]−1)1f(x)∂x[f(x)Var[L|L≤x]]x=Q0 (55) +⎛⎝(E[N2]E[N])2−E[N3]E[N]⎞⎠× 1f(x)∂x[f(x)E[L|L≤x]2]x=Q0.

The approximation to is similar to the corrections to the single loss formula proposed in the literature [31, 25, 26, 30]. In section 4, we provide a review of these corrections. Their accuracy will be compared to the perturbative expansion in section 5. To make the numerical computation of the perturbative approximation up to high orders feasible it is useful to express the terms of the series recursively. These recursive expressions are presented in Appendix C.

## 4 Related work

In this section we review closed-form approximations for the percentile of sums of positive iidrvÃ¢’s that have been proposed in previous investigations. Even though it is possible to derive approximations for particular heavy-tailed distributions, such as [36] for the Pareto distribution, in this work we consider comparisons only with approximations for general subexponential distributions [15, 16]. The single-loss approximation can be derived using first order asymptotics of the tail of sums of subexponential random variables [37, 38, 14]. Higher order asymptotic expansions of the tails of the compound distribution [22, 23, 24, 27, 28, 29] can be used to obtain corrections to the single-loss approximation [25, 26, 30]. These high order corrections are similar to the successive terms in the perturbative expansion analyzed in this article. However, there are some important differences. In particular, these terms are expressed as a function of right-censored moments, which are always finite. In the section on experimental evaluation (section 5) we will further show that the perturbative series provides more accurate approximations than the expressions introduced in this section.

One of the defining properties of subexponential distributions is that large values of sums of subexponential random variables are dominated by the maximum

 ZN=N∑i=1Li≈max{L1,…LN},ZN→∞. (56)

In insurance mathematics this corresponds to the ’one loss causes ruin’ regime [4]. Using the property of subexponential distributions [37, 38]

 limx→∞P(L1+…+LN>x)P(L1>x)=N, (57)

it is possible to show that, for this type of distributions, the percentile of at the probability level is approximately

 QSL=F−1(1−1−αN),for α→1−. (58)

In this limit, expression (58) is very similar to the zeroth order term in the perturbative expansion

 Q0 =F−1(α1N) (59) =F−1(1−1−αN+O((1−α)2N))≈QSL.

The derivation of a closed-form approximation for high percentiles using first order tail asymptotics can be readily extended to sums of subexponential iirdv’s with a random number of terms

 QSL=F−1(1−1−αE[N]), (60)

where is the average number of terms in the sum. In the area of operational risk, this expression is known as the ’single-loss approximation’ [14, 31].

Using heuristic arguments, a correction to the single-loss approximation was proposed in [31] for distributions with finite mean

 Q≈F−1(1−1−αE[N])+(E[N]−1)μL,μL≡E[L]. (61)

In the limit , the value is large, so that and the approximation given by (61) becomes similar to (54).

Besides the heuristic derivation given in [31] and the perturbative expansion proposed in this work, higher order corrections to the single-loss approximation can be derived in at least three different ways: Using the second order asymptotic approximations introduced in [22, 23, 25, 26], from the asymptotic expansion analyzed in [27, 28, 29] or from asymptotic approximations based on evaluations of at different arguments [30].

In the case of distributions with finite mean, the asymptotic analysis of the tail of a subordinated distribution analyzed in [23] can be used to obtain , a second order approximation of the percentile of sums of subexponential iidrv’s, as the solution of

 QOW=F−1[1−1−αE[N]+(E[N2]E[N]−1)μLf(QOW)]. (62)

This implicit nonlinear equation can be solved numerically using, for example, an iterative scheme. Alternatively, one can retain only the leading terms in a perturbative expansion of this expression

 Q∗OW = QSL+(E[N]+(D−1))μL, (63)

where is the index of dispersion ( for the Poisson distribution and for the negative binomial distribution). The first term in (63) is the single-loss approximation [14, 31]. The second term is a correction that involves the mean and is similar to (61) when and . As shown in Appendix D, expression (63) can be derived in a number of different ways [26, 27, 28, 29, 30].

In the case of distributions with infinite mean, in which the density is regularly varying at infinity with index , [39], the second order approximation of satisfies the relation [22]

 QOW =F−1(1−1−αE[N] (64) +ca(E[N2]E[N]−1)μF(QOW)f(QOW)),

where

 μF(x)≡∫x0ds(1−F(s))=(1−F(x))x+F(x)E[L|≤x], (65)

and

 ca=⎧⎨⎩