An analysis of the Rüschendorf transform  with a view towards Sklar’s Theorem
Frank Oertel^{*}^{*}*Deloitte & Touche GmbH, FSI Assurance, Quantitative Services & Valuation, D  81669 Munich, Email: f.oertel@email.de
Abstract: In many applications including financial risk measurement, copulas have shown to be a powerful building block to reflect multivariate dependence between several random variables including the mapping of tail dependencies.
A famous key result in this field is Sklar’s Theorem. Meanwhile, there exist several approaches to prove Sklar’s Theorem in its full generality. An elegant probabilistic proof was provided by L. Rüschendorf. To this end he implemented a certain “distributional transform” which naturally transforms an arbitrary distribution function to a flexible parameterdependent function which exhibits exactly the same jump size as .
By using some real analysis and measure theory only (without involving the use of a given probability measure) we expand into the underlying rich structure of the distributional transform.
Based on derived results from this analysis (such as Proposition 2.5 and Theorem 2.12) including a strong and frequent use of the right quantile function, we revisit Rüschendorf’s proof of Sklar’s theorem and provide some supplementing observations including a further characterisation of distribution functions (Remark 2.3) and a strict mathematical description of their “flat pieces” (Corollary 2.8 and Remark 2.9).
Keywords: Copulas, distributional transform, generalised inverse functions, Sklar’s Theorem.
MSC: 26A27, 60E05, 60A99, 62H05.
1 Introduction
The mathematical investigation of copulas started 1951, due to the following problem of M. Fréchet: suppose, one is given random variables , all defined on the same probability space , such that each random variable has a (nonnecessarily continuous) distribution function . What can then be said about the set of all possible dimensional distribution functions of the random vector (cf. [7])? This question has an immediate answer if the random variables were assumed to be independent, since in this case there exists a unique dimensional distribution function of the random vector , which is given by the product . However, if the random variables are not independent, there was no clear answer to M. Fréchet’s problem.
In [15], A. Sklar introduced the expression “copula” (referring to a grammatical term for a word that links a subject and predicate), and provided answers to some of the questions of M. Fréchet.
In the following couple of decades, copulas (which are precisely finite dimensional distribution functions with uniformly distributed marginals), were mainly used in the framework of probabilistic metric spaces (cf. e. g. [13, 14]). Later, probabilists and statisticians were interested in copulas, since copulas defined in a “natural way” nonparametric measures of dependence between random variables, allowing to include a mapping of tail dependencies. Since then, they began to play an important role in several areas of probability and statistics (including Markov processes and nonparametric statistics), in financial and actuarial mathematics (particularly with respect to the measurement of credit risk), and even in medicine and engineering.
One of the key results in the theory and applications of copulas, is Sklar’s Theorem (which actually was proven in [13] and not in [15]). It says:
Sklar’s Theorem.
Let be a dimensional distribution function with marginals . Then there exists a copula , such that for all we have
Furthermore, if is continuous, the copula is unique. Conversely, for any univariate distribution functions , and any copula , the composition defines a dimensional distribution function with marginals .
Since the original proof of (the general noncontinuous case of) Sklar’s Theorem is rather complicated and technical, there have been several attempts to provide different and more lucidly appearing proofs, involving not only techniques from probability theory and statistics but also from topology and functional analysis (cf. [4]).
Among those different proofs of Sklar’s Theorem, there is an elegant, yet rather short proof, provided by L. Rüschendorf, originally published in [12]. He provided a very intuitive, and primarily probabilistic approach which allows to treat general distribution functions (including discrete parts and jumps) in a similar way as continuous distribution functions. To this end, he applied a generalised “distributional transform” which  according to [12]  has been used in statistics for a long time in relation to a construction of randomised tests. By making a consequent use of the properties of this generalised “distributional transform” together with Proposition 2.1 in [12], the proof of Sklar’s Theorem in fact follows immediately (cf. Theorem 2.2 in [12]). Irrespectively of [12] the same idea was used in the (again rather short) proof of Lemma 3.2 in [11]. All key inputs for the proof of Sklar’s Theorem clearly are provided by Proposition 2.1 in [12]. However, the proof of the latter result is rather difficult to reconstruct. It says:
[12]  Proposition 2.1.
Let be two random variables, defined on the same probability space , such that and is independent of . Let be the distribution function of the random variable . Then , and almost surely.
Here, denotes the (leftcontinuous) left quantile of which in particular is the lowest generalised inverse of (cf. e.g. [14, Chapter 4.4], respectively [8, Definition 2]). In our paper we consistently adopt the very suitable symbolic notation of [14], respectively [8] to identify generalised inverse functions in general (cf. (2.2) and (2.3)).
While studying (and reconstructing) carefully the proof of Sklar’s Theorem built on Proposition 2.1 in [12], we recognise that it actually implements key mathematical objects which do not involve probability theory at all and play an important role beyond statistical applications.
The main contribution of our paper is to provide a thorough analysis of these mathematical building blocks by studying carefully properties of a realvalued (deterministic) function, used in the proof of Proposition 2.1 in [12]; the socalled “Rüschendorf transform”. We reveal some interesting structural properties of this function which to the best of our knowledge have not been published before, such as e. g. Theorem 2.12 which actually is a result on LebesgueStieltjes measures, strongly built on the role of the right quantile function which seems to be not widely used in the literature (as opposed to the left quantile function).
Equipped with Theorem 2.12 we then revisit the proof of Proposition 2.1 in [12] (cf. also [10, Chapter 1.1.2]). However, in our approach Proposition 2.1 in [12] is an implication of Theorem 2.12 and Lemma 2.15. For sake of completeness we include a proof of Sklar’s Theorem again (cf. also [10, Chapter 1.1.2])  yet as an implication of Theorem 2.12, finally leading to Remark 2.21.
2 The Rüschendorf Transform
At the moment let us completely ignore randomness and probability theory. We “only” are working within a subclass of realvalued functions, all defined on the real line, and with suitable subsets of the real line.
Let be an arbitrary rightcontinuous and nondecreasing function. Let . Since is nondecreasing, it is wellknown that both, the lefthand limit
and the righthand limit
are welldefined real numbers, satisfying . Moreover, due to the assumed rightcontinuity of , it follows that for all . denotes the (lefthand) “jump” of at . We consider the following important transform of :
Definition 2.1.
Let and . Put
We call the realvalued function the Rüschendorf transform of . For given is called the Rüschendorf transform of .
Clearly, we have the following equivalent representation of the Rüschendorf transform :
In particular, for all the following inequality holds:
(2.1) 
Moreover, is continuous if and only if for all , and for all we have and .
Assumption 2.2.
In the following we assume throughout that is bounded on (i. e., the range is a bounded subset of ), implying that for some real numbers . Moreover, let us assume that for any the set is nonempty and bounded from below.^{†}^{†}†In particular, cannot be a constant function on the whole real line. WLOG, we may assume from now on that and (else we would have to work with the function ).
Although its proof (by contradiction) mostly is an easy calculus exercise with sequences, the following observation  which does not require a rightcontinuity assumption  should be explicitly noted (cf. also (cf. [5, 6, 13])):
Remark 2.3.
Let an arbitrary nondecreasing function. Then the following statements are equivalent:

and ;

For any the sets and both are nonempty;

For any the set is nonempty and bounded from below.

is a welldefined real number for any .
Hence, given Assumption 2.2, the assumed rightcontinuity of and Remark 2.3 imply that (possibly after shifting and stretching adequately) actually is a distribution function! Therefore, its generalised inverse function , given by
(2.2) 
is welldefined and satisfies
(2.3) 
for any (cf. e. g. [9]). Actually, since is assumed to be rightcontinuous, it follows that
for all (cf. [5, Proposition 2.3 (4)]). Moreover, the following important inequality is satisfied:
(2.4) 
for all , , and for all . Hence,
(2.5) 
for all . Also recall from e. g. [14] that , respectively for any .
Let us fix the distribution function . Then by we denote the set of all jumps of which is wellknown to be at most countable.
Throughout the remaining part of our paper, we follow the notation of [12] and put for fixed . By taking a closer look at , we firstly note the following observation.
Remark 2.4.
Let and . Then
Proof.
Fix and put , where . Then is welldefined. Since , the claim follows. ∎
The next result shows an important part of the role of Rüschendorf transform which can be more easily understood if one sketches the graph of including its jumps. Since is at most countable, it follows that , where either or . By making use of this representation and the canonically defined function , (cf. also [14, Chapter 4.4]) we arrive at the following
Proposition 2.5.

Let . Then
In particular, if then . Moreover,
implying that the mapping , is welldefined and bijective. Its inverse is given by
Proof.
To prove the first set inclusion, we may assume without loss of generality that is not continuous in . So, let . Then (else we would obtain the contradiction , respectively ) and for all . Hence, for all (cf. [5, Proposition 2.3 (5)]), implying the first inclusion. Now let such that . Due to (2.5) it follows that
which gives the second set inclusion.
To verify the representation of the disjoint union let for some . Then and hence and . Put
Then and
Furthermore, a straightforward application of the inequality (2.4) (together with (2.5) and the monotonicity assumption on ) shows the graphically clear fact that there is no such that contains elements of the form , respectively for some . Now, given the construction of above and the listed properties of any of the sets , the assertion about the mapping follows immediately. ∎
Definition 2.6.
Let and . Put:
Firstly note that is nonempty. To see this, consider any . Then for some . Hence, . To motivate the following representation of the set , let us assume for the moment that is continuous at . Due to (2.5), it follows that . Hence, in this case, , implying that .
However, in the general (noncontinuous) case, need not be an element of the set . Therefore (by fixing and ), we are going to represent the set as a disjoint union of the following three subsets of the real line:
and
Thus,
Next, we are going to simplify the sets and as far as possible. To this end, we have to analyse carefully the jump , implying that we have to check against the (finite) value of the largest generalised inverse of (cf. [9] and [14, Chapter 4.4])
The inequality (2.5) is also satisfied for (cf. [6, Lemma A. 15]):
(2.6) 
Note that since is a distribution function, (respectively ) is precisely the right (respectively left) quantile of .
Clearly, for every . However, if , we even obtain equality of both sets  since:
Lemma 2.7.
Let and . Put and .

If , then and . Moreover, the restricted function is continuous, and
(2.7) 
If , then .

Furthermore,
In particular, the following statements are equivalent:

;

.
Proof.
Put . Clearly, we always have .
To verify (i), let . Then for some . Thus, , implying that and . Assume by contradiction that . Then for all , implying the contradiction . Hence, . Proposition 2.5 therefore implies that .
Let . Assume by contradiction that is not continuous at . Then (since ). Since , we have for some . Thus,
Hence, , which is a contradiction. Thus, the restricted function is continuous on . Let . Since is continuous at , it follows that
Thus, .
To prove (ii), suppose that is nonempty. The previous calculations show that the existence of an element already implies . Consequently, cannot coincide with (since ), implying that .
To finish the proof of (i), we have to verify (2.7). To this end, let and . Then there exists such that . Consequently, . Thus,
Moreover, [5, Proposition 2.3 (6)] implies that
Hence,
If , then and hence . If , then and hence .
Statement (iii) is a direct implication of (i) and Proposition 2.5. ∎
Regarding a visualisation of Lemma 2.7 consider the set . Note that
Thus, by joining Lemma 2.7 with Proposition 2.5 we immediately obtain the following tangible mathematical description of the (preimages of) “flat pieces” of (and hence allowing us to perfect related observations from e. g.[14, Chapter 4.4] and [5], Proposition 2.3, (6) coherently):
Corollary 2.8.
Let and . Put and .

If , then

If , then
In particular, if and only if , and if and only if , and if , then if and only if .
Remark 2.9.
Let . Then, according to [1, Corollary 1.1] for a large class of distribution functions any nonempty set even emerges as a set of optimal solutions of the so called “single period newsvendor problem” which asks for the minimisation of coherent risk measures, such as the conditionalvalueatrisk (which coincides with Expected Shortfall), corresponding to a cost function, induced by random demand. Here, one should recall that recently the Basel Committee on Banking Supervision (BCBS) suggested in their updated consultative document “Fundamental review of the trading book” to implement Expected Shortfall at in a bank’s internal market risk model to calculate its minimum capital requirements with respect to market risk.
Let denote the set of all Borel subsets of . In the following, let be the LebesgueStieltjes measure of . For a detailed description of the construction and properties of the LebesgueStieltjes measure (including LebesgueStieltjes integration), we refer the reader to e. g. [2] and [3]. For the convenience of the reader, we recall the following fundamental result (cf. [3, Theorem 12.4]):
Theorem 2.10 (LebesgueStieltjes measure).
Let be an arbitrary nondecreasing and rightcontinuous function. Then there exists a unique Borel measure satisfying
for all .
Clearly, this crucial result implies that and hence
for all . Moreover, if and only if is a constant function on .
Returning to our distribution function , a direct application of leads to another important implication of Lemma 2.7:
Corollary 2.11.
Let and . Then , and
In particular, if , then
Proof.
Nothing is to prove if . So, let . Then .
Suppose first that . Then
Consequently, since in general for all , it follows that
Now suppose that . Then , and it follows that is continuous at . Thus, . Since in this case
it consequently follows that
∎
Next, we are going to reveal in detail that the function is almost “leftinvertible” at every which does not belong to the preimage of a “flat piece” of . More precisely:
Theorem 2.12.
Let . Assume that almost everywhere. Then
In particular, if almost everywhere, then
Proof.
Let . Consider the Borel set
where denotes the set of all jumps of the function .^{‡}^{‡}‡Note that by construction if . Since the (leftcontinuous) function is nondecreasing, is at most countable. Hence, if , there exists a subset of , and a sequence , consisting of pairwise distinct elements , such that . Thus, . Corollary 2.11 therefore implies that  in any case  and hence (since cannot be a constant function on the whole real line).
Let . Put . Then , and . Thus, is welldefined. Consider .
First, let . Then . Lemma 2.7 therefore implies that . In particular, . Hence, since , it consequently follows that
and hence .
Now let . If , it follows again that and hence
as above. So, let . Then for some , and hence . Since , it follows once more again that , and hence
∎
Remark 2.13.
.
Moreover, by using a similar argument like that one which has shown us that the set is nonempty, we further obtain
Remark 2.14.
.
Observe that only the subset of does depend on the choice of .
2.1 The inclusion of randomness
In addition to our assumptions above, we now fix a given probability space . Let and be two given random variables (on this probability space) such that is uniformly distributed over and independent of . In the following we consider the random variable , defined on as
where here , and ^{§}^{§}§Since , we obviously have and hence .. Next, we have to evaluate ; i.e, we wish to calculate
Due to our previous observations, we have
for all . Consequently, given the assumed independence of and , Lemma 2.7 implies that^{¶}^{¶}¶Here, and , where . :
Apparently, to continue with the calculation of the respective probabilities, we have to consider the following two possible cases: and :

Let . Thus, since , it follows that

Let . Since is uniformly distributed over , we have
. Hence, since , it follows that
Moreover, by taking into account that in case (i) (since is continuous at if ), we have arrived at the following important
Lemma 2.15.
Suppose that is an arbitrary distribution function. Let . Put and . Let be two random variables, both defined on the same probability space , such that and is independent of . Then
where if and if .
To conclude, let us slightly point towards the fact that Lemma 2.15 could also be viewed as a building block of a probabilistic limit theorem (whose detailed discussion would then exceed the main goal of this paper, though).
2.2 The role of the distribution function of
From now on, is given as the distribution function of a given random variable .
Proposition 2.16.
Let be two random variables, both defined on the same probability space , such that and is independent of . Let be the distribution function of . Then is a uniformly distributed random variable. Moreover,
on the set