Some indices to measure departures from stochastic order.^{1}^{1}1Research partially supported by the Spanish Ministerio de Economía y Competitividad y fondos FEDER, grants MTM201456235C21,2 and MTM201786061C21,2 and Consejería de Educación de la JCyL y fondos FEDER VA005P17.
Abstract
This paper deals with three famous statistics involved on twosample problems. The MannWhitney, the onesided KolmogorovSmirnov, and the Galton rank order statistics are invoked here in an unusual way. Looking for indices to capture the disagreement of stochastic dominance of a distribution function over another , we resort to suitable couplings of random variables with marginal distribution functions and . We show as, the common representation, under the independent, the contamination and the quantile frameworks give interpretable indices, whose plugin samplebased versions lead to these widely known statistics and can be used for statistical validation of approximate stochastic dominance. This supplies a workaround to the nonviable statistical problem of validating stochastic dominance on the basis of two samples. While the available literature on the asymptotics for the first and second statistics justifies their use for this task, for the Galton statistic the existent results just cover the case where both distributions coincide at a large extent or the case of distribution functions with only one crossing point. In the paper we provide new findings, giving a full picture of the complex behaviour of the Galton statistic: the time that a sample quantile function spent below another. We illustrate the performance of this index through simulations and discuss its application in a case study on the improvement of household wealth distribution in Spain over the period around the recent financial crisis.
Keywords: relaxed stochastic dominance, asymptotics, copulas, inferential procedures, representation, Galton’s rank statistic.
1 Introduction
Comparison is a common daily task in any type of research or activity. However, although almost instinctive when it involves two individuals, it is far of being obvious when involves populations or their sophisticated versions like distributions or experiments. If we are interested in comparing a feature over two given populations, except in the trivial case where all the members of one population exhibit lower values than all of the other population, the task admits very different points of view. Often it is approached through the comparison of the mean values or some other available summary value (median, Gini index), but the real meaning of such comparison is frequently just (even wrongly) suspected by the practitioners. This situation has been addressed in ÁlvarezEsteban et al. (2017), emphasizing on locationscale models, defending instead the stochastic order as the natural gold standard behind twosample comparison problems. To pursuit in that direction we will come back to the very principles of comparison, by considering the real meaning of stochastic order and some natural relaxed alternatives.
Let us begin by considering a simple generic statement like “men are taller than women”. If we understand that expression in terms of mean height, its precise meaning would be that if we (randomly) chose one group of men and another group of women, both large enough, almost certainly the average height of the former would be greater than the second. Nonetheless, as it is well known and stressed in ÁlvarezEsteban et al. (2017), such a comparison is compatible with very different shapes of the parent distributions, and could lead to a false picture, highly unsuitable e.g. when comparison between treatments is the goal of our research. An order between populations or distributions should indicate a comprehensive relation between them, improving those based on making comparisons through individual indices or features of the distributions.
To simplify the initial exposition and produce sensible pictures, let us begin by considering the comparison of the heights of two equalsized populations () of men and women, represented in Figure 1 through black and white bars. In the leftgraphic there, we can observe that there is some tendency towards higher values of the black bars on the white bars. Also it becomes obvious that there are white bars that are larger than some black ones and viceversa. The rightgraphic shows the same bars but now yuxtaposes the smallest black bar with the smallest white, the second with the second, …, and so on. If we consider the status (or rank) of any individual as its position, by height, in his/her population, we now see that any ranked black bar is larger than that the equalranked white bar. If we change the ordered positions by real numbers in , giving the relative position of any individual by the proportion of taller individuals than him/herself in his/her population, we arrive to a simple general way of comparison of populations or distributions just by comparing their values occupying these generalized ranks once the proper definitions are provided. Notice that these generalized ranks are nothing but the values of the distribution functions, while the associated values are the corresponding quantile values. The relation between the heights of these particular populations of men and women made apparent in the rightgraphic (any black bar is larger than the yuxtaposed white bar) is named the stochastic dominance of the first one over the second.
The attained relation above generalizes in a straightforward way to two distributions on the real line, , with respective distribution functions (d.f.’s in the sequel) and . Denoting by to the quantile function of (that we recall is defined by , for ), we say that is stochastically dominated by , denoted , if
(1) 
Note that this is not the usual definition of stochastic dominance, namely:
(2) 
but they are easily seen to be equivalent while, as noted in Lehmann (1955), (1) is more intuitive. Recall that when defined on the unit interval equipped with the Lebesgue measure, , the quantile function can be considered as a random variable with d.f. . This fact and (1) lead to a new and appealing characterization of stochastic dominance:
(3) 
Once we have a representation of two distributions in terms of random variables defined on a probability space, we are fixing a joint distribution which makes possible to design a sampling procedure able to produce simultaneous samples from both distributions. Therefore, after (1), the meaning of the stochastic dominance is that we can design a sampling procedure such that any value obtained from the first distribution is lower than the obtained from the second. The quantile functions give a kind of canonical joint representation that allows to characterize the stochastic dominance, but (1) opens the possibility to other representations of stochastic dominance in terms of almost sure dominance.
While it is hard to argue against the interest of the stochastic order domination, it is often observed that such a relation is too strong to be guaranteed. As an example of this kind of claim, Arcones et al. (2002) notes that the domination (2) may well hold over an important part of the range but it may fail over another small part of the range, or may simply be unknown or unknowable over the entire range (a fact also implicit in Leshno and Levy (2002) although presented in terms of utility functions). Even worse, from an inferential point of view, as noted in Berger (1988), Davidson and Duclos (2013) or ÁlvarezEsteban et al. (2016), it is not possible to statistically guarantee stochastic dominance. In general, the available literature on testing stochastic dominance considers either the problem of testing equality in distribution against stochastic dominance (as, for instance, in Section 6.9 in Lehmann and Romano (2005) or in the inference under order restrictions setting described in El Barmi and Mukerjee (2005)), or the problem of testing the null model of stochastic dominance against the alternative of lack of stochastic dominance (this is the most frequent case in the econometric literature). In the first case we would be assuming that the ‘new treatment’ only can improve the existing one, a hard assumption that should be carefully analyzed!. In the second, we could conclude, at best, that there is not enough statistical evidence against stochastic dominance, although some practitioners wrongly use these tests to conclude improvements in the income distributions, for instance.
Testing hypothesis theory is designed to provide evidence to reject the null hypothesis. The goal is not the confirmation of the null. If we want to conclude that stochastic order holds we should gather statistical evidence to reject the null in the testing problem of vs . However, on the basis of samples and obtained from the corresponding distributions, the no data test (that rejects the null with probability irrespective of the data) is uniformly most powerful for this problem.
Following ideas that go back to Hodges and Lehmann (1954), these facts invite to consider relaxed versions of stochastic order for which feasible statistical tests can be designed. This is often carried say through a suitable distance, say through contamination neighbourhoods or through combinations of both methods. In any case the goal is the substitution of the null by including “similar” distributions leading to reject it just if there are “relevant changes” on the original hypothesis (Rudas et al. (1994), Munk and Czado (1998), Liu and Lindsay (2009), ÁlvarezEsteban et al. (2012), or recently Dette and Wied (2016) and Dette and Wu (2018) share this point of view).
We can cite several interesting examples of relaxations of stochastic order. The ‘stochastic precedence’ relation, introduced in Arcones et al. (2002), is related to the probability of obtaining greater values under than under when independently sampled. Stochastic precedence would mean that this probability is at most 1/2. In a different line, the approach for “almost stochastic dominance”, introduced in Leshno and Levy (2002), is based on measuring the relative contribution of on the set , where (2) is not satisfied, to the distance between and . Yet, ÁlvarezEsteban et al. (2016) introduced another index in the spirit of similarity, based on the contamination model.
Very recently, ÁlvarezEsteban et al. (2017) introduced a new index, measuring the size of the set where the associated quantile functions do not satisfy the appropriate pointwise order. For a better understanding of our general approach, we will present that index by resorting to the comparison of finite populations, handling two new populations of similar characteristics to that already considered. The barplot on the left of Figure 2 has been produced in the same way as in the right one in Figure 1. We can observe that, although in these new populations “generally” men are taller than women, that is not the case for the third and fourth elements of both populations, thus, for these populations, (1) avoids the possibility of a stochastic dominance of the height of men over that of women. However we can give a precise measure of the extent at which the dominance is not verified: 2/20. In other words, through the corresponding quantile functions, we can guarantee that sampling from a U(0,1) (uniform on ) distribution and transforming these data both, through the quantile function of men, say , and the quantile function of women, say would produce legitimate samples of and of that for large would verify . We notice that by introducing this index in this way, we realized that it is just the same used by Galton to answer a query by Darwin in 1876 (see Subsection 2.1).
Let us now recall that, in Figure 1, ranking the elements of both populations results on an easier comparison of them based on just comparing the pairs formed by the elements sharing identical rank. However, any joint representation of our populations, defined on a suitable probability space, would give a sense to a question like “how large is ?”, thus we could choose a different coupling strategy for comparison. In fact, by transposing the third and fourth white bars in the leftpicture in Figure 2, we get the one on the right, and see that now only the bars occupying the fourth position fail to meet the general ordering. Therefore, on the basis of this coupling, we could measure the disagreement with the stochastic order as 1/20. Moreover, it is easy to see that there is not a coupling improving this index. We stress the fact that, through this coupling, we could quickly design a sampling procedure able to produce samples of and of that for large would verify , thus improving the previous rate. To our best knowledge, the consideration of the coupling most compatible with pointwise order has not been addressed until now in the literature (but see (11) in Section 2.3).
Through this paper we explore this coupling approach to evaluate the extent at which a d.f. stochastically dominates another, . We relativize the problem by considering a random mechanism able to simultaneously produce items, described as realizations of a pair of random variables with marginal d.f.’s and . The natural index under this model is then , although this index will strongly depend of the joint law of , relating the marginal laws and . Hopefully, this task can be generally addressed through the copula.
As we show in this paper, several proposals of relaxed versions of stochastic order can be analyzed through this common framework that takes advantage of basic coupling ideas. In particular we will solve the noticeable problem of obtaining a coupling whose associated index takes the minimum possible value. Unexpectedly, this problem is related to the contamination index introduced in ÁlvarezEsteban et al. (2016), which also plays a keynote role in this approach (see Section 2.3).
Indices that fit into the ‘ representation’ enjoy the property of invariance against increasing transformations, which we think is a desirable feature in this setting. We notice at this point that among the already mentioned indices, only the index associated to the Leshno and Levy (2002) approach does not share this invariance, hence it does not fit into the representation (see Remark 2.4.1).
To show that our relaxed versions of stochastic order can be assessed through valid inferential procedures, we will pay some attention to the corresponding plugin estimators of the indices. That is, the indices computed on the basis of the sample distribution functions and Let us also notice that the statistical analysis of began in Birnbaum (1965), and under the suggestive title of StressStrength Model is widely recognized by its multiple applications (see Kotz et al. (2003) for a general account).
The paper is structured in the following way: First we give a quick overview on the already introduced indices. Then, in Section 2.4, we introduce the framework to analyze these indices from a common perspective. Section 3 includes the pertinent comments and theory for the statistical use of the indices. In fact, the first and second considered indices involve well studied statistics: the MannWhitney version of the Wilcoxon statistic, and the onesided KolmogorovSmirnov statistic. We notice that ÁlvarezEsteban et al. (2017) includes some asymptotic theory for the third index, but it only covers the case of distribution functions with a single crossing point. Also, it was studied at the opposite extreme, when both distributions coincide at a large extent, in its role of Galton’s statistic. Here we revisit Galton’s rank order statistic showing (in Section 5) the complex panorama of the asymptotic behaviour of the generalized version , measuring the time that the sample quantile function obtained from a sample of spent over that obtained from . Our results resort to empirical processes well known theorems as well as to a, less known, profound theorem of Paul Lévy (1939) related to the “arcsin law”. A simple realistic version for applications is given in Theorems 3.2 and 3.3. We also provide, in Section 4, a set of simulations and a case study showing the performance of this index in an applied setting.
We end this Introduction with some words on notation. Through the paper will denote the law of the random vector or variable . We will consider a generic probability space , where the involved random objects are defined and verify the current assumptions ad hoc. As already noted, will denote the Lebesgue measure on the unit interval . Convergences in the almost surely or in law (or weak) senses will respectively denoted by and . Let us also recall the well known fact that for a bivariate d.f. with continuous marginal d.f.’s , the associated copula capturing its dependence structure is . Finally, given a real value, , we will write for the sign of (defined as if and in the rest).
2 Measuring departures from stochastic order
Through this section will be real random variables with respective d.f.’s and . Other common elements in the section, include comparisons between some fixed normal distributions that play a kind of reference role. To get a quick visual perspective of the relation between some of these indices to measure stochastic dominance, we refer to the contour plots relating normal distributions in ÁlvarezEsteban et al. (2017).
2.1 The quantile approach
From the beginning we have adopted the quantile approach to introduce not only the stochastic dominance, but also the possibility that it opens to measure the disagreement with the order. Here we emphasize that the quantile representation translates the arguably more abstract concept of stochastic order to a well known type of relation (pointwise ordering) between random variables with the same laws as those of . In this case this is achieved with a particular dependence structure defined by the copula . This copula is associated to the construction on the unit interval. We note that from this quantile characterization it follows easily that if have normal distributions and , respectively, will hold if and only if and . Therefore, if e.g. the stochastic order relation would be impossible, although for example for , the subset of where (1) fails has measure In other words, a mechanism generating data according with this scheme only would produce lower values for than for in a proportion around 5%, while for it would be about 30%. This observation led to define in ÁlvarezEsteban et al. (2017) the following index to measure the disagreement level with the stochastic order between distribution functions
(4) 
An statement like , for some fixed (small) would give a quantified approach to an approximate stochastic order. This index allows to measure the level at which a restricted (usually to an interval) stochastic dominance holds, a concept already considered in Berger (1988), Lehmann and Rojo (1992) and Davidson and Duclos (2013). Let us also note the easy facts that if and only if and that, for any pair of continuous d.f.’s which, at most, coincide on a denumerable set of points, the relation holds.
We recall that for finite populations, thus for samples and coming from and , the value of is obtained by reordering both data sets in increasing order and counting the number of times that an of given rank exceeded the of the same rank. As explained in Example III.1 b) in Feller (1968) or in Hodges (1955), Galton used this procedure to answer in the positive sense to a query of Charles Darwin on a data set composed by two samples of size 15 for which only two times the order was reversed. We note that this (rankorder Galton) statistic, has been considered in the literature just to reject, for small enough values of , the null hypothesis that the treatment is without effect (), in favour of the alternative that the treatment tends to increase the measurements (). Therefore, until now, its use belongs to the class of procedures for testing equality in distribution against stochastic dominance. Probably this was the first documented in the literature use of a rank statistic, although Galton was not able to quantify through a significance level his argument. In fact, Chung and Feller (1949) (see also SparreAndersen (1953) or Hodges (1955) for alternative proofs)) showed that under the null (), the number of times that an of given rank exceeded the of the same rank is uniformly distributed on the set , thus the pvalue associated to Galton argument for the Darwin data should be 3/16, which is not as rare as Galton suspected.
2.2 The independent sampling approach
Assume now that have also the same marginal distributions as , but that they are independent random variables. Under this joint structure, associated to the copula the relation would be too extreme, because it demands that for some value the support of is contained in while the support of is contained in . However, in order to compare say treatments, it would also be very informative to know that
(5) 
is very small. In fact, this would mean that with large probability, treatment will produce better results than treatment when used on independent samples of patients. For the parameters already considered in Subsection 2.1, while
Notice that with equality if or is continuous. Moreover, implies that but the opposite fails.
The concept of ‘stochastic precedence’ of to (noted ) introduced in Arcones et al. (2002) corresponds to the case , leading to a weaker relation than the stochastic order. In fact, if then we see that
where is an independent copy of . Of course the value 1/2 can be considered as a maximal value of to guarantee some advantage of over in the sense considered in (5), but lower values of would confirm a larger guarantee of improvement.
Arcones et al. (2002) mention, as a convenient feature of stochastic precedence, that it holds for normal distributions whenever their means satisfy the corresponding order. It is easy to show that this generalizes to other situations involving locationscale families:
Proposition 2.1
Let be two random variables whose distributions are symmetrical w.r.t. zero. Let be their d.f.’s which we assume strictly increasing on for some . Let and and let be respectively the d.f.’s of and . Then, if and only if .
Proof: W.l.o.g. we can assume that are independent. We have , hence
Thus, is always true. If we take , then obviously In fact, if the reverse inequality holds and under the symmetry hypothesis, it should happen for any what is impossible by assumption.
Thus, in spite of the stochastic precedence is an appealing characterization, it seems to be a too loose condition to describe stochastic dominance, while a quantification, like that provided by could improve the information.
2.3 The contamination approach
Let us also consider this alternative approach, developed in ÁlvarezEsteban et al. (2016). Note that always exists some , allowing decompositions of and in the way
(6) 
We can interpret these mixture decompositions in terms of a two stage random generation consisting in a Bernoulli distribution that, with probability equal to , chooses the distributions that effectively satisfy the stochastic order. Therefore, if such a is small enough, we could say that the greater part of the distribution dominates that of . Therefore, the lowest compatible with such decompositions:
(7) 
can be considered as a level of disagreement with the stochastic order.
Fortunately the index defined through (7) can be easily characterized resorting to the intrinsic relation between trimmings and contamination mixtures pointed out e.g. in Proposition 2.1 in ÁlvarezEsteban et al. (2011). For that purpose it is necessary to introduce a general version of trimming which allows partial trimming of data. In the sample setting, for a fixed , trimming a data set at the level consists in giving a weight function, , on that satisfies
where is the remaining proportion of after trimming. Each trimming has an associated trimmed probability: . For general probability measures the generalization is simple. The probability is a trimming of the probability , a fact denoted by if there exists a function satisfying a.s. and such that
We include now the link between the contamination model and trimmings as well as some relevant facts in our present setting (see Propositions 2.3 and 2.4 in ÁlvarezEsteban et al. (2016) for details and additional discussion).
Proposition 2.2
Let be probability distributions on with d.f.’s and , respectively and . Also define the d.f.’s
Then, is equivalent to each of the following statements:

for every for some d.f. .

.
Statement b) implies that the set of the trimmed versions of a given distribution on has a minimum and a maximum with respect to the stochastic order, just characterized by and . These d.f.’s are respectively obtained by trimming at level just on the right (resp. left) tail the probability with d.f. . From this, it is easy to show that the decompositions (6) hold for if and only if , and this holds if and only if . This leads to the appealing characterization of ,
(8) 
allowing to define a quantified approximation to the stochastic order, denoted , whenever . Returning to the example already considered in the preceding subsections, for this index we have , while
Obviously, if and only if . Also the relation holds, although strict inequality is the typical situation.
2.4 A unifying framework
Let us return to the initial problem, on two populations with the same number of individuals, for a simple comparison of the ways that the different approaches focus to measure the lack of stochastic dominance. Let and be the heights of some individuals belonging to different populations.
To analize a possible dominance of the height of population over that of , we can reordering the individuals of both populations according to their heights leading to and . In this way, stochastic dominance would be equivalent to for every , corresponding just to the comparison between the individuals with identical height rank in each population.

If the order does not hold say for pairs , the index approach would report the value as the measure of lack of dominance. In the twosample setting this would be the value of Galton statistic.

Although it is not obvious from the definition (7) of the index , by using the characterization after Proposition 2.2 (leading to (8)), to compute that index we would consider the infimum value, say , such that deleting the greatest ranked individuals in and the lowest ranked in , the remaining subsets and verify the stochastic dominance (thus for every ). Also, recalling characterization (8), in the twosample setting this would be the value of the onesided KolmogorovSmirnov statistic.
Now, it is time for their relations and similarities. In spite of their different meanings, the indices and share a common principle that can be generalized in the following way. If is any bivariate random vector with marginal distribution functions and , the laws of and can be decomposed as:
(9) 
Of course, if (resp. 1) we would have stochastic order (resp. ). In general, if regardless the joint distribution of the relations
hold for all . Therefore the conditional laws satisfy the stochastic order relations
that embedded in (2.4) result in a decomposition of and (that depends of the joint law ) as
(10) 
As a first byproduct, from (7) we conclude that independently from the chosen representation. Particularizing for the pair given by the quantile functions , takes the value , and the d.f.’s and (resp. and ) are the d.f.’s of the quantile function (resp. ) conditioned to the subsets and of In particular
Recall that for independent random variables and with respective d.f.’s and , leading to (10) with , and and (resp. and ) being the d.f.’s of the first (resp. second) coordinate conditioned to the halfspaces and of equipped with the (product) probability associated to the d.f. on .
Other decompositions based on different dependence structures may be of some interest, but instead let us focus on the problem of searching for a pair , if it exists, that minimizes in the decompositions (10). This would result in the new suggestive index
(11) 
that, taking into account that is a lower bound for any satisfying (10), verifies
(12) 
Now recall that the quantile functions are just realistic realizations of Strassen’s theorem on existence of stochastic representations of the stochastic order (see e.g. Lindvall (1999)). In such context (i.e., when ), Strassen theorem states that there exists a pair of random variables with marginal d.f.’s and which minimizes and we know that gives such a pair. Therefore the inequalities in (12) invite to raise the following onesided Strassen coupling problems:

Is the minimum in the definition of attained?

In case of positive answer, which is the dependence structure on that yields that minimum?
To answer these questions the following new characterization of the approximate stochastic order in terms of quantile representations will be useful. We want to remark the interplay with the way to obtain the index for the finite populations, explained in item iii) at the beginning of this subsection.
Proposition 2.3
For d.f.’s and we have if and only if
(13) 
Proof: We keep the notation introduced in Proposition 2.2, denoting by and , the minimal and maximal trimmings, respectively, of .
A simple computation shows that the associated quantile functions are
As a consequence, the quantile function of any d.f. in satisfies
thus the characterizations following Proposition 2.2 lead to if and only if
or, equivalently, if and only if (13) holds.
Now, returning to items a) and b) above, first note that under the stochastic order we would have thus both inequalities in (12) would be equalities. On the other hand, if , then . Thus, by Proposition 2.3, holds for every . We introduce the following rearrangement, , of the quantile function ,
(14) 
It is easy to see that (seen as a r.v. defined on with probability given by Lebesgue measure) the d.f. of is also . Our construction guarantees that for every . Therefore , hence by definition of and the first inequality in (12),
This shows that providing an alternative characterization of and a representation for which the minimum is attained. We summarize all these facts in the following proposition.
Proposition 2.4
In the light of the last result we can revisit the indices and resorting to the example of comparison of the normal distributions already handled. According with the underlying ideas supporting the first index, the value shows that a random generator based on the quantile functions would produce smaller values for the distribution than for the nearly the 5% of the times, a nice description of the existent high level of dominance. On the other hand, the value shows that it is possible to create a right random generator able to reduce the proportion above to the 1.5% of the times, but it is impossible to create another able to improve this rate. Of course this “smart generator” needs the previous knowledge of the pair of distributions to be built because it cannot handle both distributions in the same way, while the “naïf generator”, associated to , gives identical treatment to every distribution, as –a universal representation– would demand. In contrast with the meaning of these indices, the scope of relies on the comparison of physical samples, giving an additional and valuable information about the dominance.
The construction of allows to consider a copula associated to the index . It is straightforward to obtain that the dependence structure obtained between and has the associated copula
(15) 
Of course, to obtain a right copula for particular d.f.’s we need to know the value Also note that (15) gives a general expression that allows to incorporate the index to our setting. However, given and , it is often possible to obtain more natural couplings to obtain the same result. For a better understanding of this fact, we show in Figure 3 the coupling associated through (15) to our example in Figure 2, to be compared with the coupling showed there. Note that the last bars correspond to the comparison between the taller woman and the smaller man.
Remark 2.4.1
An important property of the stochastic dominance is that it is invariant w.r.t. monotone functions in the sense that if and is a nondecreasing function, then . In this sense, the invariance property of an index measuring departures of the stochastic order seems to be highly desirable. Since for every strictly increasing function , this invariance is obviously shared by all the indices that can be represented as for some pair with marginal d.f.’s and . Therefore this invariance property holds for all the indices considered in this section. Notice that, with the exception of linear functions, this property is not verified by the index
(16) 
naturally associated to the almost surely stochastic dominance approach of Leshno and Levy (2002). In particular, as already noted in the introduction, this shows that does not admit a representation.
Remark 2.4.2
Since we are mainly interested in using indices able to evaluate small departures from stochastic dominance, a notable property to demand to any index should be that of being zero whenever stochastic dominance holds. This is a property that and share (or even as defined in (16)) but it is not verified by .
Finally let us notice that, taking we obtain
giving an upper bound for the probability and the less favourable decomposition (looking for stochastic dominance of over ) in the way considered in (10). Recall also that and can be simultaneously small, when and are “almost equivalent”, which would lead to a large value for . In fact the index , allowing universal representations, plus the indices and give a complete enough picture of the possibilities of generating samples which satisfy individually the prescribed order up to the proportion fixed. However, when a sampling plan is mandatory, other indices related to that planning, like to the independence of the samples, could also be useful.
3 Testing the levels of dominance
Through this section and will be independent samples of i.i.d. random variables such that and have respective d.f.’s and . Also, by and we will denote the respective sample distribution functions based on the and samples.
The MannWhitney version of Wilcoxon statistic (Mann and Whitney (1947)):
allows to obtain a natural estimator for through . In fact, since , the estimator is just the plugin version of :
This estimator has been widely analyzed in the statistical literature from the beginning 1950s (see Birnbaum (1965) and references therein, Govindarajulu (1968), Yu and Govindarajulu (1995)). Chapter 5 in Kotz et al. (2003) is mainly devoted to describe the asymptotic properties of and the obtention of asymptotic confidence intervals and bounds for based on that asymptotic normality. Therefore we do not pursuit on this topic here.
Characterization (8) of also invites to consider the plugin version,
(17) 
as estimator of the index . This is widely known as the one sided KolmogorovSmirnov statistic, with an important role in the framework of nonparametric goodness of fit and also in the framework of testing stochastic dominance (see e.g. McFadden (1989), Barrett and Donald (2003), Linton et al. (2005)), although mainly in the context of testing vs .
The asymptotic distribution of (17) under the hypothesis was already obtained by Smirnov in the late 1930’s, while Raghavachari (1973) obtained the general case, that we state below:
Proposition 3.1 (Raghavachari (1973))
Let and be continuous, and in such a way that . If we denote and and are independent Brownian Bridges on , then
(18) 
A general result including a bootstrap version has been also given in ÁlvarezEsteban et al. (2014). That paper also includes the details showing that the limit law in (18) has quantiles that can be suitably bounded below by normal quantiles and above by quantiles of the law of:
Moreover, it contains an useful expression for the computation of its quantiles through numerical integration in a feasible way.
These facts provide the bases for the statistical inference on . In particular, uniformly exponentially consistent tests have been obtained even for the more interesting problem vs , thus allowing statistical assessment of almost stochastic dominance like when rejecting the null at the fixed level. We refer to ÁlvarezEsteban et al. (2016) and its Electronic Supplementary Material for details and extensive simulations showing the sample performance of the tests and confidence bounds relative to this index .
In contrast, to our best knowledge, the apparently more simple index between those considered in Section 2, , has been previously analyzed only for d.f.’s with just one crossing point (as it usually holds in a locationscatter family) and in the homogeneous case. An intermediate case, when nonnecessarily but was treated in Gross and Holland (1968). Therefore, we will give some pertinent theory and some discussion to complete the (in fact, complex) panorama and the limitations underlying this index.
It is intuitively obvious that for d.f.’s that coincide on some interval, or even that mutually cross infinitely many times, cannot be consistently estimated by the plugin version on the basis of finite samples. This was showed for equal sized samples in Gross and Holland (1968). We will solve the problem on the basis of a remarkable theorem of Lévy (1939) (see Theorem 5.2 in Section 5). In that section we will give a full picture of the possible asymptotic behaviours of the plugin estimator. In particular, from Lemma 5.4 we trivially obtain the a.s. consistency under the most general possible assumption, which is stated in the following theorem. In Section 5, we will show that the condition is also necessary for consistency (see Theorem 5.8).
Theorem 3.2
Let , be such that . Then the plugin estimator is a.s. consistent for :
Moreover, with statistical applications in mind, and adopting a realistic point of view we state below a simple version of our general asymptotic law result (see Theorem 5.11) covering only the case of d.f.’s with at most a finite number of ‘clean’ crosses.
Theorem 3.3
Assume that and are supported in (possibly unbounded) intervals where they have continuous, positive densities and . Assume further that in such a way that , and

There exist such that is constant in , , with opposite signs in consecutive intervals (in particular , ) and for .

There exist and and such that for every

For some
Then the plugin estimator verifies:
(19) 
where are independent Brownian bridges on .
Although the limit expression in (19) is troublesome, it defines just a centered normal law. Therefore, if is an estimate of the standard deviation of , then we can define asymptotic upper and lower confidence bounds for , based on the sample distribution functions , respectively by
where is the d.f. of the standard normal law. After the developed theory, it becomes clear that, given any , rejection of in favour of at a level is equivalent to assess that .
Moreover, for large samples, approximate normality of the estimator justifies the use of bootstrap to obtain as follows: for samples and , we compute . Bootstrapping with identical sizes to those of the original samples, we obtain bootstrap d.f.’s , and compute . This set of estimations leads to the bootstrap estimate of the variance of given by .
Remark 3.3.1
Some caution must be advised to check the possibility of a large asymptotic variance for d.f.’s with crosses at some points where the densities are very near, a fact that would lead to very unstable estimations of . The index seems appropriate for applications involving a very limited number of crosses, as it happens in restricted stochastic dominance, where it is expected that the stochastic order holds on a wide interval, perhaps excluding part of one or both tails (see Davidson and Duclos (2013)). The index would give a measure of the relative importance of the considered interval in both populations. In our simulations, a bootstrap estimation of the variance of