JavaNPST: Nonparametric Statistical Tests in Java
Abstract
Nonparametric statistical tests are useful procedures that can be applied in a wide range of situations, such as testing randomness or goodness of fit, onesample, twosample and multiplesample analysis, association between bivariate samples or count data analysis. Their use is often preferred to parametric tests due to the fact that they require less restrictive assumptions about the population sampled.
In this work, JavaNPST, an open source Java library implementing 40 nonparametric statistical tests, is presented. It can be helpful for programmers and practitioners interested in performing nonparametric statistical analyses, providing a quick and easy way of running these tests directly within any Java code. Some examples of use are also shown, highlighting some of the more remarkable capabilities of the library.
Keywords— Nonparametric tests, nonparametric inference, Java library, Java, open source
1 Introduction
Nonparametric statistical tests [1, 2] comprise a class of hypothesis testing procedures in which the null hypothesis is not a statement about parameter values. Instead, the hypotheses are usually concerned with the probability distribution of the sample data used in the test or with the form of the population. Generally speaking, most of them can be considered to be distributionfree procedures, in the sense that the distribution of the random variables involved does not depend on the specific distribution function of the population from which the testing sample was drawn [3].
There is a notable number of problems that can be tackled with these procedures. Inside the nonparametric statistical inference field, a practitioner can find a relatively high number of tasks such as testing for randomness or goodness of fit, locating the median or some other quantile of a particular distribution, comparing two samples in terms of location of medians or scale, testing equality of multiple independent samples or analyzing count data.
Interest in this topic is widespread: For example, a quick search in www.amazon.com with the search query nonparametric statistical tests would return more than 800 different books. The same search, performed in Google Scholar would retrieve more than 268,000 different documents indexed on the Web. From statisticians developing new procedures and analyzing their features, to practitioners in a very broad range of fields, nonparametric statistical tests have attracted the attention of the research community since the beginning of the 20th century.
Currently, many statistical software suites such as SPSS [4], SAS [5], Minitab [6] or StatXact [7, 8] include some of these procedures. Many minor software packages can be found on the Web implementing some of the most popular methods in various programing languages like C, C++, Java, Fortran90, R, Matlab or Mathematica. However, to the best of our knowledge, there is not a single software package implementing a complete set of nonparametric statistical tests ready to be used in most situations. Often, practitioners willing to employ nonparametric tests in their applications need to search through very different sources until they get the specific set of tests needed, or to rely on the tests implemented in other software suites, outside of their own software projects.
In this paper we present JavaNPST, a Java library of nonparametric statistical tests, which is suitable for practitioners in most of these situations. It is an open source library, featuring 40 different nonparametric tests (freely available at http://sci2s.ugr.es/software/javanpst/). They are classified in 10 families of methods, each one oriented to tackle a particular kind of problem. As it is a Java library, it can be easily integrated to any Java software project without requiring a deep understanding of the language, and can be used under any operating system able to run the Java Virtual Machine. Moreover, given the popularity of Java, many solutions for translating/running Java code in other environments have been developed, such as rJava in R [9], which increases the accessibility of our library to practitioners interested in its use. Therefore, it should not be difficult to integrate it into existing software projects, even if developed in a different programming language.
The design of the library offers a simple interface to the user. A homogeneous set of methods, common to every procedure, allows the user to define the necessary tests, set them up and run them, and to obtain complete results including test statistics and computed values. This property, together with the generality of problems covered by the 10 families, makes JavaNPST a suitable tool in many different application fields, even to be used by teachers in introductory courses about nonparametric statistical inference, as a resource to let students experiment with the tests by themselves.
The rest of this paper is organized as follows: Section 2 surveys the 10 families of nonparametric tests considered in the library. Section 3 describes the structure and the main features of JavaNPST. Section 4 shows several cases of use in various application fields. Finally, Section 5 discusses the conclusions achieved.
2 Nonparametric statistical tests
Given their broad definition, a substantial number of tests have been developed for many situations in which nonparametric statistical inference fits the problem that has arisen (mostly due to the nature of the data). Thus, a proper way of giving a quick snapshot of the field would be to establish a taxonomy of the tests, classified by the kinds of problems that they tackle. In the development of JavaNPST, we have followed the taxonomy established in [3], where 10 families of methods are presented (Table 1 summarizes them):
Family  Test  Reference 

Tests of randomness  Number of Runs  [10] 
Runs Up and Down  [11]  
Runs Up and Down (Median)  [10]  
Von Neumann  [12]  
Tests of goodness of fit  ChiSquare test  [13] 
KolmogorovSmirnov  [14]  
Lilliefors  [15]  
AndersonDarling  [16]  
Confidence Quantile  [3]  
Onesample and  Population Quantile  [3] 
pairedsamples  Sign test  [2] 
Wilcoxon SignedRanks  [17]  
WaldWolfowitz  [18]  
TwoSample  Median test  [19] 
general procedures  Control Median  [20] 
KolmogorovSmirnov  [14]  
Location problem  Wilcoxon RankSum  [17] 
van der Waerden  [21]  
Scale problem  DavidBarton  [22] 
FreundAnsariBradley  [23]  
Mood  [24]  
Klotz  [25]  
SiegelTukey  [26]  
Sukhatme  [27]  
Extended Median test  [19]  
Equality of  KruskalWallis  [28] 
independent samples  JonckheereTerpstra  [29] 
CharkrabortiDesu  [30]  
Association for  Kendall  [31] 
bivariate samples  Daniel Trend  [31] 
Friedman  [32]  
Association in  Page  [33] 
multiple classifications  Concordance Coefficient  [3] 
Incomplete Concordance  [34]  
Partial Correlation  [35]  
Contingency Coefficient  [36]  
Analysis of  Fisher’s exact test  [37] 
count data  McNemar  [38] 
Multinomial Equality test  [3]  
Ordered Equality test  [3] 

Tests of randomness: These tests are used to check randomness either in binary symbolic sequences (for example XYXYXY) or in numerical sequences. The Number of Runs test can be applied to test randomness of the former, whereas, for numerical sequences, a Runs Up and Down test based either on previous values or on the median value of the sequence can be used. Furthermore, the Von Neumann ranksbased test can also be applied to numerical sequences.

Tests of goodness of fit: JavaNPST implements the ChiSquare test for the adjustment of data to discrete distributions. For continuous fit, the KolmogorovSmirnov test is provided as an omnibus test that should be useful in most situations. Two more tests, Lilliefors and AndersonDarling, can also be applied when the adjustment has to be tested against a normal or exponential distribution with unknown parameters.

Onesample and pairedsample procedures: The usefulness of these procedures lies in verifying hypotheses related to a given quantile of the sample’s distribution (usually the median). The Confidence Quantile test is used to obtain a confidence interval for a specified quantile, whereas the Population Quantile test allows the user to test a hypothesis concerning a specific value for any quantile. For paired samples, the Sign test can be employed to test the location of the median of the population of differences. Finally, the Wilcoxon SignedRanks test is used in the same scenario, but employing more information concerning relative magnitudes as well as directions or differences.

Twosample general procedures: These procedures are used to verify equality between two samples, without assuming any specific model. The WaldWolfowitz test maps the values of the ordered combined sample into a binary symbolic sequence, and then applies a Number of Runs test to detect differences if too few runs are found. Median and Control Median tests highlight differences between populations using the median value of the samples. Finally, the KolmogorovSmirnov TwoSample test can also be used to test equality, under general assumptions.

Tests for the location problem: Tests in this category follow the location model
(1) According to this model, the Wilcoxon RankSum tests provide a way to compare two samples and estimate a confidence interval for the location parameter. The van der Waerden test also follows the location model, but using inverse normal scores as weights when forming the linear ranks statistic of the test.

Tests for the scale problem: Analogous to the former category, the tests for the scale problem check differences between both distributions regarding the scale parameter
(2) Following this model, the DavidBarton, FreundAnsariBradley and Mood tests establish three different schemes of weights to obtain linear ranks for testing the hypothesis. On the other hand, the Klotz test obtains a set of ranks based on the van der Warden test, whereas the SiegelTukey test employs the Wilcoxon RankSum test weights and the Sukhatme procedure tests the hypothesis with a MannWhitney based statistic.

Tests of equality of independent samples: This family contains several methods oriented to tackling the natural extension of the TwoSample problem, namely the sample problem, whose null hypothesis states that all samples are drawn from identical populations, whereas the general alternative is simply that the populations differ in some way
(3) The Extended Median test and the KruskalWallis test are in this category, as the natural extensions of the Median and the Wilcoxon RankSum tests, respectively.
Ordered alternatives concerning the location parameter, such as
(4) can be tested with the JonckheereTerpstra test. Finally, comparisons with a control (where the null hypothesis states that every parameter is equal to or greater than that of the control, with at least one of the inequalities strict) can be performed with the CharkrabortiDesu test, the last procedure of this family.

Measures of association for bivariate samples: Two wellknown measures of association for bivariate samples, together with their respective formulation of hypothesis for association, are included in this category: The Kendall’s coefficient (Kendall’s test) and Spearman’s correlation coefficient (Daniel Trend test).

Measures of association in multiple classifications: These procedures are the nonparametric analogs of the twoway analisysofvariance problem, where data cannot be considered as single random sample because of certain relationships between them, such as columns and row effects. The general test to employ here is the Friedman test, whereas the Page test is used for testing ordered alternatives (with the same form as the alternative shown in Equation 4).
Other tests in this category can be used to find a measure of strength of the relationship between rankings. This measure, the coefficient of concordance, can be computed for complete samples (the Concordance Coefficient procedure), or for incomplete samples (the Incomplete Concordance procedure) belonging to Youden squares or Latin squares experiment designs. Finally, partial correlation between ranks (Partial Correlation test) can also be computed using Kendall’s .

Analysis of count data: Count data can also be analyzed in various ways using nonparametric procedures. Several coefficients of no association between rows and columns can be computed for contingency tables (Contingency Coefficient procedure).
Fisher’s exact test consists of testing the significance of the association between classifications in 2 2 contingency tables. McNemar’s test for 2 2 contingency tables can be applied to determine whether the row and column marginal frequencies are equal.
Finally, the multinomial test of equality is used to test the equality of probabilities in multiple classifications. The alternative hypothesis may simply be that the probabilities are not equal (the Multinomial Equality test) or that they form an ordered alternative (the Ordered Equality test).
3 The JavaNPST library
JavaNPST is a Java library featuring a wide collection of nonparametric tests, together with several definitions of data structures and numeric distributions needed to deploy and carry out the tests. Public classes belonging to the library feature interfaces are restricted to the essentials, thus facilitating its use inside highlevel applications.
The library is built around three core packages:

Data (javanpst.data): Modeling the data structures of the library (sequences and tables).

Distributions (javanpst.distributions): Including classical discrete and continuous distributions, and distributions related to the tests.

Tests (javanpst.tests): Implementing the nonparametric tests of the library.
A fourth package, Utils (javanpst.utils) contains some common tools for the internal use of the library (functions for input/output of files, formats, some mathematical operations, and so forth).
All these elements have been developed using an objectoriented style. Therefore, the user of JavaNPST can expect to find objects representing all the necessary pieces to perform an analysis. In this way, a typical use of the library will include the declaration of an object modeling the data to analyze (data sequences, or samples from various populations), the creation of another object representing the nonparametric test selected, and, finally, the evaluation of the test, obtaining an output report as a result.
The rest of this section is devoted to describing in depth the main packages: The Data package (Section 3.1), the Distributions package (Section 3.2) and the Tests package (Section 3.3). A more detailed description of the JavaNPST API, together with other resources and examples of use can be found at the web page of the project http://sci2s.ugr.es/software/javanpst/.
3.1 The Data package
Before running a test, data should be provided to the test object. Input data  usually composed of various samples drawn from several populations, or of sequences of numerical or string values  should be represented in a proper way, before starting its analysis.
JavaNPST defines two main data structures for storing samples: Sequence and DataTable objects. The former is suitable when the sample to study is composed either of a sequence of values or of a sample drawn from a single population. The latter should be used when data represents more than one sample.
Sequences
Sequences can instantiated as two different objects: NumericSequence (for numerical data) and StringSequence (for textual data). Both can be easily built from ArrayList objects:
A second option for initializing Sequence objects is to load data into them directly from a file. Currently, JavaNPST allows reading data to be stored in XML, CSV and TXT formats. For both kinds of sequences, the input format is the following

XML:
<sequence><element>Value 1</element> ...<element>Value 2</element> ......<element>Value K</element> ......<element>Value N</element> ...</sequence> 
CSV or TXT:
Value 1;Value 2;...;Value K;... Value N
Once the data file has been properly formatted, new sequences can be created as follows
Data tables
The DataTable object can be used to store numerical data in a tabular way. Usually, data tables in JavaNPST are used to store several samples drawn from different populations, where values in a column belong to the same sample
In a similar way to sequences, data tables can be filled easily during their instantiation
Data tables can also load data from XML, CSV or TXT files. The required format is shown below

XML:
<tabular rows = ''#Rows'' columns = ''#Columns''><row><element>Value(1,1)</element>...</row>...<row><element>Value(n,1)</element>...</row></tabular> 
CSV or TXT:
Value(1,1);...;Value(1,n)...Value(n,1);...;Value(m,n)
Again, loading data from a file into a DataTable object can be done in a single instruction
Apart from builders and methods for reading data, Sequence and DataTable objects have an interface providing basic functionality for obtaining and modifying values, writing contents to a file and so forth, available to the user
The rest of the Data package contains several inner classes developed for writing and loading data, as well as specific data structures used for storing data belonging to test distributions.
3.2 The Distributions package
This package contains implementations of all the distributions used by the tests of the library. They can be classified as

Common distributions

Discrete distributions: BinomialDistribution, PoissonDistribution,

Continuous distributions: NormalDistribution, ChiSquareDistribution,


Tests distributions: KolmogorovDistribution, KendallDistribution,
Usually, a programmer using JavaNPST should not need to declare or employ a distribution in isolation, since they are automatically created and configured as soon as they are required by each test. However, direct access is provided as sometimes it may be useful to use some common distributions directly. An example, illustrating how a normal distribution can be modeled, is shown as follows
With the distribution initialized, some operations such as, for example, the computation of cumulative distribution values, are straightforward
On the other hand, test related distributions are used for internal management by those tests whose exact distribution is known but is stored inside the library as a table (Wilcoxon’s or Lilliefors’ distribution are representative examples). As they are managed internally, users should not need to access them in most cases.
3.3 The Tests package
Test objects are the main components on which JavaNPST is based. They allow the user to, given an appropriately formatted set of data, perform a test and get all the results obtained in the inference process.
All the tests share a common interface which provides the user with a basic functionality. In this way, setData and doTest methods can be used to load the data to test and perform the inference, respectively, whereas the printReport method shows the full results of the process.
As an example, when data have already been represented in a DataTable, a Median test can be performed using only these methods
Moreover, in some scenarios the user could be interested in obtaining a single value of the inference (that is, only the value of a tail), or the value of a test statistic. JavaNPST also allows this possibility, offering direct access to every output value of each test
No more procedures or operations are needed to perform a test and obtain its results. This simplicity is one of the key features of JavaNPST, making the task of integrating the tests in the development of more advanced software projects very easy.
4 Using JavaNPST: Examples of use
JavaNPST is freely available at http://sci2s.ugr.es/software/javanpst/. On this website, users can find both the JAR file with the library and the API documentation, in HTML (JavaDoc based) format. Moreover, the source code is also offered, under the terms of the GNU Public License GPL.V3.
The source code itself is documented thoroughly, thus practitioners should not find any problem in using the 40 tests of the library and its associated tools. In addition, the web site also provides a number of code samples illustrating the use of the tests, with various examples of their use (see http://sci2s.ugr.es/software/javanpst/tests.php).
In this section, we will focus our attention on two elaborate examples. Section 4.1 covers the use of nonparametric tests as a tool for contrasting comparisons of machine learning algorithms (reviewing an already published research experiment), and Section 4.2 shows an application of the KolmogorovSmirnov test of JavaNPST for optimizing a scoring model in credit risk management.
4.1 Nonparametric tests for comparing machine learning algorithms
In recent years, nonparametric tests have attracted the attention of numerous researchers in the machine community. In this field, nonparametric tests arise as a tool for contrasting results of experiments. In most cases, their use is preferred to parametric alternatives —such as the test— due to the impossibility of fulfilling the necessary conditions for safety (independence, normality and homoscedasticity [39]).
Data set  DROP3  CHC  Data set  DROP3  CHC 

abalone  0.7550  0.5517  lrs  0.8607  0.8211 
anneal  0.8735  0.9255  lymphography  0.6687  0.8321 
audiology  0.7255  0.8216  newthyroid  0.8727  0.8330 
autos  0.5946  0.8297  optdigits  0.8997  0.5581 
balance  0.7821  0.8723  pageblocks  0.9562  0.7740 
breastcancer  0.8186  0.9178  pendigits  0.9468  0.5089 
cancer  0.9362  0.9451  phoneme  0.8191  0.7262 
card  0.7468  0.9163  pima  0.7775  0.8708 
dermatology  0.8615  0.9021  postoperative  0.8790  0.7852 
ecoli  0.8380  0.8673  primaryt  0.7369  0.8075 
gene  0.6230  0.568  promoters  0.6042  0.8187 
german  0.7441  0.8553  satimage  0.8737  0.5889 
glass  0.7078  0.8471  segment  0.8648  0.9049 
glassg2  0.6783  0.8667  sick  0.9483  0.7957 
heart  0.7889  0.9198  sonar  0.6670  0.8649 
heartc  0.7937  0.8658  soybean  0.8132  0.9057 
hepatitis  0.8521  0.9186  texture  0.9015  0.5130 
horse  0.7872  0.9125  tictactoe  0.7316  0.8838 
hypothyroid  0.9749  0.7834  vehicle  0.6808  0.8155 
ionosphere  0.9070  0.8592  vote  0.9072  0.9265 
iris  0.8348  0.9141  vowel  0.5741  0.8094 
KrvsKp  0.7623  0.7926  waveform  0.7291  0.5044 
labor  0.6308  0.8846  wine  0.8298  0.8801 
led24  0.5583  0.7894  yeast  0.7282  0.8374 
liver  0.5871  0.9061  zoo  0.7681  0.8374 
A representative example can be found in [40], where the Sign and Wilcoxon tests are used to contrast the results obtained when analyzing the behavior of several machine learning algorithms. Here we reproduce one of the comparisons performed, between the DROP3 and CHC methods, in terms of a specific performance measure (storage requirements).
Table 2 shows an adaptation of the values used for the comparison (values have been changed to (storage requirements), to aim for maximization of the performance measure before applying Wilcoxon and Sign tests).
Using JavaNPST, Wilcoxon and Sign tests are carried out in a few lines:
The results obtained exactly match those reported for the Sign test by the authors ( value: 0.0066) and shows a very near value for the Wilcoxon test ( value: 0.0718). With these results we have corroborated the study performed (with small differences probably caused by their use of an asymptotic distribution of the Wilcoxon statistic instead of the exact one) and shown how JavaNPST can be used to perform these tests in a quick and easy way.
4.2 Nonparametric tests in credit scoring improvement
It is also possible to find applications of nonparametric tests in the field of Economics. Specifically, in this section we shall show how the KolmogorovSmirnov TwoSample test (and its associated statistic) can be used to evaluate the quality of a scoring process, and how JavaNPST can be employed to develop a testing method for them.
Credit scoring [41] is the process by which the potential risk posed by lending money to consumers is evaluated, with the aim of mitigating losses due to bad debt. In this way, a great variety of techniques may be used to establish a scoring scale, which will be used to discriminate between “goods” credits (those which are very likely to be returned) and “bads” ones (those which will be lost). Hence, new petitions can be segmented between those two classes, depending on the score that they achieve.
One way of analyzing the predictive power of a credit score system (such as, for example, a credit card) consists of using the KolmogorovSmirnov statistic to measure how far apart the cumulative distribution functions of the scores of the “goods” and the “bads” credits are. If the KolmogorovSmirnov statistic is high enough, both distributions will be well discriminated, and thus the predictive power of the scoring system will be good. A graphical example is shown in Figure 1, where the vertical line marks the maximum separation between both distributions, representing the KolmogorovSmirnov statistic.
The process of optimizing such a scoring model can be evaluated using JavaNPST. Below we show an example where, given a SearchMethod able to generate several ScoreCard models (inside an optimization process), classifications obtained by these models are evaluated by a KolmogorovSmirnov TwoSample test (K_STest), using the KolmogorovSmirnov statistic as a performance measure. When all the iterations of the SearchMethod have been carried out, the best model is reported, as well as its associated statistic and the value of the difference between the distributions of “goods” and “bads” credits
This is a good example of an improvement of test libraries (such as JavaNPST) over software suites: Partial results of the tests can be easily included inside a greater process, without requiring the development of complex procedures to extract data, migrate it to a software suite, and send the results to the native application.
5 Conclusions
In this article we have introduced JavaNPST, a software library featuring 40 nonparametric tests which can be applied in 10 different families of problems. Given the homogeneity of use of the tests, its easy integration and the wide range of problems covered, it may become a useful tool for practitioners in very different fields of science and research, providing them with an open source library ready to be used in any Java software project where a nonparametric statistical test is necessary.
Finally, we conclude this work noting that, despite the current version of JavaNPST providing tests that fit many situations, the possibility of extending it is open: with the main infrastructure of data structures, distributions, and tests already working, inclusion of new procedures as needed should be straightforward, thus enabling it to remain uptodate.
Acknowledgments
Supported by the Spanish Ministry of Science and Technology. This work was supported by the Research Project TIN201128488.
Footnotes
 Elements do not have to be separated by new lines
 Unless explicitly stated in the documentation of a test
 For more details, see the JavaDoc documentation of JavaNPST
References
 J. J. Higgins. Introduction to Modern Nonparametric Statistics. Duxbury Press, 2003.
 D. J. Sheskin. Handbook of Parametric and Nonparametric Statistical Procedures, 5th edition. Chapman & Hall/CRC, 2011.
 J. D. Gibbons and S. Chakraborti. Nonparametric Statistical Inference, 5th edition. Chapman & Hall, 2010.
 Spss software. http://www.spss.com/, 2015.
 Sas. http://www.sas.com/, 2015.
 Minitab. http://www.minitab.com/, 2015.
 Statxact. http://www.cytel.com/Software/StatXact.aspx.com/, 2015.
 C. P. Mehta. Statxact: A statistical package for exact nonparametric inference. The American Statistician, 45(1):74–75, 1991.
 R. http://www.rforge.net/rJava/, 2015.
 F. S. Swed and C. Eisenhart. Tables for testing the randomness of grouping in a sequence of alternatives. The Annals of Mathematical Statistics, 14:66–87, 1943.
 E. S. Edginton. Probability table for number of runs of signs of first differences. Journal of the American Statistical Association, 56:156–159, 1961.
 R. Bartels. The rank version of von neumann’s ratio test for randomness. Journal of the American Statistical Association, 77:40–46, 1982.
 K. Pearson. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Philosophical Magazine, 50(5):157–175, 1900.
 N. V. Smirnov. Estimate of deviation between empirical distribution functions in two independent samples (in russian). Bulletin of Moscow University, 2:3–16, 1939.
 H. W. Lilliefors. On the kolmogorovsmirnov test for normality with mean and variance unknown. Journal of the American Statistical Association, 62:399–402, 1967.
 T.W. Anderson and D. A. Darling. Asymptotic theory of ”goodnessoffit” criteria based on stochastic process. The Annals of Mathematical Statistics, 23:193–212, 1952.
 F. Wilcoxon. Individual comparisons by ranking methods. Biometrics Bulletin, 14:80–83, 1945.
 A. Wald and J. Wolfowitz. On a test whether two samples are from the same populations. The Annals of Mathematical Statistics, 11:147–162, 1940.
 G. W. Brown and A. M. Mood. Homogeneity of several samples. The American Statician, 2:22, 1948.
 K. Pearson. A method of testing the hypotheses that two samples are from the same population. The Annals of Mathematical Statistics, 14:188–194, 1943.
 B. L. van der Waerden and E. Nievergelt. Order test for the twosample problem and their power. Indagationes Mathematicae, 14:453–458, 1952.
 F. N. David and D. E. Barton. A test for birthorder effects. Annals of Human Eugenics, 22:250–257, 1958.
 A. R. Ansari and R. A. Bradley. Rank sum tests for dispersion. Annals of Mathematical Statistics, 31:1174–1189, 1960.
 A. M. Mood. On the asymptotic efficiency of certain nonparamteric twosample tests. Annals of Mathematical Statistics, 25:514–522, 1954.
 J. Klotz. Nonparametric tests for scale. Annals of Mathematical Statistics, 33:495–512, 1962.
 S. Siegel and J. W Tukey. A nonparametric sum of ranks procedure for relative spread in unpaired samples. Journal of the American Statistical Association, 55:429–445, 1960.
 B. V. Sukhatme. On certain two sample nonparametric tests for variances. Annals of Mathematical Statistics, 28:188–194, 1957.
 W. H. Kruskal and W. A. Wallis. Use of ranks in onecriterion analysis of variance. Journal of the American Statistical Association, 47:583–621, 1952.
 A. R. Jonckheere. A distributionfree ksample test against ordered alternatives. Biometrika, 41:133–45, 1954.
 S. Chakraborti and M. M. Desu. Generalization of mathisen’s median test for comparing several treatments with a control. Communications in Statistics–Simulation and Computation, 17:947–967, 1988.
 M. G. Kendall. Rank Correlation Methods, 4th Edition. Charles Griffin and Co., Ltd., London and High Wycombe, 1970.
 M. Friedman. The use of ranks to avoid the assumption of normality implicit in the analysis of variance. Journal of the American Statistical Association, 32:675–701, 1937.
 E. P. Page. Ordered hypotheses for multiple treatments: A significance test for linear ranks. Journal of the American Statistical Association, 58:216–230, 1963.
 J. Durbin. Incomplete blocks in ranking experiments. British Journal of Psychology (Statistical Section), 4:85–90, 1951.
 S. Maghsoodloo. Estimates of the quantiles of kendall’s partial rank correlation coefficient and additional quantile estimates. Journal of Statistical Computation and Simulation, 4:155–164, 1975.
 K. Pearson. On the Theory of Contingency and Its Relation to Association and Normal Correlation. Dulau and Co., 1904.
 R. A. Fishern. On the interpretation of from contingency tables, and the calculation of p. Journal of the Royal Statistical Society, 85(1):87–94, 1922.
 Q. McNemar. Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika, 12(2):153–157, 1947.
 S. García, A. Fernández, J. Luengo, and F. Herrera. A study of statistical techniques and performance measures for geneticsbased machine learning: Accuracy and interpretability. Soft Computing, 13(10):959–977, 2009.
 N. GarcíaPedrajas, J. A. Romero del Castillo, and D. OrtizBoyer. A cooperative coevolutionary algorithm for instance selection for instancebased learning. Machine Learning, 78(3):381–420, 2010.
 N. Siddiqi. Credit Risk Scorecards: Developing and Implementing Intelligent Credit Scoring. Wiley, 2005.