Computing the asymptotic power of a Euclidean-distance test for goodness-of-fit\thanksreftt

Computing the asymptotic power of a Euclidean-distance test for goodness-of-fit\thanksreftt

\fnmsWilliam \snmPerkinslabel=e1]perkins@math.gatech.edu [ School of Mathematics
Georgia Institute of Technology
686 Cherry St.
Atlanta, GA 30332-0160
\printeade1
   \fnmsGary \snmSimonlabel=e4]gsimon@stern.nyu.edu [ IOMS Department
Stern School of Business
NYU
44 West 4th St.
New York, NY 10012
\printeade4
   and
   \fnmsMark \snmTygertlabel=e2]tygert@aya.yale.edu [ Courant Institute of Mathematical Sciences
NYU
251 Mercer St.
New York, NY 10012
\printeade2
Abstract

A natural (yet unconventional) test for goodness-of-fit measures the discrepancy between the model and empirical distributions via their Euclidean distance (or, equivalently, via its square). The present paper characterizes the statistical power of such a test against a family of alternative distributions, in the limit that the number of observations is large, with every alternative departing from the model in the same direction. Specifically, the paper provides an efficient numerical method for evaluating the cumulative distribution function (cdf) of the square of the Euclidean distance between the model and empirical distributions under the alternatives, in the limit that the number of observations is large. The paper illustrates the scheme by plotting the asymptotic power (as a function of the significance level) for several examples.

[
\kwd
\startlocaldefs\endlocaldefs\runtitle

Computing the asymptotic power of a Euclidean-distance test for goodness-of-fit

\thankstext

ttSupported in part by NSF Grant OISE-0730136, an NSF Postdoctoral Research Fellowship, a DARPA Young Faculty Award, and an Alfred P. Sloan Research Fellowship.

class=AMS] \kwd[Primary ]62G10 \kwd62F03 \kwd[; secondary ]65C60

rms \kwdroot-mean-square \kwdsignificance \kwdstatistic

1 Introduction

Given observations, each falling in one of bins, we would like to test if these observations are consistent with having arisen as independent and identically distributed (i.i.d.) draws from a specified probability distribution over the bins ( is known as the “model”). A natural measure of the deviation between and the observations is the square of the Euclidean distance between the actually observed distribution of the draws and the expected distribution , that is,

(1)

where , …, are the proportions of the observations falling in bins , …, , respectively.

The “P-value” is then defined to be the probability that , where would be the same as , but constructed from draws that definitely are taken i.i.d. from , that is,

(2)

where , …, are the proportions of i.i.d. draws from falling in bins , …, , respectively. When calculating the P-value — the probability that — we view as a random variable while viewing as a fixed number. If the P-value is small, then we can be confident that the observed draws were not taken i.i.d. from the model .

To characterize the statistical power of the P-value based on the Euclidean distance, we consider i.i.d. draws from the alternative distribution

(3)

where is a vector whose entries satisfy . We thus need to calculate the distribution of the square of the Euclidean distance,

(4)

where , …, are the proportions of i.i.d. draws from falling in bins , …, , respectively. Section 4 below provides an efficient method for calculating the cumulative distribution function (cdf) of in the limit that the number of draws is large. Section 5 below then describes how to use such a method to plot the cdf of the P-values; this cdf is the same as the statistical power function of the hypothesis test based on the Euclidean distance (as a function of the significance level). Presenting this method is the principal purpose of the present paper, complementing the earlier discussions of Perkins, Tygert, and Ward (2011b) and Perkins, Tygert, and Ward (2011a), which compare the Euclidean distance with classical statistics such as , the log–likelihood-ratio , and other members of the Cressie-Read power-divergence family; Perkins, Tygert, and Ward (2011b) and Perkins, Tygert, and Ward (2011a) review the classical statistics and provide detailed comparisons.

As reviewed, for example, by Kendall et al. (2009) and Rao (2002), defined in (4) converges in distribution to a noncentral in the limit that the number of draws is large, when the model is a uniform distribution. When is nonuniform, converges in distribution to the sum of the squares of independent Gaussian random variables in the limit that the number of draws is large, as shown by Moore and Spruill (1975) and reviewed in Section 2 below. Section 3 provides integral representations for the cdf of the sum of the squares of independent Gaussian random variables and applies suitable quadratures for their numerical evaluation. Section 4 summarizes the numerical method obtained by combining Sections 2 and 3. Section 5 summarizes a scheme for plotting the asymptotic power (as a function of the significance level) using the method of Section 4. Section 6 illustrates the methods via several numerical examples.

The extension to models with nuisance parameters is straightforward, following Perkins, Tygert, and Ward (2011c); the present paper focuses on the simpler case in which the model is a single, fully specified probability distribution.

2 Preliminaries

This section states Theorem 2.1, which is a special case of Theorem 4.2 of Moore and Spruill (1975). Before stating the theorem, we need to set up some notation. The set-up amounts to an algorithm for computing the real numbers , …, and , …, used in Theorem 2.1, where is an integer greater than 1.

First, we aim to define the positive real numbers , …, , given any vector whose entries are all positive. We define to be the diagonal matrix

(5)

for , …, . We define to be the matrix

(6)

for , …, . Note that is an orthogonal projector. We define , so that is the self-adjoint matrix

(7)

for , …, . As a self-adjoint matrix whose rank is (after all, , is an orthogonal projector whose rank is , and is a full-rank diagonal matrix), given in (7) has an eigendecomposition

(8)

where is a real unitary matrix and is a diagonal matrix such that . Finally, we define the positive real numbers , …, via the formula

(9)

for , …, , where , …, are the diagonal entries of from the eigendecomposition (8).

Next, we define the real numbers , …, , given both and an vector such that . We define the vector

(10)

where is the leftmost block of from the eigendecomposition (8), that is, is the same as after deleting the last column of . We can then define the real numbers , …, via the formula

(11)

for , …, , where is defined in (10) and is defined in (9).

With this notation, we can state the following special case of Theorem 4.2 of Moore and Spruill (1975).

Theorem 2.1.

Suppose that is an integer greater than one, is a probability distribution over bins (that is, is an vector whose entries are all positive and ), is an vector such that , and , …, are the proportions of draws falling in bins , …, , respectively, out of a total of i.i.d. draws from the probability distribution

(12)

Suppose further that is the random variable

(13)

Then, converges in distribution to the random variable

(14)

as becomes large, where , …, are i.i.d. Gaussian random variables of zero mean and unit variance, , …, are the positive real numbers defined in (9), and , …, are the real numbers defined in (11). The values of , …, do not depend on the vector ; the values of , …, do depend on .

Remark 2.2.

The matrix defined in (7) is the sum of a diagonal matrix and a low-rank matrix. The methods of Gu and Eisenstat (1994, 1995) for computing the eigenvalues of such a matrix  and computing the result of applying from (8) to an arbitrary vector require only either or floating-point operations. The methods of Gu and Eisenstat (1994, 1995) are usually more efficient than the method of Gu and Eisenstat (1995), unless is impractically large.

3 Integral representations

This section describes efficient algorithms for evaluating the cdf of the sum (14) of the squares of independent Gaussian random variables. The bibliography of Duchesne and de Micheaux (2010) gives references to possible alternatives to the methods of the present section. Our principal tool is the following theorem, representing the cdf as an integral suitable for evaluation via quadratures (see, for example, Remark 3.2 below); the theorem expresses formula 7 of Rice (1980) in the same form as formula 8 of Perkins, Tygert, and Ward (2011b).

Theorem 3.1.

Suppose that is a positive integer, , …, are i.i.d. Gaussian random variables of zero mean and unit variance, and , …, and , …, are real numbers. Suppose in addition that is the random variable

(15)

Then, the cdf of is

(16)

for any positive real number , where

(17)

and for any nonpositive real number . The square roots in (16) denote the principal branch, and takes the imaginary part.

Remark 3.2.

An efficient means of evaluating (16) numerically is to employ adaptive Gaussian quadratures; see, for example, Section 4.7 of Press et al. (2007). Good choices for the lowest orders of the quadratures used in the adaptive Gaussian quadratures are 10 and 21, for double-precision accuracy.

The remainder of the present section (particularly Remark 3.5) discusses the numerical stability of the method of Remark 3.2 and recalls an alternative integral representation suitable for use when the method of Remark 3.2 is not guaranteed to be numerically stable. The following lemma, proven in Remark 3.2 of Perkins, Tygert, and Ward (2011b), ensures that the denominator in (16) is not too small.

Lemma 3.3.

Suppose that is a positive integer, and , …, and are positive real numbers. Suppose further that (in parallel with formula (17) above)

(18)

for , …, .

Then,

(19)

The following lemma ensures that the numerator in (16) is not too large, provided that is not large.

Lemma 3.4.

Suppose that , , and are positive real numbers and (in parallel with formulae (17) and (18) above)

(20)

Then,

(21)
Proof.

Defining

(22)

and

(23)

we obtain that

(24)

It follows from (24) that

(25)

It follows from (22) that and from (23) that , and hence

(26)

If , then (26) yields that

(27)

which in turn yields that

(28)

If , then (recalling that , too)

(29)

We see from (28) and (29) that, in all cases,

(30)

Combining (25) and (30) yields (21). ∎

Remark 3.5.

The bound (19) shows that the integrand in (16) is not too large for any nonnegative , provided that the numerator of (16) is not too large. An upper bound on the numerator follows immediately from (21):

(31)

For any particular application, we can check that the right-hand side of (31) is not too many orders of magnitude in size, guaranteeing that applying quadratures to the integral in (16) cannot lead to catastrophic cancellation in floating-point arithmetic. Naturally, it is also possible to check on the magnitude of the integrand in (16) during its numerical evaluation, indicating even better numerical stability than guaranteed by our a priori estimates. See Theorem 3.7 and Remark 3.8 below for an alternative integral representation suitable for use when the right-hand side of (31) is large.

Remark 3.6.

The bound in (31) is quite pessimistic. In fact, the real part of is often nonpositive, so that

(32)

If the right-hand side of (31) is large, then we can use the method of Imhof (1961), Davies (1980), and others, applying numerical quadratures to the integral in the following theorem. Please note that the integrand in the following theorem decays reasonably fast when the right-hand side of (31) is large.

Theorem 3.7.

Suppose that is a positive integer, , …, are i.i.d. Gaussian random variables of zero mean and unit variance, and , …, and , …, are real numbers. Suppose in addition that is the random variable

(33)

Then, the cdf of is

(34)

for any positive real number , where

(35)

and for any nonpositive real number . The square roots in (34) denote the principal branch, and takes the imaginary part.

Remark 3.8.

The integrand in (34) is not too large (except for values of that are closer to 0 than are typical quadrature nodes), since the real part of is always nonpositive, so that

(36)

Moreover, the numerator in (34) decays reasonably fast (it is sub-Gaussian) when the right-hand side of (31) is large.

4 Numerical method

Combining Sections 2 and 3 yields an efficient method for calculating the cdf of times the square of the Euclidean distance between the model and empirical distributions, in the limit that is large, when the observed draws are taken i.i.d. from an alternative distribution (as always, is the model — a probability distribution over bins — and is a vector whose entries satisfy ). Indeed, Theorem 2.1 shows that the desired is the same as that in (16) and (34), with the real numbers , …, and , …, calculated as detailed in Section 2 (identifying ). Remark 3.2 describes an efficient means of evaluating in (16) that is numerically stable when the right-hand side of (31) is not too many orders of magnitude in size. When the right-hand side of (31) is many orders of magnitude in size, we can apply quadratures to the representation of in (34) instead (see Remark 3.8).

5 Plotting the asymptotic statistical power

Let us denote by the cdf of the P-values for the Euclidean distance (or, equivalently, for any positive multiple of the square of the Euclidean distance); is also the statistical power function of the hypothesis test based on the Euclidean distance (as a function of the significance level). The method of Section 4 is sufficient for plotting in the limit that the number of draws is large. Indeed, suppose that denotes times the square of the Euclidean distance between the model and empirical distributions, denotes the cdf for when taking draws i.i.d. from the model probability distribution , and denotes the cdf for when taking draws i.i.d. from , where is a vector whose entries satisfy . The P-value equals , in the limit that is large, and then the cdf of the P-values for draws from is

(37)

for any nonnegative real number ; thus, the graph of all points with ranging from to is the same as the graph of all points with ranging from to , in the limit that is large. Section 4 describes how to evaluate and for any real number , in the limit that the number of draws is large; note that when the entries of are all zeros, so the procedure of Section 4 can evaluate as well as . When the entries of are all zeros, in the method of Section 4, and then the right-hand side of (31) is exactly 1.

6 Numerical examples

This section illustrates the algorithms of the present paper via several numerical examples. As detailed in the subsections below, we consider three examples for the model (as always, is a probability distribution over bins, that is, a vector whose entries are all positive and satisfy ), taking i.i.d. draws from the alternative probability distribution

(38)

where is a vector whose entries satisfy (the subsections below detail several examples for ). Figure 1 plots the cdf of the P-values for i.i.d. draws taken from the alternative distribution , when is large; is also the statistical power function of the hypothesis test based on the Euclidean distance (as a function of the significance level). For each of the examples, Figure 1 plots the cdf both for 1,000,000 draws (computed via Monte-Carlo simulations) and in the limit that is large (computed via the algorithms of the present paper); not surprisingly, there is little difference between the plots for 1,000,000 and for the limit that is large. The lines in Figure 1 corresponding to 1,000,000 draws are colored green; the lines corresponding to the limit of large are black.

Remark 6.1.

For each example, we computed the cdf for 1,000,000 draws via 40,000 Monte-Carlo simulations. A straightforward argument based on the binomial distribution, detailed in Remark 3.4 of Perkins, Tygert, and Ward (2011a), shows that the standard errors of the resulting estimates of the P-values are equal to , ensuring that the standard errors of the plotted abscissae for the green points in Figure 1 are approximately (roughly the size of the radii of the plotted points).

Remark 6.2.

For each example, we plotted the cdf in the limit of a large number of draws via the scheme of Section 5. Figure 1 displays the points for the 10000 values , …, , in the limit that the number of draws is large, where and are defined in Section 5 and computed to at least 6-digit accuracy via the method of Section 4.

Table 1 summarizes computational costs of the procedure described in Section 4. The headings of Table 1 have the following meanings:

  • is the number of bins in the probability distributions and .

  • is the maximum number of quadrature nodes required in any of the 10000 evaluations of plotted in Figure 1 (Section 5 defines ), using adaptive Gaussian quadratures as described in Remark 3.2.

  • is the maximum number of quadrature nodes required in any of the 10000 evaluations of plotted in Figure 1 (Section 5 defines ), using adaptive Gaussian quadratures as described in Remark 3.2.

  • is the time in seconds required to perform the quadratures for both and at a single value of , amortized over the 10000 pairs plotted in Figure 1 (Section 5 defines and ).

6.1 Uniform model

For our first example, we take

(39)

for , …, , and take

(40)

for , …, . The Euclidean distance is equivalent to the canonical statistic for this example, since is a uniform distribution.

6.2 Nonuniform model

For our second example, we take

(41)

for , …, , and take

(42)

for , …, .

6.3 Poisson model

For our third example, we take

(43)

for , …, and take

(44)

for , …. For all numerical computations associated with this example, we can truncate to the first 20 bins, since .

6.4 Poisson model with a different alternative

For our fourth example, we again take

(45)

for , …, but now take

(46)

for , …. For all numerical computations associated with this example, we can truncate to the first 20 bins, since .

Remark 6.3.

The right-hand side of (31) is 8.233 for Subsection 6.1, 2.443 for Subsection 6.2, and 24.05 for Subsection 6.3. As discussed in Remark 3.5, roundoff errors in the numerical evaluation of (16) are therefore guaranteed to be negligible for the standard floating-point arithmetic (the mantissa in the standard, double-precision arithmetic has a dynamic range of about ). The right-hand side of (31) is for Subsection 6.4, so we used (34) rather than (16) for the last example (Remark 3.8 explains why).

We used Fortran 77 and ran all examples on one core of a 2.2 GHz Intel Core 2 Duo microprocessor with 2 MB of L2 cache. Our code is compliant with the IEEE double-precision standard (so that the mantissas of variables have approximately one bit of precision less than 16 digits, yielding a relative precision of about ). We diagonalized the matrix defined in (7) using the Jacobi algorithm (see, for example, Chapter 8 of Golub and Van Loan (1996)), not taking advantage of Remark 2.2; explicitly forming the entries of the matrix defined in (7) can incur a numerical error of at most the machine precision (about ) times , yielding 6-digit accuracy or better for all our examples. A future article will exploit the interlacing properties of eigenvalues, following Gu and Eisenstat (1994), to obtain higher precision. Of course, even 4-digit precision would suffice for most statistical applications; however, modern computers can produce high accuracy very fast, as the examples in this section illustrate.

Figure 1: Cumulative distribution functions of the P-values for draws from the alternative distributions defined in Subsections 6.16.4
example 1 10 230 230 0.006
example 2 100 530 550 0.090
example 3 20 250 330 0.013
example 4 20 350 350 0.010
Table 1: Computational costs

Acknowledgements

We would like to thank Jim Berger, Tony Cai, Jianqing Fan, Andrew Gelman, Peter W. Jones, Ron Peled, Vladimir Rokhlin, and Rachel Ward for many helpful discussions.

References

  • Davies (1980) {barticle}[author] \bauthor\bsnmDavies, \bfnmRobert B.\binitsR. B. (\byear1980). \btitleAlgorithm AS 155: The distribution of a linear combination of random variables. \bjournalJ. Roy. Statist. Soc. Ser. C \bvolume29 \bpages323–333. \endbibitem
  • Duchesne and de Micheaux (2010) {barticle}[author] \bauthor\bsnmDuchesne, \bfnmPierre\binitsP. \AND\bauthor\bsnmde Micheaux, \bfnmPierre Lafaye\binitsP. L. (\byear2010). \btitleComputing the distribution of quadratic forms: Further comparisons between the Liu-Tang-Zhang approximation and exact methods. \bjournalComput. Statist. Data Anal. \bvolume54 \bpages858–862. \endbibitem
  • Golub and Van Loan (1996) {bbook}[author] \bauthor\bsnmGolub, \bfnmGene H.\binitsG. H. \AND\bauthor\bsnmVan Loan, \bfnmCharles F.\binitsC. F. (\byear1996). \btitleMatrix Computations, \bedition3rd ed. \bpublisherJohns Hopkins University Press, \baddressBaltimore, Maryland. \endbibitem
  • Gu and Eisenstat (1994) {barticle}[author] \bauthor\bsnmGu, \bfnmMing\binitsM. \AND\bauthor\bsnmEisenstat, \bfnmStanley C.\binitsS. C. (\byear1994). \btitleA stable and efficient algorithm for the rank-one modification of the symmetric eigenproblem. \bjournalSIAM J. Matrix Anal. Appl. \bvolume15 \bpages1266–1276. \endbibitem
  • Gu and Eisenstat (1995) {barticle}[author] \bauthor\bsnmGu, \bfnmMing\binitsM. \AND\bauthor\bsnmEisenstat, \bfnmStanley C.\binitsS. C. (\byear1995). \btitleA divide-and-conquer algorithm for the symmetric tridiagonal eigenproblem. \bjournalSIAM J. Matrix Anal. Appl. \bvolume16 \bpages172–191. \endbibitem
  • Imhof (1961) {barticle}[author] \bauthor\bsnmImhof, \bfnmJ. P.\binitsJ. P. (\byear1961). \btitleComputing the distribution of quadratic forms in normal variables. \bjournalBiometrika \bvolume48 \bpages419–426. \endbibitem
  • Kendall et al. (2009) {bbook}[author] \bauthor\bsnmKendall, \bfnmMaurice G.\binitsM. G., \bauthor\bsnmStuart, \bfnmAlan\binitsA., \bauthor\bsnmOrd, \bfnmKeith\binitsK. \AND\bauthor\bsnmArnold, \bfnmSteven\binitsS. (\byear2009). \btitleKendall’s Advanced Theory of Statistics \bvolume2A, \bedition6th ed. \bpublisherWiley. \endbibitem
  • Moore and Spruill (1975) {barticle}[author] \bauthor\bsnmMoore, \bfnmDavid S.\binitsD. S. \AND\bauthor\bsnmSpruill, \bfnmM. C.\binitsM. C. (\byear1975). \btitleUnified large-sample theory of general chi-squared statistics for tests of fit. \bjournalAnn. Statist. \bvolume3 \bpages599–616. \endbibitem
  • Perkins, Tygert, and Ward (2011a) {btechreport}[author] \bauthor\bsnmPerkins, \bfnmWilliam\binitsW., \bauthor\bsnmTygert, \bfnmMark\binitsM. \AND\bauthor\bsnmWard, \bfnmRachel\binitsR. (\byear2011a). \btitle and classical exact tests often wildly misreport significance; the remedy lies in computers. \btypeTechnical Report No. \bnumber1108.4126, \binstitutionarXiv. \bnotehttp://cims.nyu.edu/tygert/abbreviated.pdf. \endbibitem
  • Perkins, Tygert, and Ward (2011b) {barticle}[author] \bauthor\bsnmPerkins, \bfnmWilliam\binitsW., \bauthor\bsnmTygert, \bfnmMark\binitsM. \AND\bauthor\bsnmWard, \bfnmRachel\binitsR. (\byear2011b). \btitleComputing the confidence levels for a root-mean-square test of goodness-of-fit. \bjournalAppl. Math. Comput. \bvolume217 \bpages9072–9084. \endbibitem
  • Perkins, Tygert, and Ward (2011c) {btechreport}[author] \bauthor\bsnmPerkins, \bfnmWilliam\binitsW., \bauthor\bsnmTygert, \bfnmMark\binitsM. \AND\bauthor\bsnmWard, \bfnmRachel\binitsR. (\byear2011c). \btitleComputing the confidence levels for a root-mean-square test of goodness-of-fit, II \btypeTechnical Report No. \bnumber1009.2260, \binstitutionarXiv. \endbibitem
  • Press et al. (2007) {bbook}[author] \bauthor\bsnmPress, \bfnmWilliam\binitsW., \bauthor\bsnmTeukolsky, \bfnmSaul\binitsS., \bauthor\bsnmVetterling, \bfnmWilliam\binitsW. \AND\bauthor\bsnmFlannery, \bfnmBrian\binitsB. (\byear2007). \btitleNumerical Recipes, \bedition3rd ed. \bpublisherCambridge University Press, \baddressCambridge, UK. \endbibitem
  • Rao (2002) {bincollection}[author] \bauthor\bsnmRao, \bfnmCalyampudi R.\binitsC. R. (\byear2002). \btitleKarl Pearson chi-square test: The dawn of statistical inference. In \bbooktitleGoodness-of-Fit Tests and Model Validity (\beditor\bfnmC.\binitsC. \bsnmHuber-Carol, \beditor\bfnmN.\binitsN. \bsnmBalakrishnan, \beditor\bfnmM. S.\binitsM. S. \bsnmNikulin \AND\beditor\bfnmM.\binitsM. \bsnmMesbah, eds.) \bpages9–24. \bpublisherBirkhäuser, \baddressBoston. \endbibitem
  • Rice (1980) {barticle}[author] \bauthor\bsnmRice, \bfnmStephen O.\binitsS. O. (\byear1980). \btitleDistribution of quadratic forms in normal random variables — Evaluation by numerical integration. \bjournalSIAM J. Sci. Stat. Comput. \bvolume1 \bpages438–448. \endbibitem
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
305100
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description