BET on Independence

BET on Independence

Kai Zhang
University of North Carolina at Chapel Hill
Abstract

We study the problem of nonparametric dependence detection. Many existing methods suffer severe power loss due to non-uniform consistency, which we illustrate with a paradox. To avoid such power loss, we approach the nonparametric test of independence through the new framework of binary expansion statistics (BEStat) and binary expansion testing (BET), which examine dependence through a novel binary expansion filtration approximation of the copula. Through a Hadamard-Walsh transform, we find that the cross interactions of binary variables in the filtration are complete sufficient statistics for dependence. These interactions are also uncorrelated under the null. By utilizing these interactions, the BET avoids the problem of non-uniform consistency and improves upon a wide class of commonly used methods (a) by achieving the minimax rate in sample size requirement for specified power and (b) by providing clear interpretations of global and local relationships upon rejection of independence. The binary expansion approach also connects the test statistics with the current computing system to facilitate efficient bitwise implementation. We illustrate the BET by a study of the distribution of stars in the night sky and by an exploratory data analysis of the TCGA breast cancer data.

1 Introduction

Independence is one of the most foundational concepts in statistical theory and practice. It is also one of the most common assumptions in statistical literature. Thus verifying independence is at the core of nearly all statistical tests. If we are not able to check this crucial condition, then we are “betting on independence” at the risk of losing control of the validity of our conclusions. In this paper we study the dependence detection problem in a distribution-free setting, in which we do not make any assumption on the joint distribution. In this paper, we focus on the test of independence between two continuous variables though the approach can be easily generalized for more variables. Without loss of generality, we consider i.i.d. observations from the copula whose marginal distributions are uniform over This copula can be obtained by transformations with marginal cumulative distribution functions (CDF) when they are known. In this case, and are independent if and only if their joint distribution is the bivariate uniform distribution over , denoted by . We also study the case when the marginal CDFs are unknown. In this case, we can use the empirical CDFs, and the theory and procedures are shown to be similar.

Tests of independence have been extensively studied in statistics and information theory. One of the most classical parametric methods is based on Pearson correlation, which has an important property that it can be interpreted as a measure of linear relationship. Classical results connecting correlation and independence including Rényi (1959) have led to useful methods such as Breiman and Friedman (1985). Existing nonparametric testing procedures can be roughly categorized into three main classes:

(a) The CDF approach, which compares the joint CDF and the product of marginal CDFs: A pioneer nonparametric test of independence is by Hoeffding (1948) as a variant of the Kolmogorov-Smirnov test. Other important tests in this approach include Romano (1989).

(b) The distance and kernel based approach, which can be regarded as a generalization of the correlation: One important recent development on dependence measures is the distance correlation (Székely et al., 2007, 2009), which possesses the crucial property that a zero distance correlation implies independence. Tests based on sample versions of the distance correlation (Székely and Rizzo, 2013a, b) have since been popular methods. Other important methods include the generalized measures of correlation (GMC) by Zheng et al. (2012) and the Hilbert Schmidt independence criterion (HSIC) by Gretton et al. (2007); Sejdinovic et al. (2013); Pfister et al. (2016) who study dependence through distances between embedding of distributions to reproducing kernel Hilbert spaces (RKHS).

(c) The binning approach, which generalizes the comparison of the joint density and the product of marginal ones: By discretizing and into finite many categories, classical statistical or information theoretical methods such as the tests and Fisher’s exact tests can be applied to study the dependence. Miller and Siegmund (1982) studied the maximal statistic from forming tables through partitions of data. Reshef et al. (2011, 2015a, 2015b) introduced the maximal information coefficient (MIC) by aggregating information from optimal partitions of the scatterplot for different partition sizes. This approach was further studied by the -nearest neighbor mutual information (KNN-MI) approach as described in Kraskov et al. (2004); Kinney and Atwal (2014). Heller et al. (2012, 2016); Heller and Heller (2016) studied optimal permutation tests over partitions to improve the power. Filippi and Holmes (2015) took a Bayesian nonparametric approach to the partitions. Wang et al. (2016) considered a generalized to detect piecewise linear relationships, a compromise between the distance approach and the binning approach that takes advantages of both. A very recent paper on Fisher exact scanning (FES) by Ma and Mao (2017) constructed multi-scale scan statistics that are particularly effective at detecting local dependency through Fisher’s exact tests over rectangle scanning windows.

Most of the above nonparametric procedures enjoy the property of universal consistency against any particular form of dependence. Formally, this universality means that for any specific copula distribution , the test

(1.1)

has asymptotic power of as . However, one important problem in many distribution-free dependence detection methods is the lack of uniformity. To see this, consider the total variation (TV) distance which is defined by where is a -algebra of the sample space. The uniform consistency of nonparametric dependence detection is to be consistent for any joint distribution which is certain distant from independence, i.e,

(1.2)

for some For the testing problem in (1.2), although many tests are universally consistent, we show in Section 2 and Theorem 2.2 the non-existence of a test that is uniformly consistent. The uniformity issue is due to the fact that the space of is large. Said another way: When two variables are not independent, there are so many ways they can be dependent. In practice, having this non-uniform consistency problem means having “blind spots” in dependence detection for a given sample size, especially in nonlinear forms of dependency. Note that nonlinear forms of dependence are ubiquitous in sciences, for example laws in physics defined by differential equations. Therefore, avoiding the power loss due to the non-uniform consistency problem in nonparametric dependence detection is a fundamental problem in statistics and has huge potential impact in many areas of science.

To avoid the power loss due to non-uniform consistency, we propose a novel framework for understanding dependence through a filtration approach: The procedure is constructed by decomposing the joint distribution into many layers such that (a) the -field of the layers form a filtration that accumulatively approximate the true underlying dependence structure, and (b) at every resolution, the problem is identifiable with a well-defined set of model and parameters. Similar filtration ideas are nicely described in Liu and Meng (2014, 2016) in studying the Simpson’s Paradox. The approximation idea is also related to the “probably approximately correct” (PAC) approach in machine learning (Valiant, 1984). In this paper, the layers are constructed through the binary expansion of a uniformly distributed variable into i.i.d. Bernoulli variables (Kac, 1959). Through a truncation of binary expansions of marginal distributions, the resulting distribution at each truncation is equivalent to a contingency table, to which classical categorical data analysis can be applied. Such truncated variables also induce a filtration which provides a universal approximation of the underlying joint distribution. We explain the details in Section 3.1.

We note here that although many other ways of filtration approximations are available, there are a few important advantages of the proposed binary expansion filtration that facilitate studies of dependence.

  1. The -field in the binary expansion is finite since functions of binary variables are at most binary.

  2. For binary variables, uncorrelatedness does imply independence.

We call the statistics that are functions of the Bernoulli variables from marginal binary expansions binary expansion statistics (BEStat), and we call the testing framework on the overall independence by testing the independence at each truncation the binary expansion testing (BET) framework. Although classical tests for contingency tables such as the tests (Lehmann and Romano, 2006) are readily available, they have some drawbacks: (a) the exponentially growing degrees of freedom that would affect the power, and (b) the unclear interpretability of dependence when the independence hypothesis is rejected. To improve on these two issues, we consider reparametrization of the likelihood of the contingency tables through a novel binary interaction design (BID) equation (Theorem 3.4), which connects the study of dependence to the Hadamard-Walsh transform in signal processing. Through this connection, the interactions of binary variables in the filtration are shown to be complete sufficient statistics for dependence. These interactions are also pairwise independent under the null. By utilizing these interactions, we convert the dependence detection problem to a multiple testing problem. Statistically speaking, the benefits of the above approach are summarized below:

  1. The Hadamard-Walsh transform provides new insights for the analysis of any contingency table whose size is a power of 2. The novel parameterization with marginal interaction odds ratios (MIOR) and cross interaction odds ratios (CIOR) separates the marginal information and dependence information. As an analogy, the CIORs are to contingency tables as the correlations are to multivariate normal distributions. See Theorem 3.7 and Theorem 3.8.

  2. The symmetry statistics from the reparametrization are shown to be complete sufficient statistics for dependence. They are identically distributed and are uncorrelated under the null of independence. See Theorem 4.1, Theorem 4.2 and Theorem 4.3.

  3. As a consequence of the above properties, the multiple testing procedure is shown to be minimax in the sample size requirement for reliable power. See Theorem 4.4.

  4. Upon rejection of independence, the largest absolute symmetric statistic and the corresponding cross interaction provide clear interpretation of the dependency.

Although theories for copula and contingency tables are well-developed, we are not aware of similar approach or results in statistical literature.

We also note that the BEStat approach is closely related to computing. In current standard computing systems, each decimal number is coded as a sequence of binary bits, which is exactly the binary expansion of that number. This connection means that one can carry out the BEStat procedures by operating directly on the bits. Since bitwise operations are one of the most efficient operations in current computing systems, we are able to develop computationally efficient implementations of the proposed method. The detailed algorithm is described in a separate paper (Zhao et al., 2017), and it improves the speed of existing methods by orders of magnitude.

This paper is organized in as follows: Section 2 explains the problems of the clustering intuition and non-uniform consistency. Section 3 introduces the concept and basic theory in the framework of BEStat. Section 4 studies the Max BET procedure and its properties. Section 5 connects the BEStat framework to current computing system. Section 6, Section 7 and Section 8 illustrate the procedure with simulated and real data studies. Section 9 concludes the paper with discussions of future work. The proofs can be found in the supplemental file.

2 Motivation: The Problem of Non-Uniform Consistency

To explain the problem of non-uniform consistency, we develop the following example of the bisection expanding cross (BEX). Many existing methods suffer substantial power loss under this example due to this problem, which can be avoided through the binary expansion statistics proposed in Section 3 and Section 4.

We call the following sequence of one-dimensional manifolds in the bisection expanding cross (BEX). These manifolds can be defined through the following implicit function for every positive integer : , where

The BEX structure is illustrated in Figure 1, where the first four levels are plotted. Graphically, this grid can be regarded as a space-filling fractal with the following recursive construction steps: (a) is the usual “cross” in the unit square defined by the implicit function: . (b) with is constructed by expanding the bisector of each of the “arms” of until intersection.

Figure 1: The bisection expanding cross (BEX) at level .

Now we consider the random variables that are uniformly distributed over whose joint distribution is denoted by . The properties of these distributions are summarized in the following proposition.

Proposition 2.1.
  1. and are marginally for any .

  2. for any , i.e., the joint distribution of is degenerate. In particular, for any .

  3. , as , .

Part (b) and part (c) of Proposition 2.1 seem to contradict each other: Part (b) says that the joint distribution of and is far away from independence, thus they are strongly non-independent. Yet, part (c) claims that when is large, and are nearly independent. Indeed, the BEX shows that despite a total variation distance of 1, degenerate distributions can be arbitrarily close to independence. We shall explain this paradox in Section 4.3. This paradox also in turn lead to a challenge: Given a finite sample, can we effectively distinguish any form of dependency from independence?

Unfortunately, for any testing method, the answer is negative. Intuitively speaking, this is because for any given test with a given samples size , one can keep expanding the BEX until it is so close to independence that this test becomes powerless. This example thus illustrates the problem of non-uniform consistency of the test in (1.2): No test can be uniformly consistent against all forms of dependence, not even all levels of the BEX, for which in (1.2).

The power loss due to non-uniform consistency can be severe. For example, simulations show that many CDF based and kernel based tests are powerless in detecting BEX at level even when the sample size is as high as . Note that with such a large sample, the BEX structure can be almost completely plotted and the dependency can be clearly observed by naked eyes. However, many existing tests cannot distinguish it from independence.

We make two remarks about the BEX example before proceeding.

(a) Although the BEX structure is uncommon in statistical literature, it is related to many research problems such as the chessboard detection in computer vision (Forsyth and Ponce, 2002).

(b) The BEX is not the first example that a sequence of degenerate distributions converges to independence. The earliest example we could find is in Kimeldorf and Sampson (1978). There are also other interesting and useful fractal applications in statistics such as Craiu and Meng (2005, 2006). The basis of the BEX example is a classical result in Vitale (1990). We construct the BEX paradox due to its fractal structure which explains the problem of non-uniform consistency.

The following theorem provides a formal statement of the problem of non-uniform consistency:

Theorem 2.2.

Consider the testing problem in (1.2). For any finite number of i.i.d. observations , for any test that has a Lebesgue measurable critical region with and , , there exists a bivariate distribution and .

The message of Theorem 2.2 is that in a distribution-free setting without any assumption on the joint distribution, dependence is not a tractable target. The intractability comes from the fact that without a model of the joint distribution, there is no parameter to characterize and identify the underlying form of dependency. Therefore, there is no target of inference about dependence from a test or any other statistical method. Although one can develop good measures of dependence such as distance correlation, GMC, HSIC and MIC, such measures cannot make the joint distribution identifiable. Therefore, they can never replace the role of parameters in statistical inference about dependence. This fact motivates the following three key elements in the BEStat approach and the BET framework:

  1. Rather than one measure of dependence, we will study dependence through a carefully designed sequence of tests based on a filtration to achieve universality.

  2. For every test statistic in the sequence, there is an explicit well-defined set of parameters as the target for inference to achieve identifiability.

  3. At every step in the sequence, the test is consistent against all alternatives which are -away from independence in total variation distance to achieve uniformity.

3 The Basic Theory of Binary Expansion Statistics

3.1 Approximating Dependence through the Binary Expansion Filtration

The considerations in Section 2 necessitate a multi-scale binning approach to study dependence. For the dependence detection problem, this multi-scale approach means to test some approximate independence rather than the exact hypothesis in (1.2). We study the known marginal CDF case first, for which we develop such a multi-scale framework through the following classical result on the binary expansion of a uniform random variable (Kac, 1959):

Theorem 3.1.

For we have where

We note here that in describing the interactions of binary variables, it would be more convenient to consider i.e., so that the interactions of binary variables can be written as products, as we see later.

The binary expansion in Theorem 3.1 decomposes the information about into information from independent fair coins ’s. ’s can be also regarded as indicator functions of . For example, , , , see Kac (1959). To study the dependence between and , we consider the binary expansion of both and : and where and .

Note that if we truncate the binary expansions of and at some finite depths and respectively, and , then and are two discrete variables that can take and possible values respectively. Moreover, as , and in probability. These convergence results in turn imply that

(3.1)

The above considerations are apparent if one regards the truncations as a filtration generated by and for each . Indeed, the filtration idea is a consequence of George Box’s aphorism “All models are wrong, but some are useful.” At every and , the probability model of is a “wrong” model for the joint distribution , however, the “wrong” model of can be very useful in many ways. In particular, we show below how the three key elements described at the end of Section 2 are achieved from this approach:

(a) Universality: The important message from (3.1) is that one can approximate the joint distribution of and hence the dependence in through that in Although the dependence in the joint distribution of can be arbitrarily complicated, when and are large, we expect a good approximation from discrete variables In terms of testing independence, this means although the joint distribution of can be arbitrarily close to independence, due to the filtration feature of the sequence, one can always detect the dependence when and are large to achieve universality.

(b) Identifiability: As we explained in Section 2, one crucial challenge in distribution-free dependence detection is identifiability. Without models and parameters, dependence is not a tractable target. On the other hand, the joint distribution of can only take a finite possible values. This means we have a partition of the scatterplot of data into a contingency table. With this consideration, the truncation of the binary expansions turns the problem on dependence, which is unidentifiable under the distribution-free setting, into a problem over a contingency table, which is fully identifiable. In terms of testing, when we begin without any assumptions about the joint distribution, there is no explicit way to write out the alternative likelihood under dependence. However, at each depth and , due to the discreteness, the class of alternative distributions is restricted to those over the contingency table, which has an explicit distribution and has cell probabilities as identifiable parameters for inference (Agresti and Kateri, 2011; Fienberg, 2007).

(c) Uniformity: As a consequence of identifiability, we can solve the problem of non-uniform consistency described in Section 2. At any depth and , one can write out the total variation distance between an alternative distribution and the null distribution in terms of the parameters in the contingency table model. We are thus able to show the uniform consistency and optimality of the proposed Max BET procedure in Theorem 4.4 for alternative distributions whose total variance distances from the independence null is at least , for any .

The above considerations motivate us to propose the binary expansion statistics in studying the dependence between and in a distribution-free setting. Formally, we define binary expansion statistics as follows:

Definition 3.2.

We call statistics as functions of finitely many Bernoulli variables from marginal binary expansions as in Theorem 3.1 the binary expansion statistics (BEStat).

Similarly, for the problem of detecting dependence from independence in a distribution-free setting, we define the binary expansion testing framework as follows.

Definition 3.3.

We call the testing framework based on the binary expansion statistics of variables up to depth and the binary expansion testing (BET) at depth and .

In the context of testing independence in bivariate distributions, the BET at depth and is to test the independence up to depth and : Denote the joint distribution of by and the bivariate uniform distribution over by . Consider

(3.2)

for some

Not rejecting the null hypothesis in the BET at depth thus indicates that there is no strong evidence against the null hypothesis of independence between and up to depths and in the binary expansions. Note that this interpretation is weaker than claiming independence between and : The dependence can appear at some larger . However, as described earlier in Section 2, claiming complete independence with a finite and without any restriction on the alternative is impossible. On the other hand, this weaker hypothesis helps us to avoid the uniform consistency problem in the dependence detection under the distribution-free settings and provides valid statements. To see the gains from these weaker hypotheses, one can compare the our uniform consistency result in Theorem 4.4 with part (b) of Proposition 2.1 that for any , the has a TV distance of 1 from the independence null. See Section 4.2 for more details.

We remark here that the filtration in approximating dependence is not unique. For example, one can consider the filtration corresponding to the Fourier basis rather than the binary expansion. However, the -field in the binary expansion filtration has a few important advantages to facilitate studies of dependence.

(a) Finiteness of -fields: For each depth and , the number of generating variables in the -field is which is finite. This is because functions of binary variables are at most binary. If we consider some other filtration (for example the Fourier basis) for the approximation of dependence, then the -field might not be of finitely many variables and can be much more complicated.

(b) Uncorrelatedness implying independence: Although uncorrelatedness usually does not imply independence, it is well known that it does for two binary variables. This property can greatly simplify studies of dependence in filtration. Again, if we consider some other filtration (for example the Fourier basis) for the approximation of dependence, then quantifying the dependence between variables in the -field can be much more complicated.

The above considerations also work similarly for the case when the marginal distributions are unknown. To study the binary expansion in this case, suppose the sample size is for some for easy explanation. With the marginal empirical CDF transformations, the -th observation in the empirical copula are and whose marginal distribution is Now let It is easy to see that for each fixed , ’s are independent, and Therefore, the binary expansion filtration can be similarly defined, and the BET at depth and is to test the independence of and :

(3.3)

The interpretation of this null hypothesis is that for each observation, the row assignment and column assignment to the contingency table are independent, as in classical categorical data analysis (Agresti and Kateri, 2011; Fienberg, 2007).

We explain the details of these tests in Section 3.2 and Section 4. We remark here that although copula theory is well developed (Nelsen, 2007), we are not aware of any filtration approach in the copula literature. We also remark here that tests of approximate independence are also considered in a very recent paper (Ma and Mao, 2017) for scanning purposes, in which the binary expansion filtration is implicitly described. In this paper, our goal is to formally develop the framework of binary expansion statistics. We shall compare the theory and methods in both papers in Section 4.4.

3.2 Revisiting the Classical Theory for Contingency Tables

We start our analysis by first revisiting the model and theory of a general contingency table with rows and columns of i.i.d. samples. The parameters of interest are , and the cell counts are . The only constraint is on the totals and Two most important models for the likelihood are as follows (Agresti and Kateri, 2011; Fienberg, 2007):

(a) When there is no restriction on marginal totals, the joint distribution of the random vector of cell counts is multinomial:

(3.4)

where

(b) When the row totals and column totals are known, we denote the row total probabilities by ’s and column total probabilities by ’s. Consider the reparametrization It is easy to see that the joint distribution of given marginal totals and is the (Fisher’s) noncentral multivariate hypergeometric distribution (Freeman and Halton, 1951):

(3.5)

where Note that if the marginal variables are independent, then , and the joint distribution reduces to

(3.6)

which is the (central) multivariate hypergeometric distribution.

With the above distributions for a contingency table, test of independence can be done through classical methods such as tests, Fisher’s exact tests, and likelihood ratio tests (LRT). For the nonparametric dependence detection problem, the BET with these tests are uniformly consistent for any depth and . However, these classical methods have two important limitations on power and interpretability:

(a) The minimal sample size for classical tests to have reliable power is known (Agresti and Kateri, 2011; Fienberg, 2007) to be similar to the size of the contingency table However, recent developments (Acharya et al., 2015) show that the optimal lower bound of sample size requirement for reliable power of this test is . This result indicates that classical tests may suffer huge power loss in dependence detection, especially when and are large. For a well-known example, when the contingency table contain many empty cells, LRT and tests will fail to work.

(b) The rejections from classical tests are not very interpretable. Even if we can claim significant dependence with a classical test, the test does not provide information about how the variables are dependent.

One intuition of the above limitations in classical tests is that each cell in a contingency table is considered in an isolated manner, thus the information between cells is somehow lost. To improve classical tests, we consider grouping the cells together to improve the power and interpretability. Such grouping process is efficiently achieved with the interactions of binary variables in the binary expansion filtration, as we describe in Section 3.3.

3.3 Binary Interaction Design: Reparametrization of the Contingency Table Likelihood

We now turn to the case when the contingency table is generated by the binary expansion up to depth and as described in Section 3.1, so that the table has rows and columns (assuming on the horizontal axis and on the vertical axis). However, we emphasize here that in this subsection, we do not restrict the total probability of each row and column being the same (which happens when ’s and ’s are ) to provide a general theory for contingency tables.

To combine the cell information, we consider the -field generated from the binary expansion filtration. We explain in the known marginal distribution case first since it is similar for the unknown marginal distribution case. With independent Bernoulli variables and another independent Bernoulli variables (again in this subsection we do not assume them to be symmetric), consider two general discrete variables defined by and The -field here is and is generated by Bernoulli variables resulting from interactions between ’s and ’s. We shall use the equivalent binary variables and since the interaction between them can be conveniently written as products. For example, the event is equivalent to the event .

Note that each of these binary variables leads to a partition of the unit square and two groups of cells according to whether the binary variable is positive. Moreover, for each binary variable in the -field, the number of cells in the regions where it takes value (and ) is exactly This fact can be explained by the BID equation (Theorem 3.4) below, and it facilitates the definition of interaction odds ratio (IOR) as in Definition 3.6 as well as the reparametrization with IOR. To illustrate these two features with an example, when and , all events in are about the signs of the binary variables, namely , , , ,,, and . The region of each of these variables being positive corresponds to 4 cells out of the total of 8. See Figure 2.

Note also that the binary variables in the -field can be categorized into two classes: The binary variables of the form or will be referred to as marginal interactions since they only involve the marginal distributions. On the other hand, the binary variables of the form with will be referred to as cross interactions since they contain information about both and .

In explanation of the theory, we use the following binary integer indexing for related distributional quantities: Denote the Bernoulli random vectors in the binary expansion by and , and denote vectors of length and with entries ’s and ’s by and The probability of each of the cells can then be written as with being the concatenation of and Now consider the integer determined by Let be the -dimensional vector of probabilities whose -th entry is

We also denote the expected value of the binary variables in with binary integer index as follows. For , we denote it by where is a -dimensional binary vector with 1’s at and are 0’s otherwise, and is a -dimensional binary vector with 1’s at and are 0’s otherwise. Note here that We also write the interaction as a product of binary variables as With defined in the previous paragraph, let be the -dimensional vector of expected values whose -th entry is

The above notation also applies to observed quantities: With the total observations, the cell counts are denoted by The collection of all ’s is denoted by and is indexed as in . We also denote the sum of observed binary interaction variables by with The collection of all ’s is denoted by and is indexed as in . We shall refer as the symmetry statistic for as they can be regarded as the differences between the numbers of points in positive and negative regions. Thus, is a statistic about symmetry. See Figure 2.

With the above notation, we establish the equation connecting the contingency table distribution and the interactions of binary variables in the -field. The equation is established through being the Sylvester’s construction of Hadamard matrix (Sylvester, 1867). We shall refer this equation as the binary interaction design (BID) equation (name coined in Zhao et al. (2017)).

Theorem 3.4.
  1. Distribution version of the BID equation:

    (3.7)
  2. Sample version of the BID equation:

    (3.8)

The Hadamard matrix is referred to as Walsh matrix in literature of signal processing, where a linear transformation with as in (3.7) and (3.8) is referred to as the Hadamard-Walsh transform (Lynn, 1973; Golubov et al., 2012; Harmuth, 2013). The earliest referral to the Hadamard matrix we found in statistical literature is Pearl (1971). The Hadamard matrix is also closely related to the orthogonal full factorial design (Box et al., 2005; Cox and Reid, 2000). In the context of dependence detection, this transform maps the cell domain (in or ) to the interaction domain (in or ). Thus, the information in individual cells can be pooled together to provide information about global dependency. Although theory and methods for contingency tables are well-developed, we are not aware of similar approach in related literature.

Figure 2: The binary interaction design (BID) at depth and with observations. The number of observations in each cell is presented in the top left plot. There are 7 non-trivial binary variables in the -field, whose positive regions are in white and whose negative regions are in blue. Symmetry statistics are calculated for these 4 marginal interactions and 3 cross interactions. For example, .

It can be helpful to understand the notation and the equation through the case when and . Note that in this case, (3.7) and (3.8) become (3.9) and (3.10) respectively, as illustrated in Figure 2:

(3.9)

and

(3.10)

To see the importance of the BID equation and the symmetry statistic , we introduce some more notation here. We label the first to -th row (and column) of with binary integer indices from to . Denote to be the binary conjugate, or logical negation of , i.e., With the above notation, we summarize some useful properties of the Hadamard matrix in the following proposition (Golubov et al., 2012).

Proposition 3.5.
  1. is symmetric. The entry in at the -th row and -th column is

  2. has orthogonal columns:

  3. Hadamard matrices can be defined recursively:

Part (b) of Proposition 3.5 implies that , i.e., where is the -th column of . With the above notation and transformation of variables, and by part (a) of Proposition 3.5, the distribution in the contingency table (3.4) can be written as

(3.11)

We give the following formal definition of the important quantity:

Definition 3.6.

We call the interaction odds ratio (IOR) with respect to interaction Denote the vector of ’s by in the same way as in .

There are three case for the IOR :
(a) When and Note that the term not involve and is constant.
(b) When but (or when but ), then is a marginal interaction odds ratio (MIOR) quantifying the balance in the marginal interaction variable (or ). For example, when and which describes the homogeneity in the distribution of Note also that there are MIORs at depth and . Denote the collections of MIORs by and
(c) When and , then is a cross interaction odds ratio (CIOR) quantifying the balance in the cross interaction variable For example, when and which describes the homogeneity in the distribution of Note also that there are CIORs at depth and Denote the collections of CIORs by

An important observation is that with the IOR, (3.11) becomes

(3.12)

where and There