Upper bounds on the sizes of variable strength covering arrays using the Lovász local lemma
Covering arrays are generalizations of orthogonal arrays that have been widely studied and are used in software testing. The probabilistic method has been employed to derive upper bounds on the sizes of minimum covering arrays and give asymptotic upper bounds that are logarithmic on the number of columns of the array. This corresponds to test suites with a desired level of coverage of the parameter space where we guarantee the number of test cases is logarithmic on the number of parameters of the system. In this paper, we study variable strength covering arrays, a generalization of covering arrays that uses a hypergraph to specify the sets of columns where coverage is required; (standard) covering arrays is the special case where coverage is required for all sets of columns of a fixed size , its strength. We use the probabilistic method to obtain upper bounds on the number of rows of a variable strength covering array, given in terms of parameters of the hypergraph. We then compare this upper bound with another one given by a density-based greedy algorithm on different types of hypergraph such as -designs, cyclic consecutive hypergraphs, planar triangulation hypergraphs, and a more specific hypergraph given by a clique of higher strength on top of a “base strength”. The conclusions are dependent on the class of hypergraph, and we discuss specific characteristics of the hypergraphs which are more amenable to using different versions of the Lovász local lemma.
Keywords: covering arrays, variable strength, Lovász local lemma, greedy algorithm, derandomization.
Covering arrays are well studied combinatorial designs [colbourn04, crchandbook] in part because of their utility in software and network testing [cohen96, dalal98, kuhn04]. For more information about covering arrays, including combinatorial constructions and overview on algorithms and applications, see the survey by Colbourn [colbourn04]. In this paper, we focus on a recent covering array generalization called variable strength covering array (VCA). We begin defining VCAs and indicate how covering arrays are a special case. For more information on VCAs, see [raaphorstphd, vca].
Let be a hypergraph and let . A variable-strength covering array, denoted , is an array filled from such that for each , the subarray of columns indexed by the elements of is covered, that is, it has every possible -tuple in as a row at least once. The variable-strength covering array number, written , is the smallest such that a exists.
Consider the complete -uniform hypergraph on vertices, denoted , that is, the hypergraph where and is the set of all -subsets of . A covering array of strength , denoted , is precisely a ; the covering array number is denoted . In this article, we use the Lovász local lemma to give an upper bound on .
Theorem 1.2 (Lovász local lemma - symmetric case [lovasz75, spencer77]).
Let be a finite set of events in a probability space such that , and each event is independent of all but at most of the other events. If , where is the base of the natural logarithm, then the probability that none of the events occur is nonzero.
For standard covering arrays, the local lemma has been used [godbole96] to prove
More recently, this local lemma technique and the deterministic analogue, often called entropy compression [moser2010], has improved the coefficient of the leading term in this bound for all when and for when , by using columns that have balanced numbers of symbols [MR3328867, francetic2016]. Sarkar and Colbourn match and extend these improvements by exploiting permutation groups with the use of the local lemma [sarkar_upper_2017]. They also combine the local lemma, permutation groups, graph colouring techniques and a density-based greedy approach into two-stage methods which further improve covering array upper bounds [colbourn_two-stage_nodate_alt]. In [colbourn_asymptotic_nodate] together with Lanus they use the local lemma to construct covering perfect hash families which are then used to construct covering arrays; covering perfect hash families are smaller than covering arrays and thus can be more efficiently generated. These constructions are related to covering arrays constructed from linear feedback shift register sequences [raaphorst2012, MR3433921].
Probabilistic methods have been previously used in the context of variable strength covering arrays. In [godbole2011, godbole2010cca] probabilistic methods, but not explicitly the local lemma, are used to derive bounds on variable strength covering arrays for consecutive hypergraphs which are similar to the cyclic consecutive hypergraphs we discuss in Section 3 but without the edges wrapping from the end of the vertex set to the beginning. Sarkar et al. [MR3808633] use the local lemma to construct partial covering arrays which cover at least of the possible -sets of columns. The main difference between these and variable strength covering arrays is that variable strength covering arrays specify exactly which -sets of columns will be covered and partial covering arrays only specify that a certain proportion of -sets must be covered, without constraining which ones they are.
1.1 Main Result
For a hypergraph , let the rank of , denoted , be the largest cardinality of an edge in .
Let be a hypergraph with , and let be an integer such that no edge of intersects more than other edges of . Then, for any , we have:
Let . Consider a randomly generated array with entries chosen from with uniform probability. For each edge , write , and associate an event that the subarray of formed by the columns corresponding to is missing one or more of the -tuples of as a row. Define:
Since , we have that
We apply the symmetric Local Lemma as in Theorem 1.2, which states that if , the probability that none of the events occur is positive, and hence there is some array that is a . This happens when:
Thus, we have:
If we use the approximation
for from the Taylor series expansion, we can rewrite the equation as follows:
Theorem 1.3 can be applied to any hypergraph and thus gives a very general existence result for VCAs with arbitrary parameters. The rest of the paper focuses on comparing the upper bound given by Theorem 1.3 with a constructive upper bound for VCAs obtained by a density-based greedy algorithm called VarDens introduced in [raaphorstphd, vcadbga] and given next. This method is a generalization of the density method of Bryce and Colbourn [bryce09] for dealing with variable strength.
[raaphorstphd, vcadbga] Let be a hypergraph such that and . Algorithm VarDens returns a where satisfies
For the remainder of the paper we compare the bounds given by the probabilistic method and by VarDens algorithm. Let be the upper bound given by the probabilistic method (the right hand side of equation (1.1) in Theorem 1.3), and let be the upper bound given by VarDens algorithm (the right hand side of equation (1.3) in Theorem 1.4). That is, we denote
We note that if absolutely nothing is known about the hypergraph except the number of edges and the rank , then we can substitute , into obtaining an upper bound quite close to , specially as we fix and and let grow.
In the rest of the article, we show that when we know better estimates on , outperforms . In Section 2, we look at hypergraphs that are combinatorial designs. In Section 3, we study two families of hypergraphs (cyclic consecutive hypergraphs and planar triangulations) where we know the exact to assess how well each of these methods perform. In Section 4, we consider other versions of the Lovász local lemma, namely the asymmetric and the general cases, and exemplify challenges and benefits to using them. An extended abstract containing the main results and discussions in Sections 1-3 appeared in [cai-ea].
2 VCAs over designs
Combinatorial designs can be used to obtain hypergraphs that have a great deal of regularity, which can be exploited to determine the number of dependent events.
An - design is a collection of -subsets (called blocks) of a -set with the property that any -subset of points from appear in exactly blocks of .
For a fixed , and , these designs are known to exist for all sufficiently large that satisfy the necessary conditions [1401.3665].
We begin by counting the number of blocks in a - design that intersect a fixed block.
Let be an - design. Then for any :
When and the design has no repeated blocks,
Furthermore, for any , odd, we can truncate the summation after terms to derive an upper bound on the summation.
Let be the point set of the design. To count the number of blocks that intersect in at least one point we use the inclusion-exclusion principle. Let be the number of blocks that contain a set . The inclusion-exclusion principle states that the number of blocks that intersect in at least one point is precisely
where the 1 is subtracted to not count itself. Whenever the outer summation is truncated ending with a positive term an upper bound is achieved.
For , can be computed from the parameters of the design. Each of the -sets , occurs in exactly -sets each of which appears times amongst the blocks of . Each block of which contains , contains exactly -sets containing , and hence blocks of . For , cannot be derived from just the parameters of the design. If there are repeated blocks then itself may be more than one. Thus when , or repeated blocks are permitted, the value of will depend on the particular design and not just on the parameters. So we calculate an upper bound on , by truncating the inclusion-exclusion after a positive term. We stop at the largest odd integer no larger than , that is .
When and there are no repeated blocks, every term in the summation can be computed from the parameters of the design and the computation is exact. ∎
Theorem 2.3 (Bound Comparison for over - designs).
Let be an - design. Then,
and Furthermore, for , , , and fixed, as , we get
We conclude that for designs, is asymptotically better than as it reduces the coefficient for the leading term by .
When and the design has no repeated blocks we can take advantage of the equality in Lemma 2.2. Table 1 gives the bounds from Theorem 2.3 for - designs for . When , explicit designs are known [crchandbook]. Although for small , may outperform , as grows even modestly, becomes better.
3 Ability of bounds to capture global properties of
In this section, we compare the two bounds for two families of hypergraphs for which VCANs are almost completely known and do not grow with [vca]. The cyclic consecutive hypergraph is with and ; for many cyclic consecutive hypergraphs equals , and in all cases we know that for some that does not grow with [vca, Theorem 3.8].
In , each edge intersects exactly other edges. Theorem 1.3 gives
In agreement with , is independent of and thus for fixed and , it is . On the other hand, the bound from algorithm VarDens is
which does grow with . Thus we have an example of a family of hypergraphs where is substantially better than . The homomorphism construction (see [raaphorstphd, Chapter 3]) gives , where and the term in Equation 3.1 in place of the usual term suggests that “recognizes” this homomorphism while does not. However, experiments show that running VarDens algorithm for does seem to generate arrays where the array size is independent of [raaphorstphd].
A triangulation hypergraph of the sphere, is a rank-3 hypergraph which corresponds to a planar graph all of whose faces are triangles; the rank-3 hyperedges are precisely the faces of the planar embedding. From colourings and homomorphisms we know that [vca]. If has vertices then it has edges (triangles) and a triangle may intersect up to other triangles where is the maximum degree of the planar graph. Thus, and
while VarDens gives
Thus, only performs better than when the maximum degree of the triangulation grows more slowly than the number of vertices. In addition, unless is bounded as grows, grows with , even though . Hence, triangulation hypergraphs represent a case where, unlike cyclic consecutive hypergraphs, our application of the local lemma is unable to capture the behaviour of the homomorphism . The bound also grows with and, in contrast to the previous class of hypergraphs, experiments running VarDens on random triangulation hypergraphs suggest that the size of the arrays produced indeed increases as some function of , see Table 2. For more details about generating the random triangulation hypergraphs see [raaphorstphd].
4 Using the general and asymmetric local lemma
In Theorem 1.3, we use the symmetric version of the local lemma to establish the existence of covering arrays. In Section 2, we apply this theorem to hypergraphs that are highly symmetric: the probability that any set of columns represented by a hyperedge be covered is precisely the same as for any other hyperedge, because all the hyperedges have the same size. Additionally, for - designs the size of the sets of dependent events also does not vary; even for - designs the sizes of sets of dependent events can only vary in a fixed range. In Section 3, the hyperedges were again of a fixed size. If the hyperedges themselves vary in size or the size of sets of dependent events significantly vary, using the symmetric version of the local lemma requires taking the worst probability of a set of columns being uncovered and requires taking the largest set of dependent events. In this section, we explore the benefit of using versions of the local lemma that can adapt to varying sizes of hyperedges and sets of dependent events.
The most general statement of the local lemma was given by Lovász in 1975.
Theorem 4.1 (Lovász local lemma - general case [lovasz75]).
Let be a finite set of events in a probability space . Define a function such that for , is independent from all events in . If there is a map such that for all :
then the probability that none of the events occur is nonzero, and is:
The Asymmetric local lemma was given by Habib in 1998.
Theorem 4.2 (Lovász local lemma - asymmetric case [habib1998]).
Let be a finite set of events in a probability space . Define a function such that for , is independent from all events in . If, for each , we have that both:
then the probability that none of the events occur is positive.
For any particular hypergraph, using these more general versions of the local lemma can require establishing complicated bounds on the probabilities and size of sets of dependent events which are different when the hypergraph is changed. We choose to focus on one specific hypergraph to highlight the challenges and benefits of this approach. Let be the hypergraph with vertex set and edge set containing the four hyperedges of rank 3 from and all possible edges of rank 2 that are not contained within any hyperedge of rank 3. Cohen et al. [cohen2003vsi] previously studied this hypergraph and other similar ones and used simulated annealing to construct variable strength covering arrays over them.
With rows, the probability of missing coverage on any pair of columns is
The probability of missing coverage on one of the four sets of three columns is
To determine the sets of dependent events we classify the hyperedges into three kinds. is the set of pairs from , is the set of pairs from and is the set of triples from . The number of events of each type that are dependent on other types is summarized in Table 3. The entry in row and column is the number of events that are dependent on an event.
We start by applying the symmetric local lemma for this hypergraph. For , ; for , ; and for , . Thus all bad events are avoided when:
These values of are given in Table 4 in column for .
To use the general form of the local lemma, as in Theorem 4.1, we need to find a function such that, for each :
If such a function exists, then the local lemma guarantees that the probability that all bad events can be avoided is nonzero. We look for a function that is fixed on each . Equation (4.1) gives the following system of inequalities:
No closed form is apparent so we solved this system numerically using OpenOpt [openopt]. We provide the results in column of Table 4. For details of the solution technique see [raaphorstphd].
Since the solutions for the system of equations from the general local lemma can be quite difficult to produce [habib1998], we also consider the asymmetric local lemma, as in Theorem 4.2. This gives the following system of inequalities:
which can be solved more directly. Details can be found in [raaphorstphd]. The results are given in column of Table 4.
Table 4 gives the values of obtained for using each of the three versions of the local lemma. Column gives the percentage improvement of the general local lemma with respect to the symmetric local lemma. Column gives the percentage improvement of the asymmetric local lemma with respect to the symmetric local lemma. The results obtained from the general local lemma showed significant improvement over those from the symmetric local lemma, with an average improvement of 24.71% and a median improvement of 23.65%. We note, however, that considerable work went into finding valid functions satisfying the conditions in Theorem 4.1. The process we used would be hard to automate and was highly intensive, requiring significant manual experimentation and interaction. The asymmetric local lemma gives results that are slightly worse than those given by the general local lemma. On average, they are within 2.76% (median: 2.66%) of those given by the general local lemma. While this difference is small, the improvement that the use of the asymmetric local lemma gives over the use of the symmetric local lemma is considerably more significant: the asymmetric local lemma is, on average, an improvement of 22.63% (median: 21.29%) over the symmetric local lemma. We believe that when the hypergraph for a variable strength covering array lacks uniformity with respect to the size of edges or the sizes of their neighbourhoods, there is benefit using the general and asymmetric forms of the local lemma.
As a final comparison, we compare the size of covering arrays given by the local lemma to the guaranteed bound of VarDens and also to results of running VarDens. The results are shown in Figure 1. The best local lemma bounds (from the general local lemma) are better than the theoretical guarantee of VarDens () but the actual size of arrays given by running VarDens is even better.
We used variants of the Lovász local lemma to find upper bounds on the sizes of variable strength covering arrays and compared to the ones constructed using VarDens, a derandomized greedy construction, and its guaranteed upper bound. Our main result, Theorem 1.3, is a general upper bound on the size of a variable strength covering array in term of the parameters of the associated hypergraph obtained via the symmetric local lemma. When nothing is known about the hypergraph these two bounds are very similar.
The bounds obtained from the symmetric local lemma work best when the hypergraph’s edges are of a fixed size and the number of edges intersecting an edge is invariant. For example, for -designs the local lemma bound is better than the VarDens bound by a constant that depends on . A more extreme example are the cyclic consecutive hypergraphs for which the local lemma bound, unlike VarDens bound, remains constant when we let grow. We suggest that when edges are of varying sizes, the general and asymmetric versions of the local lemma may work best, and demonstrate this fact with an example. We note that in some instances the VarDens bound is much worse than actual runs of VarDens on given inputs, and in several cases these runs outperform the local lemma bound.
One direction for future research is examining how much of the recent improvements in the application of both the local lemma and density based greedy algorithm for standard covering arrays [MR3328867, francetic2016, sarkar_upper_2017, colbourn_two-stage_nodate_alt, colbourn_asymptotic_nodate] can be extended to variable strength covering arrays. Another direction is to continue the exploration of the general and asymmetric local lemmas and their utility for variable strength covering arrays. Perhaps some families of hypergraphs are more amenable to the general case and closed form solutions for the required function exist. The weighted local lemma is also deserving of attention [habib1998].
Lucia Moura and Brett Stevens were supported by NSERC Discovery grants. The authors would like to thank an anonymous reviewer for various suggestions that improved the presentation of this paper.