Learning-Graph-Based Quantum Algorithm for -distinctness
We present a quantum algorithm solving the -distinctness problem in queries with a bounded error. This improves the previous -query algorithm by Ambainis. The construction uses a modified learning graph approach. Compared to the recent paper by Belovs and Lee , the algorithm doesn’t require any prior information on the input, and the complexity analysis is much simpler.
Additionally, we introduce an algorithm for the graph collision problem where is the independence number of the graph.
The element distinctness problem consists of computing function that evaluates to 1 iff there is a pair of equal elements in the input, i.e., iff . (Here we use notation .) The quantum query complexity of the element distinctness problem is well understood. It is known to be , with the algorithm given by Ambainis , and the lower bound shown by Aaronson and Shi  and Kutin  for the case of large alphabet size , and by Ambainis  in the general case.
Ambainis’ algorithm for the element distinctness problem was the first application of the quantum random walk framework to a “natural” problem (i.e., one seemingly having little relation to random walks), and it had significantly changed the way quantum algorithms have been developed since then. The core of the algorithm is quantum walk on the Johnson graph. This primitive has been reused in many other algorithms: triangle detection in a graph given by its adjacency matrix , matrix product verification , restricted range associativity , and others. Given that the behavior of quantum walk is well-understood for arbitrary graphs , it is even surprising that the applications have been mostly limited to the Johnson graph.
The -distinctness problem is a direct generalization of the element distinctness problem. Given the same input, the function evaluates to 1 iff there is a set of input elements that are all equal, i.e., a set of indices with and for all .
The situation with the quantum query complexity of the -distinctness problem is not so clear. (In this paper we assume , and consider the complexity of -distinctness as .) As element distinctness reduces to -distinctness by repeating each element times, the lower bound of carries over to the -distinctness problem (this argument is attributed to Aaronson in Ref. ). This simple lower bound is the best known so far.
In the same paper  with the element distinctness algorithm, Ambainis applied quantum walk on the Johnson graph in order to solve the -distinctness problem. This resulted in a quantum algorithm with query complexity . This was the best known algorithm for this problem prior to this paper.
The aforementioned algorithms work by searching for a small subset of input variables such that the value of the function is completely determined by the values within the subset. For instance, the values of two input variables are sufficient to claim the value of the element distinctness function is 1, provided their values are equal. This is formalized by the notion of certificate complexity as follows.
An assignment for a function with is a function with . The size of is . An input satisfies assignment if for all . An assignment is called a -certificate for , with , if for any satisfying . The certificate complexity of on is defined as the minimal size of a certificate for that satisfies. The -certificate complexity is defined as . Thus, for instance, 1-certificate complexity of element distinctness is 2, and 1-certificate complexity of triangle detection is 3.
Soon after the Ambainis’ paper, it was realized  that the algorithm developed for -distinctness can be used to evaluate, in the same number of queries, any function with 1-certificate complexity equal to . Now we know that for some functions this algorithm is tight, due to the lower bound for the -sum problem . The goal of the -sum problem is to detect, given elements of an Abelian group as input, whether there are of them that sum up to a prescribed element of the group. The -sum problem is noticeable in the sense that, given any -tuple of input elements, one has absolutely no information on whether they form a part of an (inclusion-wise minimal) 1-certificate, or not.
The aforementioned applications of the quantum walk on the Johnson graph (triangle finding, etc.) went beyond upper bound by utilizing additional relations between the input variables: the adjacency relation of the edges for the triangle problem, row-column relations for the matrix products, and so on. For instance, two edges in a graph can’t be a part of a 1-certificate for the triangle problem, if they are not adjacent.
The -distinctness problem is different in the sense that it doesn’t possess any structure of the variables. But it does possess a relation between the values of the variables: two elements can’t be a part of a 1-certificate if their values are different. However, it seems that quantum walk on the Johnson graph fails to utilize this structure efficiently.
In this paper, we use the learning graph approach to construct a quantum algorithm that solves the -distinctness problem in queries. Note that . Thus, our algorithm solves -distinctness, for arbitrary , in asymptotically less queries than the best previously known algorithm solves 3-distinctness.
The learning graph is a novel way of construction quantum query algorithms. Somehow, it may be thought as a way of designing a more flexible quantum walk than just on the Johnson graph. And compared to the quantum walk design paradigms from Ref. , it is easier to deal with. In particular, it doesn’t require any spectral analysis of the underlying graph.
Up to date, the applications of learning graphs are as follows. Belovs  introduced the framework and used it to improve the query complexity of triangle detection. Zhu  and Lee, Magniez and Santha  extended this algorithm to the containment of arbitrary subgraphs. Belovs and Lee  developed an algorithm for the -distinctness problem that beats the -query algorithm given some prior information about the input. Belovs and Reichardt  use a construction resembling learning graph to obtain an optimal algorithm for finding paths and claws of arbitrary length in the input graph. Also, they deal with time-efficient implementation of learning graphs.
The paper is organized as follows. In Section 2 we define the (dual of the) adversary bound. It is the main technical tool underlying our algorithm. Also, we describe learning graphs and the previous algorithm for the -distinctness problem. In Section 3, we describe the intuition behind our algorithm, and describe the changes we have made to the model of the learning graph. In Section 4 we give an algorithm for the graph collision problem as a preparation for the -distinctness algorithm that we describe in Sections Section 5 and Section 6. Strictly speaking, Sections from Section 2.2 to Section 4 are not necessary for understanding the -distinctness algorithm: the proof in Sections Section 5 and Section 6 rely on Theorem ? only. However, these sections are necessary for understanding the intuition behind the algorithm.
In this paper, we are mainly concerned with query complexity of quantum algorithms, i.e., we measure the complexity by the number of queries to the input the algorithm makes in the worst case. For the definition of query complexity and its basic properties, a good reference is .
In Section 2.1 we describe a tight characterization of the query complexity by a relatively simple semi-definite program (SDP): the adversary bound, Eq. ( ?). This is the main technical tool underlying our algorithm.
Although Eq. ( ?) is an SDP, and thus can be solved in polynomial time in the size of the program, the latter is exponential in the number of variables, and becomes very hard to solve exactly as its size grows. The learning graph  is a tool for designing feasible solutions to Eq. ( ?), whose complexity is easier to analyze. We define it in Sections Section 2.2 and Section 2.3. In the first one, we describe the model following Ref. . In the second one, we describe a common way of constructing learning graphs for specific problems, and give an example of a learning graph for the -distinctness problem corresponding to the Ambainis’ algorithm.
2.1Dual adversary bound
The adversary bound, originally introduced by Ambainis , is one of the most important lower bound techniques for quantum query complexity. A strengthening of the adversary bound, known as the general adversary bound , has recently been shown to characterize quantum query complexity, up to constant factors .
The (general) adversary bound is a semi-definite program, and admits two equivalent formulations: the primal, used to prove lower bounds; and the dual, used in algorithm construction. We use the latter.
The general adversary bound characterizes quantum query complexity. Let denote the query complexity of the best quantum algorithm evaluating with a bounded error.
2.2Learning graphs: Model-driven description
In this section we briefly introduce the simplest model of learning graph following Ref. .
Note that it is allowed to have several (or none) vertices labeled by the same subset . If there is unique vertex of labeled by , we usually use to denote it. Otherwise, we denote the vertex by where is some additional parameter used to distinguish vertices labeled by the same subset .
A learning graph can be thought of as a way of modeling the development of one’s knowledge about the input during a query algorithm. Initially, nothing is known, and this is represented by the root labeled by . At a vertex labeled by , the values of the variables in have been learned. Following an arc connecting vertices labeled by to can be interpreted as querying the value of variable . We say the arc loads element . When talking about a vertex labeled by , we call the set of loaded elements.
The graph itself has a very loose connection to the function being calculated. The following notion is the essence of the construction.
We always assume a learning graph is equipped with a function and a flow that satisfy the constraints of Definition ?. Define the negative complexity of and the positive complexity for input as
respectively, where is the set of arcs of . The positive complexity and the (total) complexity of are defined as
We reduce to Theorem ?. For each arc from to , we define a block-diagonal matrix , where the sum is over all assignments on . Each is defined as where, for each :
Finally, we define in ( ?) as where the sum is over all arcs loading .
Condition ( ?) is trivial, and the expression for the objective value ( ?) is straightforward to check. The feasibility ( ?) is as follows. Fix any and . By construction, , if where is the origin of ; otherwise, it is zero. Thus, only arcs from to , such that and , contribute to the sum in ( ?). These arcs define a cut between the source and all the sinks of the flow , hence, the total value of the flow on these arcs is 1, as required.
2.3Learning graphs: Procedure-driven description
In this section, we describe a way of designing learning graphs that was used in Ref.  and other papers. The learning graph, introduced in Section 2.2, may be considered as a randomized procedure for loading values of the variables with the goal of convincing someone the value of the function is 1. For each input , the designer of the learning graph builds its own procedure. The goal is to load a 1-certificate for . Usually, for each positive input, one specific 1-certificate is chosen. The elements inside the certificate are called marked. The procedure is not allowed to err, i.e., it always has to load all the marked elements in the end. The value of the complexity of the learning graph arises from the interplay between the procedures for different inputs.
We illustrate this concepts with an example of a learning graph corresponding to the -distinctness algorithm by Ambainis . Fix a positive input , i.e., one evaluating to 1. Let be such that . It is a 1-certificate for . The elements inside are marked. One possible way of loading the marked elements consists of stage and is given in Table 1. The internal randomness of the procedure is concealed in the choice of the elements on stage I. (Here is some parameter to be specified later.) Each choice has probability .
|I.||Load elements different from .|
Let us describe how a graph and flow is constructed from the description in Table 1. At first, we define the key vertices of . If is the number of stages, the key vertices are , where and consists of all possible sets of variables loaded after stages.
For a fixed input and fixed internal randomness, the sets and of variables loaded before and after stage , respectively, are uniquely defined. In this case, we connect and by a transition .
in . Here, additional labels in the internal vertices assure that the paths corresponding to the transitions do not intersect, except at the ends. We say transition and all arcs therein belong to stage .
In the case like in the previous paragraph, we say the transition is taken for this choice of and the randomness. We say a transition is used for input , if it is taken for some choice of the internal randomness. The set of transitions of is the union of all transitions used for all inputs in . For instance, stage II.2 of the learning graph from Table 1 consists of all transitions from to where and . For an example refer to Figure 1.
The flow is defined as the probability, over the internal randomness, that transition is taken for input . All arcs forming the transition are assigned the same flow. Thus, the transition is used by iff . In the learning graph from Table 1, attains two values only: 0 and .
So far, we have constructed the graph and the flow . It remains to define the weights . This is done using Theorem ? below. But, for that, we need some additional notions.
The length of stage is the number of variables loaded on this stage, i.e., for a transition from to of stage . In our applications in this paper this number is independent on the choice of . We say the flow is symmetric on stage if the non-zero value of is the same for all on stage and all .
If the flow is symmetric on stage , we define the speciality of stage as the ratio of the total number of transitions on stage , to the number of ones used by . In a symmetric flow, this quantity doesn’t depend on .
Finally, we define the (total) complexity of stage , , similarly as is defined in (Equation 1) and (Equation 2) with the summation over , the set of all arcs on stage , instead of . It is easy to see that is at most .
Let be the non-zero value of the flow on stage . Assign weight to all arcs on stage .
Now we are able to calculate the complexity of the learning graph in Table 1. The length of stage I is , and the length of stage II. is 1 for all . It is also not hard to see that the corresponding specialities are and . For example, a transition from to on stage II. is used by input iff and . For a random choice of and , the probability of is , and the probability of , given , is . Thus, the total probability is and the speciality is the inverse of that.
Thus, the complexity of the algorithm, by Theorems ? and ?, is . It is optimized when , and the complexity is .
3Outline of the algorithm
In this section we describe how the learning graph from Table 1 is transformed into a new learning graph with a better complexity. Many times when learning graphs were applied to new problems, they were modified accordingly . This paper is not an exception, thus, we also describe the modifications we make to the model of a learning graph.
The main point of the learning graph in Table 1 and similar ones is to reduce the speciality of the last step, loading . In the learning graph from Table 1, it is achieved by loading non-marked elements before loading the certificate. This way, the speciality of the last step gets reduced from to . We say that are hidden among the elements loaded on stage I. The larger the set we hide the elements into, the better.
Unfortunately, we can’t make as large as we like, because loading the non-marked elements also counts towards the complexity. At the equilibrium point , we attain the optimal complexity of the learning graph.
In Ref.  a learning graph was constructed with better complexity. It uses a more general version of the learning graph than in Section 2.2, with weights of the arcs dependent on the values of the element loaded so far. Its main idea is to hide as one entity, not independent elements. By gradually distilling vertices of the learning graph having large number of -tuples of equal elements, the learning graph manages to reduce the speciality of the last step without increasing the number of elements loaded, because gets hidden among a relatively large number of -tuples of equal elements.
But this learning graph has serious drawbacks. Due to dealing with the values of the variables in the distilling phase, the flow through the learning graph ceases to be symmetric and depends heavily on the input. This makes the analysis of the learning graph quite complicated. What is even worse, the learning graph requires strong prior knowledge on the structure of the input to attain reasonable complexity.
In this paper we construct a learning graph that combines the best features of both learning graphs. Its complexity is the same as in Ref. . Also, it has the flow symmetric and almost independent on the input, like the one in Table 1. This has three advantages compared to the learning graph in Ref. : its complexity is easier to analyze, it doesn’t require any prior information on the input, and it is more suitable for a time-efficient implementation along the lines of Ref. . This is achieved at the cost of a more involved construction.
Let us outline the modifications the learning graph from Table 1 undergoes in order to reduce the complexity. Again, we assume is a positive input, and is such that .
We achieve a symmetric flow with smaller speciality of the last step by finding a way to load more non-marked elements in the first stages of the learning graph. There is an indication that it is possible in some cases: the values of Boolean variables can be learned in less than queries, if there is a bias between the number of ones and zeros . More precisely, if the number of ones is , the values can be loaded in queries.
We start with dividing the set of loaded elements into subsets: , where denotes disjoint union. Set has size . We use to hide when loading . This step doesn’t reduce the speciality, but this division will be necessary further.
Consider the situation before loading . If an element is such that for all , this element cannot be a part of the certificate (i.e., it can’t be ), and its precise value is irrelevant. (This is the place where we utilize the relations between the values of the variables as mentioned in the introduction.) In this case, we say doesn’t have a match in , and represent it by a special symbol . Otherwise, we uncover the element, i.e., load its precise value. Similarly, when loading with , we uncover those elements only that have a match among the uncovered elements of .
Usually, the number of elements in having a match in is much smaller than the total number of elements in . Similarly to Point ?, we can reduce the complexity of loading elements in because of this bias. Thus, we have , while the complexity of loading remains . Now we have more elements to hide in between, hence, the speciality of loading gets reduced.
When loading , we do want to be in for , because that is where we hide them. On the other hand, in order to keep the speciality of loading non-marked elements in equal to , we would like to add to only after all elements in have been already loaded. Thus, we load between these two stages and put them in . This is summarized in Table 2.
Since the uncovering of elements in , for , depends on the values contained in with , adding to afterwards is a bit of cheating. This does cause some problems we describe in more detail in Section 5.3. We describe a solution in Section 6.
|I.1||Load a set of elements not from .|
|I.2||Load a set of elements not from , uncovering only those elements that have a match in .|
|I.3||Load a set of elements not from , uncovering only those elements that have a match among the uncovered elements of .|
|I.()||Load a set of elements not from , uncovering only those elements that have a match among uncovered elements of .|
|II.1||Load and add it to .|
|II.||Load and add it to .|
In order to account for these changes, we use the following modifications to the learning graph model.
In Section 6, we are forced to drop the flow notion from Definition ?.
4We use Theorem ? directly, borrowing some concepts from the proof of Theorem ?. Namely, the notion of a vertex and an arc leaving it. Also, we keep the internal randomness intuition from Section 2.3. The loading procedure still doesn’t err in some sense formalized in (Equation 13).
We change the way the vertices of the learning graph are represented. Firstly, we keep track to which each loaded element belongs, like said in Point ?. Also, we assume the condition on uncovering of elements, and use the special symbol as a notation for a covered element, as described in Point ?. Technically, this corresponds to modification of the definition of an assignment in in the proof of Theorem ?.
Instead of having a rank-1 matrix as in the proof of Theorem ?, we define it as a rank-2 matrix. The weight of the arc depends now on the value of the variable being loaded as well, although in a rather restricted form. Thus, we are able to make use of the bias as described in Point ?, and to account for the introduction of in Point ?.
The remaining part of the paper is organized as follows. In Section 4 we give a learning graph for the graph collision problem that uses some ideas from above (Points ? and ?). In Sections Section 5 and Section 6 we describe the algorithm for -distinctness. In order to simplify the exposition, we first give a version of the learning graph from Table 2 that illustrates the main idea of the algorithm, but has a flaw. We identify it in Section 5.3 and then describe a work-around in Section 6. The complexity analysis of the second algorithm is analogous to the first one, so we do it for the first algorithm.
4Warm-up: Graph collision
In order to get ready for the -distinctness algorithm, we start with a learning graph for the graph collision problem with an additional promise. It is a learning graph version of the algorithm by Ambainis .
The graph collision problem is one of the ingredients of the triangle finding quantum algorithm by Magniez et al.  and the learning-graph-based quantum algorithm by Belovs . It is also used in the algorithm for boolean matrix multiplication by Jeffery et al. .
The problem is parametrized by a simple graph on vertices. The input is formed by boolean variables: one for each vertex of the graph. The function evaluates to 1 if there exists an edge of with both endpoints marked by value 1, and to 0 otherwise.
The best known quantum algorithm solving this problem for a general graph uses queries. For specific classes of graphs one can do better. For instance, if is the complete graph, graph collision is equivalent to the 2-threshold problem that can be solved in queries by two applications of the Grover algorithm. The algorithm in this section may be interpreted as an interpolation between this trivial special case and the general case.
Recall that the independence number of a simple graph is the maximal cardinality of a subset of vertices of such that no two of them are connected by an edge.
Note that if is a complete graph, , and we get the previously mentioned -algorithm for this trivial case. In the general case, , and the complexity of the algorithm is that coincides with the complexity of the algorithm for a general graph.
Jeffery et al.  build a quantum algorithm solving graph collision on in queries if misses edges to be a complete graph. This algorithm is incomparable to the one in Theorem ?: for some graphs the algorithm from Theorem ? performs better, for some graphs, vice versa.
Let be the graph collision function specified by graph . The first step of the algorithm is quantum counting . We distinguish the case when the number of ones in the input is at most , and when it is at least . In the intermediate case, the counting subroutine is allowed to return any of the outcomes. The complexity of the subroutine is .
If we know, with high probability, that the number of ones is greater than , we may claim that graph collision exists. Otherwise, we may assume the number of ones is at most . In this case, we execute the following learning graph .
The learning graph is essentially the learning graph from Table 1 for 2-distinctness. Let us denote, for simplicity, and . Then, instead of loading and such that , the graph collision learning graph loads and such that and is an edge of . We reduce the complexity of the learning graph by utilizing the bias between the number of zeros and ones induced by the small independence number, as outlined in Points ? and ? of Section 3.
One could prove the correctness of the algorithm completely analogously to the correctness proof of the algorithm from Section 2.3. However, in the preparation for future discard of the notion of flow (Point ? from Section 3), we use language from Section 6. The reader is encouraged to compare both ways of the proof.
Let be a positive input, and let and be such that and is an edge of . Set is a 1-certificate for .
The key vertices of the learning graph are , where and consist of all subsets of of sizes and , respectively, where is some parameter to be specified later.
A vertex in completely specifies the internal randomness. For each , we fix an arbitrary order of its elements: . We say the choice of randomness is consistent with if . For each , there are exactly choices of consistent with . We take each of them with probability .
For a fixed input and fixed randomness consistent with , the elements are loaded (we are going to define what this means later) in the following order:
The non-key vertices of are of the form , where , , and are from (Equation 3). Recall that, as stated in Section 2.2, the first element of the pair is the set of loaded elements, and the second one is an additional mark used to distinguish vertices with the same set of loaded elements.
An arc of the learning graph is a process of loading one variable. We denote it by . Here, is the variable the arc loads, and is a vertex of it originates in. In our case, the arcs are as follows. The arcs of the stage I have and with .
For a fixed and fixed internal randomness consistent with , the arcs taken are
Recall, we say satisfies an arc if the arc is taken for some consistent with . Note also, no arc is taken for two different choices of the randomness.
Like in the proof of Theorem ?, for each arc , we assign a matrix . Then, in ( ?) are given by .
Fix , and let be the set of loaded elements. Recall that an assignment on as a function . An input satisfies assignment iff for each . We say inputs and agree on , if they satisfy the same assignment . Let where the sum is over all assignments on . The matrix is defined as , where, for each ,
Here and are parameters to be specified later (the weights of the arc). They depend only on the stage the arc belongs to. In other words, consists of the blocks of the following form:
Here each of the 16 elements corresponds to a block in with all entries equal to this element. The first and the second columns represent the elements from that satisfy and , and such that their th element equals and , respectively. Similarly, the third and the fourth columns represent elements from that satisfy and such that their th element equals and , respectively. This construction is due to Robin Kothari .
Assume and are inputs such that and . Let be a choice of the internal randomness consistent with . Let be the matrix corresponding to the arc loading that is taken for the input and randomness . I.e., is either the matrix of (Equation 4) with sub-index , or , if there are none, i.e., when . We are going to prove that
(This is what we meant by saying in Point ? of Section 3 that the learning graph doesn’t err for all choices of the internal randomness.) Since there are choices of consistent with , and no arc is taken for two different choices of the randomness, this proves the feasibility condition in ( ?).
Consider the order (Equation 3) in which elements are loaded for this particular choice of and . Before any element is loaded, both inputs agree (they satisfy the same assignment ). After all elements are loaded, and disagree, because it is not possible that and . With each element loaded, the assignments become more specific. This means that there exists an element such that and agree before loading , but disagree afterwards. In particular, . By construction, this contributes to the sum in (Equation 6). All other contribute 0 to the sum. Indeed, if with then , hence, contributes 0. For with , and disagree on , hence, by construction.
Similarly to Section 2.3, let us define the complexity of stage on input as , where with the sum over such that belongs to stage . Also, define the complexity of stage as the maximum complexity over all inputs . Clearly, the objective value ( ?) of the whole program is at most the sum of the complexities of all stages.
Let us start with stages II.1 and II.2.
The total number of arcs on stages II.1 and II.2 are and , respectively. Each of them contributes at most to the complexity of any on stages II.1 and II.2, respectively.
Thus, the complexities of stages II.1 and II.2 on any is . On any , it is at most and , respectively. If we set equal to on stage II.1 and to on stage II.2, the complexities of these stages become and , respectively.
Consider stage I now. Let be the number of variables with value 1 in the input ( or ). The total number of arcs on this stage is . Out of them, exactly load a variable with value 1. Thus, for , the complexity of stage I is
Similarly, for , the complexity of stage I is . If we set and then, since , the complexity of stage I becomes . The total complexity of the learning graph is
5Algorithm for -distinctness: First attempt
The aim of this and the next sections is to prove the following theorem:
As mentioned in Section 3, we do not rely on previous results like Theorem ? in the proof, and use Theorem ? directly. The construction of the algorithm deviates from the graph representation: a bit in Section 5, and quite strongly in Section 6. However, we keep the term “vertex” for an entity describing some knowledge of the values of the input variables, and the term “arc” for a process of loading a value of a variable (possibly, only partially). Each arc originates in a vertex, but we do not specify where it goes. Inspired by Section 2.3, the vertices are divided into key ones denoted by the set of loaded variables with additional structure. The non-key vertices are denoted by where is the set of loaded variables, and is an additional label used to distinguish vertices with the same , as described in Section 2.2. Also, we use the “internal randomness” term from Section 2.3.
Throughout Sections Section 5 and Section 6, let be the -distinctness function. The section is organized as follows. In Section 5.1, we rigorously define the learning graph from Table 2; in Section 5.2, analyze its complexity; and, finally, describe the flaw mentioned in Point ? of Section 3 in Section 5.3.
Similarly to the analysis in Ref. , we may assume there is unique -tuple of equal elements in any positive input.
The construction of the learning graph for -distinctness is similar to the one in Theorem ?. Let be a positive input, and let denote the unique -tuple of equal elements in . The key vertices of the learning graph are , where , for , consists of all -tuples of pairwise disjoint subsets of of the following sizes. For , we require that for , and for .
Again, a vertex completely specifies the internal randomness. We assume that, for any , an arbitrary order of the elements in is fixed so that all elements of precede all elements of for all . (Here .) We say is consistent with if .
For each , there are exactly choices of consistent with . We take each of them, in the sense of Section 2.3, with probability . Here we use notation
For a fixed input and fixed randomness consistent with , the elements are loaded in the following order:
We use a similar convention to name the vertices and the arcs of the learning graph as in Theorem ?. The non-key vertices of are of the form , where , , and are from (Equation 7). Here we use notation . The first element of the pair describes the set of loaded elements.
Let us describe the arcs of , where, again, is the variable the arc loads, and is the vertex of it originates in. The arcs of the stages I. have and with . The arc belongs to stage I. iff . The arcs of stage II. have , with , and .
For a fixed and fixed internal randomness consistent with , the following arcs are taken:
We say satisfies all these arcs. Note that, for a fixed , no arc is taken for two different choices of .
Again, for each arc , we assign a matrix , so that in ( ?) are given by . Assume is fixed. Let be the set of loaded elements. Define an assignment on as a function , where represents the covered elements of stages I. for . Thus, must satisfy and for . An input satisfies assignment iff, for each ,
Each input satisfies unique assignment on . Again, we say inputs and agree on , if they satisfy the same assignment on .
We define as where the sum is over all assignments on . The definition of depends on whether is on stage I. with , or not. If is not on one of these stages then where, for each ,
Here is a positive real number: the weight of the arc. It only depends on the stage of the arc, and will be specified later. Thus, consists of the blocks of the following form:
Here and represent inputs mapping to 1 and 0, respectively, all satisfying some assignment . The inputs represented by have to satisfy the arc as well.
If is on stage I. with , the elements having a match in and the ones that don’t must be treated differently. In this case, , where
Here and are again parameters to be specified later. In other words, consists of the blocks of the following form:
Here and are like in (Equation 9). This is a generalization of the construction from Theorem ?. Note that if and are both represented by in the assignments on they satisfy then .
Let us estimate the complexity of the learning graph. We use the notion of the complexity of a stage from Section 4.2.
Let us start with stage I.1. We set for all arcs on this stage. There are arcs on this stage, and, by (Equation 9), each of them contributes at most to the complexity of each . Hence, the complexity of stage I.1 is .
Now consider stage II. for .
respectively. By setting , we get complexity of stage II.. The maximal complexity is attained for stage II..
Now let us calculate the complexity of stage I. for . The total number of arcs on this stage is . Consider an input , and a choice of the internal randomness . An element is uncovered on stage I. for this choice of if and only if there is an -tuple of elements with such that and for all . By our assumption on the uniqueness of a -tuple of equal elements in a positive input, the total number of such -tuples is . And, for each of them, there are choices of such that for all . By (Equation 10), the complexities of this stage for an input in and in are, respectively, at most
By assigning and , both these quantities become .
With this choice of the weights, the value of the objective function in ( ?) is
Assuming all terms in (Equation 11) except the last one are equal, and denoting , we get that