A Supplemental Material

# Measurement-dependent locality beyond i.i.d

## Abstract

When conducting a Bell test, it is normal to assume that the preparation of the quantum state is independent of the measurements performed on it. Remarkably, the violation of local realism by entangled quantum systems can be certified even if this assumption is partially relaxed. Here, we allow such measurement dependence to correlate multiple runs of the experiment, going beyond previous studies that considered independent and identically distributed (i.i.d.) runs. To do so, we study the polytope that defines block-i.i.d. measurement-dependent local models. We prove that non-i.i.d. models are strictly more powerful than i.i.d. ones, and comment on the relevance of this work for the study of randomness amplification in simple Bell scenarios with suitably optimised inequalities.

## I Introduction

Since their introduction, by John Bell in 1964 Bell (1964), Bell inequalities have been a subject of extensive study, as they highlight the fact that quantum theory is incompatible with local realism. Numerous experimental tests of Bell inequalities have been carried out, with the results being overwhelmingly in favour of the quantum predictions. Of particular note are the recent loophole-free Bell tests Hensen et al. (2015); Shalm et al. (2015); Giustina et al. (2015), which simultaneously addressed several loopholes that had been raised regarding previous experiments. All these tests were conducted under the assumption that the choice of measurements and the state of the source are independent in each run. This observation should not be taken as a reservation: such measurement independence is an essential piece of the scientific method, and its negation would be rightly considered conspiratorial. This makes it all the more remarkable that quantum theory can be proved incompatible with local realism even if this assumption is relaxed to some extent.

Indeed, while unrestricted measurement dependence would lead to an unfalsifiable superdeterminism Brans (1988), it was noted by Hall Hall (2011) that the violation of Bell inequalities keeps its meaning if some restrictions are made. This led to the study of measurement-dependent local (MDL) scenarios, where some correlation is allowed between the measurement choices and the source. A few subsequent works refined our understanding of measurement dependence Barrett and Gisin (2011); Koh et al. (2012); Pope and Kay (2013); Yuan et al. (2015), all sticking to known inequalities. A significant breakthrough was achieved when Pütz and coworkers noticed that the traditional Bell inequalities are no longer optimal: other linear constraints, suitably named MDL inequalities, more tightly define the conditions under which local realism holds in the MDL scenario. Their works Pütz et al. (2014); Pütz and Gisin (2016) developed the mathematical framework to study these inequalities. Their most celebrated discovery is the following: there exist quantum correlations that violate local realism with “arbitrarily low measurement independence”, that is, as long as the MDL model does not trivially allow us to reproduce all no-signalling correlations. The corresponding inequality has been tested in an experiment Pütz et al. (2016).

Measurement dependence in Bell-type tests is also central in the task of randomness amplification, where one aims to turn a single weak source of randomness (one in which the subsequent outcomes may be correlated in an almost unrestricted way) into a perfect coin. This task is provably impossible with classical information processing, but it becomes possible if the weak source is used to choose the inputs (including the state) in a Bell test, whose outcomes are taken as the new random numbers Colbeck and Renner (2012); Gallego et al. (2013); Brandão et al. (2016); Bouda et al. (2014); Chung et al. (2015). One may wonder why the optimised approach of Pütz and coworkers has not yet been applied to improve the bounds on randomness amplification. The reason is that, as reported so far, that approach has been developed under the assumption that the runs are independent and identically distributed (i.i.d.); and amplification of an i.i.d. source is trivial 1.

In this paper we study MDL for block-i.i.d. models, in which, as the name indicates, blocks of runs are i.i.d. but the runs in each block can be arbitrarily correlated Pope and Kay (2013); Yuan et al. (2015). Among the results (see Table 1 for a comprehensive overview), we prove that MDL models in fact become strictly more powerful if the i.i.d. assumption is dropped. This was not a foregone conclusion: under measurement independence, the local bound of a Bell inequality is the same with or without the i.i.d. assumption, only the estimates of finite-sample fluctuations differ.

## Ii Measurement dependence

Let us begin by reviewing how measurement dependence is formalised in the i.i.d. case. Consider a bipartite Bell test setup with two experimenters, Alice and Bob. Alice’s measurement choice is specified by an input , and she obtains an output . Similarly, Bob’s input and output will be labelled and respectively. In the case of i.i.d. runs, Alice and Bob can measure the probabilities of their various inputs and outputs occurring in a single experimental run. The set of all valid probability distributions will be denoted , the single-run probability space 2. Later, we will consider probability spaces , which account for input-output combinations over runs.

Local realistic models are defined as those that admit a decomposition

 P(abxy)=(∫dλw(λ)P(a|xλ)P(b|yλ))P(xy), (1)

where is a “local hidden variable” or “strategy” that determines the conditional probabilities of the outputs given the inputs. For any fixed , the set of points in admitting a local realistic model as described above forms a polytope Fine (1982); Brunner et al. (2014), denoted as . The linear inequalities satisfied by all points in are the Bell inequalities. In MDL models, the input probabilities can be conditioned on as well (Fig a), so the achievable probability distributions are of the form

 P(abxy)=∫dλw(λ)P(a|xλ)P(b|yλ)P(xy|λ). (2)

Clearly, this is a superset of . If no constraints are imposed on the conditional input probabilities , MDL models can trivially reproduce all quantum distributions Brans (1988); Pütz et al. (2014). To exclude this scenario, one approach Pütz et al. (2014); Pütz and Gisin (2016) is to impose linear bounds

 l≤P(xy|λ)≤h. (3)

Let us notice that in the language of randomness, a source characterised only by would be called an i.i.d. min-entropy source. Given such constraints, the set of points in achievable by MDL models is a polytope as well Pütz et al. (2014); Pütz and Gisin (2016), which we shall denote as . The linear inequalities satisfied by all points in the MDL polytope are the MDL inequalities. While famously the local polytope and the quantum set are subsets of the no-signalling polytope , which is a slice of defined by suitable linear equality constraints Brunner et al. (2014), the MDL polytope extends into the signalling region Thinh et al. (2013); Pütz et al. (2014). This contributes to making its characterisation, if not conceptually harder, certainly computationally heavier.

It is important to become familiar with the constraints (3), so we make a few remarks about them. First, the normalisation implies , because there are combinations of inputs in the Bell test. If either or are set at , then this enforces which is the case of measurement independence. A deterministic choice of inputs conditioned on would be allowed by and , but one does not need to go all the way to determinism for the MDL scenario to become trivial. In particular, already means that there can exist one or more pairs of inputs that are never used, and this is very powerful. For instance, consider the 2-input 2-output case : as soon as one pair of settings is not used, one can use local variables to fake any no-signalling distribution. Therefore, the values and already describe a trivial situation in which the violation of local realism cannot possibly be certified. As mentioned in the introduction, a key finding of Pütz and coworkers  Pütz et al. (2014) is an MDL inequality that admits a quantum violation for any in the i.i.d. case; in particular, this statement also holds for all even if is left unspecified.

Having introduced these notions, following Pope and Kay Pope and Kay (2013) we generalise them to block--i.i.d. MDL models (or MDL models for short), which are the focus of this work. We now consider blocks of experimental runs in parallel, dealing with -tuples of inputs and outputs (Fig. b). After many repetitions of these runs, one can reconstruct ; the set of all valid probability distributions of this form is denoted by .

Within this set, MDL models achieve probability distributions of the form

 P(→a→b→x→y)=∫dλw(λ)P(→a|→xλ)P(→b|→yλ)P(→x→y|λ). (4)

Similar to the i.i.d. case, one can impose linear constraints on the MDL model. Under such constraints, the set of all points in attainable by MDL models is again a polytope. This can be shown simply by noticing that this MDL scenario for is mathematically equivalent to the MDL scenario for .

In this work, we focus on the case where the lower bound is left unspecified, with larger values of corresponding to greater amounts of measurement dependence. To facilitate comparison with the i.i.d. case, we denote , so finally we are going to work with the constraint

 P(→x→y|λ)≤hN. (5)

In the language of randomness, Eq. (5) says that the inputs are drawn from a block-i.i.d. min-entropy source with bits of input entropy per use, which is strictly more general than uses of an i.i.d. min-entropy source with bits of input entropy per use.

Before presenting our results, we need to introduce two more notions. The first is single-run coarse-graining, which converts the block probabilities into average single-run probabilities . The function that represents this coarse-graining is linear, and hence maps polytopes to polytopes; it is described in detail in the Supplemental Material. Obviously, information is lost in this procedure, but the resulting probability space is of considerably lower dimension and hence easier to study. Besides, if a violation of local realism is seen in the coarse-grained version, it must also be present in the full probabilities. Finally, for large , it will be hard to reconstruct the from the experimental data.

The second notion is that of restricting MDL models to local strategies that are independent but not identically distributed. This is obtained by assuming

 P(→a|→xλ)=N∏j=1P(aj|xjλ),P(→b|→yλ)=N∏j=1P(bj|yjλ). (6)

in (4). The corresponding set of probabilities still forms a polytope (see Supplemental Material), which we denote as . This is useful for comparison with the works of Pope and Kay Pope and Kay (2013) and Pütz and coworkers Pütz et al. (2014); Pütz and Gisin (2016).

## Iii Results

Our results are obtained for the case and on the slice , describing the natural assumption that all the inputs appear uniformly distributed when there is no information on . The results are listed in Table 1, with detailed proofs given in the Supplemental Material. Here we comment on them.

Firstly, most of these results relate an MDL scenario with product sets of the type or (see Supplemental Material), defined by a condition similar to that in Eq. (6). The reason for this choice goes back to a previous remark: . By studying the most general quantum statistics , we would actually be discussing MDL for larger alphabets. In order to give our study a clear flavour of going beyond i.i.d., therefore, we discuss the power of MDL models to reproduce statistics achievable with independent entangled pairs. In other words, we are addressing the following question: by implementing a “routine” Bell test with independent entangled pairs, up to which value of can one obtain a probability distribution that falsifies local realism, even under an MDL assumption? We note that the pairs do not need to be identically distributed, which is a pleasant feature for comparison with experiments, in which some parameters may drift with time.

We start with the left column of Table 1, dealing with the general . Unfortunately, already has up to vertices, making it impractical to compute all its facets. By instead exploiting properties of the Popescu-Rohrlich (PR) box Popescu and Rohrlich (1994), we have been able to prove that is already enclosed by for (top-left corner). In particular, then, it will be impossible for to violate local realism all the way up to in the MDL scenario. We found that the MDL inequality that was violated in the whole non-trivial range of  Pütz et al. (2014, 2016) loses much of its robustness under MDL: it can no longer be violated by the quantum distribution specified in Refs. Pütz et al. (2014, 2016) when (see Supplemental Material for details). We were still able to show that for all , but it remains an open question whether other points in can violate local realism for higher values, possibly up to .

The bottom-left corner shows that the robustness reported by Pütz and coworkers is recovered if the two runs are constrained to be independent as in Eq. (6). This result, though maybe not surprising, does constitute a generalisation of the original one, insofar as the runs are not required to be identical.

Moving to the right column of Table 1, we deal with the coarse-grained probabilities . In the upper-right corner, the new piece of information is that , while the converse implication follows from the previous result. More interestingly, here we are able to make a statement for any , and about rather than only , because it can be shown that (see Supplemental Material). Therefore, after coarse-graining, no quantum statistics will show any violation of local realism for under the MDL assumption. We do not know if this bound is tight, but we find at least that there exists a point in which remains outside for all .

Finally, the lower-right corner is the most constrained situation, that was studied by Pope and Kay Pope and Kay (2013) 3. They considered the CHSH inequality Clauser et al. (1969) and proved that it is violated by for any up to (denoted in their paper); when in particular, it is violated up to . For this scenario with , we have found points in that violate local realism for , but cannot make conclusive statements for larger values of .

## Iv Conclusion

We have studied block-i.i.d. models for measurement-dependent locality (MDL), and their power to reproduce statistics that can be produced with independent entangled pairs. The MDL model is the least constrained one: a weak random source with min-entropy . For specific results, we have considered the Bell scenario with two inputs and two outputs per party. For , it was known that MDL models become too powerful at , and remarkably, quantum correlations could demonstrate violation of local realism all the way up to that value. However, this conclusion does not stand when the i.i.d. assumption is relaxed: already for we have shown that the threshold value is reduced to ; and with correlations achievable with two entangled pairs we have not been able to find any violation beyond . We have obtained similar results for more restricted MDL models and for a coarse-grained data processing, some of which are valid for arbitrary .

We finish by commenting on the implications of our results for randomness amplification. The first results Colbeck and Renner (2012); Gallego et al. (2013); Brandão et al. (2016) were obtained for a slightly stronger model of random sources, the so-called Santha-Vazirani sources, but subsequent results have claimed the possibility of amplifying even a min-entropy source Bouda et al. (2014); Chung et al. (2015). However, all these protocols require multi-partite entanglement and are not robust to deviations from an ideal quantum state; it is currently an open problem to devise a randomness amplification protocol that can be implemented with existing devices. The MDL approach started by Pütz and coworkers gives the hope of deriving such a protocol: robust, and for the simplest Bell scenario. Our paper is the first step in this direction, but we are not yet there. A computational study of MDL for larger would be challenging, so one would have to try obtaining analytical results instead.

## Acknowledgments

We acknowledge useful correspondence with Gilles Pütz and Nicolas Gisin, particularly regarding the computational methods used to study the MDL polytope. This work is funded by the Singapore Ministry of Education (partly through the Academic Research Fund Tier 3 MOE2012-T3-1-009), by the National Research Foundation of Singapore, Prime Minister’s Office, under the Research Centres of Excellence programme.

## Appendix A Supplemental Material

In this Supplemental Material, we begin by laying out a framework for block-i.i.d. models, defining the quantum sets and no-signalling sets of interest in the block-i.i.d. case. We then show that the MDL set is a polytope, and give the form of its vertices. This allows us to study the i.i.d. MDL inequality described in Refs. Pütz et al. (2014); Pütz and Gisin (2016) in the MDL scenario, and show that it becomes substantially less robust. Finally, we describe several techniques that can be used to study the MDL polytope, most importantly the notion of -mismatch strategies. Using these techniques, we derive the results shown in Table 1 in the main text.

## Appendix B The probability space and input probabilities

In this section, we discuss the block--i.i.d. scenario directly, with the i.i.d. scenario being described by the case. We begin by noting that since is defined by a set of linear equality and inequality constraints on a vector space, it forms a polytope. In this work, we choose to use the definition of a polytope as an intersection of finitely many half-spaces, and all polytopes will be implicitly assumed to be compact. This is then equivalent to defining a polytope as a convex hull of finitely many points.

As mentioned in the main text, the probabilities can be converted into conditional probabilities by dividing by , assuming all values of are nonzero. This is not a linear transformation on as a whole; however, on any slice of specified by fixing the values of , it is indeed an invertible linear transformation. This allows us to directly apply many theorems derived in terms of conditional probabilities to the full probabilities . An alternative approach for future work may be to work entirely with the conditional probabilities, in which case the local realistic set and quantum set have been extensively characterised, but the structure of the MDL set becomes less clear.

There are multiple ways to compute these input probabilities ; for instance, given the set of probabilities , they can be computed by summing over and . This is used to convert to conditional probabilities . Alternatively, they could be computed as if a -strategy is specified. As a consistency check, we note that these methods are equivalent, since

 ∑→a,→bP(→a→b→x→y) =∑→a,→b∫dλw(λ)P(→a→b|→x→yλ)P(→x→y|λ) =∫dλw(λ)⎛⎜⎝∑→a,→bP(→a→b|→x→yλ)⎞⎟⎠P(→x→y|λ) =∫dλw(λ)P(→x→y|λ)by % normalisation. (7)

We also note that if the average single-run probabilities are needed, they could be computed by averaging the probabilities directly, or by first computing the probabilities by averaging , then summing over and . All these methods can be shown to give the same value. This raises some question of whether to use or to specify a slice of . However in this work, we will restrict ourselves to the former, as the latter specification results in a slice where the conversion between and is not necessarily linear.

## Appendix C The coarse-graining function

Starting with the block-2-i.i.d. case, we define the coarse-graining function as follows: given any point corresponding to probabilities , we define as the point corresponding to probabilities

 Pc2(p)(abxy) =12(PA1B1X1Y1(abxy)+PA2B2X2Y2(abxy)) (8) =12⎛⎝∑a2,b2,x2,y2Pp(aa2bb2xx2yy2)+∑a1,b1,x1,y1Pp(a1ab1bx1xy1y)⎞⎠. (9)

This represents an average of the probabilities in the first and second runs of getting the input-output combination . The generalisation of the coarse-graining function to the block--i.i.d. case is fairly straightforward, though cumbersome to express:

 PcN(p)(abxy)=1NN∑j=1⎛⎜⎝∑(→a,→b,→x,→y)∈Sj(a,b,x,y)Pp(→a→b→x→y)⎞⎟⎠, (10)

where is defined as the set of tuples such that , using to denote the entry of and so on. It can be seen that the function is linear, and hence admits a matrix representation for computational purposes. It is not injective as a function from to , because it is easy to find two points such that , for instance by permuting the order of the runs. This is consistent with its interpretation as a coarse-graining which loses some information about the original point. Another fairly intuitive property of the coarse-graining function is as follows:

###### Proposition 1.

Consider any point that corresponds to the repetition of a single-run probability distribution over runs,

 Pp(→a→b→x→y)=N∏j=1Pp1(ajbjxjyj). (11)

Then .

###### Proof.

Referring to Eq. 10 for the definition of the coarse-graining function, we see that for the specified point , the first term in the summation has the form

 ∑(→a,→b,→x,→y)∈S1(a,b,x,y)Pp(→a→b→x→y) =∑(→a,→b,→x,→y)∈S1(a,b,x,y)(N∏j=1Pp1(ajbjxjyj)) =∑a2,b2,x2,y2...∑aN,bN,xN,yN(Pp1(abxy)N∏j=2Pp1(ajbjxjyj)) =Pp1(abxy)N∏j=2⎛⎝∑aj,bj,xj,yjPp1(ajbjxjyj)⎞⎠ =Pp1(abxy) \quad by normalisation of Pp1, (12)

and similarly for the other terms as well. Therefore,

 PcN(p)(abxy) =1NN∑j=1Pp1(abxy)=Pp1(abxy), (13)

which is the result to be proven. ∎

## Appendix D The quantum set and no-signalling set

As stated in the main text, when considering quantum models, we shall mainly consider product sets , defined by the set of points in admitting a decomposition

 P(→a→b→x→y)=(∫dλw(λ)N∏j=1Tr[ρ(j)(λ)(P(j)aj|xj(λ)⊗P(j)bj|yj(λ))])P(→x→y), (14)

where the measurements , can be considered projective without loss of generality because the dimension of the systems is left unconstrained. This describes the situation where the quantum states and measurements may differ from one run to the other but are independent across the runs, similar to independent-runs MDL models (Eq. (6)). We contrast this to the most general quantum set , given by points of the form

 P(→a→b→x→y)=(∫dλw(λ)Tr[ρ(λ)(P→a|→x(λ)⊗P→b|→y(λ))])P(→x→y). (15)

This allows for coherent quantum states and measurements across multiple runs, and clearly . However, we mostly do not consider in this work, because it would be mathematically identical to an i.i.d. scenario with a larger alphabet. In addition, it would also be difficult to implement experimentally. A possible regime for future investigation would be allowing the quantum states and measurements to be conditioned on the inputs or outputs of past runs, producing a set intermediate between and .

While quantum distributions can violate Bell inequalities, they still obey the no-signalling conditions, preventing faster-than-light communication. In the i.i.d. case, the no-signalling conditions can be expressed mathematically as

 ∑bP(ab|xy)=∑bP(ab|xy′)∀a,x,y,y′, (16) ∑aP(ab|xy)=∑aP(ab|x′y)∀b,y,x,x′.

For any fixed , these specify a set of linear equality constraints on . This hence defines a slice of the polytope , which we denote as , the no-signalling polytope. The no-signalling set can be easier to study than the quantum set, since it can be characterised by a finite set of vertices, unlike the quantum set. However we note that when studying MDL models, the MDL polytope is not constrained to lie on the no-signalling slice, because the measurement dependence in can introduce correlations between the inputs.

A particularly significant point on the no-signalling slice is the Popescu-Rohrlich (PR) box Popescu and Rohrlich (1994), defined by

 PPR(ab|xy)={12%ifa⊕b=xy0otherwise, (17)

where represents addition modulo 2. This distribution satisfies the no-signalling constraints, but cannot be achieved by any quantum models. It has the important property that up to permutations of inputs, outputs and parties, every vertex of in the bipartite 2-input 2-output case is either a vertex of or a PR box Barrett et al. (2005).

We can also generalise to the product set , defined as the set of points admitting a decomposition

 P(→a→b→x→y)=(∫dλw(λ)N∏j=1Pqj(λ)(ajbj|xjyj))P(→x→y), (18)

where all the points satisfy the i.i.d. no-signalling constraints in Eq. (16). Since quantum models are no-signalling, is clearly a subset of . Intuitively, we would also expect that allowing block-i.i.d. quantum models still does not result in apparently-signalling distributions after averaging over the runs, which is to say . We now show that this is indeed the case, at least on the uniform-measurements slice.

###### Proposition 2.

Consider any on the uniform-measurements slice . If its conditional probabilities satisfy the constraints

 ∑→bP(→a→b|→x→y)=∑→bP(→a→b|→x→y′)∀→a,→x,→y,→y′, (19) ∑→aP(→a→b|→x→y)=∑→aP(→a→b|→x′→y)∀→b,→y,→x,→x′,

then the conditional probabilites corresponding to satisfy the i.i.d. no-signalling constraints specified in Eq. (16), and thus with .

###### Corollary.

If the uniform-measurements constraint is imposed, we have and , with .

###### Proof.

Suppose that the conditions in Eq. (19) are fulfilled by the point . Given the uniform-measurements constraint , these conditions on can be converted directly to the same statements for simply by multiplying throughout by . We can hence write

 ∑→bPq(→a→b→x→y)=f(→a,→x), (20) ∑→aPq(→a→b→x→y)=g(→b,→y),

expressing the fact that these sums are independent of and respectively. On this slice, we also have for all . Hence considering the i.i.d. no-signalling condition for Alice (the first equation in Eq. (16)), we note that for any , we have

 ∑bPcN(q)(ab|xy) =dXdY∑bPcN(q)(abxy) =dXdYN∑bN∑j=1⎛⎜⎝∑(→a,→b,→x,→y)∈Sj(a,b,x,y)Pq(→a→b→x→y)⎞⎟⎠ =dXdYNN∑j=1⎛⎜⎝∑(→a,→x,→y)∈S′j(a,x,y)∑→bPq(→a→b→x→y)⎞⎟⎠ =dXdYNN∑j=1⎛⎜⎝∑(→a,→x,→y)∈S′j(a,x,y)f(→a,→x)⎞⎟⎠ =dXdYNN∑j=1⎛⎜⎝dN−1Y∑(→a,→x)∈S′′j(a,x)f(→a,→x)⎞⎟⎠,% since the summand is independent of →y =dXdNYNN∑j=1⎛⎜⎝∑(→a,→x)∈S′′j(a,x)f(→a,→x)⎞⎟⎠, (21)

where is defined as the set of tuples such that , and similarly for , analogous to the definition of for the coarse-graining function. From the final expression, we see that is independent of , and thus the i.i.d. no-signalling condition for Alice is fulfilled. Applying the same argument to Bob, we conclude that indeed, with .

As for the corollary, the statement follows immediately by noting that even for the quantum points described in Eq. (15), the constraints in Eq. (19) are still satisfied, as can be seen by treating it as an i.i.d. scenario with a larger alphabet. Regarding , we similarly have , since it can be shown that any point in obeys the constraints of Eq. (19). We note also that Proposition 1 implies on the uniform-measurements slice, since it shows that any point in with has a pre-image in with under the coarse-graining function (simply by repetition). Therefore, we can conclude that under the uniform-measurements condition. ∎

## Appendix E Vertices of the MDLN polytope

We now turn to the issue of characterising the MDL set, as defined in Eq. 4 and subject to the constraints . Regarding these constraints, we note that any implicitly imposes an upper bound by normalisation of . Similarly, any implies a nonzero lower bound . In particular, for the 2-input 2-output i.i.d. case, this shows that any imposes a nonzero lower bound, even if is left unspecified.

In cases of potential ambiguity, we shall refer to the general MDL models in Eq. (4) as dependent-runs models, and those constrained to independent local strategies (Eq. (6)) as independent-runs models. Their corresponding sets are denoted and respectively. There may be other MDL sets of interest, such as models where the outputs depend on the inputs of all past runs but not future runs. However, in this work we shall only consider the dependent-runs and independent-runs models.

As claimed in the main text, the set of probability distributions admitting dependent-runs or independent-runs MDL models forms a polytope in . When subsequently taking a slice of by specifying , the values chosen for must be compatible with the constraints , in order for the MDL set to have non-empty intersection with this slice. For , its vertices are precisely the set of points of the form

 P(→a→b→x→y)=P(→a|→x)P(→b|→y)P(→x→y), (22)

where and are all equal to either 0 or 1, and the values of are extremal in the sense that all but at most one of them are either equal to or . Such an assignment of values for and is referred to as a local deterministic strategy. Similarly, the vertices of are the set of points of the form

 P(→a→b→x→y)=(N∏j=1P(aj|xj)P(bj|yj))P(→x→y), (23)

where and are all equal to either 0 or 1, and the values of are extremal as described above.

To justify this claim, we note that for dependent-runs models, the MDL scenario for is mathematically equivalent to the MDL scenario for . Hence the proof in Refs. Pütz et al. (2014); Pütz and Gisin (2016) for the i.i.d. MDL set with arbitrary finite inputs and outputs carries over directly to this case, and it shows that the dependent-runs MDL set is a polytope with vertices of the form in Eq. (22).

For independent-runs models, we instead use an intermediate theorem from Refs. Pütz et al. (2014); Pütz and Gisin (2016), that if the conditional output probabilities and the input probabilities are both drawn from polytopes, then combining them in the manner of Eq. 4 produces a polytope. Since the independent-runs restriction only affects , it suffices to show that these conditional output probabilities still form a polytope under the independent-runs condition.

By noting that the probabilities that admit an independent-runs decomposition are isomorphic to those for a -party local realistic model where parties have inputs and parties have outputs, we see that they indeed form a polytope, with vertices given by local deterministic strategies. The theorem from Refs. Pütz et al. (2014); Pütz and Gisin (2016) then shows that the independent-runs MDL set is a polytope, and that its vertices are given by combinations of local deterministic strategies with an extremal assignment of values to , as described in Eq. (23).

We see from this that the number of vertices of the MDL polytope equals the product of the number of local deterministic strategies with the number of possibilities for an extremal assignment of values to . In the 2-input 2-output case, there are local deterministic strategies for dependent-runs models, or local deterministic strategies for independent-runs models. As for the number of possible extremal assignments for , this depends on the values of and . By directly applying the proof given in Refs. Pütz et al. (2014); Pütz and Gisin (2016) for the i.i.d. case with inputs and outputs, we note that up to permutation, the unique extremal assignment of values to the terms is to have of them equal to , of them equal to , and the last chosen to satisfy normalisation. The number of permutations is hence given by the multinomial coefficient

 (22Nm,(22N−m−1),1)=22N!m!(22N−m−1)!. (24)

A special case is when the values of are such that is already an integer, in which case all of can be set equal to either or . In that case, letting the number of terms equal to be , the number of possible permutations is the binomial coefficient

 (22Nm,(22N−m))=22N!m!(22N−m)!. (25)

From these expressions, we see that for the special values of where all of can be set equal to either or , the number of vertices of the MDL polytope tends to be smaller. However, we see that in general, the number of vertices is large enough that it would be intractable for anything more than small values of .

## Appendix F The i.i.d. MDL inequality

The MDL inequality shown in Eq. 5 of Pütz et al. (2014) can be written in the form

 lP(0000)−h(P(0101)+P