Simulation of Quantum Walks and Fast Mixing with Classical Processes
Abstract
We compare discretetime quantum walks on graphs to their natural classical equivalents, which we argue are lifted Markov chains, that is, classical Markov chains with added memory. We show that these can simulate quantum walks, allowing us to answer an open question on how the graph topology ultimately bounds their mixing performance, and that of any stochastic local evolution. The results highlight that speedups in mixing and transport phenomena are not necessarily diagnostic of quantum effects, although superdiffusive spreading is more prominent with quantum walks.
Random walks are both ubiquitous models for natural processes and a powerful, versatile algorithmic tool to explore networks and extract information about their structure. In recent years their quantum analogue, named quantum walks (QWs), was shown to hold similar promises. QWs describe the evolution of the position probability distribution of a “walking” quantum particle on a graph, possibly entangled with other quantum degrees of freedom (the socalled coin). The joint dynamics can be either discretetime or continuous and must respect the graph locality [aharonov1993quantum, farhi1998quantum, watrous1999quantum, kempe2003quantum]. Following the realization that QWs on a line can beat the diffusive behavior typical of classical stochastic processes [ambainis2001one, aharonov1993quantum], they have been invoked to explain improved transport phenomena in biological systems [engel2007evidence, mohseni2008environment], linked to thermodynamic theories, breakdown models and topological states of matter [romanelli2014thermodynamics, oka2005breakdown, kitagawa2012observation], and simulated in various experiments [karski2009quantum, peruzzo2010quantum, genske2013electric, preiss2015strongly, flurin2017observing]. Furthermore, they have been intensely studied as a paradigm for quantum computing [childs2009universal, lovett2010universal] and to speed up algorithmic tasks [ambainis2003quantum], in particular, those related to the celebrated Grover search algorithm [shenvi2003quantum, childs2004spatial, magniez2011search].
Despite impressive advances in their analysis, elucidating the source and extent of quantum advantages from the perspective of QWs, as well as providing general design principles to ensure a quantum speedup, remain ongoing lines of research. A general quadratic speedup by QWs has been established for the hitting time [childs2003exponential, szegedy2004quantum, kempe2005discrete, magniez2011search, krovi2016quantum, hoyer2016efficient], thus searching for a marked node in a graph. The complementary problem of mixing, that is, converging to a particular probability distribution over the nodes, has so far resisted a general QW speedup analysis, although it is closer to the original observation on the line [ambainis2001one]. There is further evidence for a quadratic speedup with respect to classical Markov chains on specific graphs including the cycle [aharonov2001quantum], the hypercube [moore2002quantum], and the torus [richter2007quantum]. A general characterization of QW mixing would be a fundamental step for investigating quantum vs. classical differences in statistical mechanics (thermodynamic equilibration, transport phenomena, localization defects), and its algorithmic complexity is of key relevance for applications like sampling and MonteCarlo simulations [sinclair2012algorithms].
In this paper, we characterize mixing performance of QWs by showing that they belong to a class of processes which can be simulated by classical Markov chains with additional finite memory, called “lifted Markov chains” (LMCs) [chen1999lifting]. For general graphs, our constructive proof reminds a classical version of the “Feynman clock Hamiltonians” used to prove universality of adiabatic computing [feynman1982simulating, kitaev2002classical, aharonov2008adiabatic], in combination with “stochastic bridges” generalizing [aaronson2005quantum] and [pavon2010discrete, georgiou2015positive] to simulate quantum channels for fixed initial conditions. This allows us to derive a tight bound on potential QW mixing speedup, improving the known bounds from [aharonov2001quantum, temme2010chi]. Furthermore, for lattices, on which most QW mixing speedups have been demonstrated, we relate the QWs to fast mixing LMCs that have not only the same mixing performance, but also the same structure [diaconis2000analysis, diaconis2013spectral], making them their natural classical analogue.
These results provide several insights. First, an observed speedup in mixing is not fundamentally diagnostic of a quantum effect, as it may always be explained by a purely classical memory. Second, QWs are essentially subject to the same bound on their mixing performance as other local processes, induced solely by the topology of the graph. Third, the search for a quantum advantage should focus on identifying efficient designs, in terms of the amount of memory or the graph knowledge required. For lattices, beating efficient classical algorithms is possible only for tasks beyond pure mixing. Whether for statistical mechanics, evolutionarily selected biological systems, or design of faster Monte Carlo algorithms, our results significantly narrow the context in which quantum effects may provide an intrinsic advantage.
QWs and their classical counterparts: a paradigmatic example. – Usually, QWs are presented as the quantum analogues of, and compared to, classical random walks. We next argue that different classical models should be considered towards establishing an intrinsic quantum advantage in mixing, as QWs exhibit genuine memory effects. Standard discretetime QWs [meyer1996quantum, aharonov2001quantum] describe the evolution of the position distribution of a quantum particle (“walker”) over a discrete set of graph nodes . The quantum evolution of position is conditioned on additional degrees of freedom , the coin of the walker. The walker state is thus defined on the joint Hilbert space The cycle graph is a simple example where QWs provide a mixing speedup with respect to a classical walk, see Fig. 1. To the nodes of the cycle, the QW adds a binary coin , see [ambainis2001one, aharonov2001quantum]. Denoting the cyclic permutation of position, that is, for , the unitary QW primitive reads
where expresses a conditional shift, while is a general unitary “coin toss” on . The conditional motion can also be viewed as spinorbit coupling. To actually mix, some decoherence or measurement rule must be added to this unitary evolution, see [kendon2007decoherence] for a survey. For instance, after every application of , one can perform with probability a projective measurement in the canonical basis, after which the unitary evolution is resumed:
(1) 
A purely unitary QW is obtained with , while projects the state on the reference basis at each step. The position distribution is obtained by tracing over the coin and considering the probabilities induced in the node basis at time . The QW of Eq. (1) with, e.g., parameters and , converges towards a uniform in steps, from any initial distribution [ambainis2001one, aharonov2001quantum]. In contrast, a classical random walk over with transition matrix reaches the same distribution only after steps.
Compared to a classical random walk, the QW above clearly adds memory via the coin degrees of freedom. Yet, QWs can exhibit memory effects even without coin. Consider the twonode graph without coin, equivalent to a qubit, and take the Hadamard gate as QW primitive. Starting on a given node, after one step, the distribution over is uniform, yet at the second step the initial state is perfectly recovered since is the identity operator. This behavior, impossible for any classical Markov process on , is due to the quantum state storing information in its relative phases, or coherences. Hence, to establish if there is an intrinsic quantum advantage, QWs should be compared to classical local processes with at least a certain amount of additional memory.
Remarkably, a classical walker with memory that mixes fast on the cycle has already been proposed independently of the QW literature [diaconis2000analysis, diaconis2013spectral], and it shares striking similarities. This walker moves among classical states in Its probability distribution over evolves as with stochastic transition matrix having the same structure as , yet with now replaced by a stochastic coin toss:
(2) 
This can be seen as the mixture of two reversible evolutions: with probability the state follows the conditional shift ; or, with probability the coin is switched before applying The coin allows the classical walker to retain and use information about its previous motion direction, in physical terms its momentum. The similarity between and carries a deeper connection, as in Eq. (2) exactly describes the probabilistic evolution induced by Eq. (1) when starting with for some and with .
This mixes over the cycle in steps [diaconis2000analysis, diaconis2013spectral], provided . This speedup, only due to classical memory, matches the one provided by the QW in Eq.(1) with . In both cases, an nonunitarity provides a good tradeoff between fast (deterministic) motion along the graph and losing correlation with the initial condition. From these observations, it appears most natural to compare QWs like Eq.(1) to classical evolutions with memory like Eq.(2), which are formalized as LMCs [chen1999lifting].
QWs and LMCs as local processes with equivalent mixing performance. – Consider a graph with node set and edges . The nodes could represent energy levels and the edges allowed transitions. The QW and LMC constructions both start by building a lifted graph, where each node of the initial graph is split into “lifted nodes” or “sublevels”. This is done without loss of generality by introducing a coin set defining the lifted nodes and selecting lifted edges in , thus without introducing transitions that were not allowed before lifting.
A general QW is then described by a quantum channel over the space generated by viewing coin and node as quantum numbers, i.e.,
(3) 
where is a density operator on and the satisfy , with denoting the identity ^{1}^{1}1Some authors add a socalled Cesaro averaging routine on top of this QW model [aharonov2001quantum, marquezino2008mixing]. Our results can explicitly capture this and similar extensions via local stochastic maps, see Supplemental Material in appendix A.. The graph locality is imposed by if . To complete the setup, an initial distribution over is mapped onto the lifted nodes (or sublevels) by thus associating some fixed initial coin state to each The object of interest is the distribution over , the main nodes or levels ^{2}^{2}2 Standard literature like [chen1999lifting] defines LMCs with joint distribution over as object of interest, without initialization map . This does not affect our QW results, and in fact it implies no significant difference in general, see [apers2017lifting]., obtained with the partial trace as .
Similarly, a LMC follows the dynamics where is a vector representing the probability distribution over and is a stochastic matrix expressing the jump probabilities among sublevels. Namely, denoting by and the distributions with probability 1 of being on and on respectively, is the transition probability from to . Graph locality imposes if . Initial lifted nodes are assigned by . The distribution of interest is obtained by marginalizing over , thus for all .
Clearly, a LMC is a particular QW where populations evolve without coherences, i.e., where remains diagonal at all times and , with index running over all nonzero elements of . The key to our main result will be to observe how, conversely, any QW can be simulated by some LMC (with possibly higherdimensional coin). In other words, the nonMarkovian evolution of under a QW can be described as a classical Markovian evolution of sublevel populations.
We focus on comparing the mixing behavior induced by QWs and LMCs. A QW or LMC mixes to some distribution over if for any initial state the induced distribution converges to . The mixing time , for any , is the time required to get close to the limit distribution in total variation distance, i.e., the smallest time such that for all and all . A standard “stabilizing” requirement for a process that converges to is that should imply at all times. This holds automatically for the timeinvariant considered by the LMC framework. The QW framework allows the to depend on time, but in standard constructions only through the measurement mechanism, like making timedependent in the cycle example (see [kendon2007decoherence] for a review). Such QWs too preserve at all times when . We call this property invariance ^{3}^{3}3Note that this condition involves both the channel and the initialization . and we will come back to its significance. Our first result shows that the mixing performance of such QW can be closely matched by a LMC.
Theorem 1.
Given a invariant QW with mixing time for some , we can construct an LMC that has mixing time for all .
Mathematical details for all our results are available in the Supplemental Material, appendix A. The main idea in proving Thm.1 is to simulate the QW over the time interval using a LMC. Indeed, as shown for unitary evolutions in [aaronson2005quantum], the probability distribution in the fixed measurement basis associated to the nodes is not subject to the nogo results for general local hidden variables theories. We extend this result to induced by an arbitrary QW that starts from a given node . Following step by step, one builds a sequence of stochastic matrices acting on only, satisfying graph locality, and such that when starting on . The maxflow mincut theorem from graph theory ensures that such construction always exists. It can be traced back to a property that holds for LMCs, QWs, and more general local stochastic processes independently of the underlying physical mechanism: a node cannot contain more population at time , than the population at time on itself and on its neighbors [aharonov2001quantum]. We thus simulate the QW with a classical process whose jump probabilities depend on time and on the starting node . To obtain a simulation with a (timeindependent) LMC, at least for finite time horizon , we encode these dependencies into the coin. This follows the same spirit as adding registers in the clock Hamiltonians by Feynman and Kitaev [feynman1982simulating, kitaev2002classical]. Explicitly, we let current time and initial node act as a coin degree of freedom , which conditionally selects the proper transition matrix , see Fig. 2. The resulting LMC describes a distribution over with associated stochastic transition matrix
and initial assignment . Finally, we apply an amplification technique that is exploited in randomized algorithms: the action of on is modified to have , so that the first steps are repeated iteratively. Thanks to invariance of the QW that was used to generate , the resulting LMC will contract towards at the announced exponential rate for all .
Beyond the comparison with LMCs, this construction implies a general bound on the mixing performance of invariant QWs. This tightens and generalizes the bounds of [aharonov2001quantum, temme2010chi], which are restricted to generating uniform with unital quantum channels. The bound involves a function of graph topology and target distribution only, meant to capture the bottlenecks that slow down mixing, called the graph conductance . Specifically, partitioning into two subsets and , consider that all the stationary population on is lost; the conductance counts which fraction of the remaining population jumps back to in one step. More precisely, if on has a stationary distribution , then
The maximal over all Markov chains that keep invariant on a given graph, is the graph conductance .
The estimate is a wellknown lower bound on the mixing time of any classical Markov chain, and it carries over to the convergence of in associated LMCs [chen1999lifting]. Conversely, [chen1999lifting] establishes a construction of LMCs that essentially saturate this bound; it however requires to solve a hard multicommodity flow problem over the entire graph. A novel observation, obtained essentially by fully exploiting the triangle inequality while computing the marginal probabilities, is that the bound keeps holding when taking the marginal over sublevels (i.e., over coin values) of a invariant LMC. Combining this with Theorem 1 provides a tight bound for the ultimately achievable mixing time of QWs.
Theorem 2.
Any invariant QW has a mixing time , and there exists such a QW that has a mixing time for all .
Besides mixing, the LMC construction has relevance for other tasks, enabling for instance to effectively simulate quantum transport with finite classical resources.
On efficient design of fast mixing QWs and LMCs. – Fast mixing LMCs can often be built significantly more simply than with the general construction of Thm.1, by mirroring the structure of corresponding QWs. Accelerated mixing with QWs has been mostly demonstrated for graphs with strong symmetries, more specifically lattices [ambainis2001one, aharonov2001quantum, moore2002quantum, richter2007quantum, marquezino2008mixing, marquezino2010mixing]. Similarly to the QW on the circle above, these examples use coin values to encode the lattice generators among which the walker can select its next move.
Remarkably, the same structure is found in a proposal for designing fast mixing LMCs [diaconis2013spectral]. For a dimensional square lattice of size , the coin features values of type , with indicating the axis and the direction of conditional motion among the nodes. At each step, the coin has a probability to switch to each of the other coin values, thus retaining a high probability to stay with the same generator. This dynamics precisely corresponds to a QW with diagonally dominant coin update that is projectively measured at each step, as in Eq.(1) with . For fixed dimension , it also provides the same order of speedup as a QW with [richter2007quantum], and as the best possible QW according to Theorem 2, namely linear in . Indeed, by counting the probabilities of applying, to each lattice dimension consecutively, the sequence of steps that lead to fast mixing on the cycle, one obtains the following (possibly loose) bound ^{4}^{4}4This bound was conjectured to hold more generally for all Abelian Cayley graphs in [diaconis2013spectral], and the authors provide a concrete proof only for the case of the circle..
Theorem 3.
The just described LMC on has a mixing time
Thus, QRW and LMC have the same order of mixing time; the same structure; and they require the same graph knowledge for tuning ( and/or ), namely the time at which mixing will be considered accomplished.
In summary, we clarify that QWs on a graph induce nonMarkovian local processes whose mixing behavior can be simulated by LMCs (Thm.1), and that this has several implications for searching a quantum speedup in mixing processes. The construction of Thm.1 can in fact be extended to abstract local stochastic dynamics (see Supplemental Material in appendix A) beyond the QW model. As a consequence, the hierarchy LMCs QWs {general local processes} collapses regarding mixing speed, not only in terms of ultimate speedup achievable on general graphs (Thm.2), but also in terms of paradigmatic cases for which efficient mixing designs are known (lattices, Thm.3). In this light, a mixing speedup with respect to Markov chains is not diagnostic of underlying quantum dynamics, but potentially just of a memory effect. This prompts the question whether there is room for a “quantum advantage” at all in QW mixing.
Besides establishing that there is no advantage in terms of best achievable mixing time, our analysis also suggests why this is not the end of the story. While the property of invariance holds and stabilizes the system in typical QW proposals, it does not hold in some mixingrelated applications, like simulated annealing. This distinction may be important as, without invariance, the conductance bound of Thm.2 could be broken significantly [apers2017lifting]. As another memoryrelated aspect, in Eq. (1) on the cycle, taking leads to fast QWs, while the corresponding “projectively measured” LMC boils down to the quadratically slower standard random walk. This shows that coherences can play a beneficial role, and could guide future research towards designing simple yet fast mixing QWs on graphs for which, unlike on lattices, LMCs of simple design do not meet the conductance bound yet. Furthermore, the QW of Eq. (1), taking and , turns out to efficiently mix over the nodes closest to its starting node, for any number of iterations [ambainis2001one]. Such multiscale mixing cannot be achieved with the LMC of Eq. (2), where tuning to have good mixing at implies almost deterministic motion for . This feature could point to efficient quantum algorithms addressing tasks related to mixing, yet not directly reducible to it.
The authors want to thank Giuseppe Vallone and Lorenza Viola for valuable suggestions and comments on earlier versions of the manuscript.
References
Appendix A Supplemental Material
The objective of the paper is the comparison of Quantum Walks (QWs) and Lifted Markov Chains (LMCs). However, the main results can be extended to a more general setting that includes both QWs and LMCs, namely, local stochastic processes preserving the target distribution. An example of such a generalized setting would be Cesaro averaging, i.e., to consider as output distribution the uniform time average of the evolution generated by a QW or LMC. Here we shall provide detailed proofs of our results directly in this generalized setting. The main ideas remain the same as for the particular case of QWs.
The supplemental material is organized as follows. We start with some notation and defining the generalized class of processes that will be studied. We then explicitly show how QWs fall under this setting. The next three sections are respectively devoted to a detailed proof, with all mathematical details worked out, of each of the three theorems of the main paper.
Notation – We first recall some notation that will be used throughout these notes. Let be a graph with node set and edge set . By convention, we include in the edge set all , . We define the inneighborhood or simply neighborhood of a set as Note that we will keep, throughout the report, this notation such that “probability mass flows from to ”. We create a lifted graph by expanding the node set to , for some finite set , so the nodes for the lifted graph are pairs , with and we let the edge set be a subset of . We associate the Hilbert space to this graph and call the set of density operators over , that is, positive semidefinite Hermitian operators of trace one. For any subset and , we define , where is the projector onto the subspace associated to the subset of nodes of the original graph . More generally, we will use the standard notation to denote the probability of some event according to the probability measure ; occasionally we also use the notation . We will also denote, as in the main paper, by the probability (column) vector with all weight on the node , and by the dual classical (row) vector. Using the tensor product also known as the Kronecker product of vectors, we get the probability vector with all weight on the single event .
a.1 Local stochastic processes
In this section we introduce the concept of local stochastic dynamics, and we show how QWs and LMCs fall under this general framework. A stochastic map over is function that maps a probability distribution to another probability distribution ; it is linear and preserves both the positivity and the sum of the components of . The general stochastic processes which we consider are a family of stochastic linear maps , indexed by time , and which map an initial probability distribution over to a probability distribution over at each time . We say that the family is local with respect to a graph with nodes if and only if [aharonov2001quantum]:
(4) 
This formula expresses the intuitive statement from the main paper, that a node cannot contain more population at time , than the population at time on itself and on its neighbors .
The family is invariant with respect to a distribution , or short invariant, if and only if , . This expresses that for all when , and it ensures that the process stabilizes at all times.
a.1.1 Quantum channels as invariant local stochastic processes
We will now show how such a family of abstract processes explicitly covers the specific case of induced by QWs. Thereto, let be a completely positive tracepreserving (CPTP) map representing a QW, defined by
together with a linear initialization map . For any starting condition , we can compute the distribution induced by the QW as the diagonal of the partial trace over of . Thanks to linearity of all these steps, the computation of the resulting evolutions can be described by a family of linear maps such that , for general too. Of course they preserve positivity and total probability, so they are stochastic; and if a target distribution is invariant under , then obviously it is invariant too under the family of induced . In the following lemma we prove that if is local, in the sense that the have zero entries where nodes are not connected in , then so is in the sense of Eq.(4).
Lemma 4.
Let be a quantum channel. The following statements are equivalent:

For all , it holds: if .

For all and
Proof.
“(a) (b)”: We will show that the inequality in (b) holds for a onedimensional projection ; due to linearity of the involved operators in , the inequality must then necessarily hold also for all density operators , being convex combinations of projections. We can write
Since we assume that (a) holds, we have that , for and , where . If we now write , where for any , then (a), in particular, implies that and all , we have Intuitively, this expresses that does not contribute to the probability of observing after the action of . Inserting into the above sum, we thus get:
”(b) (a)”: Assume that (a) does not hold. Thus, there exists some , some , some and such that . If we now consider , then , whereas . So (b) does not hold when (a) does not; thus conversely, if (b) holds then (a) must hold too. ∎
In the light of the above lemma, a quantum channel is said to be local with respect to a reference lifted graph if or equivalently holds; and from (b) thus, the associated will be local in the sense of Eq.(4) too.
a.2 Proof of Theorem 1
In the main paper, Theorem 1 essentially states that QW mixing can be simulated by an LMC, and the main steps of its proof are described for this setting. Here we provide a formal proof for a more general statement: the mixing performance of any stochastic process that is local and invariant can be simulated using a suitably constructed local LMC.
a.2.1 Simulability of stochastic linear maps
In the following, we first show that the generated by a local stochastic map, starting from any given initial distribution, can always be simulated by a sequence of stochastic transition matrices which each satisfy the graph locality. This sequence will be in general dependent on the initial distribution. The lemma and proof are a generalization of the result by [aaronson2005quantum] from unitary evolution to abstract stochastic linear maps.
Lemma 5 (Local simulability).
If is local, then for every pair with there exists a local stochastic matrix such that , where .
Proof.
Call and . In order to prove the above statement, it is convenient to resort to results concerning flows over capacitated networks [ford1956maximal], and, in particular, consider the graph shown in Figure 3, where each edge is assigned a corresponding weight, or capacity. The network consists of a source node , a sink node , and two copies and of the set of node states . Node is connected with capacity to any node ; any node is connected with capacity 1 to any node iff , else the nodes are not connected; and any node is connected with capacity to node . The capacities and , respectively from and to , thus reflect the probability distributions to be mapped. The key observation is the following: if this network can route a steady flow of value 1 from node to node , then the fraction from that is routed towards directly defines the entry that we need, and also denoted . Indeed, to route a flow of value 1 the edges from to will have to be used to their full capacities , such that the flow through the edges from to becomes ; so we would have as claimed.
The maxflow mincut theorem [ford1956maximal] states that the maximum steady flow which can be routed from node to node is equal to the minimum cut value of the graph, where a cut value is the sum of the capacities of a set of edges that disconnects from . It is clear that cutting all edges arriving at disconnects the graph, with a cut value of , whereas cutting any middle edge between and gives a cut value . So the minimum cut should not include any of these “middle” edges, and it must be some combination of edges starting on or arriving at . Assume that we know the optimal cut, and let such that the cut involves the edges from the complement of to . To block any flow from to while keeping all middle edges, we must then cut the edges from to all the which have an edge to . This corresponds to all . The value of this cut is thus
Recalling that and , locality imposes
from which it follows that the minimum value of the cut is . This minimum is attained (among others) with cutting all edges arriving at , i.e., with the empty set. Hence, the minimum cut value is 1 and a solution to our problem exists.∎
a.2.2 Amplification lemma
Lemma 5 is instrumental in proving Thm.1 of the main paper for a finite time frame, by simulating the QW up to some given time. The following will be instrumental to prove the theorem for arbitrary time, showing that a finitememory process is sufficient to extend this mixing performance to arbitrarily small . In particular, we now show that, given an evolution map that mixes up to a certain total variation distance, we can iterate this map in order to mix to arbitrarily small distance, a process informally known as amplification.
Lemma 6 (Amplification lemma).
Assume that is a family of stochastic linear maps that mixes to an invariant and admits a mixing time for all . Then for any , its amplified version defined as
with , has a mixing time for all .
Proof.
We will thus check that at any time , , the total variation distance to is lower than . The proof uses invariance of under to transform into .
For , we get
On the last line we have used that , and that as soon as ; from first to second line we have used the submultiplicativity of the total variation norm under any linear map in the form stated in [levin2017markov].
For with any , we know that
thanks to contractivity of the 1norm under stochastic maps. So finally we find that, for arbitrary ,
if . ∎
a.2.3 Proof of main theorem
We can now finalize the proof of a generalized form of Theorem 1 of the main paper for a general stochastic process associated to a family of linear maps over the node set that are local, and that leave the target distribution invariant.
Theorem 7 (main paper Thm.1, generalized version).
Let be a stochastic linear map that mixes to some distribution with mixing time , satisfying some locality constraint and leaving invariant. Then for any we can construct an LMC that satisfies the same locality constraint and that mixes to with a mixing time for all , and a mixing time for all .
Proof.
The proof essentially combines the two previous lemmas, and for the rest it follows the construction from the main text. We shall first use Lemma 5 to build a lifted Markov chain that simulates the dynamics of this channel up to time , and next apply the amplification lemma 6 to prove exponential convergence for and thus .
[First part: construction for ] Lemma 5 tells us that for every initial state and given time bound , there exists a local stochastic bridge such that, for all , we have
This allows us to construct the operator sets for , where we recall that is the elementary vector corresponding to node representing the classical probability vector whose entire weight is on node . We will combine these bridges into a single and timeinvariant lifted Markov chain , mapping probability distributions over the extended set , where . Let denote the probability (column) vector over whose weight is centered on element , and by we denote the dual or adjoint (row) vector. Now we can define
(5) 
To complete the construction, the above evolution should be locally initialized according to the map , which maps any probability vector over to a probability vector over , defined as
(6) 
The probability vector is such that induces the same marginal distribution on as for a fixed time frame:
where computes the marginal probability distribution induced by the lift on , i.e.,
As a consequence, initial states , with an arbitrary over , will mix on with the same mixing time as for any , i.e., for any . This proves the first part of the theorem.
[Second part: modifying the construction towards ] As the size of the lift transition matrix scales linearly with , which in general is unbounded for , the above construction only makes sense for fixed . Towards building a lift for arbitrary , and thus prove the second part of the theorem, we invoke the amplification lemma 6. The lemma shows that, instead of the full process , we can simulate the simpler one , defined as
and ensure a mixing time . It is not difficult to show that the evolution induced by can in fact be simulated for an arbitrary number of steps, with a lift of fixed size. To this aim, we modify the lift construction of the first part, namely replacing the unit probability of staying at by a unit probability to jump from to . Explicitly, we thus adapt the lift as follows:
(7) 
When associated to the same initialization map and marginalization , this transition matrix gives exactly the same output distributions over as the LMC of the first part, for all . At , in fact (7) takes the output of the LMC constructed in the first part, and reinitiates the walk with for the next steps. It follows that with (7) we have:
Lemma 6 implies the conclusion about mixing time for all . ∎
The version reported in the main text is the second part of the above theorem for the special case of a stochastic process generated from a QW.
a.3 Proof of conductance bound
We have just shown that quantum channels, and stochastic linear maps in general, can be simulated by lifted Markov chains under appropriate conditions. Accordingly, we can prove a conductance bound for quantum channels and stochastic linear maps by building on a similar bound for LMCs that we provide in Lemma 9 below. It is a generalization of the bound formulated in for instance [chen1999lifting], to the setting where the Markov chain is initialized on a lifted space with some local map and where the convergence to a limit distribution only involves the marginal over .
Before going into the statement of the lemmas, let us formalize some concepts more rigorously with the notation of this Supplemental Material. We say that an LMC with initialization map mixes to with a mixing time for all if, for any over , the induced distribution of over is close in total variation distance to for all . We will bound this mixing time using the conductance, a quantity that we can associate to a general irreducible Markov chain on . We recall that if has a stationary distribution , then its conductance is defined as
where and is the ergodic flow from to its complement. Here we use as earlier the notation . We can also associate a conductance to a graph and distribution, without specifying an associated Markov chain. The graph conductance with respect to some distribution is defined as , where the maximization runs over all satisfying the locality of the graph and .
To any lifted Markov chain on a lifted graph, we can associate an induced chain on the original graph, as introduced in [aldous2002reversible]. Thereto, let be an irreducible lifted Markov chain on the nodes of a lifted graph , having stationary distribution . The induced chain over is defined by
where represents the stationary distribution of , defined by . This definition is motivated by obtaining matching ergodic flows