Logic Synthesis for FaultTolerant Quantum Computers
\deptElectrical Engineering
\principaladviserYoshihisa Yamamoto
\firstreaderJelena Vuckovic
\secondreaderWilliam D. Oliver,
MIT Lincoln Laboratory
Abstract
Efficient constructions for quantum logic are essential since quantum computation is experimentally challenging. This thesis develops quantum logic synthesis as a paradigm for reducing the resource overhead in faulttolerant quantum computing. The model for error correction considered here is the surface code. After developing the theory behind general logic synthesis, the resource costs of magicstate distillation for the gate are quantitatively analyzed. The resource costs for a relatively new protocol distilling multiqubit Fourier states are calculated for the first time. Four different constructions of the faulttolerant Toffoli gate, including two which incorporate error detection, are analyzed and compared. The techniques of logic synthesis reduce the cost of faulttolerant quantum computation by one to two orders of magnitude, depending on which benchmark is used.
Using resource analysis for gates and Toffoli gates, several proposals for constructing arbitrary quantum gates are compared, including “Clifford+” sequences, basis sequences, phase kickback, and programmable ancilla rotations. The application of arbitrary gates to quantum algorithms for simulating chemistry is discussed as well. Finally, the thesis examines the techniques which lead to efficient constructions of quantum logic, and these observations point to even broader applications of logic synthesis.
Acknowledgements As you might expect, there is a long list of people to thank for contributing to a doctoral education. I owe a large debt of gratitude to my advisor, Yoshi Yamamoto. From the beginning, he knew that the best way to guide me was a handsoff approach. I was allowed to make my own mistakes (and fix them), knowing that I could count on his gentle counsel when it was needed.
There are so many members of the Yamamoto group that deserve thanks. Among the people that I worked with and learned from are: Darin Sleiter, Leo Yu, Shruti Puri, Chandra Natarajan, Na Young Kim, Kaoru Sanaka, Wolfgang Nitsche, Zhe Wang, Georgios Roumpos, Kai Wen, Dave Press, and Susan Clark. Kristiaan De Greve and Peter McMahon deserve special mention for their parts in (mostly) highminded conversations, whether at Ike’s or the Rose and Crown, and for being true friends. Thaddeus Ladd only overlapped with me for three months in the group, but I learned a tremendous amount from him. He is largely responsible for developing my interest in quantum computer architecture, and I continue to learn from his astonishing breadth of knowledge (except in Starcraft, perhaps the only subject I know better). I must also thank Yurika Peterman and Rieko Sasaki, who work tirelessly in the background to keep the group functioning.
There are many academic collaborators from whom I learned a great deal. Jungsang Kim, a Yamamoto group alumnus, helped me focus on the engineering aspects of quantum architecture. In Japan, I found kindred spirits to my architecture research in Rod Van Meter, Simon Devitt, and Bill Munro. Austin Fowler taught me a great deal about the surface code, and maybe someday I will know half as much as he does on the matter. At Harvard, I learned far more about quantum chemistry than I ever would have expected from James Whitfield, ManHong Yung, and Alán AspuruGuzik. I was privileged to work on simulation and dynamical decoupling with Bryan Fong, Jim Harrington, and Thaddeus at HRL Laboratories, and I look forward to joining them as a colleague.
I want to thank the National Science Foundation, which supported my graduate studies through the Graduate Research Fellowships Program.
Finally, I want to thank my brother, Jason, and my loving parents, who made me into the man I am.
Chapter 1 Introduction to Quantum Computing
Quantum computing is a research field that promises to solve problems that normal computers cannot, using quantum physics. The notion of a powerful computer built on the esoteric rules of quantum mechanics sounds like an idea from science fiction, and the field has generated considerable interest in technical communities and the general public. However, quantum computing is built on very sound principles. There is ample theoretical analysis that shows the concept is viable, under the right conditions [1, 2, 3]. Furthermore, intense experimental work is steadily improving the reliability of quantum hardware [4, 5, 6]. Quantum computing is not science fiction, and the best evidence for this assertion is that the future of the field is not novel discoveries in physics, but rather steady advances in engineering.
The topic of this thesis is quantum logic synthesis. This is just one component of designing a quantum computer, but I will argue several times that it is a very important component. Logic synthesis is concerned with arranging the instructions in a quantum computer to minimize resource costs, such as number of quantum bits (“memory”) and gates (“calculations”). Based on current understanding of experimental hardware, error correction will be the most costly feature of a quantum computer, and logic synthesis will play a crucial role in managing these costs.
As an introduction to the subject, this chapter gives a highlevel overview of quantum computing. I start with applications, just to show why the field has attracted the attention of so many. The next section gives a basic primer on quantum bits, gates, and measurement. The last section covers the greatest challenge for quantum information, noise and errors. For a more comprehensive introduction, I refer the reader to Ref. [7].
1.1 Applications of Quantum Computing
Quite a few applications for quantum computers have been identified. A website maintained by Stephen Jordan has the most comprehensive list that I have seen (“Quantum Algorithm Zoo,” [8]), which currently counts 50 different algorithms. The performance advantage for each over a classical computer varies, from polynomial to exponential to unknown. Some applications are very general, while others address narrowly defined problems. This section discusses just a handful of these algorithms, the ones which I think will have the greatest impact.
Shor’s integerfactoring algorithm [9] is one of the oldest and most widely known applications of quantum computing. Due to the connection to RSA cryptography [10], the integer factoring problem was already a problem of considerable interest to computer science. The heart of the matter is that multiplication of integers is computationally efficient (polynomialbounded time and space complexity), but no efficient method in classical computing is known to decompose an integer into its prime factors. For some time, the digital security firm RSA Security (founded by the creators of the protocol) held an open challenge to factor numbers typical of the RSA protocol [11]. In this case, the number to be factored is , where and are prime numbers, typically both very large in size (e.g. around 1000 bits). Shor’s algorithm demonstrates that quantum computers can factor such a number in polynomialbounded time and space. Nevertheless, the computation is rather complex when error correction is included, requiring perhaps millions of qubits and billions of gates [12, 13, 14].
Simulating quantum physics is another problem ideal for quantum computers [15, 16]. In this application, the state of a quantum system is encoded into quantum bits, and the time evolution of this state is reproduced in simulated time using quantum gates. Many useful quantities can be calculated, such as energy eigenvalues and chemical reaction rates [17]. Simulation algorithms will be examined in more detail in Chapter 8. Multiple forms of encoding are possible, and the choice has consequences for the way logic is synthesized. More recently, a closely related linear systems algorithm has been proposed [18], which may have promising applications like solving partial differential equations for electromagnetic scattering [19].
In these and other cases, the quantum computer solves a particular computational task better than a conventional computer. Even in doing so, the quantum computer requires substantial classical computing support for pre and postprocessing, as well as managing the considerable task of quantum error correction [20, 21]. For these reasons, quantum computers are appropriately viewed as “coprocessors” that perform specialized tasks in a classicalquantum hybrid computing environment.
1.2 Quantum States, Operations, and Measurement
The information states in a quantum computer are normalized, complexvalued vectors. The elements of each such vector correspond to the projection into a basis. The most common basis will be the “computational basis”, which is spanned by binary values. For example, a single quantum bit, or qubit, is a superposition of the states and (the “ket” notation is a convention for identifying states). Quantum states must be normalized, so an arbitrary qubit state can be specified by
(1.1) 
subject to the constraint that
(1.2) 
Normalization ensures that, for measurement processes, the sum of probabilities for all outcomes sums to one. Quantum states can consist of multiple qubits. Furthermore, “global phase,” which is a scalar coefficient to any state, is meaningless in quantum mechanics. In general, the possible qubit states span a basis of dimension , with the additional degrees of freedom of allowing complex amplitudes. For simplicity, states consisting of multiple qubits use abbreviated state notation, such as .
Some notational shorthand will be used throughout this thesis. The Pauli spin operators, which are used to define operations on a qubit, will be denoted as , , and . Similarly, is the identity operator, with dimensionality appropriate for its context. As an example, one might write the projector onto the state as , where , like , has dimension two.
The state of a quantum system is modified using gates. Each gate is unitary operator, meaning , where “” denotes conjugate transpose (Hermitian adjoint). One example is . When applied to a state, each righthand “bra” such as combines with a ket to form an inner product, which is a scalarvalued quantity. For example, , because the bra and ket vectors are parallel. Likewise, , because the two unit vectors are orthogonal. As a result, the action of on is .
A sequence of gates is often illustrated with a “circuit diagram,” as shown in Fig. 1.1. Operations read left to right in time, so first an gate flips the state of the top qubit, then a controlledNOT (CNOT) gate will apply to the bottom qubit if the first is , otherwise it does nothing. The CNOT is a twoqubit gate, meaning that it couples the state of two qubits. The output state is .
In addition to gates, quantum computation also relies on measurement to reveal the underlying state. However, the superposition nature of quantum states means that there is no single basis in which to measure states. This thesis will only consider strong projective measurements, where the measurement process projects the quantum system into one of several orthogonal states. This can be described using projectors, such as . A complete measurement basis is defined by the set such that . For some state , the probability of measuring outcome and projecting the system into is given by . The state after measurement result is determined by
(1.3) 
which is known as the projection postulate of quantum mechanics (as noted above, the complex phase here has no effect). Informally, the system becomes consistent with the observed measurement. Frequently, measurement bases are those of Pauli operators, which play an important role in error correction [22, 7]. For example, the computational basis is and . This common measurement operation will be denoted .
In addition to gates and measurement, one must be able to initialize to a welldefined quantum state. The reason for saving this process for last is that initialization and measurement are dual operations. They are both nonunitary with respect to the logical space of computable states. Furthermore, the measurement process can be used to perform initialization, using the projection postulate of quantum mechanics. Other methods, such as cooling the system to a ground state, are also used in practice.
1.3 Noise and Decoherence
Quantum operations are not perfect. Gates will probabilistically introduce errors, and even idle qubits will experience “decoherence,” disturbance from the original state due to interactions with the environment. In all hardware platforms considered so far, the error rates are so high as to make error correction mandatory for reliable computing. Error correction will be the subject of later chapters, so I will just briefly review quantum errors here.
In general, an error is any change in the state of a quantum system that is not perfectly known by the system controller. Errors are modeled with a quantum distribution known as a density matrix. Previously, I introduced vectors like that are “pure” states, having no error and being perfectly defined. A density matrix is a probabilityweighted distribution of pure states, which gives the likelihood of the system being in each of those states. For example, is “mixed” state for a system with 90% probability of being . The normalization for a density matrix is having a trace (sum of diagonal entries) of one: .
Density matrices are useful for modeling quantum noise. Imagine one starts in the pure state , and this state experiences dephasing noise, which is a positive probability of the phase being flipped by operator . Consider the dephasing channel . The initial density matrix is
(1.4) 
After dephasing, the state is
(1.5) 
If the qubit continues to dephase through such time intervals, the state is
(1.6) 
where it becomes clear that dephasing is damping the offdiagonal terms of the density matrix. In the limit , dephasing turns the quantum state into an evenly mixed distribution of states and . Because phase is critically important to most quantum algorithms, this type of error will corrupt data.
A dephasing event can be modeled with the probabilistic application of operator . Another common error channel is the depolarizing channel,
(1.7) 
which applies one of the Pauli errors with probability each, for total probability of error . Viewing error events this way avoids the need to explicitly write density matrices, allowing one to analyze error correction without knowing the underlying state. At an abstract level, error correction will act as a filter to catch these errors by performing measurements which reveal what error (if any) has occurred to the system. This technique will be used often in Chapters 5–7.
Chapter 2 Architecture of a Quantum Computer
Quantum computing as an engineering discipline is still in its infancy. Although the physics is well understood, developing devices which compute with quantum mechanics is technologically daunting. While experiments to date manipulate only a handful of quantum bits [4], this chapter considers what effort is required to build a largescale quantum computer. One must consider the faulty quantum hardware, with errors caused by both the environment and deliberate control operations; when error correction is invoked, classical processing is required; constructing arbitrary gate sequences from a limited faulttolerant set requires special treatment, and so on. Quantum computer architecture, the subject of this chapter, is a framework to address the complete challenge of designing a quantum computer.
This chapter provides an overview of the steps for designing a quantum computer, and it is based on Ref. [13]. The chapter concludes with resource estimates for largescale quantum computation. Although they were the best estimates when Ref. [13] was published, the logic being used was not optimized. The daunting numbers, such as million qubits, serve as a pretext for why resourcereduction techniques through logic synthesis, the subject of this thesis, are so important.
2.1 Layered Architecture Overview
Many researchers have presented and examined components of largescale quantum computing. This chapter considers how these components may be combined in an efficient design, and later chapters introduce methods to improve the quantum computer. This engineering pursuit is quantum computer architecture, which is developed here in layers. An architecture decomposes complex system behaviors into a manageable set of operations. A layered architecture does this through layers of abstraction where each embodies a critical set of related functions. Each ascending layer brings the system closer to an ideal quantum computing environment by suppressing errors and hiding implementation details not needed elsewhere. This section reviews the field of quantum computer architecture, then discusses the layered architecture of Ref. [13].
2.1.1 Prior Work on Quantum Computer Architecture
Many different quantum computing technologies are under experimental investigation [4], but for each a scalable system architecture remains an open research problem. Since DiVincenzo introduced his fundamental criteria for a viable quantum computing technology [23] and Steane emphasized the difficulty of designing systems capable of running quantum error correction (QEC) adequately [24, 25, 26], several groups of researchers have examined the architectural needs of largescale systems [27, 28]. As an example, smallscale interconnects have been proposed for many technologies, but the problems of organizing subsystems using these techniques into a complete architecture for a largescale system have been addressed by only a few researchers. In particular, the issue of heterogeneity in system architecture has received relatively little attention.
The most important subroutine in faulttolerant quantum computers considered thus far is the preparation of ancilla states for faulttolerant circuits, because very many ancillas are required [12, 13, 14]. Taylor et al. proposed a design with alternating “ancilla blocks” and “data blocks” in the device layout [29]. Steane introduced the idea of “factories” for creating ancillas [24], as examined later in this chapter. Isailovic et al. [12] studied this problem for ion trap architectures and found that, for typical quantum circuits, approximately 90% of the quantum computer must be devoted to such factories in order to calculate “at the speed of data,” or where ancillaproduction is not the ratelimiting process. The results in this chapter are in close agreement with this estimate. Metodi et al. also considered production of ancillas in ion trap designs, focusing instead on a 3qubit ancilla state used for the Toffoli gate [30], which is an alternative pathway to a universal faulttolerant set of gates.
Some researchers have studied the difficulty of moving data in a quantum processor. Kielpinski et al. proposed a scalable ion trap technology utilizing separate memory and computing areas [31]. Because quantum error correction requires rapid cycling across all physical qubits in the system, this approach is best used as a unit cell replicated across a larger system. Other researchers have proposed homogeneous systems built around this basic concept. One common structure is a recursive H tree, which works well with a small number of layers of a CalderbankShorSteane (CSS) code, targeted explicitly at ion trap systems [32, 33]. Oskin et al. [34], building on the Kane solidstate NMR technology [35], proposed a loose lattice of sites, explicitly considering the issues of classical control and movement of quantum data in scalable systems, but without a specific plan for QEC. In the case of quantum computing with superconducting circuits, the quantum von Neumann architecture specifically considers dedicated hardware for quantum memories, zeroing registers, and a quantum bus [5].
Longrange coupling and communication is a significant challenge for quantum computers. Cirac et al. proposed the use of photonic qubits to distribute entanglement between distant atoms [36], and other researchers have investigated the prospects for opticallymediated nonlocal gates [37, 38, 39, 40, 41]. Such photonic channels could be utilized to realize a modular, scalable distributed quantum computer [42]. Conversely, Metodi et al. consider how to use local gates and quantum teleportation to move logical qubits throughout their iontrap QLA architecture [30]. Fowler et al. [43] investigated a Josephson junction flux qubit architecture considering the extreme difficulties of routing both the quantum couplers and large numbers of classical control lines, producing a structure with support for CSS codes and logical qubits organized in a line. Whitney et al. [44, 45] have investigated automated layout and optimization of circuit designs specifically for ion trap architectures, and Isailovic et al. [46, 12] have studied interconnection and data throughput issues in similar ion trap systems, with an emphasis on preparing ancillas for teleportation gates [47].
Other work has studied quantum computer architectures with only nearestneighbor coupling between qubits in an array [48, 49, 50, 51, 21, 13, 14], which is appealing from a hardware design perspective. With the recent advances in the operation of the topological codes and their desirable characteristics such as having a high practical threshold and requiring only nearestneighbor interactions, research effort has shifted toward architectures capable of building and maintaining large two and threedimensional cluster states [52, 53, 20, 54]. These systems rely on topological error correction models [55, 56, 57, 58], whose higher tolerance to error often comes at the cost of a larger footprint in the hardware, relative to, for example, implementations based on the Steane code [59]. The surface code [56, 57, 60, 61, 14], which is studied throughout this thesis, belongs to the topological family of codes.
Recent attention has been directed at distributed models of quantum computing. Devitt et al. studied how to distribute a photonic clusterstate quantum computing network over different geographic regions [62]. The abstract framework of a quantum multicomputer recognizes that largescale systems demand heterogeneous interconnects [63]. For most quantum computing technologies, it may be challenging to build monolithic systems that contain, couple, and control billions of physical qubits. Van Meter et al. [64] extended this architectural framework with a design based on nanophotonic coupling of electron spin quantum dots that explicitly uses multiple levels of interconnect with varying coupling fidelities (resulting in varying purification requirements), as well as the ability to operate with a very low yield of functional devices. Although that proposed system has many attractive features, concerns about the difficulty of fabricating adequately high quality optical components and the desire to reduce the surface code lattice cycle time led to the system design proposed in Ref. [13].
2.1.2 Layered Framework
A good architecture must have a simple structure while also efficiently managing the complex array of resources in a quantum computer. Layered architectures are a conventional approach to solving such engineering problems in many fields of information technology. For example, Ref. [33] presents a layered architecture for quantum computer design software. The architecture developed in Ref. [13] describes the physical design of the quantum computer, which consists of five layers, where each layer has a prescribed set of duties to accomplish. The interface between two layers is defined by the services a lower layer provides to the one above it. To execute an operation, a layer must issue commands to the layer below and process the results. Designing a system this way ensures that related operations are grouped together and that the system organization is hierarchical. Such an approach allows quantum engineers to focus on individual challenges, while also seeing how a process fits into the overall design. The architecture is organized in layers to deliberately create a modular design for the quantum computer.
The layered framework can be understood by a control stack composed of the five layers in the architecture. Figure 2.1 shows an example of the control stack for the specific quantum dot architecture considered in this chapter [13], but the particular interfaces between layers will vary according to the physical hardware, quantum error correction scheme, etc. that one chooses to implement. At the top of the control stack is the Application layer, where a quantum algorithm is implemented and results are provided to the user. The bottom Physical layer hosts the raw physical processes supporting the quantum computer. The layers between (Virtual, Quantum Error Correction, and Logical) are essential for shaping the faulty quantum processes in the Physical layer into a system of reliable faulttolerant [3, 65, 7, 25, 26, 66] qubits and quantum gates at the Application layer.
2.1.3 Interaction between Layers
Two layers meet at an interface, which defines how they exchange instructions or the results of those instructions. Many different commands are being executed and processed simultaneously, so one must also consider how the layers interact dynamically. For the quantum computer to function efficiently, each layer must issue instructions to layers below in a tightly defined sequence. However, a robust system must also be able to handle errors caused by faulty devices. To satisfy both criteria, a control loop must handle operations at all layers simultaneously while also processing syndrome measurements to correct errors that occur. A prototype for this control loop is shown in Fig. 2.2.
The primary control cycle defines the dynamic behavior of the quantum computer in this architecture since all operations must interact with this loop. The principal purpose of the control cycle is to successfully implement quantum error correction. The quantum computer must operate fast enough to correct errors; still, some control operations necessarily incur delays, so this cycle does not simply issue a single command and wait for the result before proceeding — pipelining is essential [67, 12]. A related issue is that operations in different layers occur on drastically different timescales, as discussed later in Section 2.5. Figure 2.2 also describes the control structure needed for the quantum computer. Processors at each layer track the current operation and issue commands to lower layers. Layers 1 to 4 interact in the loop, whereas the Application layer interfaces only with the Logical layer, making the algorithm independent of the hardware.
2.2 Quantum Hardware and Control
The essential requirements for the Physical layer are embodied by the DiVincenzo criteria [23]. The layered framework for quantum computing was developed in tandem with a specific hardware platform, known as QuDOS (quantum dots with opticallycontrolled spins). The QuDOS platform uses electron spins within quantum dots for qubits. The quantum dots are arranged in a twodimensional array; Figure 2.3 shows a cutaway rendering of the quantum dot array inside an optical microcavity, which facilitates control of the electron spins with laser pulses. Reference [13] argued that the QuDOS design is a promising candidate for largescale quantum computing, and I use it here as a model for generating concrete resource estimates.
The physical qubit used by QuDOS is the spin of an electron bound within an InGaAs selfassembled quantum dot (QD) surrounded by GaAs substrate [68, 69, 70, 71, 72, 73]. These QDs can be optically excited to trion states (a bound electron and exciton), which emit light of wavelength nm when they decay. A transverse magnetic field splits the spin levels into two metastable ground states [74], which form the computational basis states for a qubit. The energy separation of the spin states is important for two reasons related to controlling the electron spin [75]. First, the energy splitting facilitates control with optical pulses. Second, there is continuous phase rotation between spin states and around the axis on the qubit Bloch sphere, which in conjunction with timed optical pulses provides complete unitary control of the electron spin vector.
The electron spin is bound within a quantum dot. These quantum dots are embedded in an optical microcavity, which will facilitate quantum gate operations via laser pulses. To accommodate the twodimensional array of the surface code detailed in Layer 3, this microcavity must be planar in design, so the cavity is constructed from two distributed Bragg reflector (DBR) mirrors stacked vertically with a cavity layer in between, as shown in Fig. 2.3. This cavity is grown by molecular beam epitaxy (MBE). The QDs are embedded at the center of this cavity to maximize interaction with antinodes of the cavity field modes. Using MBE, highquality () microcavities can be grown with alternating layers of GaAs/AlAs [76]. The nuclei in the quantum dot and surrounding substrate have nonzero spin, which is an important source of noise that must be suppressed through control techniques like dynamical decoupling [77, 78, 79, 80, 81, 82, 83, 84, 85, 13].
Control in QuDOS uses laser pulses which selectively target quantum dots; see Ref. [13] for details. The 1qubit operations are developed using a transverse magnetic field and ultrafast laser pulses [75, 73]. The construction of a practical, scalable 2qubit gate in QuDOS remains the most challenging element of the hardware, and various methods are currently under development. A fast, alloptically controlled 2qubit gate would certainly be attractive, and early proposals [69] identified the importance of employing the nonlinearities of cavity QED. Reference [69] suggests the application of two lasers for both singlequbit and 2qubit control; more recent developments have indicated that both singlequbit gates [86, 87, 75] and 2qubit gates [88] can be accomplished using only a single optical pulse.
QuDOS requires a measurement scheme that is still under experimental development. The proposed mechanism (shown in Fig. 2.4) is based on Faraday/Kerr rotation. The underlying physical principle is as follows: an offresonant probe pulse impinges on a quantum dot, and it receives a different phase shift depending on whether the quantum dot electron is in the spinup or spindown state (these are separated in energy by the external magnetic field). Sensitive photodetectors combined with homodyne detection measure the phase shift to enact a projective QND measurement on the electron spin. Several results in recent years have demonstrated the promise of this mechanism for measurement: multishot experiments by Berezovsky et al. [89] and Atatüre et al. [90] have measured spindependent phase shifts in charged quantum dots, and Fushman et al. [91] observed a large phase shift induced by a neutral quantum dot in a photonic crystal cavity. Most recently, Young et al. observed a significantly enhanced phase shift from a quantum dot embedded in a micropillar cavity [92].
2.3 Error Correction and Fault Tolerance
Error correction is essential to quantum computation, given current understanding of hardware technology. Some of the best experimental results achieve an errorperoperation of about or (Refs. [4, 6], and references therein), but even these impressive feats are not close to the to error rates needed for largescale quantum algorithms, as will be demonstrated below. The gap can be bridged with faulttolerant quantum computing [1, 3, 65, 7], so long as error rates in the hardware are below a threshold value which is specific to the code being used [2].
The threshold theorem has garnered significant attention in the community, but it is sometimes mistakenly presumed that the threshold itself is the target performance. A functioning quantum error correction system must operate below threshold, and a practical system must operate well below threshold. Later chapters show that the resources required for error correction become manageable when the hardware error rate is about an order of magnitude below the threshold of the chosen code. The code used throughout this thesis is the surface code [55, 56, 57, 60], which is distinguished by requiring only nearestneighbor operations in two dimensions and by having a high threshold around 1% error per physical gate [93, 61, 14].
This section discusses some of the features of error correction that are salient to quantum computer architecture. First, I briefly outline the advantages of the surface code. Second, I discuss the use of Pauli frames, which is a simple but effective technique for reducing the number of gates implemented. Finally, I give an overview of magicstate distillation, which is a powerful technique in fault tolerance and the subject more intense investigation in Chapter 5.
2.3.1 Surface Code Error Correction
As just mentioned, the primary justifications for the surface code are that it requires only a twodimensional geometry of nearestneighbor gates in the hardware, yet still has one of the highest threshold error rates of any code considered thus far [57, 14]. There is also evidence that the surface code might have lower overhead than other codes, when the demands of fault tolerance are considered [94].
In this thesis, I base nearly all of my analysis on a hypothetical quantum computer that uses surface code error correction. A complete explanation of the code and its properties is a subject of active research, so I defer to the literature [55, 56, 57, 60, 95, 93, 61, 54, 96, 97, 98, 99, 94, 14, 100]. Some of the features will be examined throughout the thesis. Chapter 3 shows how to depict the dynamic implementation of operations in the surface code, as well as calculating a powerlaw approximation to how resources scale with increasing levels of error correction. Still, I only touch on the aspects immediately relevant to my analysis, and otherwise I assume the reader is familiar with the mechanics of the surface code.
2.3.2 Pauli Frames
A Pauli frame [101, 102] is a simple and efficient classical computing technique to track the result of applying a series of Pauli gates (, , or ) to single qubits. The GottesmanKnill Theorem implies that tracking Pauli gates can be done efficiently on a classical computer [103]. Many quantum error correction codes, such as the surface code, project the encoded state into a perturbed codeword with erroneous singlequbit Pauli gates applied (relative to states within the code subspace). The syndrome reveals what these Pauli errors are, up to undetectable stabilizers and logical operators, and error correction is achieved by applying those same Pauli gates to the appropriate qubits (since Pauli gates are Hermitian and unitary). However, quantum gates are faulty, and applying additional gates may introduce more errors into the system.
Rather than applying every correction operation, one can keep track of what Pauli correction operation would be applied, and continue with the computation. This is possible because the operations needed for error correction are in the Clifford group. When a measurement in a Pauli , , or basis is finally made on a qubit, the result is modified based on the corresponding Pauli gate which should have been applied earlier. This stored Pauli gate is called the Pauli frame [101, 102], since instead of applying a Pauli gate, the quantum computer changes the reference frame for the qubit, which can be understood by remapping the axes on the Bloch sphere, rather than moving the Bloch vector.
I want to emphasize that the Pauli frame is a classical object stored in the digital circuitry that handles error correction. Pauli frames are nonetheless very important to the functioning of a surface code quantum computer. Layer 3 in the control stack (Fig. 2.1) uses a Pauli frame with an entry for each qubit in the errorcorrecting code. As errors occur, the syndrome processing step identifies a mostlikely pattern of Pauli errors. Instead of applying the recovery step directly, the Pauli frame is updated in classical memory. The Pauli gates form a closed group under multiplication (and global phase of the quantum state is unimportant), so a Pauli frame only tracks one of four values (, , , or ) for each qubit in the hardware.
The Pauli frame is maintained as follows. Denote the Pauli frame at time as :
(2.1) 
where is an element from the Pauli group corresponding to qubit at time . Any Pauli gate in the quantum circuit is multiplied into the Pauli frame and is not implemented in hardware, so for all Pauli gates in the circuit at time . Other gates in the Clifford group are implemented in hardware, but they also transform the Pauli frame according to
(2.2) 
When using Pauli frames, the flow of the computation proceeds in the same manner as if Pauli gates were being implemented, with the only change being how the final measurement of that qubit is interpreted. The set of Clifford gates is sufficient for implementing surface code error correction, though one also needs to implement nonClifford logical operations for universal quantum computing.
Quantum algorithms need to apply gates outside the Clifford group. When using a Pauli frame, the gate that is actually implemented, , is given by:
(2.3) 
Note the distinction between this expression and Eqn. (2.2). In Eqn. (2.2), the Pauli frame is changed by application of Cliffordgroup gate, but here an unchanging Pauli frame modifies the gate that is applied.
2.3.3 MagicState Distillation
In the layered framework, the Logical layer takes the faulttolerant resources from Layer 3 and creates a logical substrate for universal quantum computing. This task requires additional processing of errorcorrected gates and qubits to produce any arbitrary gate required in the Application layer [13]. Quantum error correction provides only a limited set of gates, such as the Clifford group (or only a subset thereof, as in the surface code [60]). Although circuits from this set can be simulated efficiently on a classical computer by the GottesmanKnill Theorem [7], the Clifford group forms the backbone of quantum logic.
The Logical layer constructs arbitrary gates from circuits of fundamental gates and ancillas injected into the errorcorrecting code [60, 13]. For example, surface code architectures inject and purify the ancillas and ; then the surface code consumes these ancillas in quantum circuits to produce and gates, respectively [7, 56, 57, 60]. Because the ancillas are faulty, they must be purified through a process known as magicstate distillation [104, 57, 60, 64, 13, 105, 106, 107].
Magicstate distillation will be examined at length in Chapter 5. For now, I only want to explain the simple method that was used in Ref. [13]. Consider the process of distilling the ancilla state that is used to construct the gate [104, 57, 60, 13]. Figure 2.5 provides an illustration of why this process is important by showing the faulttolerant construction of a Toffoli gate at the Application layer using distilled ancillas at the Logical layer. Two separate analyses contend that ancilla distillation circuits constitute over 90% of the computing effort for a single Toffoli gate [12, 13]. Viewed another way, for every qubit used by the algorithm, approximately 10 qubits are working in the background to generate the necessary distilled ancillas.
The circuit in Fig. 2.5 shows one level of distillation, but a lengthy computation like Shor’s algorithm will typically require two levels, where the outputs of the first round are distilled again. Moreover, since perhaps trillions of distilled ancillas will be needed for the entire algorithm, QuDOS uses a “distillation factory” [24, 64], which is a dedicated region of the computer that continually produces these states as fast as possible. Speed is important, because ancilla distillation can be the ratelimiting step in quantum circuits [12]. Figure 2.5 shows how to construct a Toffoli gate, but the gates can be used to approximate any other gate as well (see Ref. [7]; more details in Chapter 7).
Each distillation circuit will require 15 lowerlevel states, but they are not all used at the same time. For simplicity, set the “clock cycle time” for each gate equal to the time to implement a logical CNOT, so that with initialization and measurement, the distillation circuit requires 6 cycles. By only using ancillas when they are needed, the circuit can be compacted to require at most 12 logical qubits at any time. The computing effort can be characterized by a “circuit volume,” which is the product of logical memory space (i.e. area of the computer) and time. The circuit volume of distillation is . A twolevel distillation will require 16 distillation circuits, or a circuit volume of . An efficient distillation factory with area will produce on average distilled ancillas per clock cycle. Table 2.1 summarizes of these results.
Parameter  Symbol  Value 

Circuit depth  6 clock cycles  
Circuit area  12 logical qubits  
Circuit volume  72 qubitscycles  
Factory rate (level ) 
ancillas/cycle 
As a research effort, magicstate distillation has exploded in the last year. Chapter 5 will cover these matters in more detail, but many new results were produced in the short time since Ref. [13] was published. Fowler and Devitt developed a highly efficient implementation of distillation in the surface code, along with good estimates of resources [94]. Several new schemes for distilling states were also developed [105, 106, 107]. Section 5.2 will examine my proposal for “multilevel distillation,” which is asymptotically very efficient but perhaps too complicated to be useful in practice. These developments, and those in other chapters, will dramatically lower the cost of faulttolerant quantum computing. As I mentioned at the outset to this chapter, one of the purposes of the calculations given here is to provide contrast for the new methods developed later.
2.4 Quantum Algorithms
The Application layer is where quantum algorithms are executed. The efforts of Layers 1 through 4 have produced a computing substrate that supplies any arbitrary gate needed. The Application layer is therefore not concerned with the implementation details of the quantum computer—it is an ideal quantum programming environment. This section deals with estimating the resources required for a target application. This analysis can indicate the feasibility of a proposed quantum computer design, which is a worthwhile consideration when evaluating the longterm prospects of a quantum computing research program.
A quantum engineer could start here in Layer 5 with a specific application in mind and work down the layers to determine the system design necessary to achieve desired functionality. I take this approach for QuDOS by examining two interesting quantum algorithms: Shor’s factoring algorithm and simulation of quantum chemistry. A rigorous system design is beyond the scope of the present work, but this section considers the computing resources required for each application in sufficient detail that one may gauge the engineering effort necessary to design a quantum computer based on QuDOS technology.
2.4.1 Elements of the Application Layer
The Application layer is composed of application qubits and gates that act on the qubits. Application qubits are logical qubits used explicitly by a quantum algorithm. As discussed in Section 2.3.3, many logical qubits are also used to distill ancilla states necessary to produce a universal set of gates, but these distillation logical qubits are not visible to the algorithm in Layer 5. When an analysis of a quantum algorithm quotes a number of qubits without reference to faulttolerant error correction, often this means the number of application qubits [108, 16, 109, 110]. Similarly, Applicationlayer gates are equivalent in most respects to logical gates; the distinction is made according to what resources are visible to the algorithm or deliberately hidden in the machinery of the Logical layer, which affords some discretion to the computer designer.
A quantum algorithm could request any arbitrary gate in Layer 5, but not all quantum gates are equal in terms of resource costs. As shown in Section 2.3.3, distilling ancillas for gates is a very expensive process. For example, Fig. 2.5 shows how Layers 4 and 5 coordinate to produce an Applicationlayer Toffoli gate, illustrating the extent to which ancilla distillation consumes resources in the computer. When ancilla preparation is included, gates can account for over 90% of the circuit complexity in a faulttolerant quantum algorithm [12, 13].
When analyzing algorithms, it is convenient to count resources in terms of Toffoli gates. This is a natural choice, because the level of ancilla distillation, number of virtual qubits, etc. depend on the choice of hardware, error correction, and many other designspecific parameters; by comparison, number of Toffoli gates is machineindependent since this quantity depends only on the algorithm (much like the number of application qubits mentioned above). To determine error correction or hardware resources for a given algorithm, one can take the Layer 5 resource estimates and work down through Layers 4 to 1, which is an example of modularity in this architecture framework. As shown in Ref. [13], an Applicationlayer Toffoli gate in QuDOS has an execution time of 930 s (31 logical gate cycles including the gate circuits).
2.4.2 Shor’s IntegerFactoring Algorithm
Perhaps the most wellknown application of quantum computers is Shor’s algorithm, which decomposes an integer into its prime factors [9]. Solving the factoring problem efficiently would compromise the RSA cryptosystem [10]. Because of the prominence of Shor’s algorithm in the field of largescale, faulttolerant quantum computing, I estimate the resources required to factor a number of size typical for RSA.
A common key length for RSA publickey cryptography is 1024 bits. Factoring a number this large is not trivial, even on a quantum computer, as the following analysis shows. Figure 2.6 shows the expected run time on QuDOS for one iteration of Shor’s algorithm versus key length in bits for two different quantum computers: one where system size increases with the problem size, and one where the system size is limited to logical qubits (including application qubits). For the fixedsize quantum computer, the runtime begins to grow faster than the minimal circuit depth when factoring numbers 2048 bits and higher. Fixing the machine size highlights the importance of the ancilla distillation factories. For this instance of Shor’s algorithm, about 90% of the machine should be devoted to distillation; if insufficient resources are devoted to distillation, performance of the factoring algorithm plummets. For example, the 4096bit factorization devotes of the machine to distillation, but about as many factories would be needed to achieve maximum execution speed in the lower trace in Fig. 2.6. I should also mention here that Shor’s algorithm is probabilistic, so a few iterations may be required [9].
2.4.3 Simulation of Quantum Chemistry
Quantum computers were inspired by the problem that simulating quantum systems on a classical computer is fundamentally difficult. Feynman postulated that one quantum system could simulate another much more efficiently than a classical processor, and he proposed a quantum processor to perform this task [111]. Quantum simulation is one of the few known quantum algorithms that solves a useful problem believed to be intractable on classical computers, so I estimate the resource requirements for quantum simulation in QuDOS, and more details are available in Ref. [13].
This section specifically considers faulttolerant quantum simulation. Other methods of simulation are under investigation [112, 113, 114], but they lie outside the scope of this work. The particular example selected here is simulating the Schrödinger equation for timeindependent Hamiltonians in firstquantized form, where each Hamiltonian represents the electron/nuclear configuration in a molecule [115, 116, 17, 117]. An application of such a simulation is to determine ground and excitedstate energy levels in a molecule. This analysis focuses on firstquantized instead of secondquantized form for better resource scaling at large problem sizes [17]. Digital quantum simulation will also be examined in Chapter 8.
Figure 2.7 shows the time necessary to execute the simulation algorithm for determining an energy eigenstate on the QuDOS computer as a function of the size of the simulation problem, expressed in number of electrons and nuclei. Firstquantized form stores the positionbasis information for an electron wavefunction in a quantum register, and the complete Hamiltonian is a function of one and twobody interactions between these registers, so this method does not depend on the particular molecular structure or arrangement; hence, the method is very general. Note that the calculation time scales linearly in problem size, as opposed to the exponential scaling seen in exact classical methods. The precision of the simulation scales with the number of time steps simulated [16], and this example uses time steps for a maximum precision of about 3 significant figures.
2.5 Quantum Computing and the Need for Logic Synthesis
The factoring algorithm and quantum simulation represent interesting applications of largescale quantum computing, and for each the computing resources required of a layered architecture based on QuDOS are listed in Table 2.2. The algorithms are comparable in total resource costs, as reflected by the fact that these two example problems require similar degrees of error correction. The simulation algorithm is more compact than Shor’s, requiring fewer logical qubits for distillation, which is a consequence of this algorithm performing fewer arithmetic operations in parallel. However, Shor’s algorithm has a shorter execution time owing to its use of parallel computation. Both algorithms can be accelerated through parallelism if the quantum computer has more logical qubits available [118, 117].
Shor’s  Molecular  
Computing Resource  Algorithm  Simulation  
(1024bit)  (alanine)  
Layer 5  Application qubits  6144  6650 
Circuit depth (Toffoli)  
Layer 4  Log. distillation qubits  66564  15860 
Logical clock cycles  
Layer 3  Code distance  31  31 
Error per lattice cycle  
Layer 2  Virtual qubits  
Error per virtual gate  
Layer 1  Quantum dots  
(area on chip)  (4.54 )  (1.40 )  
Execution time (est.)  1.81 days  13.7 days 
Precise timing and sequencing of operations are crucial to making an architecture efficient. In the layered framework presented by Ref. [13], an upper layer in the architecture depends on processes in the layer beneath, so that logical gate time is dictated by QEC operations, and so forth. This system of dependence of operation times is depicted for QuDOS in Fig. 2.8. The horizontal axis is a logarithmic scale in the time to execute an operation at a particular layer, while the arrows indicate fundamental dependence of one operation on other operations in lower layers.
Examining Fig. 2.8, the operation timescales increase as one moves to higher layers. This is because a higher layer must often issue multiple commands to layers below. A crucial point shown in Fig. 2.8 is that the time to implement a logical quantum gate is four orders of magnitude greater than the duration of each individual physical gate, such as a laser pulse. For largescale quantum computing, the speed of errorcorrected operations is the crucial figure of merit, and the substantial overhead for fault tolerance shown in Fig. 2.8 indicates that improved methods are needed.
The findings in Table 2.2 and Fig. 2.8 were, more or less, the key results of Ref. [13]. In an earnest attempt to design the architecture of a quantum computer, it was revealed that a few error correction processes accounted for a substantial portion of the resource overhead. These include magicstate distillation, Toffoli gates, and approximations to arbitrary gates. These tasks all involve the synthesis of faulttolerant quantum logic, and it soon became apparent to other researchers and myself that significant improvements are possible by optimizing the logic constructions. Quantum logic synthesis is the subject of my thesis, and the following chapters develop methodology and novel techniques for this new field of research. The processes listed above are considered explicitly in Chapters 5–7. The methods in those chapters will improve on the resource costs given here by about a factor of 500.
Chapter 3 Preliminaries for Quantum Logic
Quantum logic is the result of composition. Every quantum program is a sequence of instructions, each being one of three types: preparing quantum states (qubits), applying unitary operations (gates), and performing projective measurement. In addition to quantum logic, classical logic is often included when gates are conditioned on the result of an earlier measurement. Because the order of operations is important, quantum programs can be quite complicated. This chapter examines how quantum programs are specified, how programs are represented in diagrams, and how the resource costs are calculated for a program in the surface code.
Informative diagrams are essential for quantum circuit designers to see the action of a sequence of operations. Having easytounderstand pictorial diagrams helps to: design programs, adapt using previous results, identify mistakes, and communicate results. This chapter discusses two types of quantum logic diagrams. The first is the familiar quantum circuit, which was introduced in Chapter 1. The second type is a surface code topology diagram, which is a threedimensional rendering of how quantum logic is implemented using surface code error correction. Surface codes are preferred in this work for reasons outlined in Chapter 2. What is particularly useful about this diagram is that it provides both visual and quantitative assessment of actual resource costs at the hardware level; the disadvantage is that such diagrams are difficult to interpret alone. Circuit diagrams and surface code diagrams will play complementary roles in this thesis.
Analyzing resource costs is essential to quantum logic synthesis. The objective is to compose logic in a way that minimizes costs while ensuring reliable execution of the quantum program. This chapter concludes by explaining how to quantitatively estimate resource costs in the surface code. I also introduce the concept of the Trivial Upper Bound (TUB), which for any program is the resource costs for using a naive, “worst case” compilation. The TUB represents the cost of a program that surely works but is probably not optimal, and TUBs will be used as benchmarks to demonstrate the efficacy of logic synthesis.
3.1 Quantum Programs
A quantum program is any sequence of operations on a quantum state. As mentioned in the introduction, there are three types of operations: initialization of quantum states, unitary gates, and measurement. A program is defined by this sequence of operations and any input or output states that are fixed externally. A program might not have an input state; if this is the case, the program initializes all of its quantum data. A program also might not have an output state, returning only classical information from internal measurements. A key concept for logic synthesis is that two programs are logically equivalent if they produce the same outputs from the same inputs, within some specified error tolerance.
In practice, most operations belong to finite sets. Within an error correcting code, such as the surface code [56, 57, 60], the logical operations are constrained. The EastinKnill theorem and related results indicate that it is impossible for all logical operations to be native to the code [119, 120]. Moreover, the available errorcorrected operations are often discrete. The errorcorrected operations supported by the surface code are:

Initialize or (basis or basis, respectively);

Unitary gate (Hadamard);

Unitary CNOT gate;

Measurement or (basis or basis, respectively).
These operations are not universal for quantum computing, but they will account for most of the operations in quantum programs.
The final operation in the surface code is the ability to initialize a single qubit in any arbitrary state, though it has error probability proportional to the hardware error rate. The qubit is called an “injected state,” because it was teleported into the code using faulty methods [56, 57, 60, 95, 14]. The error probability is a sum of error rates in the hardware. Reference [100] estimates an injection error that is 10 times the gate error probability , so the injected state could have error on the order of 1% for . These faulty states are essential for universal quantum computation, but fault tolerance requires that they be purified in some manner using the error corrected operations listed above. The choice of program to “clean up” these noisy inputs will have a dramatic impact on resource costs, as will be considered in detail in Chapters 5 and 6.
The simplest way to implement a program is to initialize all the states that one might need at the beginning, then apply all of the operations using unitary gates, then perform all of the measurements at the end. However, the same output can often be achieved by performing some initialization and measurement in the middle of the program. Doing so can lower resource costs in several ways. For one thing, idle quantum states still require error correction at the hardware level, so if initialization can be delayed until the state is needed or if measurement can be performed as soon as possible, then the program should do so. Moreover, sometimes a unitary gate can be replaced by a nonunitary sequence of operations that has lower resource cost.
The technique of replacing unitary logic with nonunitary logic will be used frequently in later chapters. It may seem counterintuitive to replace a single gate with multiple nonunitary operations, as the latter appears more complex. However, some unitary gates are very expensive, so replacing them with a nonunitary sequence of operations can lead to a net reduction in resources. Consider the circuit in Fig. 3.1 as an example. On the left, one would like to implement the gate , but this gate is not available (i.e. it has infinite cost). However, the logically equivalent program on the right uses an ancilla state (injected and purified), , CNOT, and measurement. The gates enclosed in the dashed box are conditionally applied based on the measurement outcome. Neglecting for now the way in which the injected state is purified (Chapter 5 covers this in detail), it is clear that all operations are available in the surface code, so this program has lower (finite) cost.
Considering the list of available errorcorrected operations above, only trivial quantum programs can be implemented with unitary gates in the surface code. This list is a subset of the Clifford group, and even programs that use the full Clifford group can be simulated classically using the GottesmanKnill Theorem [121, 7]. Therefore, all useful programs in the surface code require purified injected states, and using these states requires nonunitary operations. Hence all useful quantum programs in the surface code are nonunitary, at some level. However, a quantum program can encapsulate the nonunitary details, so that the external world only sees the program perform a unitary mapping of an input state to an output state. When some arbitrary program, such as a quantum algorithm, needs to be implemented in a faulttolerant manner, the synthesis procedure will replace many unitary operations with logically equivalent, nonunitary programs so as to minimize resource costs.
Finally, quantum programs can call subprograms. Using some inductive reasoning, any program is a valid composite operation because it is composed of valid operations. Hence, programs can be structured in a hierarchical fashion. This is a common technique in classical programming, but it plays a special role in quantum computing. Later chapters show that certain choices of subprograms can be easily verified, thereby lowering the costs of error correction substantially. Logic synthesis will tend to produce hierarchical quantum programs.
3.2 Quantum Logic Diagrams
Quantum logic diagrams provide a visual aid for understanding properties of quantum logic. Moreover, each type of diagram is useful for a different purpose. This section covers two frequently used diagrams, quantum circuits and surface code depictions. Quantum circuits are one of the oldest methods to represent quantum programs, and they are straightforward to interpret. Time progress left to right, like a musical score, and each horizontal line is a qubit. By contrast, surface code diagrams are challenging to interpret, but they explicitly account for the resource costs of implementing a program. When used together, the diagrams explain both the action of a program and its costs, which are the main concerns of logic synthesis.
Quantum circuits were introduced in Chapter 1, so I will be brief. In a quantum circuit diagram, each qubit is a horizontal line, and operations affecting a certain qubit touch the corresponding line. The line begins where the qubit is initialized or at an input to the program, and it ends where the qubit is measured or at an output of the program. In some contexts, multiqubit states are grouped into one line, often borrowing the digitallogic notation of a slash “/” through the line to denote multiple bits. Figure 3.1 is a quantum circuit, and Nielsen and Chuang provide a more detailed overview of quantum circuits (Ref. [7], Ch. 1). In the List of Figures, I denote circuit diagrams by the prefix “Circuit.”
The second type of logic diagram, the surface code diagram, is a threedimensional geometric depiction. Two dimensions are space, and one dimension is time. In most cases, I will set the viewing angle such that time flows left to right, making the spatial dimensions vertical and outofpage. The diagram represents how the surface code implements encoded gates with many physical gates and qubits. By using surface code diagrams, I implicitly assume that the quantum computer implements surface code error correction at the lowest level. This is justified by arguments in Chapter 2, which in essence reduce to the following: surface codes are the best error correction scheme published so far when hardware gates are constrained to a nearestneighbor, twodimensional geometry [56, 57, 60, 95, 14]. In the List of Figures, I denote surface code diagrams by the prefix “Surface Code.”
Surface code diagrams are useful for two reasons. First, this type of diagram accurately represents the total resource costs of quantum logic, because there is a direct correspondence between the features of the diagram and the operations at the hardware level, in both space and time. Such information is not readily available in circuit diagrams, where the costs associated with two different gates may differ by orders of magnitude. Second, surface code diagrams provide a visual way to modify or optimize logic while maintaining the errorcorrection capacity of the surface code. In this work, I make use only on the first purpose, though optimization within the surface code is actively being studied elsewhere [94, 100, 122].
For all their utility, surface code diagrams have a notable downside. Owing to the way that quantum logic in the surface code depends on topology [56, 57], it is virtually impossible to determine the underlying logic being shown, as will become apparent in the examples which follow. For this reason, a surface code diagram should always be paired with a quantum circuit diagram, because the two are complementary. The quantum circuit shows what the logic does, while the surface code diagram shows how the logic is implemented and what the resource costs are. This complementarity will be used frequently to demonstrate logic synthesis in later chapters.
An example of a surface code diagram is shown in Fig. 3.2. The left side is a simple circuit with a CNOT gate acting on two qubits, while the right shows how this might be implemented in the surface code. CNOT gates in the surface are determined by the topology of defects in the code (shown here as yellow and black pipes) braid around each other. Each defect is a hole of sorts in the surface code lattice, and Refs. [60, 14] give a good explanation of how this is implemented at the hardware level. Some other common circuit primitives are initialization and measurement (Fig. 3.3), which at this level are mirror images in time, and state injection (Fig. 3.4). In Fig. 3.4, the tip of the pyramids is a single physical qubit, whose state is converted into a surface code logical qubit contained in the defect. As mentioned earlier, state injection is a critical process in surface code programs, and Refs. [56, 57, 60, 95, 14] give a proper explanation. The Hadamard gate is also important, but it is not shown in braiding diagrams because it requires some manipulation of the code properties; see Refs. [98, 14] for details. In other codes, the Hadamard gate may be the “hard” operation [123].
3.3 Resource Calculations
The objective of quantum logic synthesis is to minimize resource costs while executing a reliable quantum program. There are many resources that require consideration for running a quantum computer, but this work will focus on only two: qubits and gates used for faulttolerant computation. Suppose that qubits are regularly spaced on a twodimensional grid and that gates are regularly separated in time, or “clocked.” Using this model, one can account for resource costs by the threedimensional volume (space and time) required to execute the program, which corresponds exactly to the volume required to implement the braiding topology in the surface code. Volume is a useful measure for resource cost because it depends mostly on the underlying logic of the program and the error rates of the hardware, and less on the sequence of gates in the program.
The reliability of a quantum program is the probability that the output does not have an error. A program is reliable if the output error probability is below some target value. Errors are suppressed using techniques of quantum error correction, but these are costly in terms of resources [3, 7, 101, 12, 13, 14]. The overhead associated with fault tolerance depends on error rates in the hardware and the chosen code. Generally speaking, the cost scales as , where is an upper bound on the logical error of the program and exponent is a constant that depends on the logic synthesis method. Instead of relying on asymptotic estimates, a more precise resource analysis described below will be used in later chapters to give quantitative resource costs.
Operations in the surface code are convenient to analyze at a high level of abstraction, where one only considers the arrangement of the braiding surfaces. Surface code diagrams exist at this level, as the details of hardware operations are not shown. Apart from visual clarity, this abstraction also gives the diagram a sense of scale invariance, because the same topology, hence same program, could be implemented in two instances of a surface code, where each has a different code distance. The code distance, often denoted , determines how far apart the braid surfaces must be separated in terms of qubits (space) or stabilizer measurements (time). Because of this fundamental spacing, one can define a unit cell as two stabilizer measurements, one of each type ( and ), as shown in Fig. 3.5. The surface code consists of these unit cells tiled across the 2D plane in space, and repeated in time. Viewed this way, the surface code is a crystal, in the abstract sense, where the unit cell is repeated in three dimensions. Logic is implemented with defects, or holes, in the repeated pattern [56, 57, 60], but the volume can still be accounted in terms of these unit cells, which is the methodology I use throughout this manuscript.
A relevant example of the resource overhead required for faulttolerant quantum computing is the cost of making some cubic region of the surface code sufficiently reliable. First, let me explain some rules for surface code logic. Fowler, Devitt, and collaborators [94, 100] develop a simple set of design rules for spacing defects. For a given code distance , the rules are:

two defects (or other boundaries) of the same type must be separated by ;

any defect must have circumference greater than or equal to , so square defects must have side length ;

given (1) and (2), two defects of different types must be separated by .
A simple strategy to follow these rules is to design braiding patterns using cubic regions of with side length . The finite set of allowable braiding patterns are known as “plumbing pieces,” because visually they are pipes that connect together [94, 100]. A simple estimate for the probability of error in a plumbing piece with distance is derived in Ref. [100]:
(3.1) 
where is the error per hardware gate and the factor 100 comes from numerical data fitting in Refs. [96, 14, 99].
The volume of a plumbing piece as a function of is logical error probability is plotted in Fig. 3.6, where . This type of plot will be used many times throughout this manuscript to quantify the resource cost of making a quantum program sufficiently reliable. The volume is measured in unit cells of the surface code, as discussed earlier. The only notable feature of this plot is that the resource scaling obeys a power law (dashed line) very well: unit cells. This is in close agreement to other findings that the “scaling exponent” should be 3 [94]. The exponent is less than 3 here only because of the coefficient in Eqn. (3.1), whose presence skews the estimated error rate up more at lower values of . Indeed, the fitted exponent will approach 3 as , but the plot in Fig. 3.6 only shows the range relevant to practical quantum computing. This is a good time to remark that power law fits should only be used for estimating quantities like resources, not revealing some deep meaning about quantum information.
On the subject of surface code scaling trends, I would also like to note that the error bound in Eqn. (3.1) may overstate error probability at low values of , because there is numerical evidence that the surface code actually performs better than the asymptotic fits for low code distance [96, 14]. Hence a more accurate error rate (as a function of ) may come closer to the expected scaling coefficient of 3. The reason for this behavior is that the edges and corners of the surface code become more important at low distance, and these stabilizers have lower weight (two or three, instead of four), which reduces the possible sources of error at the physical level.
A primary concern of this thesis is optimizing quantum logic, and any statement of improvement requires some point of reference. For comparison purposes, it is useful to define the worstcase resource cost for implementing a quantum program. Given a quantum program composed of some fundamental operations, the Trivial Upper Bound (TUB) is the resource cost associated with the simplest logic design. For example, one could make the probability of error in each fundamental operation so low that, when summed together, the total probability is small and the entire program is guaranteed to be reliable. This approach is usually not optimal, but it is a starting point that is useful for comparison. For a particular program, the difference between TUB and optimized logic shows how important logic synthesis can be.
Chapter 4 Quantum Logic Synthesis
The purpose of logic synthesis is to execute a quantum program in a way that minimizes resource costs. The previous chapter introduced quantum programs to encapsulate quantum logic, diagrams to depict quantum logic, and ways to estimate resources. These are the tools required for logic synthesis. This chapter gives an overview of the common synthesis techniques, while the subsequent chapters provide detailed examples with resource analysis.
4.1 Generalized Teleportation Gates
A crucial development for faulttolerant quantum logic was the teleportation gate [1, 65, 47, 124]. Instead of teleporting a quantum state from one position to another, this procedure implements a logical gate using a sequence of operations fueled by a special quantum state. In effect, the quantum state changes through teleportation, even though it may not change its physical location. The novelty of this proposal is that a gate can be encoded into a quantum state, so long as one knows how to “read” this information.
Let me introduce the notion of a “quantum lookup table” (QLUT). Take any dimensional unitary operator and represent it in the spectral decomposition using eigenvalues and eigenvectors :
(4.1) 
Let be a uniform superposition over the eigenvectors. The QLUT for is
(4.2) 
There is a clear similarity between the RHS of Eqns. (4.1) and (4.2). The reason I call this a “lookup table” is that that the QLUT is a state that encodes the action of . For any eigenvector of , the QLUT has the associated eigenvalue stored in its state. In many contexts, these are also called “magic states,” for precisely the same reason. For example, the magic state for is , where is the uniform superposition over the eigenvectors of .
One way to compile a quantum program into a QLUT is to begin with a teleportation circuit that takes as an input. Specifically, the circuit teleports an arbitrary qubit onto the ancilla , then implements . This process is depicted in Fig. 4.1(a) for and . The QLUT may be formed by using commutation rules to move to before the teleportation circuit, as in Fig. 4.1(b) (cf. [7], p.487). In general, this commutation step modifies the teleportation procedure, so it is crucial that the new circuit has an efficient faulttolerant construction. Developing other general procedures for designing teleportation gates is an area of future research.
4.2 OffLine Validation and Fault Tolerance
The technique of compiling a quantum program into a QLUT can take much of the computational effort off of the data path. The data path is the sequence of operations which come in direct contact with data qubits in an algorithm. If there is a failure here, the data is corrupted. By contrast, operations off of the data path (“offline”) may be expendable; if an error is detected, the faulty states are removed without affecting the rest of the computation. Reference [12] also discusses how offline preparation of QLUTs enables fast computation.
A QLUT can be compiled in a faulty manner, then validated using a procedure that checks for error in the QLUT. Using the quantum measurement postulate, successful validation projects the QLUT into a higherfidelity state. This is essentially a variant of postselected quantum computation [101]. Fault tolerance is achieved by bringing the QLUT to sufficient fidelity for interaction on the data path.
At first glance, the strategy of moving quantum programs into QLUTs would appear to just redistribute the effort of error correction from one place to another. However, the ability to discard states which fail validation is quite valuable. Validation only requires error detection instead of correction, and the former is more efficient. A distance code can correct errors, leading to an output error of order [7]. By contrast, the same code can detect errors, leading to a validated output state with error . Moreover, error detection is almost always less taxing on classical control hardware, which can be a concern in some contexts [20]. For these reasons, performing validation can lead to substantial reductions in the overhead for faulttolerant computation.
The steps for offline logic synthesis are: (1) identify an important and frequently used quantum program; (2) compile this program into a teleportation gate using a QLUT; (3) develop an efficient procedure to validate the QLUT. This design methodology will be demonstrated repeatedly in Chapters 5–7. Chapter 9 will examine common features of these techniques, which may be useful both for developing new methods and for understanding limitations of this approach.
Chapter 5 Distillation Protocols
Distillation protocols are a special case of error detection where many noisy copies of a quantum state are “distilled” into fewer lowerror copies of the same state. A common theme for this chapter is that, in many circumstances, an important but difficult operation can be encoded into a wellcharacterized quantum state, such as a quantum lookup table (QLUT; see Chapter 4). After injecting noisy copies of the desired state into the surface code, they are distilled before being used by computation. Error detection is often employed with a quantum code that uses only operations that are themselves errorcorrected by the surface code (see Section 3.1 for a list). However, at the end of the chapter, I discuss Fourierstate distillation, which distills a special class of multiqubit quantum states. This is a new protocol that relies on Toffoli gates, which are not native to the surface code. The Toffoli gates require techniques developed in Chapter 6, and this example shows that distillation can be applied to produce useful multiqubit states beyond just satisfying the minimum requirements of universal computation.
Distillation protocols hold an important place in faulttolerant quantum computing. For example, entanglement distillation demonstrated that arbitrarily longrange quantum entanglement was achievable in principle, using quantum repeaters [125, 126, 127]. The advent of magicstate distillation made the prospect of largescale quantum computing more plausible [101]. There are alternative ways to achieve universal, faulttolerant quantum computing [128, 129, 130, 131, 22, 3, 7, 25], but the magicstate techniques developed by Knill [132, 101] and Bravyi and Kitaev [104] are compatible with broader sets of codes, including the surface code.
When viewed as a quantum program, a distillation protocol takes many copies of the same state as inputs and returns fewer copies of the same state as outputs. By assumption, the input states have independent errors, which is essential for the technique. Moreover, it is often assumed that the errors are also identically distributed, but this is not necessary. Because the inputs and outputs are of the same form, distillation protocols can be executed recursively. Recursive distillation is needed when just one round does not purify the desired state to sufficiently low error probability. The different rounds have different requirements for error correction, and hence different resource costs.
Resource costs for magicstate distillation can dominate the total resources required for quantum computing [12, 13, 94, 14, 100]. A recursive distillation protocol used to make a single gate, such as , requires very many fundamental gates in the surface code. Fowler and Devitt estimate that a single gate requires 46 times the surface code volume as a single CNOT [94]. Resource costs will be a central concern for this chapter, as the distillation protocols examined here will be the first concrete demonstrations of the techniques of logic synthesis.
5.1 MagicState Distillation
Magicstate distillation purifies a quantum lookup table (QLUT) for a gate that is otherwise unavailable within the chosen code. For example, the surface code is usually implemented with two distinct types of magic state distillation. The gates and are required for universal computation, and they may be produced using magic states and , respectively [57, 60, 13, 14]. This section focuses on distilling because this process is more costly than distilling ; however, gates are a necessary part of distillation, as discussed later.
There are many proposals for distilling states [132, 104, 105, 106, 107], but I focus on the 15to1 BravyiKitaev (or “BK”) protocol, named for the authors of Ref. [104]. The label “15to1” refers to the ratio of input states to output states, which is an important consideration for efficiency. A circuit diagram for the BK protocol is shown in Fig. 5.1. Each gate is produced using a copy of , as shown in Fig. 5.2, so the BK distillation protocol takes 15 copies of as inputs. When each of the input states has independent error , the distilled output state has error to lowest nonvanishing order.
The BK protocol has an important advantage over many other competing protocols: it distills only one output state. Other protocols [105, 106, 107] that distill two or more states within the same code block inevitably lead to correlated errors at the output. This poses a problem when one round of magicstate distillation is insufficient, so the output of the first round must be purified again. In such a scenario, states with correlated errors must fan out to different secondround distillation blocks, etc. By not having this issue, the BK protocol is much simpler to analyze. Still, recent analysis suggests there may be advantages to the multipleoutput distillation methods [105, 106, 107], if the routing considerations can be addressed.
5.1.1 BravyiKitaev Distillation in Surface Code
Several works have analyzed the BK protocol assuming perfect Clifford gates [104, 133, 13] and the costs associated with making Clifford gates fault tolerant [56, 57, 94, 14, 100]. Using the implementation from Ref. [94] of the BK protocol in the surface code, one can estimate the resources required to implement sufficient error correction for this distillation routine. Moreover, the cost of a gate at any level of fidelity can be calculated by accounting for the costs of multiround distillation, as explained below.
Following the methodology developed in Refs. [94, 100], one constructs programs in the surface code using regularsized “plumbing pieces” (see also Chapter 3). Each piece occupies a cube in the surface code with side length . The probability of logical error in a single plumbing piece can be bounded from above by
(5.1) 
as derived in Ref. [100]. Subscript denotes logical error, is errorpergate at the hardware level, is the distance of this implementation of the surface code, and the power law scaling is a fit to numerical simulations of surface code error correction [93, 61, 96, 97]. The error at the output of BK distillation is therefore bounded by the sum of probabilities for distillation error from input states and for error in the distillation circuit: , where is the number of plumbing pieces and is the error of the input gates. The volume, in unit cells of the surface code, is the product of number of plumbing pieces and the volume of a single plumbing piece, which is unit cells.
Fowler and Devitt constructed a version of the BK protocol in the surface code with plumbing pieces [94]; however, this work considers the volume to be slightly larger. An important issue for distilling states in the surface code is that gates are implemented using the teleportation circuit in Fig. 5.2, which may also require an gate correction. is not a native operation in the surface code, but it is still relatively inexpensive since it can be catalyzed by without destroying the magic state [134, 13], as shown in Fig. 5.3. Moreover, the additional overhead is small because each gate need only have a fidelity on the same order as the gate input error , and hence lower code distance can be used for gates. I estimate that the operations for these gates can be implemented in a depth of two plumbing pieces (there was already one allocated in the volume estimate above), making the total volume now plumbing pieces for the BravyiKitaev distillation protocol, as shown in Fig. 5.4. A bounding box serves as a guide to how the volume is estimated.
5.1.2 Resource Analysis for BravyiKitaev Distillation
Determining the best combination of BravyiKitaev distillation protocols at different code distances is a resource optimization problem. In the first round of distillation, increasing code distance will increase volume and lower output error, until the probability of surface code error is negligible in comparison to the error from faulty input states. To move beyond this limit, one must use two rounds of distillation. Since the inputs to the second round require distillation, the total volume will be 15 times the volume in first round, plus the volume of the second round. Furthermore, there is positive probability that any distillation circuit will fail, requiring repetition; I account for this by multiplying volume by , which gives the mean volume including repeated distillation. For BK distillation, . For a specified number of rounds, let denote the injected state error probability, the probability of error after one round of distillation, etc. The approximate volume and output error after round are given by the recursion relations:
(5.2) 
(5.3) 
The factor 224 is the estimated size of the surface code program, in plumbing pieces. The factor is the number of copies of round distillation needed to feed into one instance of round distillation.
Using the formulas in Eqns. (5.2) and (5.3), I calculated all possible combinations of BK distillation volume (in unit cells) and output error rate, as explained below. The number of rounds ranged from one to three, the distance in each round ranged from , and I calculated results for values of the hardware gate error such that . By conventional assumption, the injected states used by the first round have [100], where the factor 10 accounts for the number of faulty hardware operations during injection and before error correction. To narrow focus to useful results, I only make note of protocols on the “efficient frontier,” which consists of those protocols (each having a unique combination of parameters ,,) that are not dominated by any other protocol. In terms of performance, one protocol dominates another if the first has both lower volume and lower output error rate; there is no reason to use a dominated protocol. The results for are shown in Fig. 5.5. The results for other values of show effectively the same behavior, so they are not plotted. In addition to this plot, Table 5.1 gives the estimated resource costs when using BK distillation for different input error rates and output error rates . This can be compared with tables in Refs. [105, 106, 107, 100], most of which only consider cost in number of input magic states. The difference between my results and those in Ref. [100] is due mostly to my definition of a unit cell, which contains four qubits when ancillas are used for stabilizer measurement. Additionally, I estimate a slightly larger volume for BK distillation in the surface code (224 vs. 192 plumbing pieces).
Volume (unit cells)  
3  
(13)  (9)  
4  
(15)  (9)  (7)  (5)  
5  
(11,17)  (11)  (9)  (7)  (5)  
6  
(11,19)  (7,13)  (9)  (7)  (5)  
7  
(13,21)  (7,15)  (11)  (9)  (7)  
8  
(13,23)  (9,15)  (7,11)  (9)  (7)  
9  
(13,27)  (9,17)  (7,13)  (5,9)  (9)  
10  
(15,27)  (9,19)  (7,13)  (5,11)  (9)  
11  
(15,31)  (9,19)  (7,15)  (5,11)  (5,9)  
12  
(11,15,33)  (11,21)  (7,15)  (5,13)  (5,11)  
13  
(11,17,33)  (11,23)  (7,17)  (7,13)  (5,11)  
14  
(11,17,35)  (11,23)  (9,17)  (7,13)  (5,11)  
15  
(11,19,37)  (13,25)  (9,19)  (7,15)  (5,13)  
Key:  Level1 BK  Level2 BK  Level3 BK 
In the numerical optimization above, I allowed the distance in each round of distillation to be independent of the other rounds, as suggested in Ref. [14]. The protocols on the efficient frontier approximately double the code distance from one round to the next. This is an optimization that lowers the burden of error correction; with increasing strength of error correction, the probability of surface code failure becomes negligible compared to the distillation error term . To see what difference is made by this optimization, consider what happens when the distance is the same in all rounds. The code distance will need to be large enough for total probability of logical error in the surface code to be well below the target error probability, which makes the entire distillation procedure very expensive. Figure 5.6 shows the efficient frontier from Fig. 5.5 compared to this naive approach.
As mentioned in the introduction, I will use a Trivial Upper Bound (TUB) for each program to demonstrate the importance of logic synthesis. In Fig. 5.6, the naive approach will be the TUB for magicstate distillation. The power law fits show how the volume required by each method scales as demands on output error are increased. The optimized logic requires a surface code volume of approximately unit cells, while the TUB requires . As demonstrated in Section 3.3, error per plumbing piece in the surface code scales with exponent 2.84, and the asymptotic scaling exponent of BK distillation is . A simple guess would suppose that the naive protocol would have scaling coefficient , which is rather close to the fitting parameter calculated above.
The optimized protocol has scaling exponent 3.27, which is surprisingly close to the lower bound of 2.84 for a surface code operation. Compressing early rounds of distillation by lowering code distance works very well to improve performance over the TUB. For example, at output error , the optimized protocols reduce resources by about a factor of 10. The resource estimates in Ref. [13] and Chapter 2 used the naive method; moreover, gate production was found to dominate those resource totals. Using the optimized BK distillation protocols examined here would dramatically lower the estimated costs to execute those quantum algorithms. This is the first example of how logic synthesis can reduce the overhead of fault tolerance by a sizable factor relative to naive constructions, but this chapter and those that follow will examine several more.
5.1.3 Alternative Protocols
Many other magicstate distillation protocols have been developed but are not analyzed here. Notable examples are:

a 2to1, protocol for [134];

the MeierEastinKnill 10to2, protocol for distilling (which can be converted to a protocol for distilling ) [105];

BravyiHaah triorthogonal codes distilling to with output error [106];

LandahlCesare generalized ReedMuller codes distilling the family of states for integers [135];

a distillation protocol for Toffoli magic states [134]
In addition to analysis in the original proposals, there has been work to understand the resource costs associated with fault tolerance for some of these protocols. Several researchers have considered the limits on errors in magic states for distillation to succeed [133, 136]. Reference [137] considers the effectiveness of magicstate distillation using faulty gates. An equivalent form of the triorthogonal codes called “block codes,” implemented with surface code error correction, was analyzed in Ref. [100]. Finally, it is worth mentioning that a handful of codes, such as the Steane code and some topological color codes, use a faulttolerant state injection method rather than magicstate distillation. While this technique is appealing for its simplicity, the surface code still appears to have better performance for the reasons outlined in Section 2.3.
5.2 Multilevel Distillation
I developed another distillation protocol for states (or , through slight modification) to probe the limits of magicstate distillation [107]. Let the efficiency of a distillation protocol be measured only in terms of input and output states with , where is the number of input states, is the number of output states, and the output error is . In their work on triorthogonal codes, Bravyi and Haah conjectured that [106]. All previous distillation methods obey this limit, but there was no protocol for states that approached . The multilevel protocols described below come arbitrarily close to in certain limits, which is interesting theoretically. However, I ultimately conclude that these methods are probably not useful for quantum computing. This is an instructive example for logic synthesis because it shows that narrowly improving one aspect of faulttolerant computation may not be effective at lowering overall resource costs. Still, the techniques developed below may be useful in other applications.
5.2.1 Block Codes with Transversal Hadamard
As a preliminary step, I define a family of CSS quantum codes that encode logical qubits, where is even, using physical qubits. Furthermore, these codes possess a transversal Hadamard operation, so I call them collectively “ codes” and denote as the code using physical qubits. Any code may be defined as follows. The stabilizer generators are , , , , where subscripts index over physical qubits and tensor product between Pauli operators is implicit. The logical Pauli operators (corresponding to logical qubits), denoted with an over bar and indexed by , are and . The Hadamard transform exchanges and operators, so application of transversal Hadamard gates at the physical level enacts a transversal Hadamard operation at the logical level, which will be a useful property when I later concatenate these codes. All codes have distance two, which means they can detect a single physical Pauli error. The product of two logical Pauli operators of the same type for two distinct logical qubits has weight two (number of nonidentity physical, singlequbit Pauli operators); the product of sametype Pauli operators on all logical qubits is also weighttwo at the physical level. The stabilizers come in matched / pairs, so there are no weightone logical operators.
The eigenstate of the Hadamard operator is a magic state for universal quantum computing [132, 104, 133, 105, 106]. In particular, two of these magic states can be consumed to implement a controlled operation [132, 105], enabling one to measure in the basis of (see Fig. 5.7(a)). The distillation procedure is as follows: (a) encode faulty magic states in an code; (b) measure in the basis of the transversal Hadamard gate by consuming ancillas; (c) reject the output states if either the measureHadamard or codestabilizer circuits detect an error. For example, when an code is used for distillation, states are encoded as logical qubits using physical qubits. Each transversal controlledHadamard gate consumes two states [105], and this gate is applied to all physical qubits, which results in the to input/output distillation efficiency of these codes. A diagram of the quantum circuit for distillation using is shown in Fig. 5.7(b).
5.2.2 Multilevel Protocols
Multilevel distillation uses concatenated codes with transversal Hadamard for distillation, in such a manner that the protocol uses two classes of input magic states, where the classes have different levels of infidelity and enter at different concatenation levels in the code. The ancillas consumed for transveral controlledHadamard measurement are of lower fidelity than the encoded logical states being distilled. When two quantum codes with transversal Hadamard are concatenated, the resulting code also has transversal Hadamard. Under appropriate conditions, the distance of the concatenated code is the product of the distances for the individual codes: [105]. Thus the concatenation of two codes yields a distance4 code with transversal Hadamard, and level concatenation has distance .
The concatenation conditions for codes are that, through all levels of concatenation, any pair of physical qubits have at most one encoding block (at any level) in common. The reasons for this restriction are that logical errors in the same block are correlated and that the statement above regarding distance multiplying through concatenation assumes independence of errors. Consequently, two logical qubits from the same encoding block can never be paired again in a different encoding block. The required arrangement of qubits can be given a geometric interpretation. Arrange all physical qubits at points on a Cartesian grid in the shape of a rectangular solid, with the number of dimensions given by the number of levels of concatenation. A square, cube, or hypercube are possible examples at dimensionality two, three, or four. Each dimension is associated with a level of concatenation, and there must be an even qubits in each dimension to form an code. Construct codes in the first dimension by forming an encoding block with each line of qubits in this direction, as in Fig. 5.8(a). This will give rise to logical qubits along each line in this direction. Repeat this procedure by grouping these firstlevel logical qubits in lines along the second dimension to form logical qubits in a twolevel concatenated code, as in Fig. 5.8(b). Continuing in this fashion through all dimensions ensures that any pair of qubits have at most one encoding block in common.
As with the codes, multilevel codes use a transversal logical Hadamardbasis measurement to detect whether any one encoded qubit has an error (an even number of encoded errors would not be detected). If the logical states have independent error probabilities , then the distilled states will have infidelity with perfect distillation. One must also consider whether the Hadamardbasis measurement has an error. For a twolevel code arranged as a square of side length , the transversal controlledHadamard gates at the lowest physical level require magic states, each of which has infidelity . However, this is a distance4 code, so for independent input error rates, the probability of failing to detect errors at the physical level is (analysis is provided in Section 5.2.3). The code can detect more errors in the magic states at the lower physical level, so these states can be of lower fidelity than the magic states encoded as logical qubits and successfully perform distillation. This is the essential distinction between multilevel distillation and all prior distillation protocols. When multiple rounds of distillation are required [13, 14], lowfidelity magic states are less expensive to produce, so multilevel protocols achieve higher efficiency.
Multilevel distillation protocols are applied in rounds, beginning with a small protocol (such as an code) and progressing to concatenated multilevel codes. Let us denote the output infidelity from a single round by the function . For each such function, is the dimensionality (number of levels of concatenation) and are the sizes of each dimension, which need not all be the same. As before, and refer to the independent error probabilities on logical and physical magic states, respectively. A typical progression of rounds using a source of states with infidelity might be , , etc.
Multilevel protocols tend to be much larger in both qubits and gates than other protocols. Because there can be many encoded qubits, the protocol is still very efficient, but the size of the overall circuit may be a concern for some quantum computing architectures. At any number of levels, the distilled output states have correlated errors, so distilled magicstate qubits in multilevel distillation must never meet again in a subsequent distillation circuit (it is a requirement that errors are independent within the same encoding block, as in Refs. [105, 106]). Let us suppose that one performs two rounds of distillation, where the first round uses onelevel distillers with encoded magic states and the second round uses twolevel distillers with encoded states. Because the inputs to each distiller in the second round must have independent errors, there must be independent distillation blocks in the first round. Therefore, to distill output states through two rounds, the number of input states is
(5.4)  
Consider a similar sequence through rounds with each distiller in round having encoded qubits. The total number of logical magic states is to ensure that errors are independent between logical magic states in every round. In the first round, the number of consumed magic states is ; in any subsequent round , the number of consumed magic states is (recall that the Hadamard measurement is implemented times, meaning it is repeated for ). The total number of input magic states can thus be expressed as
(5.5) 
For , this reproduces Eqn. (5.4). What also becomes clear is that the total size of multilevel protocols becomes unwieldy as and increase. For example, the case of and would require about input magic states and a comparable number of gates to distill output magic states. In general, the most efficient multilevel distillation protocols use large and multiple rounds, where efficiency is measured in the ratio of lowfidelity input states consumed to yield a single highfidelity output. Because of the complexity of such protocols, the greatest benefit from their application is seen in largescale quantum computing, where a typical algorithm run may require magic states, each with error probability [13]. It may be possible for alternative designs to circumvent these issues. If the first round uses a different protocol without correlated errors across logical magic states, such as BravyiKitaev 15to1 distillation, then having multiple distillation blocks is unnecessary in the second round using a twolevel concatenated protocol, which would lead to smaller multiround, multilevel protocols. Indeed, the results below show that optimal protocols found by numerical search happen to take this approach.
The scaling exponent of a distillation protocol characterizes its efficiency. Specifically, input states are required to distill one magic state of infidelity . Scaling exponents for previous protocols are (“15to1” [132, 104]), (“10to2” [105]), and (triorthogonal codes [106]). Moreover, Bravyi and Haah conjecture that no magicstate distillation protocol has [106]. In this work, if each round of distillation uses one higher level of concatenation in the multilevel protocols, then the number of consumed inputs doubles. In the limits of , , multilevel protocols require input states to each output state for rounds of distillation, where the round is a level distiller. The final infidelity is , so the scaling exponent is as , which is the closest any protocol has come to reaching the conjectured bound. I show later through numerical simulation that for error rates relevant to quantum computing.
5.2.3 Error Analysis for Multilevel Distillation
For simplicity, make the conventional assumption that all quantum circuit operations are perfect, except for the initial magic states being distilled. This is a valid approximation if all operations are performed using faulttolerant quantum error correction where the logical gate error is far below the final infidelity for distilled magic states [57, 13, 14]; for a more explicit construction of faulttolerant distillation circuits, see Ref. [94]. Additionally, following the methodology in Refs. [104, 105], one can consider each magic state with infidelity as the mixed state , where is the eigenstate of the Hadamard operation.
Determining the infidelity at the output of distillation becomes simply a matter of counting the distinct ways that errors lead to the circuit incorrectly accepting faulty states, which is aided by the geometric picture from Section 5.2.2. It is essential that error probabilities and for each input magic state are independent. Then a onelevel, to distiller using the code has output error rate on each state as
(5.6) 
where higher order terms denoted are omitted. The numerical results justify the use of lowestorder approximations as higherorder terms are negligible in optimally efficient protocols. The lowestorder error rates are both second order, because the Hadamard basis measurement and code can together detect a single error in any magic state. The probability of the distiller detecting an error, in which case the output is discarded, is . If , then the output error rate of conditioned on success is the same as in Ref. [106]. Using the twolevel distiller constructed from concatenated codes, the output infidelity for each distilled state is
(5.7)  
The probability of the twolevel distiller detecting an error is
(5.8) 
Similar error suppression extends to higher multilevel protocols, as examined in more detail below.
The multilevel codes analyzed here use concatenated codes, though other codes could be concatenated. When two codes are concatenated, the logical qubits of the first level of encoding are used as physical qubits for completely distinct codes at the second level. Consider a twolevel scheme: if the codes at first and second levels are and , respectively, then the concatenated code is , as shown in Fig. 5.8(b). This process can be extended to higher levels of concatenation.
Determining the potential errors and their likelihood in multilevel protocols requires careful analysis. Let us enumerate the error configurations which are detected by the protocols; the error probability is given by summing the probability of all error configurations that are not detected and that lead to error(s) in the encoded states. As a first step, the analysis of multilevel codes is simplified by considering each input magic state to the quantum computer as having an independent probability of error, as discussed in Refs. [104, 105]. Hence only one type of error stems from each magic state used in the protocol.
Identifying undetected error events in multilevel distillation, which lead to output error rate, is aided by the geometric picture introduced before. Qubits which will form the code are arranged in a rectangular solid, then grouped in lines along each dimension for encoding. There are two errordetecting steps which together implement distillation: the Hadamardbasis measurement and the error detection of the codes. The Hadamard measurement registers an error for odd parity in the total of encoded state errors and physicallevel errors in the first round of gates, and there is one of these for each qubit site in the code (see Fig. 5.7).
The second method for codes to detect errors is by measuring the code stabilizers. The code stabilizers detect any configuration of errors which is not a logical operator in the concatenated code. Because of the redundant structure using overlapping codes, only a very small fraction of error configurations evade detection. Before moving on, note that at each qubit site, there are two faulty gates applied, and two errors on the same qubit will cancel (however, the first error will propagate to the Hadamardbasis measurement). Conversely, a single error in one of the two gates will propagate to the stabilizermeasurement round, but only an error in the first gate will also propagate to the Hadamard measurement. The stabilizermeasurement round will only “see” the odd/even parity of the number of errors at each qubit site.
One type of error event that occurs at concatenation levels three and higher requires special treatment. If there is an error in an encoded magic state and errors on two physical states used for the same controlledHadamard gate at the physical level, then this combination of input errors is not detected by the distillation protocol, leading to logical output error. This event leads to the error probability mentioned previously, which is not an issue for twolevel protocols, but it must be addressed in levels three and higher. The solution for level distillation, where , is to repeat the controlledHadamard measurement times, consuming magic states at the physical level. After each transversal controlledHadamard, the code syndrome checks for detectable error patterns. With this procedure, one encodedstate error would also require at least errors in physicallevel magic states to go undetected, leading to probability of error that scales as .
Consider the pattern of errors after the two potentially faulty gates on each qubit in the dimensional Cartesian grid arrangement. The many levels of error checking in the codes can detect a single error in any encoding block at any encoding level. For this analysis, let us separate the qubits in a single code block into two groups: the first four qubits are “preamble” qubits, while the remaining qubits are index qubits. The reason for this distinction is that the logical operators, which would also be undetected error configurations, have common physicalqubit operators in the preamble, with a degeneracy of two: , because of the stabilizer . Conversely, the logical operators are distinguished by the logical Pauli operator having a physical Pauli operator on the index qubit (numbered when preamble is included).
The preamble/index distinction makes it easier to identify the most likely error patterns. For any size code, there are two weight2 errors in the preamble: and . Logically, these represent the product of operators on all encoded qubits. In the index qubits, any pair of errors is logical: . However, a pair of errors split with one each in preamble and index is always detectable by the code stabilizers. Thus, any single encoded qubit could have a logical error stemming from a pair of errors in two different configurations in the preamble or configurations in the index qubits. There is also one weightthree error. Each physicalstate error configuration is multiplied by a degeneracy factor that is the number of ways an even number of errors occur before the CNOT in Fig. 5.7, thereby evading the Hadamard measurement. Thus the probability of logical error is . The Hadamard measurement fails to detect an even number of errors in the logical input states. There are ways that a pair of encoded input errors could corrupt any given qubit and ways four errors could corrupt any given qubit (assuming ). This contributes error terms