Logic Synthesis for Fault-Tolerant Quantum Computers

Logic Synthesis for Fault-Tolerant Quantum Computers

Nathan Cody Jones

Electrical Engineering \principaladviserYoshihisa Yamamoto \firstreaderJelena Vuckovic \secondreaderWilliam D. Oliver,
MIT Lincoln Laboratory



Efficient constructions for quantum logic are essential since quantum computation is experimentally challenging. This thesis develops quantum logic synthesis as a paradigm for reducing the resource overhead in fault-tolerant quantum computing. The model for error correction considered here is the surface code. After developing the theory behind general logic synthesis, the resource costs of magic-state distillation for the gate are quantitatively analyzed. The resource costs for a relatively new protocol distilling multi-qubit Fourier states are calculated for the first time. Four different constructions of the fault-tolerant Toffoli gate, including two which incorporate error detection, are analyzed and compared. The techniques of logic synthesis reduce the cost of fault-tolerant quantum computation by one to two orders of magnitude, depending on which benchmark is used.

Using resource analysis for  gates and Toffoli gates, several proposals for constructing arbitrary quantum gates are compared, including “Clifford+” sequences, -basis sequences, phase kickback, and programmable ancilla rotations. The application of arbitrary gates to quantum algorithms for simulating chemistry is discussed as well. Finally, the thesis examines the techniques which lead to efficient constructions of quantum logic, and these observations point to even broader applications of logic synthesis.


Acknowledgements As you might expect, there is a long list of people to thank for contributing to a doctoral education. I owe a large debt of gratitude to my advisor, Yoshi Yamamoto. From the beginning, he knew that the best way to guide me was a hands-off approach. I was allowed to make my own mistakes (and fix them), knowing that I could count on his gentle counsel when it was needed.

There are so many members of the Yamamoto group that deserve thanks. Among the people that I worked with and learned from are: Darin Sleiter, Leo Yu, Shruti Puri, Chandra Natarajan, Na Young Kim, Kaoru Sanaka, Wolfgang Nitsche, Zhe Wang, Georgios Roumpos, Kai Wen, Dave Press, and Susan Clark. Kristiaan De Greve and Peter McMahon deserve special mention for their parts in (mostly) high-minded conversations, whether at Ike’s or the Rose and Crown, and for being true friends. Thaddeus Ladd only overlapped with me for three months in the group, but I learned a tremendous amount from him. He is largely responsible for developing my interest in quantum computer architecture, and I continue to learn from his astonishing breadth of knowledge (except in Starcraft, perhaps the only subject I know better). I must also thank Yurika Peterman and Rieko Sasaki, who work tirelessly in the background to keep the group functioning.

There are many academic collaborators from whom I learned a great deal. Jungsang Kim, a Yamamoto group alumnus, helped me focus on the engineering aspects of quantum architecture. In Japan, I found kindred spirits to my architecture research in Rod Van Meter, Simon Devitt, and Bill Munro. Austin Fowler taught me a great deal about the surface code, and maybe someday I will know half as much as he does on the matter. At Harvard, I learned far more about quantum chemistry than I ever would have expected from James Whitfield, Man-Hong Yung, and Alán Aspuru-Guzik. I was privileged to work on simulation and dynamical decoupling with Bryan Fong, Jim Harrington, and Thaddeus at HRL Laboratories, and I look forward to joining them as a colleague.

I want to thank the National Science Foundation, which supported my graduate studies through the Graduate Research Fellowships Program.

Finally, I want to thank my brother, Jason, and my loving parents, who made me into the man I am.


Chapter 1 Introduction to Quantum Computing

Quantum computing is a research field that promises to solve problems that normal computers cannot, using quantum physics. The notion of a powerful computer built on the esoteric rules of quantum mechanics sounds like an idea from science fiction, and the field has generated considerable interest in technical communities and the general public. However, quantum computing is built on very sound principles. There is ample theoretical analysis that shows the concept is viable, under the right conditions [1, 2, 3]. Furthermore, intense experimental work is steadily improving the reliability of quantum hardware [4, 5, 6]. Quantum computing is not science fiction, and the best evidence for this assertion is that the future of the field is not novel discoveries in physics, but rather steady advances in engineering.

The topic of this thesis is quantum logic synthesis. This is just one component of designing a quantum computer, but I will argue several times that it is a very important component. Logic synthesis is concerned with arranging the instructions in a quantum computer to minimize resource costs, such as number of quantum bits (“memory”) and gates (“calculations”). Based on current understanding of experimental hardware, error correction will be the most costly feature of a quantum computer, and logic synthesis will play a crucial role in managing these costs.

As an introduction to the subject, this chapter gives a high-level overview of quantum computing. I start with applications, just to show why the field has attracted the attention of so many. The next section gives a basic primer on quantum bits, gates, and measurement. The last section covers the greatest challenge for quantum information, noise and errors. For a more comprehensive introduction, I refer the reader to Ref. [7].

1.1 Applications of Quantum Computing

Quite a few applications for quantum computers have been identified. A website maintained by Stephen Jordan has the most comprehensive list that I have seen (“Quantum Algorithm Zoo,” [8]), which currently counts 50 different algorithms. The performance advantage for each over a classical computer varies, from polynomial to exponential to unknown. Some applications are very general, while others address narrowly defined problems. This section discusses just a handful of these algorithms, the ones which I think will have the greatest impact.

Shor’s integer-factoring algorithm [9] is one of the oldest and most widely known applications of quantum computing. Due to the connection to RSA cryptography [10], the integer factoring problem was already a problem of considerable interest to computer science. The heart of the matter is that multiplication of integers is computationally efficient (polynomial-bounded time and space complexity), but no efficient method in classical computing is known to decompose an integer into its prime factors. For some time, the digital security firm RSA Security (founded by the creators of the protocol) held an open challenge to factor numbers typical of the RSA protocol [11]. In this case, the number to be factored is , where and are prime numbers, typically both very large in size (e.g. around 1000 bits). Shor’s algorithm demonstrates that quantum computers can factor such a number in polynomial-bounded time and space. Nevertheless, the computation is rather complex when error correction is included, requiring perhaps millions of qubits and billions of gates [12, 13, 14].

Simulating quantum physics is another problem ideal for quantum computers [15, 16]. In this application, the state of a quantum system is encoded into quantum bits, and the time evolution of this state is reproduced in simulated time using quantum gates. Many useful quantities can be calculated, such as energy eigenvalues and chemical reaction rates [17]. Simulation algorithms will be examined in more detail in Chapter 8. Multiple forms of encoding are possible, and the choice has consequences for the way logic is synthesized. More recently, a closely related linear systems algorithm has been proposed [18], which may have promising applications like solving partial differential equations for electromagnetic scattering [19].

In these and other cases, the quantum computer solves a particular computational task better than a conventional computer. Even in doing so, the quantum computer requires substantial classical computing support for pre- and post-processing, as well as managing the considerable task of quantum error correction [20, 21]. For these reasons, quantum computers are appropriately viewed as “co-processors” that perform specialized tasks in a classical-quantum hybrid computing environment.

1.2 Quantum States, Operations, and Measurement

The information states in a quantum computer are normalized, complex-valued vectors. The elements of each such vector correspond to the projection into a basis. The most common basis will be the “computational basis”, which is spanned by binary values. For example, a single quantum bit, or qubit, is a superposition of the states and (the “ket” notation is a convention for identifying states). Quantum states must be normalized, so an arbitrary qubit state can be specified by


subject to the constraint that


Normalization ensures that, for measurement processes, the sum of probabilities for all outcomes sums to one. Quantum states can consist of multiple qubits. Furthermore, “global phase,” which is a scalar coefficient to any state, is meaningless in quantum mechanics. In general, the possible -qubit states span a basis of dimension , with the additional degrees of freedom of allowing complex amplitudes. For simplicity, states consisting of multiple qubits use abbreviated state notation, such as .

Some notational shorthand will be used throughout this thesis. The Pauli spin operators, which are used to define operations on a qubit, will be denoted as , , and . Similarly, is the identity operator, with dimensionality appropriate for its context. As an example, one might write the projector onto the state as , where , like , has dimension two.

The state of a quantum system is modified using gates. Each gate is unitary operator, meaning , where “” denotes conjugate transpose (Hermitian adjoint). One example is . When applied to a state, each righthand “bra” such as combines with a ket to form an inner product, which is a scalar-valued quantity. For example, , because the bra and ket vectors are parallel. Likewise, , because the two unit vectors are orthogonal. As a result, the action of on is .

A sequence of gates is often illustrated with a “circuit diagram,” as shown in Fig. 1.1. Operations read left to right in time, so first an gate flips the state of the top qubit, then a controlled-NOT (CNOT) gate will apply to the bottom qubit if the first is , otherwise it does nothing. The CNOT is a two-qubit gate, meaning that it couples the state of two qubits. The output state is .

Figure 1.1: Example of a quantum circuit diagram. The gate flips the top qubit to , then the CNOT will flip the state of the bottom qubit.

In addition to gates, quantum computation also relies on measurement to reveal the underlying state. However, the superposition nature of quantum states means that there is no single basis in which to measure states. This thesis will only consider strong projective measurements, where the measurement process projects the quantum system into one of several orthogonal states. This can be described using projectors, such as . A complete measurement basis is defined by the set such that . For some state , the probability of measuring outcome and projecting the system into is given by . The state after measurement result is determined by


which is known as the projection postulate of quantum mechanics (as noted above, the complex phase here has no effect). Informally, the system becomes consistent with the observed measurement. Frequently, measurement bases are those of Pauli operators, which play an important role in error correction [22, 7]. For example, the computational basis is and . This common measurement operation will be denoted .

In addition to gates and measurement, one must be able to initialize to a well-defined quantum state. The reason for saving this process for last is that initialization and measurement are dual operations. They are both non-unitary with respect to the logical space of computable states. Furthermore, the measurement process can be used to perform initialization, using the projection postulate of quantum mechanics. Other methods, such as cooling the system to a ground state, are also used in practice.

1.3 Noise and Decoherence

Quantum operations are not perfect. Gates will probabilistically introduce errors, and even idle qubits will experience “decoherence,” disturbance from the original state due to interactions with the environment. In all hardware platforms considered so far, the error rates are so high as to make error correction mandatory for reliable computing. Error correction will be the subject of later chapters, so I will just briefly review quantum errors here.

In general, an error is any change in the state of a quantum system that is not perfectly known by the system controller. Errors are modeled with a quantum distribution known as a density matrix. Previously, I introduced vectors like that are “pure” states, having no error and being perfectly defined. A density matrix is a probability-weighted distribution of pure states, which gives the likelihood of the system being in each of those states. For example, is “mixed” state for a system with 90% probability of being . The normalization for a density matrix is having a trace (sum of diagonal entries) of one: .

Density matrices are useful for modeling quantum noise. Imagine one starts in the pure state , and this state experiences dephasing noise, which is a positive probability of the phase being flipped by operator . Consider the dephasing channel . The initial density matrix is


After dephasing, the state is


If the qubit continues to dephase through such time intervals, the state is


where it becomes clear that dephasing is damping the off-diagonal terms of the density matrix. In the limit , dephasing turns the quantum state into an evenly mixed distribution of states and . Because phase is critically important to most quantum algorithms, this type of error will corrupt data.

A dephasing event can be modeled with the probabilistic application of operator . Another common error channel is the depolarizing channel,


which applies one of the Pauli errors with probability each, for total probability of error . Viewing error events this way avoids the need to explicitly write density matrices, allowing one to analyze error correction without knowing the underlying state. At an abstract level, error correction will act as a filter to catch these errors by performing measurements which reveal what error (if any) has occurred to the system. This technique will be used often in Chapters 57.

Chapter 2 Architecture of a Quantum Computer

Quantum computing as an engineering discipline is still in its infancy. Although the physics is well understood, developing devices which compute with quantum mechanics is technologically daunting. While experiments to date manipulate only a handful of quantum bits [4], this chapter considers what effort is required to build a large-scale quantum computer. One must consider the faulty quantum hardware, with errors caused by both the environment and deliberate control operations; when error correction is invoked, classical processing is required; constructing arbitrary gate sequences from a limited fault-tolerant set requires special treatment, and so on. Quantum computer architecture, the subject of this chapter, is a framework to address the complete challenge of designing a quantum computer.

This chapter provides an overview of the steps for designing a quantum computer, and it is based on Ref. [13]. The chapter concludes with resource estimates for large-scale quantum computation. Although they were the best estimates when Ref. [13] was published, the logic being used was not optimized. The daunting numbers, such as million qubits, serve as a pretext for why resource-reduction techniques through logic synthesis, the subject of this thesis, are so important.

2.1 Layered Architecture Overview

Many researchers have presented and examined components of large-scale quantum computing. This chapter considers how these components may be combined in an efficient design, and later chapters introduce methods to improve the quantum computer. This engineering pursuit is quantum computer architecture, which is developed here in layers. An architecture decomposes complex system behaviors into a manageable set of operations. A layered architecture does this through layers of abstraction where each embodies a critical set of related functions. Each ascending layer brings the system closer to an ideal quantum computing environment by suppressing errors and hiding implementation details not needed elsewhere. This section reviews the field of quantum computer architecture, then discusses the layered architecture of Ref. [13].

2.1.1 Prior Work on Quantum Computer Architecture

Many different quantum computing technologies are under experimental investigation [4], but for each a scalable system architecture remains an open research problem. Since DiVincenzo introduced his fundamental criteria for a viable quantum computing technology [23] and Steane emphasized the difficulty of designing systems capable of running quantum error correction (QEC) adequately [24, 25, 26], several groups of researchers have examined the architectural needs of large-scale systems [27, 28]. As an example, small-scale interconnects have been proposed for many technologies, but the problems of organizing subsystems using these techniques into a complete architecture for a large-scale system have been addressed by only a few researchers. In particular, the issue of heterogeneity in system architecture has received relatively little attention.

The most important subroutine in fault-tolerant quantum computers considered thus far is the preparation of ancilla states for fault-tolerant circuits, because very many ancillas are required [12, 13, 14]. Taylor et al. proposed a design with alternating “ancilla blocks” and “data blocks” in the device layout [29]. Steane introduced the idea of “factories” for creating ancillas [24], as examined later in this chapter. Isailovic et al. [12] studied this problem for ion trap architectures and found that, for typical quantum circuits, approximately 90% of the quantum computer must be devoted to such factories in order to calculate “at the speed of data,” or where ancilla-production is not the rate-limiting process. The results in this chapter are in close agreement with this estimate. Metodi et al. also considered production of ancillas in ion trap designs, focusing instead on a 3-qubit ancilla state used for the Toffoli gate [30], which is an alternative pathway to a universal fault-tolerant set of gates.

Some researchers have studied the difficulty of moving data in a quantum processor. Kielpinski et al. proposed a scalable ion trap technology utilizing separate memory and computing areas [31]. Because quantum error correction requires rapid cycling across all physical qubits in the system, this approach is best used as a unit cell replicated across a larger system. Other researchers have proposed homogeneous systems built around this basic concept. One common structure is a recursive H tree, which works well with a small number of layers of a Calderbank-Shor-Steane (CSS) code, targeted explicitly at ion trap systems [32, 33]. Oskin et al. [34], building on the Kane solid-state NMR technology [35], proposed a loose lattice of sites, explicitly considering the issues of classical control and movement of quantum data in scalable systems, but without a specific plan for QEC. In the case of quantum computing with superconducting circuits, the quantum von Neumann architecture specifically considers dedicated hardware for quantum memories, zeroing registers, and a quantum bus [5].

Long-range coupling and communication is a significant challenge for quantum computers. Cirac et al. proposed the use of photonic qubits to distribute entanglement between distant atoms [36], and other researchers have investigated the prospects for optically-mediated nonlocal gates [37, 38, 39, 40, 41]. Such photonic channels could be utilized to realize a modular, scalable distributed quantum computer [42]. Conversely, Metodi et al. consider how to use local gates and quantum teleportation to move logical qubits throughout their ion-trap QLA architecture [30]. Fowler et al. [43] investigated a Josephson junction flux qubit architecture considering the extreme difficulties of routing both the quantum couplers and large numbers of classical control lines, producing a structure with support for CSS codes and logical qubits organized in a line. Whitney et al. [44, 45] have investigated automated layout and optimization of circuit designs specifically for ion trap architectures, and Isailovic et al. [46, 12] have studied interconnection and data throughput issues in similar ion trap systems, with an emphasis on preparing ancillas for teleportation gates [47].

Other work has studied quantum computer architectures with only nearest-neighbor coupling between qubits in an array [48, 49, 50, 51, 21, 13, 14], which is appealing from a hardware design perspective. With the recent advances in the operation of the topological codes and their desirable characteristics such as having a high practical threshold and requiring only nearest-neighbor interactions, research effort has shifted toward architectures capable of building and maintaining large two- and three-dimensional cluster states [52, 53, 20, 54]. These systems rely on topological error correction models [55, 56, 57, 58], whose higher tolerance to error often comes at the cost of a larger footprint in the hardware, relative to, for example, implementations based on the Steane code [59]. The surface code [56, 57, 60, 61, 14], which is studied throughout this thesis, belongs to the topological family of codes.

Recent attention has been directed at distributed models of quantum computing. Devitt et al. studied how to distribute a photonic cluster-state quantum computing network over different geographic regions [62]. The abstract framework of a quantum multicomputer recognizes that large-scale systems demand heterogeneous interconnects [63]. For most quantum computing technologies, it may be challenging to build monolithic systems that contain, couple, and control billions of physical qubits. Van Meter et al. [64] extended this architectural framework with a design based on nanophotonic coupling of electron spin quantum dots that explicitly uses multiple levels of interconnect with varying coupling fidelities (resulting in varying purification requirements), as well as the ability to operate with a very low yield of functional devices. Although that proposed system has many attractive features, concerns about the difficulty of fabricating adequately high quality optical components and the desire to reduce the surface code lattice cycle time led to the system design proposed in Ref. [13].

2.1.2 Layered Framework

A good architecture must have a simple structure while also efficiently managing the complex array of resources in a quantum computer. Layered architectures are a conventional approach to solving such engineering problems in many fields of information technology. For example, Ref. [33] presents a layered architecture for quantum computer design software. The architecture developed in Ref. [13] describes the physical design of the quantum computer, which consists of five layers, where each layer has a prescribed set of duties to accomplish. The interface between two layers is defined by the services a lower layer provides to the one above it. To execute an operation, a layer must issue commands to the layer below and process the results. Designing a system this way ensures that related operations are grouped together and that the system organization is hierarchical. Such an approach allows quantum engineers to focus on individual challenges, while also seeing how a process fits into the overall design. The architecture is organized in layers to deliberately create a modular design for the quantum computer.

The layered framework can be understood by a control stack composed of the five layers in the architecture. Figure 2.1 shows an example of the control stack for the specific quantum dot architecture considered in this chapter [13], but the particular interfaces between layers will vary according to the physical hardware, quantum error correction scheme, etc. that one chooses to implement. At the top of the control stack is the Application layer, where a quantum algorithm is implemented and results are provided to the user. The bottom Physical layer hosts the raw physical processes supporting the quantum computer. The layers between (Virtual, Quantum Error Correction, and Logical) are essential for shaping the faulty quantum processes in the Physical layer into a system of reliable fault-tolerant [3, 65, 7, 25, 26, 66] qubits and quantum gates at the Application layer.

Figure 2.1: Layered control stack which forms the framework of a quantum computer architecture. Vertical arrows indicate services provided to a higher layer. Originally published in Ref. [13].

2.1.3 Interaction between Layers

Two layers meet at an interface, which defines how they exchange instructions or the results of those instructions. Many different commands are being executed and processed simultaneously, so one must also consider how the layers interact dynamically. For the quantum computer to function efficiently, each layer must issue instructions to layers below in a tightly defined sequence. However, a robust system must also be able to handle errors caused by faulty devices. To satisfy both criteria, a control loop must handle operations at all layers simultaneously while also processing syndrome measurements to correct errors that occur. A prototype for this control loop is shown in Fig. 2.2.

Figure 2.2: Primary control cycle of the layered architecture quantum computer. Whereas the control stack in Fig. 2.1 dictates the interfaces between layers, the control cycle determines the timing and sequencing of operations. The dashed box encircling the Physical layer indicates that all quantum processes happen exclusively here, and the layers above process and organize the operations of the Physical layer. The Application layer is external to the loop since it functions without any dependence on the specific quantum computer design. Originally published in Ref. [13].

The primary control cycle defines the dynamic behavior of the quantum computer in this architecture since all operations must interact with this loop. The principal purpose of the control cycle is to successfully implement quantum error correction. The quantum computer must operate fast enough to correct errors; still, some control operations necessarily incur delays, so this cycle does not simply issue a single command and wait for the result before proceeding — pipelining is essential [67, 12]. A related issue is that operations in different layers occur on drastically different timescales, as discussed later in Section 2.5. Figure 2.2 also describes the control structure needed for the quantum computer. Processors at each layer track the current operation and issue commands to lower layers. Layers 1 to 4 interact in the loop, whereas the Application layer interfaces only with the Logical layer, making the algorithm independent of the hardware.

2.2 Quantum Hardware and Control

The essential requirements for the Physical layer are embodied by the DiVincenzo criteria [23]. The layered framework for quantum computing was developed in tandem with a specific hardware platform, known as QuDOS (quantum dots with optically-controlled spins). The QuDOS platform uses electron spins within quantum dots for qubits. The quantum dots are arranged in a two-dimensional array; Figure 2.3 shows a cut-away rendering of the quantum dot array inside an optical microcavity, which facilitates control of the electron spins with laser pulses. Reference [13] argued that the QuDOS design is a promising candidate for large-scale quantum computing, and I use it here as a model for generating concrete resource estimates.

Figure 2.3: Quantum dots in a planar optical microcavity form the basis of the QuDOS hardware platform. (a) The quantum dots are arranged 1 m apart in a two-dimensional square array. The quantum dots trap single electrons, whose spins will be used for quantum information processing. (b) Side view. The electron spins are manipulated with laser pulses sent into the optical cavity from above, and two neighboring quantum dots can be coupled by a laser optical field which overlaps them. The purple and green layers are AlGaAs and GaAs, grown by molecular beam epitaxy. The alternating layers form a distributed Bragg reflector (DBR) optical cavity which is planar, confining light in the vertical direction and extending across the entire system in horizontal directions. Originally published in Ref. [13].

The physical qubit used by QuDOS is the spin of an electron bound within an InGaAs self-assembled quantum dot (QD) surrounded by GaAs substrate [68, 69, 70, 71, 72, 73]. These QDs can be optically excited to trion states (a bound electron and exciton), which emit light of wavelength nm when they decay. A transverse magnetic field splits the spin levels into two metastable ground states [74], which form the computational basis states for a qubit. The energy separation of the spin states is important for two reasons related to controlling the electron spin [75]. First, the energy splitting facilitates control with optical pulses. Second, there is continuous phase rotation between spin states and around the -axis on the qubit Bloch sphere, which in conjunction with timed optical pulses provides complete unitary control of the electron spin vector.

The electron spin is bound within a quantum dot. These quantum dots are embedded in an optical microcavity, which will facilitate quantum gate operations via laser pulses. To accommodate the two-dimensional array of the surface code detailed in Layer 3, this microcavity must be planar in design, so the cavity is constructed from two distributed Bragg reflector (DBR) mirrors stacked vertically with a cavity layer in between, as shown in Fig. 2.3. This cavity is grown by molecular beam epitaxy (MBE). The QDs are embedded at the center of this cavity to maximize interaction with antinodes of the cavity field modes. Using MBE, high-quality () microcavities can be grown with alternating layers of GaAs/AlAs [76]. The nuclei in the quantum dot and surrounding substrate have nonzero spin, which is an important source of noise that must be suppressed through control techniques like dynamical decoupling [77, 78, 79, 80, 81, 82, 83, 84, 85, 13].

Control in QuDOS uses laser pulses which selectively target quantum dots; see Ref. [13] for details. The 1-qubit operations are developed using a transverse magnetic field and ultrafast laser pulses [75, 73]. The construction of a practical, scalable 2-qubit gate in QuDOS remains the most challenging element of the hardware, and various methods are currently under development. A fast, all-optically controlled 2-qubit gate would certainly be attractive, and early proposals [69] identified the importance of employing the nonlinearities of cavity QED. Reference [69] suggests the application of two lasers for both single-qubit and 2-qubit control; more recent developments have indicated that both single-qubit gates [86, 87, 75] and 2-qubit gates [88] can be accomplished using only a single optical pulse.

QuDOS requires a measurement scheme that is still under experimental development. The proposed mechanism (shown in Fig. 2.4) is based on Faraday/Kerr rotation. The underlying physical principle is as follows: an off-resonant probe pulse impinges on a quantum dot, and it receives a different phase shift depending on whether the quantum dot electron is in the spin-up or spin-down state (these are separated in energy by the external magnetic field). Sensitive photodetectors combined with homodyne detection measure the phase shift to enact a projective QND measurement on the electron spin. Several results in recent years have demonstrated the promise of this mechanism for measurement: multi-shot experiments by Berezovsky et al. [89] and Atatüre et al. [90] have measured spin-dependent phase shifts in charged quantum dots, and Fushman et al. [91] observed a large phase shift induced by a neutral quantum dot in a photonic crystal cavity. Most recently, Young et al. observed a significantly enhanced phase shift from a quantum dot embedded in a micropillar cavity [92].

Figure 2.4: A dispersive quantum non-demolition (QND) readout scheme for QuDOS. (a) A probe pulse is sent into a microcavity containing a charged quantum dot. (b) The cavity-enhanced dispersive interaction between the pulse and the electron spin creates a state-dependent phase shift in the light which leaves the cavity. Measurement of the phase shift can perform projective measurement on the electron spin. Originally published in Ref. [13].

2.3 Error Correction and Fault Tolerance

Error correction is essential to quantum computation, given current understanding of hardware technology. Some of the best experimental results achieve an error-per-operation of about or  (Refs. [4, 6], and references therein), but even these impressive feats are not close to the to error rates needed for large-scale quantum algorithms, as will be demonstrated below. The gap can be bridged with fault-tolerant quantum computing [1, 3, 65, 7], so long as error rates in the hardware are below a threshold value which is specific to the code being used [2].

The threshold theorem has garnered significant attention in the community, but it is sometimes mistakenly presumed that the threshold itself is the target performance. A functioning quantum error correction system must operate below threshold, and a practical system must operate well below threshold. Later chapters show that the resources required for error correction become manageable when the hardware error rate is about an order of magnitude below the threshold of the chosen code. The code used throughout this thesis is the surface code [55, 56, 57, 60], which is distinguished by requiring only nearest-neighbor operations in two dimensions and by having a high threshold around 1% error per physical gate [93, 61, 14].

This section discusses some of the features of error correction that are salient to quantum computer architecture. First, I briefly outline the advantages of the surface code. Second, I discuss the use of Pauli frames, which is a simple but effective technique for reducing the number of gates implemented. Finally, I give an overview of magic-state distillation, which is a powerful technique in fault tolerance and the subject more intense investigation in Chapter 5.

2.3.1 Surface Code Error Correction

As just mentioned, the primary justifications for the surface code are that it requires only a two-dimensional geometry of nearest-neighbor gates in the hardware, yet still has one of the highest threshold error rates of any code considered thus far [57, 14]. There is also evidence that the surface code might have lower overhead than other codes, when the demands of fault tolerance are considered [94].

In this thesis, I base nearly all of my analysis on a hypothetical quantum computer that uses surface code error correction. A complete explanation of the code and its properties is a subject of active research, so I defer to the literature [55, 56, 57, 60, 95, 93, 61, 54, 96, 97, 98, 99, 94, 14, 100]. Some of the features will be examined throughout the thesis. Chapter 3 shows how to depict the dynamic implementation of operations in the surface code, as well as calculating a power-law approximation to how resources scale with increasing levels of error correction. Still, I only touch on the aspects immediately relevant to my analysis, and otherwise I assume the reader is familiar with the mechanics of the surface code.

2.3.2 Pauli Frames

A Pauli frame [101, 102] is a simple and efficient classical computing technique to track the result of applying a series of Pauli gates (, , or ) to single qubits. The Gottesman-Knill Theorem implies that tracking Pauli gates can be done efficiently on a classical computer [103]. Many quantum error correction codes, such as the surface code, project the encoded state into a perturbed codeword with erroneous single-qubit Pauli gates applied (relative to states within the code subspace). The syndrome reveals what these Pauli errors are, up to undetectable stabilizers and logical operators, and error correction is achieved by applying those same Pauli gates to the appropriate qubits (since Pauli gates are Hermitian and unitary). However, quantum gates are faulty, and applying additional gates may introduce more errors into the system.

Rather than applying every correction operation, one can keep track of what Pauli correction operation would be applied, and continue with the computation. This is possible because the operations needed for error correction are in the Clifford group. When a measurement in a Pauli , , or basis is finally made on a qubit, the result is modified based on the corresponding Pauli gate which should have been applied earlier. This stored Pauli gate is called the Pauli frame [101, 102], since instead of applying a Pauli gate, the quantum computer changes the reference frame for the qubit, which can be understood by remapping the axes on the Bloch sphere, rather than moving the Bloch vector.

I want to emphasize that the Pauli frame is a classical object stored in the digital circuitry that handles error correction. Pauli frames are nonetheless very important to the functioning of a surface code quantum computer. Layer 3 in the control stack (Fig. 2.1) uses a Pauli frame with an entry for each qubit in the error-correcting code. As errors occur, the syndrome processing step identifies a most-likely pattern of Pauli errors. Instead of applying the recovery step directly, the Pauli frame is updated in classical memory. The Pauli gates form a closed group under multiplication (and global phase of the quantum state is unimportant), so a Pauli frame only tracks one of four values (, , , or ) for each qubit in the hardware.

The Pauli frame is maintained as follows. Denote the Pauli frame at time as :


where is an element from the Pauli group corresponding to qubit at time . Any Pauli gate in the quantum circuit is multiplied into the Pauli frame and is not implemented in hardware, so for all Pauli gates in the circuit at time . Other gates in the Clifford group are implemented in hardware, but they also transform the Pauli frame according to


When using Pauli frames, the flow of the computation proceeds in the same manner as if Pauli gates were being implemented, with the only change being how the final measurement of that qubit is interpreted. The set of Clifford gates is sufficient for implementing surface code error correction, though one also needs to implement non-Clifford logical operations for universal quantum computing.

Quantum algorithms need to apply gates outside the Clifford group. When using a Pauli frame, the gate that is actually implemented, , is given by:


Note the distinction between this expression and Eqn. (2.2). In Eqn. (2.2), the Pauli frame is changed by application of Clifford-group gate, but here an unchanging Pauli frame modifies the gate that is applied.

2.3.3 Magic-State Distillation

In the layered framework, the Logical layer takes the fault-tolerant resources from Layer 3 and creates a logical substrate for universal quantum computing. This task requires additional processing of error-corrected gates and qubits to produce any arbitrary gate required in the Application layer [13]. Quantum error correction provides only a limited set of gates, such as the Clifford group (or only a subset thereof, as in the surface code [60]). Although circuits from this set can be simulated efficiently on a classical computer by the Gottesman-Knill Theorem [7], the Clifford group forms the backbone of quantum logic.

The Logical layer constructs arbitrary gates from circuits of fundamental gates and ancillas injected into the error-correcting code [60, 13]. For example, surface code architectures inject and purify the ancillas and ; then the surface code consumes these ancillas in quantum circuits to produce and gates, respectively [7, 56, 57, 60]. Because the ancillas are faulty, they must be purified through a process known as magic-state distillation [104, 57, 60, 64, 13, 105, 106, 107].

Magic-state distillation will be examined at length in Chapter 5. For now, I only want to explain the simple method that was used in Ref. [13]. Consider the process of distilling the ancilla state that is used to construct the  gate [104, 57, 60, 13]. Figure 2.5 provides an illustration of why this process is important by showing the fault-tolerant construction of a Toffoli gate at the Application layer using distilled ancillas at the Logical layer. Two separate analyses contend that ancilla distillation circuits constitute over 90% of the computing effort for a single Toffoli gate [12, 13]. Viewed another way, for every qubit used by the algorithm, approximately 10 qubits are working in the background to generate the necessary distilled ancillas.

Figure 2.5: A Toffoli gate () at the Application layer is constructed with assistance from the Logical layer, using the decomposition in Ref. [7]. There are only three application qubits, but substantially more logical qubits are needed for distillation circuits in Layer 4. The ancillas are the result of two levels of distillation ( is an injected state) on the ancilla required for  gates. Note that each time an ancilla is used with measurement, the Pauli frame may need to be updated. The ancilla-based circuit for  gates is not shown here, for clarity. Modified from version published in Ref. [13].

The circuit in Fig. 2.5 shows one level of distillation, but a lengthy computation like Shor’s algorithm will typically require two levels, where the outputs of the first round are distilled again. Moreover, since perhaps trillions of distilled ancillas will be needed for the entire algorithm, QuDOS uses a “distillation factory” [24, 64], which is a dedicated region of the computer that continually produces these states as fast as possible. Speed is important, because ancilla distillation can be the rate-limiting step in quantum circuits [12]. Figure 2.5 shows how to construct a Toffoli gate, but the  gates can be used to approximate any other gate as well (see Ref. [7]; more details in Chapter 7).

Each distillation circuit will require 15 lower-level states, but they are not all used at the same time. For simplicity, set the “clock cycle time” for each gate equal to the time to implement a logical CNOT, so that with initialization and measurement, the distillation circuit requires 6 cycles. By only using ancillas when they are needed, the circuit can be compacted to require at most 12 logical qubits at any time. The computing effort can be characterized by a “circuit volume,” which is the product of logical memory space (i.e. area of the computer) and time. The circuit volume of distillation is . A two-level distillation will require 16 distillation circuits, or a circuit volume of . An efficient distillation factory with area will produce on average distilled ancillas per clock cycle. Table 2.1 summarizes of these results.

Parameter Symbol Value
Circuit depth 6 clock cycles
Circuit area 12 logical qubits
Circuit volume 72 qubitscycles
Factory rate (level )
Table 2.1: Resource analysis for a distillation factory. These factories are crucial to quantum computers which require ancillas for universal gates. Magic-state distillation uses Clifford gates and measurement, so the circuit can be deformed to reduce depth and increase area, or vice versa, while keeping volume approximately constant.

As a research effort, magic-state distillation has exploded in the last year. Chapter 5 will cover these matters in more detail, but many new results were produced in the short time since Ref. [13] was published. Fowler and Devitt developed a highly efficient implementation of distillation in the surface code, along with good estimates of resources [94]. Several new schemes for distilling states were also developed [105, 106, 107]. Section 5.2 will examine my proposal for “multilevel distillation,” which is asymptotically very efficient but perhaps too complicated to be useful in practice. These developments, and those in other chapters, will dramatically lower the cost of fault-tolerant quantum computing. As I mentioned at the outset to this chapter, one of the purposes of the calculations given here is to provide contrast for the new methods developed later.

2.4 Quantum Algorithms

The Application layer is where quantum algorithms are executed. The efforts of Layers 1 through 4 have produced a computing substrate that supplies any arbitrary gate needed. The Application layer is therefore not concerned with the implementation details of the quantum computer—it is an ideal quantum programming environment. This section deals with estimating the resources required for a target application. This analysis can indicate the feasibility of a proposed quantum computer design, which is a worthwhile consideration when evaluating the long-term prospects of a quantum computing research program.

A quantum engineer could start here in Layer 5 with a specific application in mind and work down the layers to determine the system design necessary to achieve desired functionality. I take this approach for QuDOS by examining two interesting quantum algorithms: Shor’s factoring algorithm and simulation of quantum chemistry. A rigorous system design is beyond the scope of the present work, but this section considers the computing resources required for each application in sufficient detail that one may gauge the engineering effort necessary to design a quantum computer based on QuDOS technology.

2.4.1 Elements of the Application Layer

The Application layer is composed of application qubits and gates that act on the qubits. Application qubits are logical qubits used explicitly by a quantum algorithm. As discussed in Section 2.3.3, many logical qubits are also used to distill ancilla states necessary to produce a universal set of gates, but these distillation logical qubits are not visible to the algorithm in Layer 5. When an analysis of a quantum algorithm quotes a number of qubits without reference to fault-tolerant error correction, often this means the number of application qubits [108, 16, 109, 110]. Similarly, Application-layer gates are equivalent in most respects to logical gates; the distinction is made according to what resources are visible to the algorithm or deliberately hidden in the machinery of the Logical layer, which affords some discretion to the computer designer.

A quantum algorithm could request any arbitrary gate in Layer 5, but not all quantum gates are equal in terms of resource costs. As shown in Section 2.3.3, distilling ancillas for gates is a very expensive process. For example, Fig. 2.5 shows how Layers 4 and 5 coordinate to produce an Application-layer Toffoli gate, illustrating the extent to which ancilla distillation consumes resources in the computer. When ancilla preparation is included,  gates can account for over 90% of the circuit complexity in a fault-tolerant quantum algorithm [12, 13].

When analyzing algorithms, it is convenient to count resources in terms of Toffoli gates. This is a natural choice, because the level of ancilla distillation, number of virtual qubits, etc. depend on the choice of hardware, error correction, and many other design-specific parameters; by comparison, number of Toffoli gates is machine-independent since this quantity depends only on the algorithm (much like the number of application qubits mentioned above). To determine error correction or hardware resources for a given algorithm, one can take the Layer 5 resource estimates and work down through Layers 4 to 1, which is an example of modularity in this architecture framework. As shown in Ref. [13], an Application-layer Toffoli gate in QuDOS has an execution time of 930 s (31 logical gate cycles including the  gate circuits).

2.4.2 Shor’s Integer-Factoring Algorithm

Perhaps the most well-known application of quantum computers is Shor’s algorithm, which decomposes an integer into its prime factors [9]. Solving the factoring problem efficiently would compromise the RSA cryptosystem [10]. Because of the prominence of Shor’s algorithm in the field of large-scale, fault-tolerant quantum computing, I estimate the resources required to factor a number of size typical for RSA.

A common key length for RSA public-key cryptography is 1024 bits. Factoring a number this large is not trivial, even on a quantum computer, as the following analysis shows. Figure 2.6 shows the expected run time on QuDOS for one iteration of Shor’s algorithm versus key length in bits for two different quantum computers: one where system size increases with the problem size, and one where the system size is limited to logical qubits (including application qubits). For the fixed-size quantum computer, the runtime begins to grow faster than the minimal circuit depth when factoring numbers 2048 bits and higher. Fixing the machine size highlights the importance of the ancilla distillation factories. For this instance of Shor’s algorithm, about 90% of the machine should be devoted to distillation; if insufficient resources are devoted to distillation, performance of the factoring algorithm plummets. For example, the 4096-bit factorization devotes of the machine to distillation, but about as many factories would be needed to achieve maximum execution speed in the lower trace in Fig. 2.6. I should also mention here that Shor’s algorithm is probabilistic, so a few iterations may be required [9].

Figure 2.6: Execution time for Shor’s algorithm, using the same circuit implementation as Ref. [64]. The vertical axis shows circuit depth, in terms of Toffoli gates, and the plot is labeled with estimated runtime on the QuDOS architecture. The blue trace is a quantum computer whose size in logical qubits scales as necessary to compute at the speed of data (no delays). The green trace is a machine limited to logical qubits, which experiences rapidly increasing delays as problem size increases beyond 2048 bits. The problem is that insufficient resources are available to distill ancillas for  gates, which are used to produce universal logic. The inset shows the same data on a linear vertical scale, illustrating when the quantum computer experiences delays for lack of enough qubits. Originally published in Ref. [13].

2.4.3 Simulation of Quantum Chemistry

Quantum computers were inspired by the problem that simulating quantum systems on a classical computer is fundamentally difficult. Feynman postulated that one quantum system could simulate another much more efficiently than a classical processor, and he proposed a quantum processor to perform this task [111]. Quantum simulation is one of the few known quantum algorithms that solves a useful problem believed to be intractable on classical computers, so I estimate the resource requirements for quantum simulation in QuDOS, and more details are available in Ref. [13].

This section specifically considers fault-tolerant quantum simulation. Other methods of simulation are under investigation [112, 113, 114], but they lie outside the scope of this work. The particular example selected here is simulating the Schrödinger equation for time-independent Hamiltonians in first-quantized form, where each Hamiltonian represents the electron/nuclear configuration in a molecule [115, 116, 17, 117]. An application of such a simulation is to determine ground- and excited-state energy levels in a molecule. This analysis focuses on first-quantized instead of second-quantized form for better resource scaling at large problem sizes [17]. Digital quantum simulation will also be examined in Chapter 8.

Figure 2.7 shows the time necessary to execute the simulation algorithm for determining an energy eigenstate on the QuDOS computer as a function of the size of the simulation problem, expressed in number of electrons and nuclei. First-quantized form stores the position-basis information for an electron wavefunction in a quantum register, and the complete Hamiltonian is a function of one- and two-body interactions between these registers, so this method does not depend on the particular molecular structure or arrangement; hence, the method is very general. Note that the calculation time scales linearly in problem size, as opposed to the exponential scaling seen in exact classical methods. The precision of the simulation scales with the number of time steps simulated [16], and this example uses time steps for a maximum precision of about 3 significant figures.

Figure 2.7: Execution time for simulation of a molecular Hamiltonian in first-quantized form, as a function of problem size. The horizontal axis is number of particles being simulated, and the plot is labeled with some interesting examples from chemistry. The vertical axis is circuit depth in Toffoli gates, and the plot is labeled with estimated runtime on QuDOS. Each simulation uses 12-bit spatial precision in the wavefunction and time steps for 10-bit precision in readout, or at most significant figures. The linear scaling in algorithm runtime versus problem size is due to two-body potential energy calculations, which constitute the majority of the quantum circuit. The number of potential energy calculations increases quadratically with problem size, but through parallel computation they require linear execution time [13, 117]. Originally published in Ref. [13].

2.5 Quantum Computing and the Need for Logic Synthesis

The factoring algorithm and quantum simulation represent interesting applications of large-scale quantum computing, and for each the computing resources required of a layered architecture based on QuDOS are listed in Table 2.2. The algorithms are comparable in total resource costs, as reflected by the fact that these two example problems require similar degrees of error correction. The simulation algorithm is more compact than Shor’s, requiring fewer logical qubits for distillation, which is a consequence of this algorithm performing fewer arithmetic operations in parallel. However, Shor’s algorithm has a shorter execution time owing to its use of parallel computation. Both algorithms can be accelerated through parallelism if the quantum computer has more logical qubits available [118, 117].

Shor’s Molecular
Computing Resource Algorithm Simulation
(1024-bit) (alanine)
Layer 5 Application qubits 6144 6650
Circuit depth (Toffoli)
Layer 4 Log. distillation qubits 66564 15860
Logical clock cycles
Layer 3 Code distance 31 31
Error per lattice cycle
Layer 2 Virtual qubits
Error per virtual gate
Layer 1 Quantum dots
(area on chip) (4.54 ) (1.40 )
Execution time (est.) 1.81 days 13.7 days
Table 2.2: Summary of the computing resources in a layered architecture based on the QuDOS platform, for Shor’s algorithm factoring a 1024-bit number (same implementation as Ref. [64]) and the ground state simulation of the molecule alanine () using first-quantized representation.

Precise timing and sequencing of operations are crucial to making an architecture efficient. In the layered framework presented by Ref. [13], an upper layer in the architecture depends on processes in the layer beneath, so that logical gate time is dictated by QEC operations, and so forth. This system of dependence of operation times is depicted for QuDOS in Fig. 2.8. The horizontal axis is a logarithmic scale in the time to execute an operation at a particular layer, while the arrows indicate fundamental dependence of one operation on other operations in lower layers.

Figure 2.8: Relative timescales for critical operations in QuDOS within each layer. Each bar indicates the approximate timescale of an operation, and the width indicates that some operation times may vary with improvements in technology. The arrows indicate dependence of higher operations on lower layers. The red arrow signifies that the surface code lattice refresh must be 2–3 orders of magnitude faster than the dephasing time in order for error correction to function. The Application layer is represented here with a Toffoli gate, which is a common building block of quantum algorithms. Complete algorithm runtimes can vary significantly, depending on both the capabilities of the quantum computer and the specific way each algorithm is compiled, such as to what extent calculations are performed in parallel. Originally published in Ref. [13].

Examining Fig. 2.8, the operation timescales increase as one moves to higher layers. This is because a higher layer must often issue multiple commands to layers below. A crucial point shown in Fig. 2.8 is that the time to implement a logical quantum gate is four orders of magnitude greater than the duration of each individual physical gate, such as a laser pulse. For large-scale quantum computing, the speed of error-corrected operations is the crucial figure of merit, and the substantial overhead for fault tolerance shown in Fig. 2.8 indicates that improved methods are needed.

The findings in Table 2.2 and Fig. 2.8 were, more or less, the key results of Ref. [13]. In an earnest attempt to design the architecture of a quantum computer, it was revealed that a few error correction processes accounted for a substantial portion of the resource overhead. These include magic-state distillation, Toffoli gates, and approximations to arbitrary gates. These tasks all involve the synthesis of fault-tolerant quantum logic, and it soon became apparent to other researchers and myself that significant improvements are possible by optimizing the logic constructions. Quantum logic synthesis is the subject of my thesis, and the following chapters develop methodology and novel techniques for this new field of research. The processes listed above are considered explicitly in Chapters 57. The methods in those chapters will improve on the resource costs given here by about a factor of 500.

Chapter 3 Preliminaries for Quantum Logic

Quantum logic is the result of composition. Every quantum program is a sequence of instructions, each being one of three types: preparing quantum states (qubits), applying unitary operations (gates), and performing projective measurement. In addition to quantum logic, classical logic is often included when gates are conditioned on the result of an earlier measurement. Because the order of operations is important, quantum programs can be quite complicated. This chapter examines how quantum programs are specified, how programs are represented in diagrams, and how the resource costs are calculated for a program in the surface code.

Informative diagrams are essential for quantum circuit designers to see the action of a sequence of operations. Having easy-to-understand pictorial diagrams helps to: design programs, adapt using previous results, identify mistakes, and communicate results. This chapter discusses two types of quantum logic diagrams. The first is the familiar quantum circuit, which was introduced in Chapter 1. The second type is a surface code topology diagram, which is a three-dimensional rendering of how quantum logic is implemented using surface code error correction. Surface codes are preferred in this work for reasons outlined in Chapter 2. What is particularly useful about this diagram is that it provides both visual and quantitative assessment of actual resource costs at the hardware level; the disadvantage is that such diagrams are difficult to interpret alone. Circuit diagrams and surface code diagrams will play complementary roles in this thesis.

Analyzing resource costs is essential to quantum logic synthesis. The objective is to compose logic in a way that minimizes costs while ensuring reliable execution of the quantum program. This chapter concludes by explaining how to quantitatively estimate resource costs in the surface code. I also introduce the concept of the Trivial Upper Bound (TUB), which for any program is the resource costs for using a naive, “worst case” compilation. The TUB represents the cost of a program that surely works but is probably not optimal, and TUBs will be used as benchmarks to demonstrate the efficacy of logic synthesis.

3.1 Quantum Programs

A quantum program is any sequence of operations on a quantum state. As mentioned in the introduction, there are three types of operations: initialization of quantum states, unitary gates, and measurement. A program is defined by this sequence of operations and any input or output states that are fixed externally. A program might not have an input state; if this is the case, the program initializes all of its quantum data. A program also might not have an output state, returning only classical information from internal measurements. A key concept for logic synthesis is that two programs are logically equivalent if they produce the same outputs from the same inputs, within some specified error tolerance.

In practice, most operations belong to finite sets. Within an error correcting code, such as the surface code [56, 57, 60], the logical operations are constrained. The Eastin-Knill theorem and related results indicate that it is impossible for all logical operations to be native to the code [119, 120]. Moreover, the available error-corrected operations are often discrete. The error-corrected operations supported by the surface code are:

  • Initialize or (-basis or -basis, respectively);

  • Unitary , , or gate (via Pauli frame, see [101, 102, 13]);

  • Unitary gate (Hadamard);

  • Unitary CNOT gate;

  • Measurement or (-basis or -basis, respectively).

These operations are not universal for quantum computing, but they will account for most of the operations in quantum programs.

The final operation in the surface code is the ability to initialize a single qubit in any arbitrary state, though it has error probability proportional to the hardware error rate. The qubit is called an “injected state,” because it was teleported into the code using faulty methods [56, 57, 60, 95, 14]. The error probability is a sum of error rates in the hardware. Reference [100] estimates an injection error that is 10 times the gate error probability , so the injected state could have error on the order of 1% for . These faulty states are essential for universal quantum computation, but fault tolerance requires that they be purified in some manner using the error corrected operations listed above. The choice of program to “clean up” these noisy inputs will have a dramatic impact on resource costs, as will be considered in detail in Chapters 5 and 6.

The simplest way to implement a program is to initialize all the states that one might need at the beginning, then apply all of the operations using unitary gates, then perform all of the measurements at the end. However, the same output can often be achieved by performing some initialization and measurement in the middle of the program. Doing so can lower resource costs in several ways. For one thing, idle quantum states still require error correction at the hardware level, so if initialization can be delayed until the state is needed or if measurement can be performed as soon as possible, then the program should do so. Moreover, sometimes a unitary gate can be replaced by a non-unitary sequence of operations that has lower resource cost.

The technique of replacing unitary logic with non-unitary logic will be used frequently in later chapters. It may seem counterintuitive to replace a single gate with multiple non-unitary operations, as the latter appears more complex. However, some unitary gates are very expensive, so replacing them with a non-unitary sequence of operations can lead to a net reduction in resources. Consider the circuit in Fig. 3.1 as an example. On the left, one would like to implement the gate , but this gate is not available (i.e. it has infinite cost). However, the logically equivalent program on the right uses an ancilla state (injected and purified), , CNOT, and measurement. The gates enclosed in the dashed box are conditionally applied based on the measurement outcome. Neglecting for now the way in which the injected state is purified (Chapter 5 covers this in detail), it is clear that all operations are available in the surface code, so this program has lower (finite) cost.

Figure 3.1: Circuit for implementing a non-native gate using specially prepared ancilla states and . This technique is sometimes referred to as a “teleportation gate.” The circuit enclosed in the dashed box is implemented only if the measurement result is .

Considering the list of available error-corrected operations above, only trivial quantum programs can be implemented with unitary gates in the surface code. This list is a subset of the Clifford group, and even programs that use the full Clifford group can be simulated classically using the Gottesman-Knill Theorem [121, 7]. Therefore, all useful programs in the surface code require purified injected states, and using these states requires non-unitary operations. Hence all useful quantum programs in the surface code are non-unitary, at some level. However, a quantum program can encapsulate the non-unitary details, so that the external world only sees the program perform a unitary mapping of an input state to an output state. When some arbitrary program, such as a quantum algorithm, needs to be implemented in a fault-tolerant manner, the synthesis procedure will replace many unitary operations with logically equivalent, non-unitary programs so as to minimize resource costs.

Finally, quantum programs can call subprograms. Using some inductive reasoning, any program is a valid composite operation because it is composed of valid operations. Hence, programs can be structured in a hierarchical fashion. This is a common technique in classical programming, but it plays a special role in quantum computing. Later chapters show that certain choices of subprograms can be easily verified, thereby lowering the costs of error correction substantially. Logic synthesis will tend to produce hierarchical quantum programs.

3.2 Quantum Logic Diagrams

Quantum logic diagrams provide a visual aid for understanding properties of quantum logic. Moreover, each type of diagram is useful for a different purpose. This section covers two frequently used diagrams, quantum circuits and surface code depictions. Quantum circuits are one of the oldest methods to represent quantum programs, and they are straightforward to interpret. Time progress left to right, like a musical score, and each horizontal line is a qubit. By contrast, surface code diagrams are challenging to interpret, but they explicitly account for the resource costs of implementing a program. When used together, the diagrams explain both the action of a program and its costs, which are the main concerns of logic synthesis.

Quantum circuits were introduced in Chapter 1, so I will be brief. In a quantum circuit diagram, each qubit is a horizontal line, and operations affecting a certain qubit touch the corresponding line. The line begins where the qubit is initialized or at an input to the program, and it ends where the qubit is measured or at an output of the program. In some contexts, multi-qubit states are grouped into one line, often borrowing the digital-logic notation of a slash “/” through the line to denote multiple bits. Figure 3.1 is a quantum circuit, and Nielsen and Chuang provide a more detailed overview of quantum circuits (Ref. [7], Ch. 1). In the List of Figures, I denote circuit diagrams by the prefix “Circuit.”

The second type of logic diagram, the surface code diagram, is a three-dimensional geometric depiction. Two dimensions are space, and one dimension is time. In most cases, I will set the viewing angle such that time flows left to right, making the spatial dimensions vertical and out-of-page. The diagram represents how the surface code implements encoded gates with many physical gates and qubits. By using surface code diagrams, I implicitly assume that the quantum computer implements surface code error correction at the lowest level. This is justified by arguments in Chapter 2, which in essence reduce to the following: surface codes are the best error correction scheme published so far when hardware gates are constrained to a nearest-neighbor, two-dimensional geometry [56, 57, 60, 95, 14]. In the List of Figures, I denote surface code diagrams by the prefix “Surface Code.”

Surface code diagrams are useful for two reasons. First, this type of diagram accurately represents the total resource costs of quantum logic, because there is a direct correspondence between the features of the diagram and the operations at the hardware level, in both space and time. Such information is not readily available in circuit diagrams, where the costs associated with two different gates may differ by orders of magnitude. Second, surface code diagrams provide a visual way to modify or optimize logic while maintaining the error-correction capacity of the surface code. In this work, I make use only on the first purpose, though optimization within the surface code is actively being studied elsewhere [94, 100, 122].

For all their utility, surface code diagrams have a notable downside. Owing to the way that quantum logic in the surface code depends on topology [56, 57], it is virtually impossible to determine the underlying logic being shown, as will become apparent in the examples which follow. For this reason, a surface code diagram should always be paired with a quantum circuit diagram, because the two are complementary. The quantum circuit shows what the logic does, while the surface code diagram shows how the logic is implemented and what the resource costs are. This complementarity will be used frequently to demonstrate logic synthesis in later chapters.

An example of a surface code diagram is shown in Fig. 3.2. The left side is a simple circuit with a CNOT gate acting on two qubits, while the right shows how this might be implemented in the surface code. CNOT gates in the surface are determined by the topology of defects in the code (shown here as yellow and black pipes) braid around each other. Each defect is a hole of sorts in the surface code lattice, and Refs. [60, 14] give a good explanation of how this is implemented at the hardware level. Some other common circuit primitives are initialization and measurement (Fig. 3.3), which at this level are mirror images in time, and state injection (Fig. 3.4). In Fig. 3.4, the tip of the pyramids is a single physical qubit, whose state is converted into a surface code logical qubit contained in the defect. As mentioned earlier, state injection is a critical process in surface code programs, and Refs. [56, 57, 60, 95, 14] give a proper explanation. The Hadamard gate is also important, but it is not shown in braiding diagrams because it requires some manipulation of the code properties; see Refs. [98, 14] for details. In other codes, the Hadamard gate may be the “hard” operation [123].

Figure 3.2: Implementation of CNOT in the surface code. (a) Circuit diagram for CNOT. (b) Perspective rendering of CNOT implemented in the surface code. Each horizontal pair of yellow pipes corresponds to the qubit on the left in the same vertical position. Each logical qubit is a pair of yellow defects, arranged along the out-of-page dimension.

Figure 3.3: Initialization and measurement operations in the surface code. Primal defects are shown, and the equivalent operations for dual defects would initialize or measure in basis. (a) Circuit element for initializing . (b) Initializing in the surface code. (c) Circuit element for -basis measurement. (d) -basis measurement in surface code.

Figure 3.4: State injection in the surface code. (a) Circuit diagram for initialization of the state . This is an example of a commonly injected state, but in principle any single-qubit state can be injected. (b) Depiction of state injection in the surface code. The viewing angle is from the side to provide better perspective. The injection process uses two pyramids which are point defects expanding in circumference. The pyramids are colored differently from other defects to stand out visually.

3.3 Resource Calculations

The objective of quantum logic synthesis is to minimize resource costs while executing a reliable quantum program. There are many resources that require consideration for running a quantum computer, but this work will focus on only two: qubits and gates used for fault-tolerant computation. Suppose that qubits are regularly spaced on a two-dimensional grid and that gates are regularly separated in time, or “clocked.” Using this model, one can account for resource costs by the three-dimensional volume (space and time) required to execute the program, which corresponds exactly to the volume required to implement the braiding topology in the surface code. Volume is a useful measure for resource cost because it depends mostly on the underlying logic of the program and the error rates of the hardware, and less on the sequence of gates in the program.

The reliability of a quantum program is the probability that the output does not have an error. A program is reliable if the output error probability is below some target value. Errors are suppressed using techniques of quantum error correction, but these are costly in terms of resources [3, 7, 101, 12, 13, 14]. The overhead associated with fault tolerance depends on error rates in the hardware and the chosen code. Generally speaking, the cost scales as , where is an upper bound on the logical error of the program and exponent is a constant that depends on the logic synthesis method. Instead of relying on asymptotic estimates, a more precise resource analysis described below will be used in later chapters to give quantitative resource costs.

Operations in the surface code are convenient to analyze at a high level of abstraction, where one only considers the arrangement of the braiding surfaces. Surface code diagrams exist at this level, as the details of hardware operations are not shown. Apart from visual clarity, this abstraction also gives the diagram a sense of scale invariance, because the same topology, hence same program, could be implemented in two instances of a surface code, where each has a different code distance. The code distance, often denoted , determines how far apart the braid surfaces must be separated in terms of qubits (space) or stabilizer measurements (time). Because of this fundamental spacing, one can define a unit cell as two stabilizer measurements, one of each type ( and ), as shown in Fig. 3.5. The surface code consists of these unit cells tiled across the 2D plane in space, and repeated in time. Viewed this way, the surface code is a crystal, in the abstract sense, where the unit cell is repeated in three dimensions. Logic is implemented with defects, or holes, in the repeated pattern [56, 57, 60], but the volume can still be accounted in terms of these unit cells, which is the methodology I use throughout this manuscript.

Figure 3.5: Diagram showing the unit cell of the surface code. (a) A 2D square array, where each circle represents a qubit. The open and filled circles play different roles in the surface code. A unit cell encompasses four qubits, two of each type. (b) A unit cell in the surface code includes two stabilizers. The red and blue “plus” shapes are four-body, nearest-neighbor stabilizer measurements of or (see Refs. [60, 14] for details). Neighboring stabilizers are shown for illustration.

A relevant example of the resource overhead required for fault-tolerant quantum computing is the cost of making some cubic region of the surface code sufficiently reliable. First, let me explain some rules for surface code logic. Fowler, Devitt, and collaborators [94, 100] develop a simple set of design rules for spacing defects. For a given code distance , the rules are:

  1. two defects (or other boundaries) of the same type must be separated by ;

  2. any defect must have circumference greater than or equal to , so square defects must have side length ;

  3. given (1) and (2), two defects of different types must be separated by .

A simple strategy to follow these rules is to design braiding patterns using cubic regions of with side length . The finite set of allowable braiding patterns are known as “plumbing pieces,” because visually they are pipes that connect together [94, 100]. A simple estimate for the probability of error in a plumbing piece with distance is derived in Ref. [100]:


where is the error per hardware gate and the factor 100 comes from numerical data fitting in Refs. [96, 14, 99].

The volume of a plumbing piece as a function of is logical error probability is plotted in Fig. 3.6, where . This type of plot will be used many times throughout this manuscript to quantify the resource cost of making a quantum program sufficiently reliable. The volume is measured in unit cells of the surface code, as discussed earlier. The only notable feature of this plot is that the resource scaling obeys a power law (dashed line) very well: unit cells. This is in close agreement to other findings that the “scaling exponent” should be 3 [94]. The exponent is less than 3 here only because of the coefficient in Eqn. (3.1), whose presence skews the estimated error rate up more at lower values of . Indeed, the fitted exponent will approach 3 as , but the plot in Fig. 3.6 only shows the range relevant to practical quantum computing. This is a good time to remark that power law fits should only be used for estimating quantities like resources, not revealing some deep meaning about quantum information.

Figure 3.6: Resource costs for a plumbing piece in the surface code. The hardware error rate is . Lower logical error rate is achieved by increasing code distance, which also increases the volume of the plumbing piece. A power law fit shows that volume scales with an exponent of 2.84.

On the subject of surface code scaling trends, I would also like to note that the error bound in Eqn. (3.1) may overstate error probability at low values of , because there is numerical evidence that the surface code actually performs better than the asymptotic fits for low code distance [96, 14]. Hence a more accurate error rate (as a function of ) may come closer to the expected scaling coefficient of 3. The reason for this behavior is that the edges and corners of the surface code become more important at low distance, and these stabilizers have lower weight (two or three, instead of four), which reduces the possible sources of error at the physical level.

A primary concern of this thesis is optimizing quantum logic, and any statement of improvement requires some point of reference. For comparison purposes, it is useful to define the worst-case resource cost for implementing a quantum program. Given a quantum program composed of some fundamental operations, the Trivial Upper Bound (TUB) is the resource cost associated with the simplest logic design. For example, one could make the probability of error in each fundamental operation so low that, when summed together, the total probability is small and the entire program is guaranteed to be reliable. This approach is usually not optimal, but it is a starting point that is useful for comparison. For a particular program, the difference between TUB and optimized logic shows how important logic synthesis can be.

Chapter 4 Quantum Logic Synthesis

The purpose of logic synthesis is to execute a quantum program in a way that minimizes resource costs. The previous chapter introduced quantum programs to encapsulate quantum logic, diagrams to depict quantum logic, and ways to estimate resources. These are the tools required for logic synthesis. This chapter gives an overview of the common synthesis techniques, while the subsequent chapters provide detailed examples with resource analysis.

4.1 Generalized Teleportation Gates

A crucial development for fault-tolerant quantum logic was the teleportation gate [1, 65, 47, 124]. Instead of teleporting a quantum state from one position to another, this procedure implements a logical gate using a sequence of operations fueled by a special quantum state. In effect, the quantum state changes through teleportation, even though it may not change its physical location. The novelty of this proposal is that a gate can be encoded into a quantum state, so long as one knows how to “read” this information.

Let me introduce the notion of a “quantum look-up table” (QLUT). Take any -dimensional unitary operator and represent it in the spectral decomposition using eigenvalues and eigenvectors :


Let be a uniform superposition over the eigenvectors. The QLUT for is


There is a clear similarity between the RHS of Eqns. (4.1) and (4.2). The reason I call this a “look-up table” is that that the QLUT is a state that encodes the action of . For any eigenvector of , the QLUT has the associated eigenvalue stored in its state. In many contexts, these are also called “magic states,” for precisely the same reason. For example, the magic state for is , where is the uniform superposition over the eigenvectors of .

One way to compile a quantum program into a QLUT is to begin with a teleportation circuit that takes as an input. Specifically, the circuit teleports an arbitrary qubit onto the ancilla , then implements . This process is depicted in Fig. 4.1(a) for and . The QLUT may be formed by using commutation rules to move to before the teleportation circuit, as in Fig. 4.1(b) (cf. [7], p.487). In general, this commutation step modifies the teleportation procedure, so it is crucial that the new circuit has an efficient fault-tolerant construction. Developing other general procedures for designing teleportation gates is an area of future research.

Figure 4.1: Circuit technique for creating the -gate QLUT. (a) Generic teleportation, followed by  gate. (b) The  gate is moved (using commutation rules) to just before to form a QLUT. Here, the commutation affects the conditional operation, as , and . The global phase is dropped.

4.2 Off-Line Validation and Fault Tolerance

The technique of compiling a quantum program into a QLUT can take much of the computational effort off of the data path. The data path is the sequence of operations which come in direct contact with data qubits in an algorithm. If there is a failure here, the data is corrupted. By contrast, operations off of the data path (“off-line”) may be expendable; if an error is detected, the faulty states are removed without affecting the rest of the computation. Reference [12] also discusses how off-line preparation of QLUTs enables fast computation.

A QLUT can be compiled in a faulty manner, then validated using a procedure that checks for error in the QLUT. Using the quantum measurement postulate, successful validation projects the QLUT into a higher-fidelity state. This is essentially a variant of post-selected quantum computation [101]. Fault tolerance is achieved by bringing the QLUT to sufficient fidelity for interaction on the data path.

At first glance, the strategy of moving quantum programs into QLUTs would appear to just redistribute the effort of error correction from one place to another. However, the ability to discard states which fail validation is quite valuable. Validation only requires error detection instead of correction, and the former is more efficient. A distance- code can correct errors, leading to an output error of order  [7]. By contrast, the same code can detect errors, leading to a validated output state with error . Moreover, error detection is almost always less taxing on classical control hardware, which can be a concern in some contexts [20]. For these reasons, performing validation can lead to substantial reductions in the overhead for fault-tolerant computation.

The steps for off-line logic synthesis are: (1) identify an important and frequently used quantum program; (2) compile this program into a teleportation gate using a QLUT; (3) develop an efficient procedure to validate the QLUT. This design methodology will be demonstrated repeatedly in Chapters 57. Chapter 9 will examine common features of these techniques, which may be useful both for developing new methods and for understanding limitations of this approach.

Chapter 5 Distillation Protocols

Distillation protocols are a special case of error detection where many noisy copies of a quantum state are “distilled” into fewer low-error copies of the same state. A common theme for this chapter is that, in many circumstances, an important but difficult operation can be encoded into a well-characterized quantum state, such as a quantum look-up table (QLUT; see Chapter 4). After injecting noisy copies of the desired state into the surface code, they are distilled before being used by computation. Error detection is often employed with a quantum code that uses only operations that are themselves error-corrected by the surface code (see Section 3.1 for a list). However, at the end of the chapter, I discuss Fourier-state distillation, which distills a special class of multi-qubit quantum states. This is a new protocol that relies on Toffoli gates, which are not native to the surface code. The Toffoli gates require techniques developed in Chapter 6, and this example shows that distillation can be applied to produce useful multi-qubit states beyond just satisfying the minimum requirements of universal computation.

Distillation protocols hold an important place in fault-tolerant quantum computing. For example, entanglement distillation demonstrated that arbitrarily long-range quantum entanglement was achievable in principle, using quantum repeaters [125, 126, 127]. The advent of magic-state distillation made the prospect of large-scale quantum computing more plausible [101]. There are alternative ways to achieve universal, fault-tolerant quantum computing [128, 129, 130, 131, 22, 3, 7, 25], but the magic-state techniques developed by Knill [132, 101] and Bravyi and Kitaev [104] are compatible with broader sets of codes, including the surface code.

When viewed as a quantum program, a distillation protocol takes many copies of the same state as inputs and returns fewer copies of the same state as outputs. By assumption, the input states have independent errors, which is essential for the technique. Moreover, it is often assumed that the errors are also identically distributed, but this is not necessary. Because the inputs and outputs are of the same form, distillation protocols can be executed recursively. Recursive distillation is needed when just one round does not purify the desired state to sufficiently low error probability. The different rounds have different requirements for error correction, and hence different resource costs.

Resource costs for magic-state distillation can dominate the total resources required for quantum computing [12, 13, 94, 14, 100]. A recursive distillation protocol used to make a single gate, such as , requires very many fundamental gates in the surface code. Fowler and Devitt estimate that a single  gate requires 46 times the surface code volume as a single CNOT [94]. Resource costs will be a central concern for this chapter, as the distillation protocols examined here will be the first concrete demonstrations of the techniques of logic synthesis.

5.1 Magic-State Distillation

Magic-state distillation purifies a quantum look-up table (QLUT) for a gate that is otherwise unavailable within the chosen code. For example, the surface code is usually implemented with two distinct types of magic state distillation. The gates and are required for universal computation, and they may be produced using magic states and , respectively [57, 60, 13, 14]. This section focuses on distilling because this process is more costly than distilling ; however,  gates are a necessary part of distillation, as discussed later.

There are many proposals for distilling states [132, 104, 105, 106, 107], but I focus on the 15-to-1 Bravyi-Kitaev (or “BK”) protocol, named for the authors of Ref. [104]. The label “15-to-1” refers to the ratio of input states to output states, which is an important consideration for efficiency. A circuit diagram for the BK protocol is shown in Fig. 5.1. Each  gate is produced using a copy of , as shown in Fig. 5.2, so the BK distillation protocol takes 15 copies of as inputs. When each of the input states has independent error , the distilled output state has error to lowest non-vanishing order.

Figure 5.1: Circuit for distilling the magic state . Each  gate is produced using , its QLUT, with the circuit in Fig. 5.2. The bottom qubit is the output state.

Figure 5.2: Circuit for teleporting a  gate using the QLUT . The error in the gate depends on the error the magic state.

The BK protocol has an important advantage over many other competing protocols: it distills only one output state. Other protocols [105, 106, 107] that distill two or more states within the same code block inevitably lead to correlated errors at the output. This poses a problem when one round of magic-state distillation is insufficient, so the output of the first round must be purified again. In such a scenario, states with correlated errors must fan out to different second-round distillation blocks, etc. By not having this issue, the BK protocol is much simpler to analyze. Still, recent analysis suggests there may be advantages to the multiple-output distillation methods [105, 106, 107], if the routing considerations can be addressed.

5.1.1 Bravyi-Kitaev Distillation in Surface Code

Several works have analyzed the BK protocol assuming perfect Clifford gates [104, 133, 13] and the costs associated with making Clifford gates fault tolerant [56, 57, 94, 14, 100]. Using the implementation from Ref. [94] of the BK protocol in the surface code, one can estimate the resources required to implement sufficient error correction for this distillation routine. Moreover, the cost of a  gate at any level of fidelity can be calculated by accounting for the costs of multi-round distillation, as explained below.

Following the methodology developed in Refs. [94, 100], one constructs programs in the surface code using regular-sized “plumbing pieces” (see also Chapter 3). Each piece occupies a cube in the surface code with side length . The probability of logical error in a single plumbing piece can be bounded from above by


as derived in Ref. [100]. Subscript denotes logical error, is error-per-gate at the hardware level, is the distance of this implementation of the surface code, and the power law scaling is a fit to numerical simulations of surface code error correction [93, 61, 96, 97]. The error at the output of BK distillation is therefore bounded by the sum of probabilities for distillation error from input states and for error in the distillation circuit: , where is the number of plumbing pieces and is the error of the input  gates. The volume, in unit cells of the surface code, is the product of number of plumbing pieces and the volume of a single plumbing piece, which is unit cells.

Fowler and Devitt constructed a version of the BK protocol in the surface code with plumbing pieces [94]; however, this work considers the volume to be slightly larger. An important issue for distilling states in the surface code is that  gates are implemented using the teleportation circuit in Fig. 5.2, which may also require an -gate correction. is not a native operation in the surface code, but it is still relatively inexpensive since it can be catalyzed by without destroying the magic state [134, 13], as shown in Fig. 5.3. Moreover, the additional overhead is small because each gate need only have a fidelity on the same order as the gate input error , and hence lower code distance can be used for gates. I estimate that the operations for these gates can be implemented in a depth of two plumbing pieces (there was already one allocated in the volume estimate above), making the total volume now plumbing pieces for the Bravyi-Kitaev distillation protocol, as shown in Fig. 5.4. A bounding box serves as a guide to how the volume is estimated.

Figure 5.3: Circuit for producing a  gate using the QLUT . The circuit does not destroy the ancilla qubit, and the gate is recorded in the Pauli frame.

Figure 5.4: Bravyi-Kitaev 15-to-1 distillation implemented in the surface code. (a) Circuit diagram for BK distillation from Fig. 5.1, repeated here for convenience. (b) Surface code braiding pattern, derived in Ref. [94]. The horizontal primal defects (yellow pipes) correspond to qubits on the left. Dual defects (black pipes) implement logical CNOT gates. The colored pyramids on the right are state injection, corresponding to the gate followed by -basis measurement in the circuit. The bottom pair of primal defects is the output qubit. The volume is plumbing pieces. Extra volume is allotted for the conditional gates in the circuit diagram, as explained in the text.

5.1.2 Resource Analysis for Bravyi-Kitaev Distillation

Determining the best combination of Bravyi-Kitaev distillation protocols at different code distances is a resource optimization problem. In the first round of distillation, increasing code distance will increase volume and lower output error, until the probability of surface code error is negligible in comparison to the error from faulty input states. To move beyond this limit, one must use two rounds of distillation. Since the inputs to the second round require distillation, the total volume will be 15 times the volume in first round, plus the volume of the second round. Furthermore, there is positive probability that any distillation circuit will fail, requiring repetition; I account for this by multiplying volume by , which gives the mean volume including repeated distillation. For BK distillation, . For a specified number of rounds, let denote the injected state error probability, the probability of error after one round of distillation, etc. The approximate volume and output error after round are given by the recursion relations:


The factor 224 is the estimated size of the surface code program, in plumbing pieces. The factor is the number of copies of round- distillation needed to feed into one instance of round- distillation.

Using the formulas in Eqns. (5.2) and (5.3), I calculated all possible combinations of BK distillation volume (in unit cells) and output error rate, as explained below. The number of rounds ranged from one to three, the distance in each round ranged from , and I calculated results for values of the hardware gate error such that . By conventional assumption, the injected states used by the first round have  [100], where the factor 10 accounts for the number of faulty hardware operations during injection and before error correction. To narrow focus to useful results, I only make note of protocols on the “efficient frontier,” which consists of those protocols (each having a unique combination of parameters ,,) that are not dominated by any other protocol. In terms of performance, one protocol dominates another if the first has both lower volume and lower output error rate; there is no reason to use a dominated protocol. The results for are shown in Fig. 5.5. The results for other values of show effectively the same behavior, so they are not plotted. In addition to this plot, Table 5.1 gives the estimated resource costs when using BK distillation for different input error rates and output error rates . This can be compared with tables in Refs. [105, 106, 107, 100], most of which only consider cost in number of input magic states. The difference between my results and those in Ref. [100] is due mostly to my definition of a unit cell, which contains four qubits when ancillas are used for stabilizer measurement. Additionally, I estimate a slightly larger volume for BK distillation in the surface code (224 vs. 192 plumbing pieces).

Figure 5.5: Resource costs for Bravyi-Kitaev distillation protocols. Color denotes the number of rounds of distillation. For each of one, two or three rounds, the efficient frontier is plotted. Dashed lines show protocols dominated by a protocol having different number of rounds, so the efficient frontier considering any number of rounds is the union of the solid lines.
Volume (unit cells)
(13) (9)
(15) (9) (7) (5)
(11,17) (11) (9) (7) (5)
(11,19) (7,13) (9) (7) (5)
(13,21) (7,15) (11) (9) (7)
(13,23) (9,15) (7,11) (9) (7)
(13,27) (9,17) (7,13) (5,9) (9)
(15,27) (9,19) (7,13) (5,11) (9)
(15,31) (9,19) (7,15) (5,11) (5,9)
(11,15,33) (11,21) (7,15) (5,13) (5,11)
(11,17,33) (11,23) (7,17) (7,13) (5,11)
(11,17,35) (11,23) (9,17) (7,13) (5,11)
(11,19,37) (13,25) (9,19) (7,15) (5,13)
Key: Level-1 BK Level-2 BK Level-3 BK
Table 5.1: Resources for Bravyi-Kitaev magic-state distillation as a function of gate error and output state error . Volume is given in unit cells of the surface code. Beneath volume, the code distance for each round is given. The background color for each protocol provides a visual guide to the number of rounds. Each reported protocol has the lowest resource cost for given while producing an output state with error below . Injected magic states have error . Empty cells do not require distillation because is at or below the target .

In the numerical optimization above, I allowed the distance in each round of distillation to be independent of the other rounds, as suggested in Ref. [14]. The protocols on the efficient frontier approximately double the code distance from one round to the next. This is an optimization that lowers the burden of error correction; with increasing strength of error correction, the probability of surface code failure becomes negligible compared to the distillation error term . To see what difference is made by this optimization, consider what happens when the distance is the same in all rounds. The code distance will need to be large enough for total probability of logical error in the surface code to be well below the target error probability, which makes the entire distillation procedure very expensive. Figure 5.6 shows the efficient frontier from Fig. 5.5 compared to this naive approach.

Figure 5.6: Scaling of efficient and naive programs for BK distillation. The lower blue trace is the efficient frontier for multi-round BK distillation with code distance varying by round. The black dashed line is a power law fit with exponent 3.27. The red trace is the trivial upper bound, i.e. the naive approach with all rounds of distillation using the same code distance; red overlaps blue for just one round of distillation. The gray dashed line shows a power law fit with exponent 5.17. For both solid traces, the abrupt jumps in volume occur where the distillation procedure must use an additional round.

As mentioned in the introduction, I will use a Trivial Upper Bound (TUB) for each program to demonstrate the importance of logic synthesis. In Fig. 5.6, the naive approach will be the TUB for magic-state distillation. The power law fits show how the volume required by each method scales as demands on output error are increased. The optimized logic requires a surface code volume of approximately unit cells, while the TUB requires . As demonstrated in Section 3.3, error per plumbing piece in the surface code scales with exponent 2.84, and the asymptotic scaling exponent of BK distillation is . A simple guess would suppose that the naive protocol would have scaling coefficient , which is rather close to the fitting parameter calculated above.

The optimized protocol has scaling exponent 3.27, which is surprisingly close to the lower bound of 2.84 for a surface code operation. Compressing early rounds of distillation by lowering code distance works very well to improve performance over the TUB. For example, at output error , the optimized protocols reduce resources by about a factor of 10. The resource estimates in Ref. [13] and Chapter 2 used the naive method; moreover, -gate production was found to dominate those resource totals. Using the optimized BK distillation protocols examined here would dramatically lower the estimated costs to execute those quantum algorithms. This is the first example of how logic synthesis can reduce the overhead of fault tolerance by a sizable factor relative to naive constructions, but this chapter and those that follow will examine several more.

5.1.3 Alternative Protocols

Many other magic-state distillation protocols have been developed but are not analyzed here. Notable examples are:

  • the 7-to-1 protocol for distilling with output error  [56, 57, 60, 94];

  • a 2-to-1, protocol for  [134];

  • the Meier-Eastin-Knill 10-to-2, protocol for distilling (which can be converted to a protocol for distilling [105];

  • Bravyi-Haah triorthogonal codes distilling -to- with output error  [106];

  • Landahl-Cesare generalized Reed-Muller codes distilling the family of states for integers  [135];

  • a distillation protocol for Toffoli magic states [134]

  • a block-code distillation protocol for distilling controlled-controlled- (locally equivalent to Toffoli) [107, 123].

In addition to analysis in the original proposals, there has been work to understand the resource costs associated with fault tolerance for some of these protocols. Several researchers have considered the limits on errors in magic states for distillation to succeed [133, 136]. Reference [137] considers the effectiveness of magic-state distillation using faulty gates. An equivalent form of the triorthogonal codes called “block codes,” implemented with surface code error correction, was analyzed in Ref. [100]. Finally, it is worth mentioning that a handful of codes, such as the Steane code and some topological color codes, use a fault-tolerant state injection method rather than magic-state distillation. While this technique is appealing for its simplicity, the surface code still appears to have better performance for the reasons outlined in Section 2.3.

5.2 Multilevel Distillation

I developed another distillation protocol for states (or , through slight modification) to probe the limits of magic-state distillation [107]. Let the efficiency of a distillation protocol be measured only in terms of input and output states with , where is the number of input states, is the number of output states, and the output error is . In their work on triorthogonal codes, Bravyi and Haah conjectured that  [106]. All previous distillation methods obey this limit, but there was no protocol for states that approached . The multilevel protocols described below come arbitrarily close to in certain limits, which is interesting theoretically. However, I ultimately conclude that these methods are probably not useful for quantum computing. This is an instructive example for logic synthesis because it shows that narrowly improving one aspect of fault-tolerant computation may not be effective at lowering overall resource costs. Still, the techniques developed below may be useful in other applications.

5.2.1 Block Codes with Transversal Hadamard

As a preliminary step, I define a family of CSS quantum codes that encode logical qubits, where is even, using physical qubits. Furthermore, these codes possess a transversal Hadamard operation, so I call them collectively “ codes” and denote as the code using physical qubits. Any  code may be defined as follows. The stabilizer generators are , , , , where subscripts index over physical qubits and tensor product between Pauli operators is implicit. The logical Pauli operators (corresponding to logical qubits), denoted with an over bar and indexed by , are and . The Hadamard transform exchanges and operators, so application of transversal Hadamard gates at the physical level enacts a transversal Hadamard operation at the logical level, which will be a useful property when I later concatenate these codes. All  codes have distance two, which means they can detect a single physical Pauli error. The product of two logical Pauli operators of the same type for two distinct logical qubits has weight two (number of non-identity physical, single-qubit Pauli operators); the product of same-type Pauli operators on all logical qubits is also weight-two at the physical level. The stabilizers come in matched / pairs, so there are no weight-one logical operators.

The  eigenstate of the Hadamard operator is a magic state for universal quantum computing [132, 104, 133, 105, 106]. In particular, two of these magic states can be consumed to implement a controlled- operation [132, 105], enabling one to measure in the basis of (see Fig. 5.7(a)). The distillation procedure is as follows: (a) encode faulty magic states in an  code; (b) measure in the basis of the transversal Hadamard gate by consuming ancillas; (c) reject the output states if either the measure-Hadamard or code-stabilizer circuits detect an error. For example, when an  code is used for distillation, states are encoded as logical qubits using physical qubits. Each transversal controlled-Hadamard gate consumes two states [105], and this gate is applied to all physical qubits, which results in the -to- input/output distillation efficiency of these codes. A diagram of the quantum circuit for distillation using is shown in Fig. 5.7(b).

Figure 5.7: Distillation of magic states using an  code. (a) Controlled-Hadamard gate is constructed using  gates, each of which requires one state (or ). (b) Initial states (left) are encoded with four additional qubits, initialized to here. The boxes “Encode” and “Decode” represent quantum circuits for encoding and decoding, which are not shown here. Modified from version published in Ref. [107].

5.2.2 Multilevel Protocols

Multilevel distillation uses concatenated codes with transversal Hadamard for distillation, in such a manner that the protocol uses two classes of input magic states, where the classes have different levels of infidelity and enter at different concatenation levels in the code. The ancillas consumed for transveral controlled-Hadamard measurement are of lower fidelity than the encoded logical states being distilled. When two quantum codes with transversal Hadamard are concatenated, the resulting code also has transversal Hadamard. Under appropriate conditions, the distance of the concatenated code is the product of the distances for the individual codes:  [105]. Thus the concatenation of two  codes yields a distance-4 code with transversal Hadamard, and -level concatenation has distance .

The concatenation conditions for  codes are that, through all levels of concatenation, any pair of physical qubits have at most one encoding block (at any level) in common. The reasons for this restriction are that logical errors in the same block are correlated and that the statement above regarding distance multiplying through concatenation assumes independence of errors. Consequently, two logical qubits from the same encoding block can never be paired again in a different encoding block. The required arrangement of qubits can be given a geometric interpretation. Arrange all physical qubits at points on a Cartesian grid in the shape of a rectangular solid, with the number of dimensions given by the number of levels of concatenation. A square, cube, or hypercube are possible examples at dimensionality two, three, or four. Each dimension is associated with a level of concatenation, and there must be an even qubits in each dimension to form an  code. Construct  codes in the first dimension by forming an encoding block with each line of qubits in this direction, as in Fig. 5.8(a). This will give rise to logical qubits along each line in this direction. Repeat this procedure by grouping these first-level logical qubits in lines along the second dimension to form logical qubits in a two-level concatenated code, as in Fig. 5.8(b). Continuing in this fashion through all dimensions ensures that any pair of qubits have at most one encoding block in common.

Figure 5.8: Concatenation of  codes. (a) Six physical qubits are coupled into an  code with two logical qubits (b) A array of physical qubits are coupled into a concatenated two-level  code. Originally published in Ref. [107]. ©2013 American Physical Society.

As with the  codes, multilevel codes use a transversal logical Hadamard-basis measurement to detect whether any one encoded qubit has an error (an even number of encoded errors would not be detected). If the logical states have independent error probabilities , then the distilled states will have infidelity with perfect distillation. One must also consider whether the Hadamard-basis measurement has an error. For a two-level code arranged as a square of side length , the transversal controlled-Hadamard gates at the lowest physical level require magic states, each of which has infidelity . However, this is a distance-4 code, so for independent input error rates, the probability of failing to detect errors at the physical level is (analysis is provided in Section 5.2.3). The code can detect more errors in the magic states at the lower physical level, so these states can be of lower fidelity than the magic states encoded as logical qubits and successfully perform distillation. This is the essential distinction between multilevel distillation and all prior distillation protocols. When multiple rounds of distillation are required [13, 14], low-fidelity magic states are less expensive to produce, so multilevel protocols achieve higher efficiency.

Multilevel distillation protocols are applied in rounds, beginning with a small protocol (such as an  code) and progressing to concatenated multilevel codes. Let us denote the output infidelity from a single round by the function . For each such function, is the dimensionality (number of levels of concatenation) and are the sizes of each dimension, which need not all be the same. As before, and refer to the independent error probabilities on logical and physical magic states, respectively. A typical progression of rounds using a source of states with infidelity might be , , etc.

Multilevel protocols tend to be much larger in both qubits and gates than other protocols. Because there can be many encoded qubits, the protocol is still very efficient, but the size of the overall circuit may be a concern for some quantum computing architectures. At any number of levels, the distilled output states have correlated errors, so distilled magic-state qubits in multilevel distillation must never meet again in a subsequent distillation circuit (it is a requirement that errors are independent within the same encoding block, as in Refs. [105, 106]). Let us suppose that one performs two rounds of distillation, where the first round uses one-level distillers with encoded magic states and the second round uses two-level distillers with encoded states. Because the inputs to each distiller in the second round must have independent errors, there must be independent distillation blocks in the first round. Therefore, to distill output states through two rounds, the number of input states is


Consider a similar sequence through rounds with each distiller in round having encoded qubits. The total number of logical magic states is to ensure that errors are independent between logical magic states in every round. In the first round, the number of consumed magic states is ; in any subsequent round , the number of consumed magic states is (recall that the Hadamard measurement is implemented times, meaning it is repeated for ). The total number of input magic states can thus be expressed as


For , this reproduces Eqn. (5.4). What also becomes clear is that the total size of multilevel protocols becomes unwieldy as and increase. For example, the case of and would require about input magic states and a comparable number of gates to distill output magic states. In general, the most efficient multilevel distillation protocols use large and multiple rounds, where efficiency is measured in the ratio of low-fidelity input states consumed to yield a single high-fidelity output. Because of the complexity of such protocols, the greatest benefit from their application is seen in large-scale quantum computing, where a typical algorithm run may require magic states, each with error probability  [13]. It may be possible for alternative designs to circumvent these issues. If the first round uses a different protocol without correlated errors across logical magic states, such as Bravyi-Kitaev 15-to-1 distillation, then having multiple distillation blocks is unnecessary in the second round using a two-level concatenated protocol, which would lead to smaller multi-round, multilevel protocols. Indeed, the results below show that optimal protocols found by numerical search happen to take this approach.

The scaling exponent of a distillation protocol characterizes its efficiency. Specifically, input states are required to distill one magic state of infidelity . Scaling exponents for previous protocols are (“15-to-1” [132, 104]), (“10-to-2” [105]), and (triorthogonal codes [106]). Moreover, Bravyi and Haah conjecture that no magic-state distillation protocol has  [106]. In this work, if each round of distillation uses one higher level of concatenation in the multilevel protocols, then the number of consumed inputs doubles. In the limits of , , multilevel protocols require input states to each output state for rounds of distillation, where the round is a level- distiller. The final infidelity is , so the scaling exponent is as , which is the closest any protocol has come to reaching the conjectured bound. I show later through numerical simulation that for error rates relevant to quantum computing.

5.2.3 Error Analysis for Multilevel Distillation

For simplicity, make the conventional assumption that all quantum circuit operations are perfect, except for the initial magic states being distilled. This is a valid approximation if all operations are performed using fault-tolerant quantum error correction where the logical gate error is far below the final infidelity for distilled magic states [57, 13, 14]; for a more explicit construction of fault-tolerant distillation circuits, see Ref. [94]. Additionally, following the methodology in Refs. [104, 105], one can consider each magic state with infidelity as the mixed state , where is the eigenstate of the Hadamard operation.

Determining the infidelity at the output of distillation becomes simply a matter of counting the distinct ways that errors lead to the circuit incorrectly accepting faulty states, which is aided by the geometric picture from Section 5.2.2. It is essential that error probabilities and for each input magic state are independent. Then a one-level, -to- distiller using the code has output error rate on each state as


where higher order terms denoted are omitted. The numerical results justify the use of lowest-order approximations as higher-order terms are negligible in optimally efficient protocols. The lowest-order error rates are both second order, because the Hadamard basis measurement and code can together detect a single error in any magic state. The probability of the distiller detecting an error, in which case the output is discarded, is . If , then the output error rate of conditioned on success is the same as in Ref. [106]. Using the two-level distiller constructed from concatenated codes, the output infidelity for each distilled state is


The probability of the two-level distiller detecting an error is


Similar error suppression extends to higher multilevel protocols, as examined in more detail below.

The multilevel codes analyzed here use concatenated  codes, though other codes could be concatenated. When two  codes are concatenated, the logical qubits of the first level of encoding are used as physical qubits for completely distinct codes at the second level. Consider a two-level scheme: if the codes at first and second levels are and , respectively, then the concatenated code is , as shown in Fig. 5.8(b). This process can be extended to higher levels of concatenation.

Determining the potential errors and their likelihood in multilevel protocols requires careful analysis. Let us enumerate the error configurations which are detected by the protocols; the error probability is given by summing the probability of all error configurations that are not detected and that lead to error(s) in the encoded states. As a first step, the analysis of multilevel codes is simplified by considering each input magic state to the quantum computer as having an independent probability of error, as discussed in Refs. [104, 105]. Hence only one type of error stems from each magic state used in the protocol.

Identifying undetected error events in multilevel distillation, which lead to output error rate, is aided by the geometric picture introduced before. Qubits which will form the code are arranged in a rectangular solid, then grouped in lines along each dimension for encoding. There are two error-detecting steps which together implement distillation: the Hadamard-basis measurement and the error detection of the  codes. The Hadamard measurement registers an error for odd parity in the total of encoded state errors and physical-level errors in the first round of  gates, and there is one of these for each qubit site in the code (see Fig. 5.7).

The second method for  codes to detect errors is by measuring the code stabilizers. The code stabilizers detect any configuration of errors which is not a logical operator in the concatenated code. Because of the redundant structure using overlapping  codes, only a very small fraction of error configurations evade detection. Before moving on, note that at each qubit site, there are two faulty gates applied, and two errors on the same qubit will cancel (however, the first error will propagate to the Hadamard-basis measurement). Conversely, a single error in one of the two gates will propagate to the stabilizer-measurement round, but only an error in the first gate will also propagate to the Hadamard measurement. The stabilizer-measurement round will only “see” the odd/even parity of the number of errors at each qubit site.

One type of error event that occurs at concatenation levels three and higher requires special treatment. If there is an error in an encoded magic state and errors on two physical states used for the same controlled-Hadamard gate at the physical level, then this combination of input errors is not detected by the distillation protocol, leading to logical output error. This event leads to the error probability mentioned previously, which is not an issue for two-level protocols, but it must be addressed in levels three and higher. The solution for -level distillation, where , is to repeat the controlled-Hadamard measurement times, consuming magic states at the physical level. After each transversal controlled-Hadamard, the code syndrome checks for detectable error patterns. With this procedure, one encoded-state error would also require at least errors in physical-level magic states to go undetected, leading to probability of error that scales as .

Consider the pattern of errors after the two potentially faulty gates on each qubit in the -dimensional Cartesian grid arrangement. The many levels of error checking in the codes can detect a single error in any encoding block at any encoding level. For this analysis, let us separate the qubits in a single  code block into two groups: the first four qubits are “preamble” qubits, while the remaining qubits are index qubits. The reason for this distinction is that the logical operators, which would also be undetected error configurations, have common physical-qubit operators in the preamble, with a degeneracy of two: , because of the stabilizer . Conversely, the logical operators are distinguished by the logical Pauli operator having a physical Pauli operator on the index qubit (numbered when preamble is included).

The preamble/index distinction makes it easier to identify the most likely error patterns. For any size  code, there are two weight-2 errors in the preamble: and . Logically, these represent the product of operators on all encoded qubits. In the index qubits, any pair of errors is logical: . However, a pair of errors split with one each in preamble and index is always detectable by the code stabilizers. Thus, any single encoded qubit could have a logical error stemming from a pair of errors in two different configurations in the preamble or configurations in the index qubits. There is also one weight-three error. Each physical-state error configuration is multiplied by a degeneracy factor that is the number of ways an even number of errors occur before the CNOT in Fig. 5.7, thereby evading the Hadamard measurement. Thus the probability of logical error is . The Hadamard measurement fails to detect an even number of errors in the logical input states. There are ways that a pair of encoded input errors could corrupt any given qubit and ways four errors could corrupt any given qubit (assuming ). This contributes error terms