Realization of a scalable Shor algorithm
Abstract
Quantum computers are able to outperform classical algorithms. This was long recognized by the visionary Richard Feynman who pointed out in the 1980s that quantum mechanical problems were better solved with quantum machines. It was only in 1994 that Peter Shor came up with an algorithm that is able to calculate the prime factors of a large number vastly more efficiently than known possible with a classical computer Shor (1994). This paradigmatic algorithm stimulated the flourishing research in quantum information processing and the quest for an actual implementation of a quantum computer. Over the last fifteen years, using skillful optimizations, several instances of a Shor algorithm have been implemented on various platforms and clearly proved the feasibility of quantum factoring Politi et al. (2009); MartinLopez et al. (2012); Lucero et al. (2012); Lu et al. (2007); Lanyon et al. (2007); Vandersypen et al. (2001). For general scalability, though, a different approach has to be pursued Smolin et al. (2013). Here, we report the realization of a fully scalable Shor algorithm as proposed by Kitaev Kitaev (1995). For this, we demonstrate factoring the number fifteen by effectively employing and controlling seven qubits and four “cachequbits”, together with the implementation of generalized arithmetic operations, known as modular multipliers. The scalable algorithm has been realized with an iontrap quantum computer exhibiting success probabilities in excess of 90%.
pacs:
03.67.Lx, 37.10.Ty, 32.80.QkShor’s algorithm for factoring integers Shor (1994) is one of the examples where a quantum computer (QC) outperforms the most efficient known classical algorithms. Experimentally, its implementation is highly demanding as it requires both a sufficiently large quantum register and highfidelity control. Clearly, such challenging requirements raise the question whether optimizations and experimental shortcuts are possible. While optimizations, especially systemspecific or architectural, certainly are possible, for a demonstration of Shor’s algorithm to be scalable special care has to be taken not to oversimplify the implementation  for instance by employing knowledge about the solution prior to the actual experimental implementation  as pointed out in Ref. 8.
In order to elucidate the general task at hand, we first explain and exemplify Shor’s algorithm for factoring the number 15 in a (quantum) circuit model. Subsequently, we show how this circuit model is translated for and implemented with an iontrap quantum computer.
How does Shor’s algorithm work? Here is a classical recipe to find the factors of a large number. As an example, assume the number we want to factor is . Then pick a random number (which we call the base in the following), say . Check if the greatest common divisor gcd, otherwise a factor is already determined. This is the case for . Next, calculate the modular exponentiations for and find its period : the first such that . Given the period , finding the factors requires calculating the greatest common divisors of and , which is classically efficiently possible  for instance using Euclid’s algorithm. For our example () the modular exponentiation yields 1, 7, 4, 13, 1, …, which has period 4. The greatest common divisor of and are , the nontrivial factors of . For the chosen example , the cases have periodicity and would only require a single multiplication step (), which is considered an “easy” case Smolin et al. (2013). Note that the periodicity for a chosen can not be predicted.
How can this recipe be implemented in a QC? A QC also has to calculate in a computational register for and then extract . However, using the quantum Fouriertransform (QFT), this can be done with high probability in a single step (compared to steps classically). Here, is stored in a quantum register consisting of qubits, or periodregister, which is in a superposition of 0 to . The superposition in the periodregister on its own does not provide a speedup compared to a classical computer. Measuring the periodregister would collapse the state and only return a single value, say , and the corresponding answer to in the computational register. However, if the QFT is applied to the periodregister, the period of can be extracted from measurements.
What are the requirements and challenges to implement Shor’s algorithm? First, we focus on the periodregister, to subsequently address modular exponentiation in the computational register. Factoring , an bit number requires a minimum of qubits in the computational register (to store the results of ) and generally about qubits in the periodregister Nielsen and Chuang (2004). Thus even a seemingly simple example such as factoring 15 (an bit number), would require qubits when implemented in this straightforward way. These qubits then would have to be manipulated with high fidelity gates. Given the current stateoftheart control over quantum systems Southwell (2008), such an approach likely yields an unsatisfying performance. However, a full quantum implementation of this part of the algorithm is not really necessary. In Ref. 9 Kitaev notes that, if only the classical information of the QFT (such as the period ) is of interest, qubits subject to a QFT can be replaced by a single qubit. This approach, however, requires qubitrecycling (specifically: insequence singlequbit readout and state reinitialization) paired with feedforward to compensate for the reduced system size.
In the following, Kitaev’s QFT will be referred to as KQFT. It replaces a QFT acting on qubits with a semiclassical QFT acting repeatedly on a single qubit. Similar applications of Kitaev’s approach to a semiclassical QFT in quantum algorithms have been investigated in Refs. Griffiths and Niu (1996); Parker and Plenio (2000); Mosca and Ekert (1999). For the implementation of Shor’s algorithm, Kitaev’s approach provides a reduction from the previous computational qubits and QFT qubits (in total qubits) to only computationalqubits and 1 KQFT qubit (in total qubits).
A notably more challenging aspect than the QFT, and the second keyingredient of Shor’s algorithm, is the modular exponentiation, which admits these general simplifications:
(i) Considering Kitaev’s approach (see Fig. 1), the input state (in decimal representation) is subject to a conditional multiplication based on the mostsignificant bit of the period register. At most there will be two results after this first step. It follows that, for the very first step it is sufficient to implement an optimized operation that conditionally maps . Considering the importance of a highfidelity multiplication (with its performance being fedforward to all subsequent qubits), this efficient simplification improves the overall performance of experimental realizations.
(ii) Subsequent multipliers can similarly be replaced with maps by considering only possible outputs of the previous multiplications. However, using such maps will become exponentially more challenging, as the number of input and output states to be considered grows exponentially with the number of steps: after steps, possible outcomes need to be considered  a numerical task as challenging as factoring by classical means. Thus, controlled full modular multipliers need to be implemented. Fig. 2 shows the experimentally obtained truth table for the modular multiplier (see also supplementary material for modular multipliers with bases ). These quantum circuits can be efficiently derived from classical procedures using a variety of standard techniques for reversible quantum arithmetic and local logic optimization Vedral et al. (1996); Van Meter and Itoh (2005).
(iii) The very last multiplier allows one more simplification: Considering that the actual results of the modular exponentiation are not required for Shor’s algorithm (as only the period encoded in the periodregister is of interest), the last multiplier only has to create the correct amount of correlations between the period register and the computation register. Local operations after the conditional (entangling) operations may be discarded to facilitate the final multiplication without affecting the results of the implementation.
(iv) In rare cases, certain qubits are not subject to operations in the computation. Thus, these qubits can be removed from the algorithm entirely.
For larger scale quantum computation, optimization steps (i), (iii) and (iv) will only marginally effect the performance of the implementation. They represent only a small subset of the entire computation which mainly consists of the full modular multipliers. Thus, the realization of these modular multipliers is a core requirement for scalable implementations of Shor’s algorithm.
Furthermore, Kitaev’s approach requires insequence measurements, qubitrecycling to reset the measured qubit, feedforward of gate settings based on previous measurement results, as well as numerous controlled quantum operations  tasks that have not been realized in a combined experiment so far.
We demonstrate these techniques in our realization of Shor’s algorithm in an iontrap quantum computer, with five Ca ions in a linear Paul trap. The qubit is encoded in the ground state and the metastable state . The universal set of quantum gates consists of the entangling MølmerSørenson interaction Sørensen and Mølmer (1999), collective operations of the form with , , the Pauli operators of qubit , determined by the Rabi frequency and laser pulse duration , determined by the relative phase between qubit and laser, and single qubit phase rotations induced by localized ACStark shifts (for more details see the supplementary material and Ref. 18). Unitary operations illustrated in Fig. 1 are decomposed into primitive components such as twotarget CNOT and CSWAP gates (or gates with global symmetries such as the fourtarget CNOT employed here), from which an adaptation of the GRAPE algorithm Nebendahl et al. (2009) can efficiently derive an equivalent sequence of laser pulses acting on only the relevant qubits. The problem with this approach is that the resulting sequence generally includes operations acting on all qubits. Implementing the optimized 3qubit operations on a 5ion string therefore requires decoupling of the remaining qubits from the computation space. We spectroscopically decouple qubits by transferring any information from and . Here, the subspace serves as a readily available “quantum cache” to store and retrieve quantum information in order to facilitate quantum computations.
Finally, to complete the toolbox necessary for a Kitaev’s approach to Shor’s algorithm, we also implement single qubit readout (by encoding all other qubits in the subspace and subsequent electron shelving Dehmelt (1975) on the transition), feedforward (by storing counts detected during the singlequbit readout Riebe et al. (2004) in a classical register and subsequent conditional laser pulses) and statereinitialization (using optical pumping for the ion, and Ramancooling Wineland et al. (1997); Marzoli et al. (1994) for the motional state of the ion string). The pulse sequences and additional information on the implementation on the modular multipliers are available as supplementary material.
The key differences of our implementation with respect to previous realizations of Shor’s algorithm are: a) the entire quantum register is employed, without sparing qubits that don’t partake in the calculation; b) besides the trivial first multiplication step (corresponding to for , realized only once for ), all nontrivial modular multipliers have been realized and applied; and c) Kitaev’s originally proposed scheme is implemented with complete qubit recycling – doing both readout and reinitialization on the very same physical qubit. This is especially important for factoring 15 with base {2,7,8,13}, as at least two steps are required for the semiclassical QFT. In our realization we go beyond the minimal implementation of Shor’s algorithm and not only employ all 7 qubits (comprised of 4 physical qubits in the computational register, 1 qubit in the periodicity register  recycled twice, plus additional cache qubits), but also include multiplication with up to the fourth power (although they correspond to the identity operation). This represents a realistic attempt at a scalable implementation of Shor’s algorithm as the entire qubit register remains subject to decoherence processes along the computation, and no simplifications are employed which presume prior knowledge of the solution.
The measurement results for base with periodicities are shown in Fig. 3. In order to quantify the performance of the implementation, previous realizations mainly focused on the squared statistical overlap (SSO) Chiaverini et al. (2005), the classical equivalent to the Uhlmann fidelity Nielsen and Chuang (2004). While we achieved an SSO of {0.968(1), 0.964(1), 0.966(1), 0.901(1), 0.972(1)} for the case of a={2,7,8,11,13}, we argue that this does not answer the question of a user in front of the quantum computer: “What is the periodicity?” Shor’s algorithm allows one to deduce the periodicity with high probability from a singleshot measurement, since the output of the QFT is, in the exact case, a ratio of integers, where the denominator gives the desired periodicity. This periodicity is extracted using a continued fraction expansion, applied to , a good approximation of the ideal case when , the number of qubits, is sufficiently large. For the realised examples, the probabilistic nature of Shor’s algorithm becomes clear: the output state never yields any information. For periodicity (and 3 qubits in the periodregister), the output state suggests a fraction , thus a periodicity of and also fails. For peridocity , only the output states and allow one to deduce the correct periodicity. In our realisations to bases , the probabilities to obtain output states that allow the derivation of the correct periodicity are %. Thus, a confidence that the correct periodicity is obtained at a level of more than 99%, requires the experiment to run about 8 times.
In summary, we have presented the realization of Kitaev’s vision to realize a scalable Shor’s algorithm with 3digit resolution to factor 15 using bases {2,7,8,11,13}. Here, a semiclassical QFT combined with singlequbit readout, feedforward and qubit recycling was successfully employed. Compared to the traditional algorithm, the required number of qubits can thus be reduced by almost a factor of 3. Furthermore, the entire quantum register has been subject to the computation in a “blackbox” fashion. Employing the equivalent of a quantum cache by spectroscopic decoupling significantly facilitated the derivation of the necessary pulse sequences to achieve highfidelity results. In the future, spectroscopic decoupling might be replaced by physically moving the qubits from the computational zone using segmented traps Kielpinski et al. (2002).
Our investigations also reveal some open questions and problems for current and upcoming realizations of Shor’s algorithm, which also apply to several other largescale quantum algorithms of interest: particularily, finding systemspecific implementations of suitable pulse sequences to realize the desired evolution. The presented operations were efficiently constructed from classical circuits, and decomposed into manageable unitary building blocks (quantum gates) for which pulse sequences were obtained by an adapted GRAPE algorithm. Thus, the presented successful implementation in an iontrap quantum computer demonstrates a viable approach to a scalable Shor algorithm.
We gratefully acknowledge support by the Austrian Science Fund (FWF), through the SFB FoQus (FWF Project No. F4002N16), by the European Commission (AQUTE), the NSF iQuISE IGERT, as well as the Institut für Quantenoptik und Quanteninformation GmbH. EM is a recipient of a DOC Fellowship of the Austrian Academy of Sciences. This research was funded by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), through the Army Research Office grant W911NF1010284. All statements of fact, opinion or conclusions contained herein are those of the authors and should not be construed as representing the official views or policies of IARPA, the ODNI, or the U.S. Government.
Appendix A Supplementary Material
a.1 Pulse sequences
In the following, the pulse sequences employed in the experiment are discussed in more detail. The nomenclature is as follows: The collective operations on the transitions, addressing all ionqubits, realize the unitary operation
with the collective spin operator
based on the Pauli operators acting on qubit qubit . Here, the rotation angle is defined by with the Rabi frequency and the laser pulse duration . In this notation, a bit flip around corresponds to . The collective operations are supplemented by singlequbit phase shifts of the form
The phase shift is realized by illuminating a single qubit with a tightly focused laser beam detuned MHz from the carrier transition. Here, the induced ACStark shift implements the desired phase shift, with the rotation angle depending on the pulse duration . In combination, collective operations and singlequbit phase shifts allow us to implement arbitrary local operations. A universal set of quantum gates, capable of implementing any desired unitary operation, can be realized by combining these arbitrary local operations with an entangling interaction. In our experiment, we employ the MølmerSørensen (MS) interaction Sørensen and Mølmer (1999) to realize entangling operations of the form
with . Using this notation, the maximally entangling operation applied onto the qubit state directly creates the qubit GHZ state.
a.2 Singlequbit measurement
Electronshelving Dehmelt (1975) on the transition addresses, and thus projects, all qubits of the quantum register. For Kitaev’s implementation, however, only one qubit needs to be measured. With collective illumination, this can be achieved by transfering quantum information encoded in qubits that should not be measured into the state manifold. Here, the quantum information is protected against shelving light on the transition  the ion will not scatter any photons. Using light resonant with the transition (denoted by ), a refocusing sequence of the form efficiently encodes all but qubit in and . Subsequently, the entire quantum register may be subject to shelving light, yet only qubit will be projected.
a.3 Insequence detection and feedforward
When all qubits that need to be protected against projection have been encoded in the manifold, light at 397 nm resonant with the transition statedependently scatters photons an the remaining ionqubits. The illumination time is set to 300 s. A histogram of the photon counts detected at the photomultiplier tube is shown in Fig. 4. Using counter electronics with discriminator set at 4 counts within the detection window, the state with a mean count rate of 0.24 counts/ms (or 0.07 counts within the detection window) and state with a mean countrate of 48 counts/ms (or 14.4 counts in the detection window) can be distinguished with a confidence better than 99.8%. The boolean output of the discriminator is subsequently used in the electronics for statedependent pulses and thus statedependent operations.
a.4 Recooling and Qubitreset
Scattering photons during the detection window heats the ionstring and can lower the quality of subsequent quantum operations applied to the register. Therefore recooling of the ionstring after the illumination with electronshelving light is necessary. However, this recooling must not destroy any quantum information stored in the other qubits. Considering that the hidden quantum information is stored in the manifold, we employ 3beam Ramancooling Wineland et al. (1997); Marzoli et al. (1994) in the manifold. The Raman light field, consisting of and light with respect to the quantization axis, is detuned by 1.5 GHz from the resonant transition. The relative detuning between and is chosen such that it creates resonant coupling between , with representing the quantized axial state of motion of the ion. The transfer is reset by resonant light. Raman cooling is employed for 500 s. The qubit is reinitialized after cooling by an additional 50 s of light. However, if the measured qubit was found to be in state , neither does the measurement heat the ion string nor does the Raman cooling affect the register. Therefore the qubit is transferred from to (which was depleted by the previous 50 s of ). An additional pulse of light for 50 s finally initializes the qubit, regardless whether it was projected into or . During the entire time when the qubit is subject to Raman cooling or initializing light, a repump laser at 866 nm is applied to prevent population trapping in the manifold due to spontaneous decay from the state to .
a.5 Pulse sequence optimisation
For a sufficiently large Hilbertspace it will no longer be possible to directly optimize unitary operations acting on the entire register. Decomposing the necessary unitary operations into building blocks acting on smaller register sizes will allow one the use of optimized pulse sequences for largescale quantum computation. From a methodological point of view it may be preferred to physically decouple the qubits from any interactions (for instance by splitting and moving part of ionqubit quantum register out of an interaction region, such as proposed in Ref. 25). However, given the technical requirements and challenges for splitting and moving ionstrings, we focus on spectroscopically decoupling certain ionqubits from the interaction. In particular, we spectroscopically decouple an ion from subsequent interaction by transferring any quantum information from the manifold to the manifold using refocusing techniques on the and transitions. Using this approach, we optimise the controlled swap operation in a 3qubit Hilbert space rather than a 5qubit Hilbert space.
a.6 ControlledSWAP
The controlledSWAP operation, also known as Fredkin operation, plays a crucial role in the modular multiplication. For its implementation, however, we could not derive a pulse sequence that can incorporate an arbitrary number of spectator qubits — qubits, that should be subject to the identity operation — in the presented case, i.e. 2 spectator qubits in the computational register. However, using decoupling of spectator qubits, this additional requirement on the implementation is not necessary. Using pulse sequence optimization Nebendahl et al. (2009), we obtained a sequence for the exact threequbit case as shown in Tab. 1. In total the sequence consists of 18 pulses, including 4 MS interactions.
Pulse Nr.  Pulse  Pulse Nr.  Pulse 

1  10  
2  11  
3  12  
4  13  
5  14  
6  15  
7  16  
8  17  
9  18 
a.7 FourTarget ControlledNOT
The modular multipliers and require, besides Fredkin operations, also CNOT operations acting on all qubits in the computational register. Such an operation can be implemented (see Ref. 26 (p.90, eq. 5.21) ) with 2 MS operations plus local operations only  regardless of the size of the computational register. The respective sequence is shown in Tab. 2.
Pulse Nr.  Pulse  Pulse Nr.  Pulse 

1  6  
2  7  
3  8  
4  9  
5 
a.8 TwoTarget ControlledNOT
There exists an analytic solution to realize multitarget controlledNOT operations in the presence of spectator qubits with the presented set of gates Nebendahl (2008)  as required for the multiplier. However, we find that performing decoupling of subsets of qubits of the quantum register prior to the application of the multitarget controlledNOT operation presented above both facilitates the optimisation, and improves the performance of the realisation of a twotarget controlledNOT operation. Thus, the required twotarget controlledNOT operation is implemented via (i) decoupling qubits 2 and 4, (ii) performing a multitarget controllednot on all qubits with the first qubit acting as control, and (iii) recoupling of qubits 2 and 4.
a.9 Controlled Quantum Modular Multipliers
Based on the decomposition shown in Fig. 1d) and the respective pulse sequences outlined in the previous section, we investigate the performance of the building blocks as well as the respective conditional multipliers. In the following, the fidelities are defined as mean probabilities and standard deviations to observe the correct output state. The elements in the respective truth tables have been obtained as average over 200 repetitions.

The Fredkin operation, controlled by qubit 1 and acting on qubits , yields fidelities of . These numbers are consistent with MS gate interactions at a fidelity of about 95% acting on three ions (in the presence of two decoupled ions) and local operations at a fidelity of 99.3%.

The 4target CNOT gate operates at a fidelity of .

Considering the quality for modular multipliers of , we find fidelities of . This performance is consistent with the multiplication of the performance of the individual building blocks: .
References
 P. W. Shor, Foundations of Computer Science, 1994 Proceedings., 35th Annual Symposium on pp. 124–134 (1994).
 A. Politi, J. C. F. Matthews, and J. L. O’Brien, Science 325, 1221 (2009).
 E. MartinLopez, A. Laing, T. Lawson, R. Alvarez, X.Q. Zhou, and J. L. O’Brien, Nat Photon advance online publication (2012).
 E. Lucero, R. Barends, Y. Chen, J. Kelly, M. Mariantoni, A. Megrant, P. O/’Malley, D. Sank, A. Vainsencher, J. Wenner, et al., Nat Phys 8, 719 (2012).
 C. Y. Lu, D. E. Browne, T. Yang, and J. W. Pan, Physical Review Letters 99, 250504+ (2007).
 B. P. Lanyon, T. J. Weinhold, N. K. Langford, M. Barbieri, D. F. V. James, A. Gilchrist, and A. G. White, Physical Review Letters 99, 250505+ (2007).
 L. M. K. Vandersypen, M. Steffen, G. Breyta, C. S. Yannoni, M. H. Sherwood, and I. L. Chuang, Nature 414, 883 (2001).
 J. A. Smolin, G. Smith, and A. Vargo, Pretending to factor large numbers on a quantum computer (2013), eprint 1301.7007, URL http://arxiv.org/abs/1301.7007.
 Kitaev (1995), eprint quantph/9511026.
 M. A. Nielsen and I. L. Chuang, Quantum Computation and Quantum Information (Cambridge Series on Information and the Natural Sciences) (Cambridge University Press, 2004), 1st ed., ISBN 0521635039.
 K. Southwell, Quantum Coherence (Nature Insight), Nature 453, 1003 (2008).
 R. B. Griffiths and C. S. Niu, Physical Review Letters 76, 3228 (1996).
 S. Parker and M. B. Plenio, Physical Review Letters 85, 3049 (2000).
 M. Mosca and A. Ekert, in Quantum Computing and Quantum Communications, edited by C. Williams (Springer Berlin Heidelberg, 1999), vol. 1509 of Lecture Notes in Computer Science, pp. 174–188.
 V. Vedral, A. Barenco, and A. Ekert, Physical Review A 54, 147 (1996), ISSN 10502947, URL http://dx.doi.org/10.1103/physreva.54.147.
 R. Van Meter and K. M. Itoh, Physical Review A 71 (2005), ISSN 10502947, URL http://dx.doi.org/10.1103/physreva.71.052320.
 A. Sørensen and K. Mølmer, Phys. Rev. Lett. 82, 1971 (1999), eprint quantph/9810039.
 P. Schindler, D. Nigg, T. Monz, J. T. Barreiro, E. Martinez, S. X. Wang, S. Quint, M. F. Brandl, V. Nebendahl, C. F. Roos, et al., New Journal of Physics 15, 123012+ (2013), ISSN 13672630, URL http://dx.doi.org/10.1088/13672630/15/12/123012.
 V. Nebendahl, H. Häffner, and C. F. Roos, Phys. Rev. A 79, 012312 (2009).
 H. Dehmelt, Bull. Am. Phys. Soc. 20, 60 (1975).
 M. Riebe, H. Haffner, C. F. Roos, W. Hansel, J. Benhelm, G. P. T. Lancaster, T. W. Korber, C. Becher, F. SchmidtKaler, D. F. V. James, et al., Nature 429, 734 (2004).
 D. J. Wineland, C. Monroe, W. M. Itano, D. Leibfried, B. E. King, and D. M. Meekhof, Journal of Research of the National Institute of Standards and Technology 103, 259 (1997), eprint quantph/9710025, URL http://arxiv.org/abs/quantph/9710025.
 I. Marzoli, J. Cirac, R. Blatt, and P. Zoller, Physical Review A 49, 2771 (1994), ISSN 10502947, URL http://dx.doi.org/10.1103/physreva.49.2771.
 J. Chiaverini, J. Britton, D. Leibfried, E. Knill, M. D. Barrett, R. B. Blakestad, W. M. Itano, J. D. Jost, C. Langer, R. Ozeri, et al., Science 308, 997 (2005), ISSN 10959203, URL http://dx.doi.org/10.1126/science.1110335.
 D. Kielpinski, C. Monroe, and D. J. Wineland, Nature 417, 709 (2002).
 V. Nebendahl, Master’s thesis, University of Innsbruck, Austria (2008), URL http://heartc704.uibk.ac.at/publications/diploma/diplom_nebendahl.pdf.