An efficient quantum circuits optimizing scheme compared with QISKit
Abstract
Recently, the development of quantum chips has made great progress– the number of qubits is increasing and the fidelity is getting higher. However, qubits of these chips are not always fully connected, which sets additional barriers for implementing quantum algorithms and programming quantum programs. In this paper, we introduce a general circuit optimizing scheme, which can efficiently adjust and optimize quantum circuits according to arbitrary given qubits’ layout by adding additional quantum gates, exchanging qubits and merging singlequbit gates. Compared with the optimizing algorithm of IBM’s QISKit, the quantum gates consumed by our scheme is 74.7%, and the execution time is only 12.9% on average.
1 Introduction
Quantum computing has attracted increasing attention because of its tremendous computing power [1, 2, 3] in recent years. There are more and more companies and scientific research institutions who devote themselves to developing quantum chips with more qubits and higher fidelity. While most theoretical studies assume that interactions between arbitrary pairs of qubits are available, almost all these realistic chips have certain constraints on qubit connectivity[4, 5]. For example, IBM’s 5qubit superconducting chips Tenerife and Yorktown[6] adopt neighboring connectivity ( illustrated in Fig.1 (a) and 1 (b), respectively). [7] uses a 4qubit superconducting chip, in which four qubits are not directly connected, but are connected by a central resonator. That is, the layout of this chip is central, as shown in Fig.1 (c). In addition, CASAlibaba Quantum Laboratory’s 11qubit superconducting chip[8] and Tsinghua University’s 4qubit NMR chip[9] both reduce the fully connectivity to the linear nearestneighbor connectivity, as shown in Fig.1 (d). Distinctly, this nonfully connected connection sets additional barriers for implementing quantum algorithms and programming quantum programs.
On the other hand, decoherence[10] is a huge challenge for quantum computing and the quantum programs should be executed within coherence time[11]. For getting more reliable results, we need to reduce the quantum circuit depth[12] as far as possible. However, for nonfully connected physical layouts, if we want to execute arbitrary quantum programs, we must add additional quantum gates to adjust the original quantum program, which will inevitably lead to an increase in the depth. Therefore, it is of great practical significance to design an optimization algorithm which can minimize the overhead as mush as possible.
As early as 2007, D. Cheung et al. made a discussion about the nonfully connected physical layout[4]. By adding SWAP gates, they turned illegal CNOT operations into legitimate operations and proved that the starshaped or the linear nearestneighbor connectivity could be able to utilize additional quantum gates to complete the adjustment, where stands for the number of qubits. In 2017, IBM developed a quantum information science kit, namely QISKit [13], which contains an algorithm that can adjust and optimize quantum programs according to any layout. Recently, in order to find more efficient solutions, IBM organized the QISKit Developer Challenge [14]. As for the optimization of quantum circuits, in order to simulate more qubits on classical computers, E. Pednault et al. proposed a method, namely slice[15], to split the original quantum circuit into multiple subcircuits. In this way, they simulate a random quantum circuit with depth 27 in a 2D lattice of qubits and a circuit with depth 23 in a 2D lattice of qubits on the IBM Blue Gene/Q supercomputer, which improved the number of entangled qubits that classical computers can simulate. However, the slice approach is focused on the simulation of more entangled qubits, so it do not take into account the physical layout, and is only applicable to programs with short circuit depth.
In this paper, we propose a general enough quantum circuit optimizing scheme which can efficiently adjust and optimize any quantum circuit according to any layout. The remainder of this paper is organized as follows: Section 2 briefly introduce the necessary conceptions. In Section 3, the design concept of our optimizing scheme is presented in detail. We next compare the cost and efficiency of our scheme with QISKit’s optimizing method in Section 4. The conclusion and future research can be found in Section 5.
2 Preliminaries
2.1 QISKit
QISKit is a quantum information science kit developed by IBM, which takes the quantum programs written by OpenQASM[16] as the input. It adjusts and optimizes the input programs according to the given layout, and then executed the programs by its builtin QASMsimulator or cloudbased quantum chips.
OpenQASM is a variant of QASM[17], which is designed to control a physical system with a parameterized gate set. Specifically, OpenQASM takes as the basic quantum gates set, where
(1)  
Obviously, actually has an infinite number of singlequbit gates and it is universal[18]. For comparison with QISKit, our optimizing scheme also takes it as the basic set of quantum gates.
2.2 Common solutions
Before introducing the common solutions, we need to point out the main obstacles for hindering the execution of quantum programs:

Obstacle1: the direction of CNOT gate is illegal, as shown in the red line in Fig.2 (a);

Obstacle2: the connectivity between two specific qubits is illegal, as shown in the blue line.
For Obstacle1, a common solution is to flip the direction by 4 additional H gates:
As for Obstacle2, the basic idea is exchanging the states of qubits by SWAP gates. For example, although cnot(, ) is illegal in Fig.2 (a), we can use another way to accomplish the same task, such as the circuit shown in Fig.3.
However, the additional overhead of this solution is costly, especially for sparse physical layouts. Specifically,
where stands for the number of intermediate nodes on the shortest path between the controlqubit and the targetqubit, stands for 3 CNOT gates and 4 H gates.
3 Our Optimizing Scheme
As mentioned before, the nonfully connected layout is widely adopted. There are only two ways to execute arbitrary quantum programs:

Hardware solution: Completely changing the layouts of chips and constructing fully connected chips;

Software solution: Designing a circuit optimization algorithm, which is able to adjust the original quantum program to meet requirements of the chip.
Our optimizing scheme is an efficient general solution from software level. Specifically, we design the following three steps to adjust and optimize quantum programs based on the common solutions described in Section 2.2.
3.1 The global adjustment of qubits
The global adjustment of qubits means that before the execution of quantum programs, we first compare the connected relation of quantum programs with the given layout, and directly exchange the qubits.
The greatest advantage of this step is that no additional quantum gates need to be consumed.
Therefore, the number of additional quantum gates consumed will be minimum if all illegal CNOT gates can be handled in this step.
For simplicity, we assume that any edge in the given layout is bidirectional in this step and Local adjustment, that is, the Obstacle1 is ignored in the two steps.
Specifically, this step can be described as Algorithm 1.
In Algorithm 1, we extract all CNOT gates from the quantum program separately and traverse them from front to back.
Once encountering an illegal CNOT gate, we try to find an available qubits’ mapping to adjust the whole OpenQASM code without converting the traversed CNOT gates illegal.
At each adjustment, we have available mappings to choose, where stands for the number of mappings which make some traversed CNOT gates illegal, and stand for the number of adjacent qubits of controlqubit and targetqubit in the given layout, respectively.
The traversal terminates when there is no illegal CNOT gate or .
Suppose that there are possible mappings, where is related to the given layout and the connectivity of quantum programs.
At this point, we need to estimate the cost of solving Obstacle2 in the program adjusted according to these mappings ( mappings and one empty mapping) respectively. Then take the smallest one as the global adjustment mapping. The reason for estimation, rather than accurate calculation, and the estimation process are explained in the next part.
Finally, we adjust the qubits of the original OpenQASM code according to the global mapping. As for the classical register, which stores the results of the measurement, does not need to be modified.
For example, is illegal in Fig.2 and it can be adjusted by the global mapping , as shown in Fig.4.
3.2 The local adjustment of qubits
In this step, the exchange of qubits’ state mainly depends on adding SWAP gates. Compared with the basic solution described in Section 2.2, our scheme has the following differences:

There is no need to use SWAP gates again to restore the state. Instead, we use the qubits involved in the exchange and intermediate qubits to generate a local mapping, then modify the subsequent gates and classical registers according to the mapping;

Due to the existence of the first difference, the effect of exchanging controlqubit with intermediate qubits by SWAP gates and exchanging targetqubit with these qubits is completely different for the subsequent code. Therefore, we need to calculate the gate costs in the two cases respectively and take the smaller one as the object of exchange.
However, it is difficult to accurately calculate the costs of these two cases in the second difference.
During the calculation, we will encounter several illegal CNOT gates, and for each illegal CNOT, we have two solutions.
Actually, the solution space is a binary tree whose height is and the number of leaf nodes is approximately , where stands for the number of illegal CNOT gates.
Obviously, classical computers have no ability to complete such largescale calculations in a relatively short time and we can only estimate the cost. Essentially, the estimation process is based on greedy ideas and easily trapped into the local optimization.
With the increase in the scale of quantum programs, the manifestation of this greedy choice is more obvious, which can be seen in Section 4.
In our scheme, the cost of adjusting the OpenQASM code is estimated by
where stands for the number of intermediate qubits between the controlqubit and the targetqubit of the th illegal CNOT, and stands for 3 CNOT gates and 4 H gates.
Among the various estimation formulas we tried, the result obtained by Equation (4) is optimal.
The reason for adding the correction factor in Equation (4) is that the later the CNOT gate is executed, the easier it is influenced by the previous adjustments.
That is, estimation is not reliable for the later CNOT gates.
Multiplying the factor, which will continue to decrease as the estimation progress, with the estimation results can have a certain correction effect.
For improving the accuracy of estimation, we accurately calculate the top layers of the binary tree, and estimate the cost of the subsequent gates of the cases respectively, where is the optimal value determined after repeated trials.
Then add the estimated result and the calculated result together and choose the smallest one among the 16 cases as our choice.
Specifically, we traverse the OpenQASM code. Whenever encountering an illegal CNOT, we call Algorithm 2 to adjust it and then update the subsequent code and the classical register until the traversal terminates. It can be seen from Algorithm 2 that the mapping generated by Adjust function only affects the subsequent code of and that is why we call this step Local adjustment.
At this point, there is no Obstacle2 in quantum programs. Then we traverse the new OpenQASM code again to handle Obstacle1 by Equation (2).
3.3 The mergence of singlequbit gates
In this step, we will reduce the circuit depth by merging singlequbit gates.
At first, we need to determine which kind of singlequbit gates can be merged.
The random quantum circuit shown in Fig.5 (a) contains three CNOT gates and these gates divide the execution processes of , , into three parts respectively. Obviously, singlequbit gates in these parts can be merged and we can reduce Fig.5 (a) to Fig.5 (b). Based on this example, we can draw a conclusion that for any qubit , the multiqubit gates with involved can divide the execution process of into subintervals and the singlequbit gates in each subintervals can be merged into one gate.
As mentioned before, all singlequbit gates in OpenQASM belong to . Therefore, merging singlequbit gates actually contains 9 different cases: , , , , , , , and . In order to handle these cases, we need to do ZY decompositions[19] for , and . By Equations (1), we obtain:
(5)  
For the first five cases, we can directly merge them by [18]. As for the last four cases, we have:
(6)  
The key of this kind of merging lies in how to transform the YZ decomposition of a quantum gate to the ZY decomposition. And we use QISKit’s merge method proposed in [20] to solve this problem. So far, we complete the adjustment and optimization of the original quantum program according to any given layout.
4 Numerical Results
In this section, we take QISKit’s optimizing method as the benchmark to evaluate the performance of our optimizing scheme in different scales of quantum programs and different layouts of quantum chips. In addition, we use the method proposed in the QISKit Developer Challenge to count the cost of gates:
where and stand for the number of CNOT gates and singlequbit gates in optimized quantum circuit, respectively.
4.1 Platform
Hardware Platform
All the experiments in this paper are executed on a PC with an Intel Core i7 processor and 8GB of RAM. Furthermore, we have no special hardware acceleration, such as a GPU.
Software Platform
In order to verify the correctness of our scheme, we use the QASMsimulator to execute the optimized circuits. In addition, we also use a special method to generate random quantum circuits, which first generates random circuits whose quantum gates belong to [21], and then decomposes these gates into gates belonged to [22]. The advantage of this method is that we can fully test different connections between qubits and the fairness of comparison between our optimizing scheme and QISKit (version=0.4.11) can be guaranteed.
The detailed execution flow of our experiments is shown in Fig.6.
It should be noticed that for accurate description, the circuit depth mentioned in the following is still circuit’s depth, and the actual depth is about 7 times of it.
4.2 Results
As we all know, the number of qubits and the circuit depth are important indicators for the scale of quantum programs.
Therefore, the experiments are designed as follow: for the cases of qubits number from to , we generate different random quantum circuits respectively for cases with circuit depth from to respectively. That means, in total, circuits are generated.
Then we chose four common connected graphs (linear, central, neighboring and circular) and use our optimizing scheme and QISKit’s algorithm to adjust and optimize these random circuits according to these layouts, respectively. That is, each algorithm handles () quantum circuits.
Finally, the optimized quantum programs are executed by QASMsimulator. If the result of our scheme is consistent with QISKit’s result, we count the cost and the execution time of each circuit.
All quantum circuits, layouts and the source code of our scheme can be found in Github^{1}^{1}1https://github.com/zhangxin20121923/QISKit_Deve_Challenge.
Comparison with QISKit’s optimizing method
Table 1 shows the quantum gates consumption of the original random quantum circuits, and the average cost of gates and compiler time required to adjust and optimize these circuits by our scheme and QISKit.
Time (s)  Gate Cost  

Original Circuit  0  3084391 
Our Scheme  16472.48  6703061 
QISKit  127751.99  8974717 
Obviously, the quantum gates consumed by our scheme is 74.7% of QISKit, and the execution time is only 12.9%.
Specifically, the performance of our scheme varies for different scales of quantum circuits.
Fig.7 (a) and Fig.7 (b) illustrate the ratio of QISKit and our scheme about the cost of quantum gates and efficiency with various qubits and circuit depths , respectively. The two formulas are shown as follows:
where and stand for the gate cost and execution time of QISKit’s algorithm, and and indicate those of our method.
Fig.7 shows that in all cases we executed, our algorithm can use fewer quantum gates to adjust and optimize the original circuits in less time.
In the worst case (more qubits and more circuit depth), we can use 6% less gates and the efficiency is about 5 times;
in optimal case (more qubits and less circuit depth), we can use 63% less gates and the efficiency is about 20 times.
Obviously, the results are consistent with the theory:
when the number of qubits is large and the circuit depth is small, since we recursively calculate 4 layers of the solution space tree, the choice is more reliable and the performance is better;
when the number of qubits is small, the layout tends to be fully connected and our scheme does not have advantages;
and when the circuit depth is large, we will be easily trapped into the local optimum and the performance of our scheme is worse than that of the small depth.
Performance in different physical layouts
For the four layouts we have chosen, there are also significant differences in costs of quantum gates and execution time.
In order to deal with different scales of circuits in a fair manner and avoid the statistical result being dominated by largescale circuits, we no longer directly sum up the gate costs in different cases (as used in Table 1).
Specifically, the statistical method is as follows:
where , , , and stand for the gate cost of the th original circuit, the th circuit adjusted by QISKit and our scheme respectively, and stand for the time required to compile the th circuit by QISKit and our scheme respectively.
Fig.8 (a) shows that for the central layout, our scheme requires times the gate consumption of the original circuit, and the optimizing method of QISKit requires times; for the linear layout, the gate cost of our scheme is times as many as the original cost and the cost of QISKit is about times; as for the circle and neighbour layouts, our scheme need to use times and times the gate cost respectively, while QISKit’s method need times and times. And Fig.8 (b) illustrates that for the linear, circle and neighbour layouts, our scheme is about 4 times faster than QISKit; as for the central layout, the efficiency of our schemes is about 17.3 times as fast as QISKit’s method.
5 Conclusions and Future Research
Considering the cost of physical implement, layouts of most existing quantum chips are not fully connected, which sets additional barriers for implementing quantum algorithms and programming quantum programs.
Therefore, a better approach is to automate the task of adjusting and optimizing quantum programs according to any given layout by the compiler of quantum computer.
We propose a general optimizing scheme to accomplish the task by adding additional logic gates, exchanging qubits in the quantum register and merging singlequbit gates.
Compared with QISKit’s optimizing method, the quantum gates consumed by our scheme is and the execution time is only overall. For circuits with more qubits and less circuit depth, this advantage is more obvious.
In addition, several common connected graphs (linear, central, neighboring and circular) are compared as well. In these four cases, our scheme has advantages. Especially for the central layout, we can use only gates and execution time of QISKit’s optimizing algorithm to adjust and optimize the original quantum circuits.
Future Research
In our scheme, we often use the idea of greedy algorithm to make a choice when the circuit depth of the quantum program is deep. But the experimental results in section 4 show that we made wrong choices in some cases, and got trapped in the local optimal solution. If we can find more equitable selection criteria or even calculate the global optimal solution, we will further reduce the consumption of additional logic gates.
In addition, a high precision floatingpoint calculation is needed in the combination of singlequbit logic gates, which takes up about 70% of the total compile time. Whether we can find more efficient merging methods is a problem worth of consideration. In order to further evaluate different physical layouts, we also plan to discuss with the R&D teams of actual quantum chips to combine the actual overhead needed to design different layouts and the expense of the software level.
References
 [1] Daniel R Simon. On the power of quantum computation. SIAM journal on computing, 26(5):1474–1483, 1997.
 [2] Peter W Shor. Polynomialtime algorithms for prime factorization and discrete logarithms on a quantum computer. SIAM review, 41(2):303–332, 1999.
 [3] Lov K Grover. A fast quantum mechanical algorithm for database search. In Proceedings of the twentyeighth annual ACM symposium on Theory of computing, pages 212–219. ACM, 1996.
 [4] Donny Cheung, Dmitri Maslov, and Simone Severini. Translation techniques between quantum circuit architectures.
 [5] Norbert M Linke, Dmitri Maslov, Martin Roetteler, Shantanu Debnath, Caroline Figgatt, Kevin A Landsman, Kenneth Wright, and Christopher Monroe. Experimental comparison of two quantum computing architectures. Proceedings of the National Academy of Sciences, page 201618020, 2017.
 [6] The backend information of ibm quantum cloud. https://github.com/QISKit/qiskitbackendinformation/.
 [7] YP Zhong, D Xu, P Wang, C Song, QJ Guo, WX Liu, K Xu, BX Xia, CY Lu, Siyuan Han, et al. Emulating anyonic fractional statistical behavior in a superconducting quantum circuit. Physical review letters, 117(11):110501, 2016.
 [8] The url of alibaba’s quantum cloud platform. http://quantumcomputer.ac.cn/index.html.
 [9] Tao Xin, Shilin Huang, Sirui Lu, Keren Li, Zhihuang Luo, Zhangqi Yin, Jun Li, Dawei Lu, Guilu Long, and Bei Zeng. Nmrcloudq: a quantum cloud experience on a nuclear magnetic resonance quantum computer. Science Bulletin, 2017.
 [10] H Dieter Zeh. On the interpretation of measurement in quantum theory. Foundations of Physics, 1(1):69–76, 1970.
 [11] David P DiVincenzo et al. The physical implementation of quantum computation. arXiv preprint quantph/0002077, 2000.
 [12] A ChiChih Yao. Quantum circuit complexity. In Foundations of Computer Science, 1993. Proceedings., 34th Annual Symposium on, pages 352–361. IEEE, 1993.
 [13] Qiskit python api. https://qiskit.org/.
 [14] Qiskit developer challenge. https://qxawards.mybluemix.net/.
 [15] Edwin Pednault, John A Gunnels, Giacomo Nannicini, Lior Horesh, Thomas Magerlein, Edgar Solomonik, and Robert Wisnieff. Breaking the 49qubit barrier in the simulation of quantum circuits. arXiv preprint arXiv:1710.05867, 2017.
 [16] Andrew W Cross, Lev S Bishop, John A Smolin, and Jay M Gambetta. Open quantum assembly language. arXiv preprint arXiv:1707.03429, 2017.
 [17] Krysta M Svore, Alfred V Aho, Andrew W Cross, Isaac Chuang, and Igor L Markov. A layered software architecture for quantum computing design tools. Computer, 39(1):74–83, 2006.
 [18] A Barenco, C. H. Bennett, R Cleve, D. P. Divincenzo, N Margolus, P Shor, T Sleator, J. A. Smolin, and H Weinfurter. Elementary gates for quantum computation. Physical Review A, 52(5):3457, 1995.
 [19] Michael A Nielsen and Isaac Chuang. Quantum computation and quantum information, 2002.
 [20] The code of merging two single gates. https://github.com/QISKit/qiskitsdkpy/blob/master/qiskit/mapper/_mapping.py.
 [21] Francesco Iachello. Lie algebras and applications, volume 708. Springer, 2006.
 [22] Farrokh Vatan and Colin Williams. Optimal quantum circuits for general twoqubit gates. Physical Review A, 69(3):032315, 2004.