Universal Programmable Quantum Circuit Schemes to Emulate an Operator

Universal Programmable Quantum Circuit Schemes to Emulate an Operator

Anmer Daskin Department of Computer Science, Purdue University, West Lafayette, IN, 47907 USA    Ananth Grama Department of Computer Science, Purdue University, West Lafayette, IN, 47907 USA    Giorgos Kollias Department of Computer Science, Purdue University, West Lafayette, IN, 47907 USA    Sabre Kais kais@purdue.edu Department of Chemistry, Department of Physics and Birck Nanotechnology Center, Purdue University, West Lafayette, IN 47907, USA Qatar Environment and Energy Research Institute, Qatar Foundation, Doha, Qatar

Unlike fixed designs, programmable circuit designs support an infinite number of operators. The functionality of a programmable circuit can be altered by simply changing the angle values of the rotation gates in the circuit. Here, we present a new quantum circuit design technique resulting in two general programmable circuit schemes. The circuit schemes can be used to simulate any given operator by setting the angle values in the circuit. This provides a fixed circuit design whose angles are determined from the elements of the given matrix-which can be non-unitary-in an efficient way. We also give both the classical and quantum complexity analysis for these circuits and show that the circuits require a few classical computations. They have almost the same quantum complexities as non-general circuits. Since the presented circuit designs are independent from the matrix decomposition techniques and the global optimization processes used to find quantum circuits for a given operator, high accuracy simulations can be done for the unitary propagators of molecular Hamiltonians on quantum computers. As an example, we show how to build the circuit design for the hydrogen molecule.

I Introduction

The classical logical devices can be broadly categorized as fixed and programmable devices. As we understand from their names, the circuits in a fixed logic can only support one function which is determined at the time of manufacture. This cannot be changed at a later day. On the other hand, programmable devices such as PLDs and FPGAs are able to support an infinite number of functionalities since they can be reconfigured outside of the manufacturing environment. With this feature designers and programmers can run and simulate their test designs and algorithms. bobda2007introduction ()

Quantum computing has become a huge new interdisciplinary area by providing different approaches and protocols to various subfields including: communication, encryption, global binary optimization (see adiabatic quantum computing Farhi ()), linear algebra, and so on Chuang (); Kaye (); Williams (); however, programmable quantum circuits and chip designs like those in classical computers have remained an open issue.

In the circuit model of quantum computing, unitary matrix operators represent the algorithms or some part of the computationsDaskin2 (). Hence, one of the fundamental issues is to have a general purpose quantum circuit or a quantum chip that can realize different types of algorithms in a fast and an efficient way. The possibility of designing universal quantum gate arrays as a general purpose quantum computer has been discussed in ref.Nielsen (). It is shown that a gate array can be programmed to evaluate the expectation value of a given operatorJuan (). For the realization of a quantum gate, a cell structured quantum circuit design based on the activation and the deactivation of the gates on different qubits is proposed: It is shown that a combination of such cells can be used to realize a given quantum gate sequenceDesousa (). Moreover, different schemes of general programmable universal quantum circuits are shown for two Vidal (); Zhang () and three qubitsmottonen (); Vatan (); Rui () found by applying different decomposition schemes to a given unitary operator. Based on the general two-qubit circuit design, a two-qubit quantum processor is experimentally realizedHanneke (). However, the realization of a general quantum processor and a full-scale quantum computer is still an obstacle which requires new theoretical and experimental improvementsLanyon ().

It is known that the realization of quantum logical operations can be simplified by using the higher dimensional Hilbert spacesLanyon (); Bary (). In this paper, using ancilla qubits, we describe a new circuit design approach which produces two programmable quantum circuit designs. These can be further improved to design general large-scale quantum chips and programmable quantum gate arrays. The circuits also support simulation of non-unitary matrices. We also show the complexity analysis for the circuits: in terms of quantum complexity, they have about the same complexity as non-programmable designs which are generated by using matrix decompositions in numerical linear algebra such as QR decompositionGolub (), the quantum Shannon decomposition, the cosine-sine decomposition and some others Peter (); vivek () (see ref. vivek () for the comparison and the complexities of these methods). In terms of classical complexity, since angles for our programmable circuits can be determined simply individual matrix elements, the classical complexity is much simpler than the decomposition methods.

This paper is organized as follows: After giving the general simulation idea, the details of two circuit designs implementing this idea are presented. Then the complexity of the circuits are analyzed in terms of classical and quantum complexities. Finally, we discuss the circuit designs and possible future directions. In the appendix, more computational details related to matrices are presented.

Ii The General Simulation Idea

For a given real unitary with , and n is the number of qubits, the relationship between the input and the output is defined as generating states:


Any system of higher dimension ( ancilla qubits are added to the original system) can also be used to generate this output on chosen states with some normalization. Our goal is to create a matrix (shown in Eq.2) which represents the system with the ancilla. We then modify the initial input to this extended system (the initial state of ancilla is taken as ) by using quantum operations such that the application of to this modified input includes the output given in Eq.(1) with a normalization constant :


where each has some distinct rows of as their leading rows. Adding a sufficient number of ancilla qubits to control each uniformly (as shown in Fig.1) permits us to produce the circuit equivalent of matrix in the above equation. If we assume that the first row of is (or includes) the th row of , then we need to use such blocks as shown in Eq.(2).

Figure 1: The number of qubits on the ancilla determines the number of s and hence the size of in Eq.(2).

The quantum operations to construct the matrix and the operations to modify the input form the circuit that simulates the given operator. That means, steps to form rows of in and also to transform to generate the general circuit design for the simulation of . One way to formulate these steps and to build matrices and the input is as follows: First, the system is extended by adding auxiliary qubits. These ancilla qubits uniformly control different block quantum operations, s, on the main qubits (in this paper, or number of auxiliary qubits are used). After the formation of all elements of U which we call the Formation step, the same row elements of are brought to the first row of each which we call the Combination step. The input is modified ( by a small circuit such that produces an output which includes the normalized states expected from the operation . We call this step the Input modification step. The measurement results for these states exactly simulate . The circuit design to be found with these steps can be drawn as a block circuit diagram (as shown in Fig.2). This approach provides a new way to find circuit designs. Hence, we will describe two different programmable circuit schemes based on the block circuit in Fig.2.

Iii Generation of Programmable Circuits

iii.1 The First Circuit Design

In this design, first we create all elements of at the diagonal positions of by using one rotation gate for each element of , Formation step. In the Combination step, the elements on each th row of are collected in the first row of each .

Figure 2: Block circuit diagram to simulate by modifying the input to and constructing in two steps: the formation of the elements of in and bringing the same row elements in to the first rows of s in , combination. The necessary gates to form and to also transform to will generate the circuit.

Formation Step: In this step, the elements of are tiled across the diagonal of a new higher-dimensional matrix . This is a block diagonal matrix with blocks across the diagonal. For each element of , one rotation gate is used. The angular value for the gate is determined to form an element of as its cosine value. Controlling such gates in a uniform binary coded fashion produces the matrix which has all elements of on its diagonal:


where generating the element of . We use number of ancilla qubits to uniformly control each , .

Combination Step: To bring the same row elements of to the first rows of the s, we need a quantum operation which will produce the matrix represented as:


where should have a form similar to the following matrix:


For a system with (n+1) qubits, the single Hadamard gates on the first qubits generate the above matrix with . Hence, is the matrix form of this operation in the system with qubits where we apply the Hadamard gates to the st, th, …, rd, and nd qubits from the bottom in the circuit.

Input modification Step: In the final matrix in Eq.4, since the corresponding state for the rows which posses the elements of with the normalization factor are to be assigned as chosen states simulating , we should modify the input in such a way that the elements represented as “”s between and are disregarded. That means the initial input should be transformed into by a prior operation to the final matrix so that the corresponding elements in the input to “” elements are set to zero:


where is a normalization constant. It is easy to see that this modification can be succeeded by simple Hadamard gates on the first qubits, and sequential swap operations between the (n+1)st and the remaining qubits.

The equivalent circuit simulating any is drawn in Fig.3 for qubit system by adding ancilla qubits and replacing the block circuits in Fig.2 with the explicit circuits found above.

At the end of this circuit, which can be decomposed into one- and two-qubit gates by using the decomposition technique discussed in Sec.IV, the following set of states exactly simulates the given unitary after normalization:


where the dashes are used to separate the main and the ancillary.

In Appendix A, we give an example of the explicit matrices used for each step of the algorithm.

Figure 3: The first circuit design for a given general matrix: the initial Hadamards and the SWAPs are to modify the input, and the last Hadamards carry the elements to the first rows of s (combination step). The uniformly controlled quantum gates in the middle form all elements of on the diagonal of (formation step).

iii.2 The Second Circuit Design

In the first circuit design, the elements of are initially formed on the diagonal of by using uniformly controlled rotation gates. Here, we take a group of elements from a row of and create them as the leading row of small block matrices by preserving the ratios between the elements. Using a rotation gate for each two of these initial small blocks, we create larger block matrices which will have more elements of in their first rows. This combination of steps is iteratively done until the final s with leading rows having the rows of as in Eq.(2) are constructed. Since the final blocks, s, are , the matrix is ; therefore qubits are needed for the ancilla. The input modification step follows the same idea as described for the first design.

Formation Step: As stated above, instead of forming matrix elements at the diagonal positions by using a rotation gate for each element of , a group of elements is created in the first row of each block with the same ratio as those elements in the original matrix. For instance, if the initial blocks are of dimension 2 by 2, the first row implements two elements, and , of . Thus, the ratio between the first element and the second element of a 2 by 2 block matrix is the same as (since the block is 2 by 2, the elements of the block matrix are the cosine and sine values of an angle which provides the equality . In our circuit designs, we will assume , and so the first row elements of each block implement the ratios of the elements in the same order as the original matrix. Therefore, if the first blocks are of dimension ; the total number of initial blocks will be since each block implements number of elements. The following matrix represents the formation step for 2 by 2 initial blocks:


where s are the normalization constants, and s are the elements of . The block operations in Fig.4 produce a matrix with 4 by 4 block matrices on its diagonal.

Combination Step: After the formation with ratios, blocks are combined using one rotation gate for each pair of two blocks so as to form new larger blocks with new normalization constants that preserve the original ratios of the elements. Each of these new blocks has twice as many elements as the former blocks. As an example, we will combine two 4 by 4 matrices located on the diagonal of the matrix :


where and and are the normalization factors. The following matrix, , can be used as a combination matrix to generate an 8 by 8 larger block from the above pair of two 4 by 4 blocks:


where , , and is an angle to achieve the required ratio. The matrix multiplication produces a matrix with the leading row , where and .

It is easy to see that the matrix can be written as . Hence, any such general combination matrix can be written as where is the size of the blocks to be combined by using ; and R is a general one qubit rotation gate. This means that for the blocks operating on qubits, if we apply a rotation gate to the st qubit, it will be equivalent in matrix form to the operation . Hence, putting single rotation gates on st, nd, …, th qubits generates an by matrix. Furthermore, by controlling each operation (or equivalently single rotation gates, s) uniformly by the upper qubits in the circuit (see the uniformly controlled rotation gates located after the block operations in Fig.4), we can generate such separate blocks and the following final matrix:


Since the resulting rows in each block are unit vectors and have the same ratio as the row elements of , they are equal to the corresponding rows of . (The final normalization constants become equal to 1.)

For the general case, if the initial blocks are operating on the last qubits, we need to use uniformly controlled rotation gates on each main qubit (excluding the last qubits) in order to recursively combine small blocks. At the end, we have blocks whose leading rows are the rows of as shown in Eq.( 11 ).

Input Modification () : Modification of the input as with the normalization constant allows us to simulate by using in Eq.(11) on the chosen states:


This input with can be produced by applying the Hadamard gates to all ancilla qubits at the beginning of the circuit.

Consequently, the general circuit design shown in Fig.4 is obtained which is able to simulate any real unitary matrix. For more explicit matrix forms and illustrative details, please refer to Appendix A and Appendix A.2.

Figure 4: The second circuit with 4 by 4 initial blocks: The differently controlled quantum gates in the networks, after the blocks, combine small blocks and build the by blocks at the end. The initial Hadamards are for the modification of the input. The blocks are for the formation step.

Iv Complexity Analysis of The Circuits

In the cases of classical and quantum complexities of the circuits explained above, it is easy to see that they depend on mostly the costs of uniformly controlled networks such as the one in Fig.(a)a. Such a network controlled by qubits can be decomposed in terms of CNOT gates and single rotation gatesmottonen (). For instance, the circuit as illustrated for in Fig.(a)a can be decomposed as in Fig.(b)b.

Figure 5: (a) A gray-coded multi-control network. (b) The decomposition of the gray-coded network in (a) into CNOT and single quantum gates.

The angle values in the decomposed circuit are found to be the solution of the system of the linear equation :


where k is the number of control qubits in the network, and the entries of are defined as:


in which the power term is found by taking the dot product of the standard binary code of the index , , and the binary representation of th gray coded integer, . Since is a column permuted version of the Hadamard matrix, we see that is unitary. Thus, , and the new angle values in the decomposed circuit are the result of the mere matrix vector multiplicationmottonen ():


iv.1 The complexity of the first circuit design

iv.1.1 The Classical Complexity

In the first circuit diagram (see Fig.3), since there is only one such network, we need to multiply the matrix by the vector of dimension constructed by taking the arc-cosines of every element of . Hence, the classical complexity for the first circuit is . However, since is the permuted version of the Hadamard matrix, by using the fast Hadamard transform Fino (), which requires computations for the transform of a vector by the Hadamard matrix, this can be achieved in:


iv.1.2 The quantum Complexity

The quantum complexity of the circuit is the number of gates required for the decomposition of the network, the combination of the blocks and the input modification: CNOT, single rotation, Hadamard, and SWAP gates.

iv.2 The complexity of the second circuit

Figure 6: The circuit in (a) with 4 by 4 initial blocks can be represented as in (b) by using the circuit given in Fig.(7). Without changing the order of the gates having the same control state, the gates can be moved to form uniformly controlled networks as in (c): If a gate has the same angle value for all control states such as the control gates in the circuit, they are equal to a single gate (in the case of gates in the circuit, only one CNOT is required).

The classical and the quantum complexities for the second circuit are determined by the number of networks which are formed by putting the quantum gates in blocks controlled uniformly together as shown in Fig.6 and by the combination steps. Since the quantum gates in different blocks with the same angles operate for every case of the control qubits, putting them together do not produce networks. Instead, they need to be applied only once such as the controlled gates shown in Fig.(c)c. Hence, if the initial blocks of by (operating on qubits) include different quantum gates (the type of the gates are the same, but each requires different angles in different blocks such as and in Fig.6), these blocks together produce gray coded networks controlled by qubits.

In addition, in the combination step, we use binary coded networks on each main qubit excluding the last qubits to produce by blocks. Thus, we will also have gray coded networks for the combination step for which the numbers of control qubits go down by one from one combination step to another(or from one gray-coded network to another). The classical and the quantum complexity will be determined mostly by the decompositions of these networks.

iv.2.1 Classical Complexity

As mentioned above, in the formation step, the combination of decomposed block circuits together form gray coded networks for different gate as represented for two-qubit blocks in Fig.6. Hence, to find the decompositions of these networks as in Fig.(b)b by the formula given in Eq.(15), number of matrix-vector multiplications are needed: The dimensions of the matrices are and the dimensions of the vectors are . Using the fast Hadamard transform, the complexity for this part is found to be instead of by the naive matrix vector multiplication.

Furthermore, the combination step is the summation of the computations done for finding the angles of gray coded networks (remember that the number of control qubits decreases by one from one network to another). This is equal to by the naive matrix vector multiplication. By the fast Hadamard transform, the complexity of the combination step is as follows:


Thus, while the total complexity by the naive multiplication is


by the fast Hadamard transform, it is:


iv.2.2 The Quantum Complexity

In terms of the quantum complexity, the analysis follows the same structure: as mentioned, different gates in the blocks on qubits create networks controlled by qubits. The decomposition of these networks requires CNOT and the same number of single gates.

Since combinations ( network) are necessary, the complexity of the combination step is the summation of terms: .

Then the total CNOT complexity reads as:


where represents the common gates in each block that needs to be run only once.

Example: As an example, the complexity of a general 4 by 4 block circuit can be found as follows: By using the Schmidt decompositionKaye (), any 1 by 4 unit vector can be decomposed as: Since and composed of and vectors are 2 by 2 unitary matrices, these matrices (with the elements ( and for , and and for ) and the coefficients satisfying can be considered as the rotation gates. For the coefficients, and are the cosine and the sine values of a rotation gate ( and ). The resulting decomposition becomes equal to the following:


which requires three rotation gates in general. The circuit given in Fig.7 forms any as the leading row of its 4 by 4 matrix.

Figure 7: Quantum circuit which is found by following the Schmidt decomposition and can generate any vector of dimension 4 as the first row of its matrix representation.

Therefore, taking this circuit to implement the blocks in Fig.4 gives , and ; hence the CNOT complexity of the whole circuit in Fig.4 reads as Also note that if the blocks in the circuit shown in Fig.4 were of dimension 2 by 2, then the complexity would be .

iv.3 Comparison with the Non-Programmable Circuit Designs

The reported non-general circuit decompositions have the CNOT complexities ranging from to the most efficient one . The proven lower bound for the CNOT complexity is without using any auxiliary qubits vivek (). Even though the circuit designs given in this paper are general and fixed size for any operator, their complexities are greater by roughly a factor of 2 compared to those nonprogrammable circuits. In addition, if we can make less than or equal to , then we can also go below the lower bound. This is likely to happen because the common quantum gates in the blocks (as two CNOTs in 4 by 4 blocks) do not affect the upper bound of the complexity. Hence, by benefiting from this property, the lower bound complexity may be reduced with the use of higher Hilbert spaces.

V Discussion and Conclusion

v.1 Programmable Quantum Chips

The circuit designs given here are independent of the type of operator; hence they can be used to design general purpose quantum processors and quantum chips in which the angles are set by a preprocessing unit. They can also help in the design of possible quantum gate arraysNielsen (). In addition, because the circuit designs are highly dependent on the matrix elements, for the application specific circuits aimed to implement particular types of systems, any level of sparsity in the system may reduce the number of gates significantly in the general design; hence, more efficient quantum chips can be built for particular uses. For instance, if half of each of the row elements are zero in the given matrix, considering the first approach, the blocks at the end of the combination steps can be made to have the dimension . Hence, this will lead the circuit to require fewer combination steps (the number of qubits in ancilla is reduced by one), which lowers the both classical and CNOT complexities and makes any possible fabrication easier.

v.2 Finding Angles

In the case of finding the angle values on classical computers for a given unitary operator, the process can be parallelized conveniently to find the angles. For instance, the distribution of each row to the different cores may be one way of parallelization of the method. This can be further improved and designed in terms of more small blocks. And so the computation time to generate angles for both circuits can be very fast.

The combination procedure described for both circuit designing processes can be further improved to combine circuits for different unitary operations by considering them as initial blocks. One of the individual blocks used to generate a row of the given matrix can also be used as the state preparation circuit (for instance Fig.7) for an arbitrary circuit. Furthermore, the circuits generated by the first approach have high resemblance to the qubus quantum computerKatherine (). Similar ideas can be used to implement circuit design techniques for this type of quantum computers as well.

v.3 Complex Cases

It is important to note that in this paper, even though real matrices are considered, it is straightforward to implement any complex case as well by considering each rotation gate as also being able to produce any complex element of a unitary matrix in the first circuit design. This may require more than one simple rotation gate, but it shall not increase the upper bound of the quantum complexity. However, the modification for the second circuit may not be as simple as for the first one: this may require additional gates during the combination and formation steps.

v.4 Simulation of Molecular Hamiltonians

The exponential growth of computational cost with the number of atoms is a huge computational challenge for the exact quantum chemistry calculations. Even for a simple molecule like methanol, using only the 6-31G** basis for the valence electrons, there are 50 orbitals. The 18 valence electrons can be distributed in these orbitals in any way that satisfies the Pauli exclusion principle. This leads to about possible configurations making an exact or Full Configuration Interaction (FCI) calculation almost impossible on classical computersDaskin2 (). However, it has been shown that a quantum computer can be used to estimate the ground and excited state energies of molecules efficientlyDaskin2 (); Abrams (); Alan-Science (); Veis (); Hefeng (); Alan2 (); Papageorgiou (); Lanyon2 (); Kassal (); Veis2 (); Kassal2 (). For the simulation of a quantum system, it is necessary to find an equivalent quantum circuit to the unitary propagator of the Hamiltonian representing that system. The molecular electronic Hamiltonian, in the Born-Oppenheimer approximation, is described in the second quantization form as Daskin2 (); Lanyon (); Alan2 ():


where the matrix elements and are the set of one- and two-electron integrals, and and are the spinless fermionic annihilation and creation operators. Let the set of single-particle spatial functions constitute the molecular orbitals and the set of spin orbitals be defined with and the set of space-spin coordinates where is a spin function. The one-electron integral is defined asDaskin2 ():


and the two electron integral is:


where is the distance between the nucleus and the electron, is the distance between electrons, is the Laplacian of the electron spatial coordinates, and is a selected single-particle basis: , , , and .

To describe the hydrogen molecule in minimal basis which is the minimum number of spatial functions required to describe the system, one spatial function is needed per atom denoted and . The molecular spatial-orbitals are defined by symmetry: and ; which correspond to four spinorbitals: and . The STO-3G basis is used to evaluate the spatial integrals of the Hamiltonian which is defined as , where since , and are simplified asDaskin2 (); Lanyon (); Alan2 ():




The spatial integral values evaluated for atomic distance , the Hamiltonian matrix found as a 16 by 16 matrix Daskin2 (), so 4 qubits are required to implement the unitary propagator of this Hamiltonian which is found from by setting . (see the note 111For the matrix exponentiation, the MATLAB expm function which uses the Pade approximation with scaling and squaring is usedDaskin2 ()).

The accuracy of the circuit design for the unitary propagator also determines the accuracy of the simulation. The generation of quantum circuits by using matrix decomposition techniques or global optimization methodsDaskin () (as done for water and hydrogen molecules in ref.Daskin2 ()) requires searching a huge complex space and simulation of the unitary matrices of quantum systems on classical computers. For large matrices, this hinders the efficiency, and hence, the accuracy of the circuits. Since the angles for the rotation gates in our circuits are determined from the matrix elements directly (for instance in the first design, Fig.3), we only take the arcosine of the values, and generating these angles requires only a few computations; the accuracy and the efficiency of the circuits are always high. This helps to get very accurate circuit designs for the simulation of quantum systems. For instance, for the 16 by 16 unitary propagator of hydrogen molecule given in ref.Daskin2 (), nine qubits are required in the circuit scheme given in Fig.3. Since the unitary propagator is highly sparse and has only 19 nonzero elements, most of the uniformly controlled gates in the circuit will be identity except 19 of them. Hence, in AppendixB we have shown how to reduce the number of qubits to 6 qubits, Fig.8. We give the rotation values for the gates in Table 1. Therefore, since our circuit designs have fixed designs, using different basis sets or parameters to compute the Hamiltonian will not change the circuit design and the accuracy of it.

In summary, we present general programmable quantum circuits which can simulate any given by real matrix. Because of the structure of the circuits, they can be used to fabricate specific or general purpose quantum chips and processors. Since the circuit designs are highly dependent on the matrix elements; for the application specific circuits aimed to implement particular type of systems, any level of sparsity in the system may reduce the number of gates significantly. In addition, we show that the generation of circuits with the complexity less than the lower bound is possible by making and increasing in the given complexity.

Vi Acknowledgments

This work is supported by the NSF Centers for Chemical Innovation: Quantum Information for Quantum Chemistry, CHE-1037992.

Appendix A The explicit illustration of the steps

Here, we detail the implementation of the input modification, the formation (), and the combination () steps. A sketch of the matrix format of the operations can be found in Eq.(29) - for the one-qubit case in the first circuit design - and Eq.(34) and Eq.(35) - for the two-qubit case in the second circuit design; here blanks denote zeros and dots denote matrix parts of no interest for the final operation.

a.1 First circuit design

Starting with an arbitrary input, , and the following arbitrary unitary matrix:


the first method requires qubits for the simulation with the input:


The followings represent the formation matrix, , the matrix after the combination step, and the modified input, :


For illustration purposes, below we also present full forms of some of the operators and the output vector for the same case:

The full form of the resulting matrix from the formation step is as follows:


The combination matrix and the matrix for input modification are defined as:


For the initial input as in Eq.(28), the final output state becomes:


Clearly the normalized states and simulate the original given system.

a.2 Second circuit design

For the same case, since the second circuit design initially works at least a pair of matrix elements, it will create the unitary at the initial step. There will be no need for the combination step. Hence, the output will be simulated on the states and . For two qubit system below, the simulation goes as follows:


In the formation step, if we use 4 by 4 blocks as shown in Fig.4, there will be no need for the combination step since we will have already formed the rows of at the formation step. However, if we use 2 by 2 initial blocks, we need to use one rotation gate for each pair of the elements, then the combination step. Thus, at the end of the formation step, we get the following matrix:


where s are the normalization constants. After the sequential combination steps and the modification on the input, we get the following matrix and the modified input:


The final state is equivalent to . In , the states and are the respective states that simulate the original given unitary matrix.

Appendix B Explicit Circuit for the unitary propagator of the Hydrogen Molecule

As mentioned, the unitary matrix, , for the hydrogen molecule has 19 nonzero elements, 15 of them located at the diagonal. Since the unitary is 16 by 16 we need 4 main and 5 ancilla qubits for the first circuit design given in Fig.3. And the uniformly controlled rotation gates in the formation steps are the gates followed by gates where we use identity for the zero elements. However, we can benefit from the sparsity of the matrix and reduce the number of ancilla to 2 qubits instead of 5: The non diagonal matrix elements are located at and , where are the row and column indices. We apply a permutation matrix, , to reduce the bandwidth of the matrix.