# Variational Quantum Singular Value Decomposition

## Abstract

Singular value decomposition is central to many problems in engineering and scientific fields. Several quantum algorithms have been proposed to determine the singular values and their associated singular vectors of a given matrix. Although these algorithms are promising, the required quantum subroutines and resources are too costly on near-term quantum devices. In this work, we propose a variational quantum algorithm for singular value decomposition (VQSVD). By exploiting the variational principles for singular values and the Ky Fan Theorem, we design a novel loss function such that two quantum neural networks (or parameterized quantum circuits) could be trained to learn the singular vectors and output the corresponding singular values. Furthermore, we conduct numerical simulations of VQSVD for random matrices as well as its applications in image compression of handwritten digits. Finally, discuss the applications of our algorithm in recommendation systems and polar decomposition. Our work explores new avenues for quantum information processing beyond the conventional protocols that only works for Hermitian data, and reveals the capability of matrix decomposition on near-term quantum devices.

[skipabove=7pt, skipbelow=7pt, backgroundcolor=darkblue!15, innerleftmargin=5pt, innerrightmargin=5pt, innertopmargin=5pt, leftmargin=0cm, rightmargin=0cm, innerbottommargin=5pt, linewidth=1pt]tBox \newmdenv[skipabove=7pt, skipbelow=7pt, backgroundcolor=blue2!25, innerleftmargin=5pt, innerrightmargin=5pt, innertopmargin=5pt, leftmargin=0cm, rightmargin=0cm, innerbottommargin=5pt, linewidth=1pt]dBox \newmdenv[skipabove=7pt, skipbelow=7pt, backgroundcolor=darkkblue!15, innerleftmargin=5pt, innerrightmargin=5pt, innertopmargin=5pt, leftmargin=0cm, rightmargin=0cm, innerbottommargin=5pt, linewidth=1pt]sBox

## I Introduction

Matrix decompositions are integral parts of many algorithms in optimization Boyd and Vandenberghe (2004), machine learning Murphy (2012), and recommendation systems Koren et al. (2009). One crucial approach is the singular value decomposition (SVD). Mathematical applications of the SVD include computing the pseudoinverse, matrix approximation, and estimating the range and null space of a matrix. SVD has also been successfully applied to many areas of science and engineering industry, such as data compression, noise reduction, and image processing. The goal of SVD is to decompose a square matrix to with diagonal matrix and unitaries and , where denotes the rank of matrix .

Quantum computing is believed to deliver new technology to speed up computation, and it already promises speedups for integer factoring Shor (1997) and database search Grover (1996) in theory. Enormous efforts have been made in exploring the possibility of using quantum resources to speed up other important tasks, including linear system solvers Harrow et al. (2009); Clader et al. (2013); Childs et al. (2017); Wossnig et al. (2018); Subasi et al. (2018), convex optimizations Brandao and Svore (2017); Chakrabarti et al. (2020); Brandão et al. (2019); van Apeldoorn et al. (2020), and machine learning Biamonte et al. (2017); Schuld and Petruccione (2018); Ciliberto et al. (2018). Quantum algorithms for SVD have been proposed in Kerenidis and Prakash (2016); Rebentrost et al. (2018), which leads to applications in solving linear systems of equations Wossnig et al. (2018) and developing quantum recommendation systems Kerenidis and Prakash (2016). However, these algorithms above are too costly to be convincingly validated for near-term quantum devices, which only support a restricted number of physical qubits and limited gate fidelity.

Hence, an important direction is to find useful algorithms that could work on noisy intermediate-scale quantum (NISQ) devices Preskill (2018). The leading strategy to solve various problems using NISQ devices are called variational quantum algorithms McClean et al. (2016). These algorithms can be implemented on shallow-depth quantum circuits that depend on external parameters (e.g., angle in rotation gates ), which are also known as parameterized quantum circuits or quantum neural networks (QNNs). These parameters will be optimized externally by a classical computer with respect to certain loss functions. Various variational algorithms using QNNs have been proposed for Hamiltonian ground and excited states preparation Peruzzo et al. (2014); McClean et al. (2017); Higgott et al. (2019); Barkoutsos et al. (2020); Nakanishi et al. (2019a); Ostaszewski et al. (2019); Nakanishi et al. (2019b), quantum state fidelity estimation Cerezo et al. (2019), quantum Gibbs state preparation Wu and Hsieh (2019); Chowdhury et al. (2020); Wang et al. (2020), quantum compiling Khatri et al. (2019); Heya et al. (2018); Jones and Benjamin (2018); Sharma et al. (2020a) etc. Furthermore, unlike the strong need of error correction in fault-tolerant quantum computation, noise in shallow quantum circuits can be suppressed via error mitigation Temme et al. (2017); McArdle et al. (2019); Strikis et al. (2020); Kandala et al. (2019), indicating the feasibility of quantum computing with NISQ devices.

In this paper, we formulate the task of SVD as an optimization problem and derive a variational quantum algorithm for singular value decomposition (VQSVD) that can be implemented on near-term quantum computers. The core idea is to construct a novel loss function inspired by the variational principles and properties of singular values. We theoretically show that the optimized quantum neural networks based on this loss function could learn the singular vectors of a given matrix. That is, we could train two quantum neural networks and to learn the singular vectors of a matrix in the sense that , where the diagonal matrix provides us the singular values. Our approach generalizes the conventional methods of Hamiltonian diagonalization Larose et al. (2019); Nakanishi et al. (2019a) to a non-Hermitian regime, extending the capabilities of matrix decomposition on near-term quantum computers. As a proof of principle, we conduct numerical simulations to estimate the SVD of random matrices. Furthermore, we explore the possibility of applying VQSVD to compress images of size pixel, including the famous MNIST dataset. Finally, we showcase the applications of VQSVD in recommendation systems and polar decomposition.

## Ii Main results

### ii.1 Variational quantum singular value decomposition

In this section, we present a variational quantum algorithm for singular value decomposition of matrices, and it can be naturally generalized for complex matrices. For given matrix , there exists a decomposition of the form , where are unitary operators and is a diagonal matrix with positive entries and is the rank of . Alternatively, we write , where , , and are the sets of left and right orthonormal singular vectors, and singular values of , respectively.

A vital issue in NISQ algorithm is to choose a suitable loss function. A desirable loss function here should be able to output the target singular values and vectors after the optimization and in particular should be implementable on near-term devices. As a key step, we design such a desirable loss function for quantum singular value decomposition (cf. Section II.2).

The input of our VQSVD algorithm is a decomposition of the matrix into a linear combination of unitaries of the form with real numbers . For example, we could assume that can be decomposed into a linear combination of Pauli terms. After taking in the inputs, our VQSVD algorithm enters a hybrid quantum-classical optimization loop to train the parameters and in the parameterized quantum circuits and via a designed loss (cf. Section II.2). This loss function can be computed on quantum computers via the Hadamard tests. We then feeds the value of the loss function or its gradients (in gradient-based optimization) to a classical computer, which adjusts the parameters and for the next round of the loop. The goal is to find the global minimum of , i.e., .

In practice, one will need to set some termination condition (e.g., convergence tolerance) on the optimization loop. After the hybrid optimization, one will obtain values and optimal parameters and . The outputs approximate the singular values of , and approximate singular vectors of are obtained by inputting optimal parameters and into the parameterized circuits and in VQSVD and then applying to the orthonormal vectors for all . The detailed VQSVD algorithm is included in Algorithm 1. A schematic diagram is shown in Fig. 1.

### ii.2 Loss function

In this section, we provide more details and intuitions of the loss function in VQSVD. The key idea is to exploit the variational principles in matrix computation, which have great importance in analysis for error bounds of matrix analysis . In particular, the singular values satisfy a subtler variational property that incorporates both left and right singular vectors at the same time. For a given matrix , the largest singular value of can be characterized by

(1) |

where is the set of pure states (normalized vectors) and Re means taking the real part. Moreover, by denoting the optimal singular vectors as , the remaining singular values () can be deduced using similar methods by restricting the unit vectors to be orthogonal to previous singular value vectors.

For a given matrix , the largest singular value of can be characterized by

(2) |

where is the set of pure states (normalized vectors) and means to take the real part. Moreover, by denoting the optimal singular vectors as , the remaining singular values () can be deduced as follows

(3) |

For a given matrix , the loss function in our VQSVD algorithm is defined as

(5) |

where are real weights and is a set of orthonormal states.

###### Theorem 1

For a given matrix , the loss function is maximized if only if

(6) |

where are the largest singular values of and are orthonormal vectors. Moreover, and are left and right singular vectors, respectively.

###### Proof.

Let assume that are real numbers for simplicity, which could be achieved after the ideal maximization process.

Then we have

(7) | ||||

(8) | ||||

(9) | ||||

(10) |

Assume . The first inequality Eq. (8) follows due to the rearrangement inequality. The second inequality Eq. (10) follows due to property of singular values in Eq. (4). Note that the upper bound in Eq. (10) could be achieved if and only if for all , which is equivalent to

(11) |

Therefore, the loss function is maximized if and only if extracts the singular values for each from to . Further, due to the variational property of singular values in Eq. (2) and Eq. (3), we conclude that the quantum neural networks and learn the singular vectors in the sense that and extract the left and right singular vectors, respectively.

###### Proposition 2

The loss function can be estimated on near-term quantum devices.

For with unitaries and real numbers , the quantity can be decomposed to

(12) |

To estimate the quantity in Eq. (12), we could use quantum subroutines for estimating the quantity for a general unitary . One of these subroutines is to utilize the well-known Hadamard test Aharonov et al. (2009), which requires only one ancillary qubit, one copy of state , and one controlled unitary operation , and hence it can be experimentally implemented on near term quantum hardware. To be specific, Hadamard test (see Fig. 2) starts with state , where denotes the ancillary qubit, and denotes the work register, and then apply a controlled unitary , conditioned on the qubit in register , to prepare state , at last, apply a Hadamard gate on the ancillary qubit, and measure. If the measurement outcome is , then let the output be ; otherwise, let the output be , and the expectation of output is . As for the imaginary part , it also can be estimated via Hadamard test by starting with state .

### ii.3 As a generalization of the VQE family

In this subsection, we discuss the connection between our VQSVD algorithm and the variational quantum eigensolver (VQE) Peruzzo et al. (2014), which estimates the ground-state energy given a molecular electronic-structure Hamiltonian . This is a central problem to quantum chemistry as many properties of the molecule can be calculated once we determine the eigenstates of . Several related algorithms have been proposed to improve the spectrum learning capability of VQE (i.e., SSVQE Nakanishi et al. (2019b)) such that the properties of excited states can also be explored. We consider these algorithms as a unified family and promising applications for quantum chemistry.

For practical reasons, one has to discretize the aforementioned Hamiltonian into a series of Pauli tensor products to work on quantum devices. Many physical models naturally fit into this scheme including the quantum Ising model and the Heisenberg Model. In particular, if the input of VQSVD in Eq. (5) is a Hamiltonian in a discretized form (i.e., a linear combination of Pauli tensor products or equivalently a Hermitian matrix), VQSVD could be naturally applied to diagonalize this Hamiltonian and prepare the eigenstates. Therefore, VQSVD can be seen as a generalization of the VQE family that works not only for Hamiltonians, but also for more general non-Hermitian matrices.

## Iii Optimization of the loss function

Finding optimal parameters is a significant part of variational hybrid quantum-classical algorithms. Both gradient-based and gradient-free methods could be used to do the optimization. Here, we provide analytical details on the gradient-based approach, and we refer to Benedetti et al. (2019) for more information on the optimization subroutines in variational quantum algorithms. Reference about gradients estimation via quantum devices can be found in Ref. Mitarai et al. (2018); Schuld et al. (2019); Ostaszewski et al. (2019).

### iii.1 Gradients estimation

Here, we discuss the computation of the gradient of the global loss function by giving an analytical expression, and show that the gradients can be estimated by shifting parameters of the circuit used for evaluating the loss function. In Algorithm 1, to prepare states and , we apply gate sequences and in turn to state , where each gate and are either fixed, e.g., C-NOT gate, or parameterized, for all and . The parameterized gates and have forms and , respectively, where and are real parameters, and and are tensor products of Pauli matrices. Hence the gradient of loss function is dependent on parameters and the following proposition shows it can be computed on near-term quantum devices.

###### Proposition 3

The gradient of loss function can be estimated on near-term quantum devices and its explicitly form is defined as follows,

(13) |

Particularly, the derivatives of with respect to and can be computed using following formulas, respectively,

(14) | ||||

(15) |

where notations and denote parameters and .

###### Proof.

Notice that the partial derivatives of loss function are given by Eqs. (14) (15), and hence the gradient is computed by shifting the parameters of circuits that are used to evaluate the loss function. Since the loss function can be estimated on near term quantum devices, claimed in Proposition 2, thus, the gradient can be calculated on near-term quantum devices.

### iii.2 Barren plateaus

It has been shown that when employing hardware-efficient ansatzes (applying 2-qubit gates only to adjacent qubits) Kandala et al. (2017), the global cost functions like the expectation value of an observable are untrainable for large problem sizes since they exhibit exponentially vanishing gradients which makes the optimization landscape flat (i.e., barren plateaus McClean et al. (2018)). Such barren plateaus happens even when the ansatz is short depth Cerezo et al. (2020a) and the work Sharma et al. (2020b) showed that the barren plateau phenomenon could also arise in the architecture of dissipative quantum neural networks Poland et al. (2020); Bondarenko and Feldmann (2020); Beer et al. (2020).

Such barren plateau issues can be avoided by either implementing identity-block initialization strategy or employing the technique of local cost Cerezo et al. (2020a), where the local cost function is defined such that one firstly construct a local observable , calculating the expectation value with respect to each individual qubit rather than gathering information in a global sense, and finally adding up all the local contributions. The latter strategy has been verified to extend the trainable circuit depth up to where denotes qubit number. Recently, a new method of constructing adaptive Hamiltonians during the optimization loop has been proposed to resolve similar issues Cerezo et al. (2020b). Moreover, one may also employ gradient-free optimization methods Rio (2013) such as genetic algorithms, simulated annealing, and Nelder–Mead method Nelder and Mead (1965) to address the barren plateau challenges.

## Iv Numerical experiments and applications

Here we numerically simulate the VQSVD algorithm with randomly generated non-Hermitian real matrices as a proof of concept. Then, to demonstrate the possibility of scaling VQSVD to larger and more exciting applications, we simulate VQSVD to compress some standard gray-scale images. Fig. 3 shows the variational ansatz used. We choose the input states to be and the circuit depth is set to be . The parameters are initialized randomly from an uniform distribution . All simulations and optimization loop are implemented via Paddle Quantum Pad (2020) on the PaddlePaddle Deep Learning Platform Pad (); Ma et al. (2019).

### iv.1 Three-Qubit example

The VQSVD algorithm described in Section II.1 can find largest singular values of matrix at once. Here, we choose the weight to be positive integers . Fig. 4 shows the learning process. One can see that this approach successfully find the desired singular values and one can reconstruct the given matrix with inferred singular values and vectors . The distance between and the original matrix is taken to be the matrix 2-norm where are the matrix elements. As illustrated in Fig. 4, the distance decreases as more and more singular values being used.

### iv.2 Image compression

Next, we apply the VQSVD algorithm to compress a pixel handwritten digit image taken from the MNIST dataset with only (choose rank to be ) of its original information. By comparing with the classical SVD method, one can see that the digit is successfully reconstructed with some background noise. Notice the circuit structure demonstrated in Fig. 3 is ordinary and it is a not well-studied topic for circuit architecture. Although we don’t know how to load such images into quantum data at this stage, this simulation shows the scaling potential for the VQSVD algorithm. Future studies are needed to efficiently load classical information into NISQ devices.

## V Solution quality estimation

After obtaining the results (singular values and vectors) from Algorithm 1, it is natural and desirable to have a method to benchmark or verify these results. In this section, we further introduce a procedure for verification purposes. Particularly, we propose a variational quantum algorithm for estimating the sum of largest squared singular values, i.e., , of a matrix as a subroutine. In the following, we first define the error of the inferred singular values and singular vectors, and then show its estimation via a tool provided in Algorithm 3.

Let denote the inferred singular values of the matrix that are arranged in descending order, and let () denote the associated inferred left (right) singular vectors. The error of inferred singular values is defined as follows,

(16) |

where s are the exact singular values of matrix and also arranged in descending order. And the error of inferred singular vectors is defined below,

(17) |

where is a Hermitian of the form , and . It is worth pointing out that when inferred vectors approximate the eigenvectors of , i.e., , inferred singular vectors and approximate the singular vectors and respectively, and vice versa.

## Vi Applications

In this section, we give a brief discussion about the application of VQSVD in recommendation systems, and some other problems such as polar decomposition.

#### Recommendation systems

The goal of a recommendation system is to provide personalized items to customers based on their ratings of items, or other information. Usually, the preferences of customers are modeled by an matrix , where we assume there are customers and items. The element indicates the level that customer rates the item . In particular, finding suitable recommendations means to locate the large entries of rows.

One commonly used method for locating the large entries of rows is matrix reconstruction. Given an arbitrary matrix and integer , the goal is to output a small -rank matrix as an approximate matrix. In recommendation system, such an low-rank approximate matrix of the preference matrix is usually used to recommend items. In this sense, our VQSVD algorithm finds its application in recommendation system by learning such a small rank matrix. To be more specific, the information about its singular values and vectors can be obtained.

After obtaining the low rank matrix , the process of recommendation for customer proceeds as follows: project the th row vector of onto the space spanned by the right singular vectors of . Specifically, denote the right singular vectors of by , and projection operator by , then apply to the th normalized row vectors of . Specially, let denote the normalized vector after projection, given below:

(20) |

where the right singular vectors form an orthogonal basis, are the corresponding coefficients, and denotes the normalization factor, i.e., .

The state can be efficiently prepared via paramerized circuit as long as we can prepare the state . Particularly, there are coefficients in Eq. (20), determining . Further, we present a procedure for estimating these coefficients in Algorithm 2 (shown below).

Finally, measure in the computational basis and output the outcome.

#### Polar decomposition and other applications

Another important application of our VQSVD algorithm is finding the polar decomposition which has many applications in linear algebra Higham (1986); Gower et al. (2004). Recently, Lloyd et al Lloyd et al. (2020) proposed a quantum polar decomposition algorithm that performs in a deterministic way. For a given a complex matrix , the right polar decomposition is defined as follows,

(21) |

where is a unitary and is a positive-semidefinite Hermitian matrix. Particularly, suppose the singular value decomposition of is calculated through the VQSVD algorithm , then and . It is interesting to explore whether our VQSVD could be applied to polar decomposition.

For Hermitian matrices, the VQSVD algorithm can be applied as an eigensolver since singular value decomposition reduces to spectral decomposition in this case. Recently, some work has been proposed to extract eigenvalues and reconstruct eigenvectors of Hermitian matrices Nakanishi et al. (2019b), density operators LaRose et al. (2019). Our VQSVD algorithm may also be applied to Schmidt decomposition of bipartite quantum states since SVD is the core subroutine. We also note that a recent work Bravo-Prieto et al. Bravo-Prieto et al. (2019a) introduces a quantum singular value decomposer in the sense that it could output Schmidt coefficients and associated orthonormal vectors of a bipartite pure state.

## Vii Discussion and outlook

To summarize, we have presented a variational quantum algorithm for singular value decomposition with NISQ devices. One key contribution is to design a loss function that could be used to train the quantum neural networks to learn the left and right singular vectors and output the target singular values. Further improvements on the performance of our VQSVD algorithm may be done for sparse matrices together with more sophisticated ansatzes. We have numerically verified our algorithm for singular value decomposition of random matrices and image compression and proposed extensive applications in solving linear systems of equations. As a generalization of the family of VQE algorithms on Hamiltonian diagonalization (or spectral decomposition), the VQSVD algorithm may have potential applications in quantum chemistry, quantum machine learning. and quantum optimization in the NISQ era.

One future direction is to develop near-term quantum algorithms for non-negative matrix factorization Koren et al. (2009) which have various applications and broad interests in machine learning. See Du et al. (2018) as an example of the quantum solution. Another interesting direction is to develop near-term quantum algorithms for higher-order singular value decomposition Kolda and Bader (2009).

## Acknowledgements

We would like to thank Runyao Duan for helpful discussions. This work was done when Z. S. was a research intern at Baidu Research. Y. W. acknowledged support from the Baidu-UTS AI Meets Quantum project and the Australian Research Council (Grant No: DP180100691).

## Appendix A Proof details for Proposition 3

In this section, we show a full derivation on the gradients of the loss function in our VQSVD algorithm.

(22) |

The real part can be estimated via Hadamard test. Equivalently saying that,

(23) |

Consider the parametrized quantum circuit and . For convenience, denote . We can write the cost function as:

(24) |

Absorb most gates into state and matrix ,

(25) |

where , and . We assume is generated by a Pauli product and same for . The derivative with respect to a certain angle is

(26) | ||||

(27) |

Thus the gradient is calculated to be

(28) |

With the following property, we can absorb the Pauli product by an rotation on

(29) |

Plug it back and we get,

(30) |

This can be further simplified as

(31) |

Similarly, for we have

(32) |

This can be further simplified as

(33) |

One can see from the above derivation that calculating the analytical gradient of VQSVD algorithm simply means rotating a specific gate parameter (angle or ) by which can be easily implemented on near-term quantum devices.

## Appendix B Supplemental material for verification of the solution quality

In this section, we provide the necessary proofs in Sec. V and detailed discussions on variational quantum Frobenius norm estimation algorithm.

### b.1 Definitions

Recall that the error of inferred singular values is defined as follows,

(34) |

where are the exact singular values of matrix and also arranged in descending order. And the error of inferred singular vectors is defined below,

(35) |

where is a Hermitian of the form , and . The quantity quantifies the component of that is perpendicular to , which follows from .

It is worth pointing out that when inferred vectors approximate the eigenvectors , where , of , i.e., , inferred singular vectors and approximate the singular vectors and respectively, and vice versa. On the other hand, the error which is used to quantify the extent that vectors approximate eigenvector can quantify the extent that inferred vectors and approximate the singular vectors. Specifically, these distances have an equal relation, which is depicted in the following equation.

(36) |

where denotes the distance between vectors.

Here, we give the explicit forms of the distances . (36). The distances between and are defined in the following form,

(37) | ||||

(38) |

And the distances between and are defined below,

(39) | ||||

(40) |

Notice that both of the Right-Hand-Sides of Eqs. (38), (40) are equivalent to , and then the relation in Eq. (36) follows.

### b.2 Error analysis

After running VQSVD, it would be ideal if we can verify the quality of the outputs from VQSVD. For achieving this purpose, we show that these error and are upper bounded and give the explicit form of upper bounds. We present the derivation for upper bounds on errors in the following lemma.