Generalized Coherent States, Reproducing Kernels, and Quantum Support Vector Machines

Generalized Coherent States, Reproducing Kernels, and Quantum Support Vector Machines

Abstract

The support vector machine (SVM) is a popular machine learning classification method which produces a nonlinear decision boundary in a feature space by constructing linear boundaries in a transformed Hilbert space. It is well known that these algorithms when executed on a classical computer do not scale well with the size of the feature space both in terms of data points and dimensionality. One of the most significant limitations of classical algorithms using non-linear kernels is that the kernel function has to be evaluated for all pairs of input feature vectors which themselves may be of substantially high dimension. This can lead to computationally excessive times during training and during the prediction process for a new data point. Here, we propose using both canonical and generalized coherent states to rapidly calculate specific nonlinear kernel functions. The key link will be the reproducing kernel Hilbert space (RKHS) property for SVMs that naturally arise from canonical and generalized coherent states. Specifically, we discuss the fast evaluation of radial kernels through a positive operator valued measure (POVM) on a quantum optical system based on canonical coherent states. A similar procedure may also lead to fast calculations of kernels not usually used in classical algorithms such as those arising from generalized coherent states.

Rupak.Chatterjee@stevens.edu

Ting.Yu@stevens.edu

1Introduction

Machine learning is an area of mathematical statistics that uses computer driven statistical learning techniques to find patterns in known empirical data with the intent of applying these learned patterns on new data. When one analyzes input data with the goal of predicting or estimating a specific output, this is called supervised learning. An example would be using a financial firm’s accounting reports to determine it’s credit rating. A machine learning algorithm would be trained on the historical financial data of those firms with known credit ratings and then be used either to predict a new rating for a firm as new data arrives or to predict the rating of a firm with no past credit rating history. Typically, one has -features along with -observation inputs of these -dimensional feature vectors and a response called the supervising output which is measured on the same -observation inputs. When a new set of observations is given, one would like to predict or estimate the specific output. A support vector machine (SVM) [1] is a supervised learning method that has been applied successfully to a very wide range of binary classification problems (the supervised output is binary) such as text classification (noun or verb), medical risk (heart disease, no heart disease), homeland security (potential risk, not a risk), etc.

All machine learning models must necessarily deal with big data issues such as the storage and fast retrieval of large amounts (~ petabytes) of data which becomes increasingly difficult as the feature space and number of training observations of these features grow exponentially. Furthermore, many popular machine learning techniques use a multi-layered or non-linear approach that leads to highly complex calculations resulting in excessive runtime speeds. It will be shown that quantum support vector machines based on coherent states may begin to address these issues. The tensor product of coherent states allows an efficient representation of high dimensional feature spaces. Quantum state overlap measurements allow for the calculation of various non-linear SVM kernel functions indicating a substantial runtime improvement over classical algorithms.

A quantum version of a support vector classifier was given in [3] where the authors provided a qubit representation of feature space that was adaptable for simple polynomial type kernels. Yet, popular nonlinear kernels such as those of the exponential or hyperbolic tangent types are not easily amenable to their qubit representation. Here, we propose using both canonical and generalized coherent states to rapidly calculate these nonlinear kernel functions on high dimensional feature spaces. A recent review of quantum machine learning techniques may be found in [4].

2Support Vector Machines and Kernel Methods

A short review of SVMs and a recent quantum version of a linear SVM are described below. For more details, see [5].

Consider observations of a -dimensional feature vector . Suppose that these observations live in a -dimensional vector space and are separable into two classes by a dimensional hyperplane. For a 2-dimensional feature space, this hyperplane is simply a line as the solid black line in figure 1 whereas in 3 dimensions, this hyperplane will be a flat 2-dimensional plane.

Optimal Margin Classifier
Optimal Margin Classifier

Consider a -dimensional weight vector and a bias parameter . A dimensional hyperplane is defined by the equation

where is the normal vector to the hyperplane. Since our observations were assumed to be separable into two classes, each observation satisfies either

or

Defining the supervised output as

the two classes can be combined as

The goal here is to find the optimal margin hyperplane (the best choice of and ) based on the training observations such that when a new test data point needs to be analyzed, it will be correctly classified. This optimal margin hyperplane will then act as a decision boundary for all new data points.

The perpendicular distance from the separating hyperplane to a particular training point is given by

Each training point will have a specific margin distance for a given set of hyperplane parameters and . The optimality problem is to find the optimal hyperplane parameters that gives the single biggest margin distance for all training points simultaneously. In figure 1, the margin distance is the distance between the solid black line, the decision boundary , and the other two parallel lines that define the maximal margin region of width . It is often the case that this problem has no perfect solution. Therefore, rather than searching for a perfect decision boundary, one can look for a hyperplane that separates most of the training data where a few feature vectors fall in the margin region or on the wrong side of the decision boundary. These few observations are called support vectors and are assigned error terms (or slack variables) that are used to violate the margin width . The sum of these error terms are bounded, i.e. .

As is a normal vector to the hyperplane, one has . Therefore, maximizing is the same as minimizing the norm of ,

which may be rewritten as

where controls the effect of the error terms coming from the support vectors. If is very high, very few errors will be accepted by the optimizer. reduces to the completely separable case.

One may use a Lagrange multiplier method to solve this problem. The Lagrangian is given by

with the optimality conditions given by the following minimizations,

The respective optimality conditions are

Note that the Lagrange multipliers must be positive, i.e. . Furthermore, only those Lagrange multipliers that exactly satisfy

can have strictly nonzero values (the Krush-Kuhn-Tucker conditions, see [2]).

By substituting the solutions (11) into the Lagrangian (9), one obtains the dual Lagrangian

Maximizing the dual Lagrangian with constraints and is often an easier problem than the minimization (8) given above. In the dual problem, the Lagrange multipliers are solved for and provide the optimal solution via the relation . The solution for the optimal hyperplane that solves the binary classification problem is

It is useful to introduce the concept of a kernel that can be used to link support vector classifiers to the SVM technique below. A kernel is a type of similarity measure between two observations and in the simple linear case described here, it is given by

This polynomial type kernel may be seen as a square symmetric matrix with components given by (15). The dual Lagrangian and the optimal hyperplane may be written as

and

There are dot products to calculate in (17) similar to a square symmetric matrix where each dot products takes time to calculate. Finding the optimal takes time. The convergence to an optimality error of is through iterations as shown in [6]. Therefore classically, the dual problem takes a computational time of . This can be improved by a quantum approach given in [3] briefly described as follows.

Consider the following quantum representation of the observation feature vector using the base-2 bit string configuration introduced in [3] where

Using quantum parallelism, consider the following superposition state of all the training data joined via a tensor product to an auxiliary Hilbert space of computational basis states, , where

The density operator associated with this state is given by

Taking a partial trace over gives

From equation (18), it is clear that

and therefore

where

According to [3], by using this method to calculate the kernel matrix and furthermore turning the optimization problem into a quantum matrix inversion problem, the runtime for their quantum support vector classifier method becomes ~.

Consider the classification problem illustrated in figure 2. Visually, one sees a distinct boundary on the left side of the figure yet it is clearly not linear. Neither of the previous two methods will find a satisfactory decision hyperplane. However, a non-linear decision boundary appears to be possible. Consider a set of square integrable functions that map the finite dimensional feature space into an infinite dimensional space. The equation in this new space, analogous to (1), is given by

with an optimal classification hyperplane of

(the right hand side of figure 2). One would like to define an inner product kernel

such that (26) once again reduces to the form given in (17),

Clearly, specifying the kernel is sufficient to find the optimal classification boundary. These methods are often referred to as kernel methods [5] as one does not need the explicit mapping itself as the kernel alone defines the solution in equation (28) (“the kernel trick” or more formally the Representer Theorem [2]).

The form of (27) is possible using Mercer’s theorem from functional analysis [8] which comes down to a spectral decomposition of a continuous symmetric kernel using a eigenvalue-eigenfunction expansion (see section IV below),

where the special case of has been used in (27). Some popular kernels in the SVM literature are

Polynomial of degree d:

Radial Kernels (Gaussian):

Radial Kernels (Ornstein Uhlenbeck):

Sigmoidal (two-layer perceptron):

Support Vector Machine
Support Vector Machine

By using these types of kernels, one is looking for a hyperplane in a higher dimensional feature space. This decision boundary hyperplane in the higher dimensional space, as in the right side of figure 2, results in a nonlinear decision boundary in the original feature space.

Analogous to the evaluation issues of (17), one of the most significant limitations of classical algorithms using non-linear kernels is that the kernel function has to be evaluated for all pairs of input feature vectors and which themselves may be of substantially high dimension. This can lead to computationally excessive times during training and during the prediction process for new data points. In fact, classical methods such as sparse kernel methods [5] have been developed to deal with this issue but they are mostly heuristic methods used to increase runtime speeds. Rather, quantum methods analogous to the linear kernel methods described in (18)-(24) would be highly desirable and beneficial to using nonlinear SVMs in a fast and efficient manner. We now demonstrate how this is possible using coherent states.

3Radial Kernels from Canonical Coherent States

A feature space representation using canonical coherent states may be acheived as follows. Consider the -dimensional tensor product of canonical coherent states

and their overlap

In a feature space with and , this becomes

which is the popular radial kernel form of SVMs.

Coherent State Overlap Measurement
Coherent State Overlap Measurement

Several simple measurements [10] may produce this overlap function in computation times faster than the time needed on a classical computer. Consider figure 3 which has been adapted from [10]. The annihilation operator is related to one of two coherent states or . This unknown state enters the 50-50 beam splitter along with a vacuum field whereupon the output states undergo a coherent displacement given by

This displacement operator acting on the vacuum state (indicated by in figure 3) creates two coherent states and that have operators and related to the incident fields via a Hadamard gate,

For the joint measurement performed by the two detectors, one has the following four outcomes with projections given by

and

along with the corresponding POVM,

Following [10], consider the following explicit representation, using normal ordering, of the coherent state projection operator,

Oone can show that is given by,

which simplifies to

where we have defined the operator . Following in this manner, it can be verified that the POVM (44) may now be written as in [10],

and

Finally, the non zero probabilities of measuring one of the states or are given by

and

which provides the necessary evaluation of the radial kernel for our quantum SVM.

How has this method improved the classical runtime? Classically, there are exponential pairs to evaluate where each exponential of a dot product takes time to calculate. Finding the optimal takes time while the convergence to an optimality error of is through iterations [6] leading to a total classical computational time of . The POVM probabilities (52)-(54) converge after repeated measurements. For the case of , which is common in the big data arena, the calculation of the radial kernel via the POVM set-up becomes largely independent of (due to the tensor product Hilbert space representation of the -dimensional feature vector in the quantum domain). The classical time has been effectively reduced to . For data sets where , this is a substantial improvement.

The time coming from the pairs of feature vectors may be further reduced by generalizing this methodology to multiple coherent states. The two state experiment above was generalized to four coherent states in [11] where the POVM to unambiguously identify one of the possible non-orthogonal states (unambiguous state discrimination or USD) are

Note that this formula is only an extension of and above as neither or provides a USD result. Therefore, the authors of [11] add an inconclusive USD measurement to complete the POVM set. (55) has a natural extension for an arbitrary number of coherent states. From this generalization, we postulate that the pair calculation time may be reduced by the parallel measurement of several pairs of coherent states but a further study will be needed to get an estimate of the reduction in runtime.

Finally, the time coming from finding the optimal may also be reduced as follows. In [12], a least squares method is introduced to solve the quadratic programming problem of SVMs. Rebentrost et al. [3] take this least squares method and propose an approximate quantum least squares method for their polynomial based kernel SVM using a quantum matrix inversion algorithm. Their method is largely independent of the type of kernel and therefore the quantum radial kernel calculation described here may be combined with their method to substantially reduce the runtime of the classical quadratic programming problem ([3] reduce their polynomial kernel problem to ).

4Reproducing Kernel Hilbert Spaces and Mercer’s Theorem

As the structure of SVMs comes down to the kernel function, a more precise description of its structure will lead naturally to a relation with other types of coherent states rather than just the canonical ones used above. For more details, see [8].

Let be a Hilbert space of real square integrable functions defined on a set . A Mercer Kernel is symmetric kernel , with the following positive semi-definite property,

Mercer’s Theorem:

For a Mercer kernel , there exists a set of orthonormal functions

such that

Reproducing Kernel Hilbert Space

(RKHS): Let be a Hilbert space of real functions defined on a set . is called a reproducing kernel Hilbert space with an inner product if there exists a kernel function with the following properties,

(if is an index then may be seen as a function of ) and the reproducing property Property (59) can be used with Mercer type kernels to induce a RKHS. The complex version of these kernels are often referred to as Bergman kernels [13] which produce RKHS of square integrable holomorphic functions. All the standard kernels used in SVMs satisfy Mercer’s theorem [2] and may be used to create RKHSs.

It is also possible to create new kernels from existing ones. Suppose one has two kernels and . One can add and multiply these kernels to produce a new kernel , i.e.

or

5A View Towards Generalized Coherent States

In order for the quantum SVM method to go beyond radial kernels, one needs to analyze generalized coherent states and their relation to RKHSs. The generalization of canonical coherent states has historically proceeded along three (not necessarily equivalent) lines of thought which happen to result in equivalent definitions for canonical states. One generalization initiated by Barut and Girardello [14] follows the path of creating generalized coherent states as eigenstates of a specific operator from a Lie algebra. Another group theoretical generalization begun independently by Perelomov [15] and Gilmore [17] considers a generalized displacement operator acting on a vacuum state. Here, we wish to consider a third approach started by Klauder and Gazeau [18] based on the following definition.

Definition:

Coherent states are wave-packets that are superpositions of eigenstates of a self-adjoint operator and square integrable functions , such that , where the states are normalized with a resolution of identity given by .

This generalization has a natural structure of an underlying RKHS. A resolution of identity requirement leads to the existence of a POVM. Let be a self-adjoint operator with a discrete spectrum and normalized eigenstates that form an orthonormal basis in a separable complex Hilbert space ,

Consider a measure space and the Hilbert space of all square-integrable functions on the set (),

with orthonormal basis functions

Furthermore, assume that the eigenstates are in a one-to-one correspondence with these orthonormal basis functions. Generalized coherent states may be defined by [13]

with a normalization restriction of

The resolution of identity in is

For any , one can associate an element as

Note that there is a natural isomorphism between the Hilbert space and because of the one to one correspondence of basis functions explicitly given by the special case of (69),

Using the resolution of identity (68) in (69), one has

which may be written as

or

One may create a RKHS composed of elements given by (69) spanned by basis functions (70). By defining a kernel as

equation (73) becomes identical to the kernel reproducing property (60),

making a RKHS. For standard canonical coherent states, one has , , , , , , with the reproducing kernel given by . By using the kernel property (62), one has another kernel given by .

A few examples of well known generalized coherent states that can be derived using the above methodology are as follows. In [21], the authors use a 1+1 anti de-Sitter space constant negative curvature metric measure space and produce a reproducing kernel given by

Generalized coherent states using a basis of Pöschl-Teller states [21] may be shown to produce a modified Bessel function () kernel given by

Several authors [22] have investigated oscillating Hermite polynomial Gaussian type wave functions that lead to a reproducing kernel given by

where are Laguerre polynomials.

The nonlinear optical properties of confining Pöschl-Teller potentials have been studied in [25]. Experimental realizations of Laguerre type generalized coherent states using the behavior of a beamsplitter with a specific geometry are explored in [23]. They also appear in the realization of a quantum oscillator consisting of a trapped ion in [27].The realization of generalized coherent states have also been investigated in photonic lattices [28].

For future work, each of these experiments needs to be analyzed in a similar manner to that of section III in order to fully get an idea of the quantum runtime efficiency. These recent experimental realizations indicate that the quantum SVM approach to calculating reproducing kernels from generalized coherent states may provide a fast and viable alternative to classical computations.

6Conclusion

In this paper, the potential of using generalized coherent states as a calculational tool for quantum SVMs was demonstrated. The key connecting thread was the RKHS concept used in SVMs. Such reproducing kernels naturally arise in the quantum state overlap of canonical and generalized coherent states. It was shown that canonical coherent states reproduce the popular radial kernels of SVMs wherein POVM measurements of overlap functions substantially reduce the computational times of such kernels, especially in high dimensional feature spaces found in big data sets. The use of reproducing kernels not usually used in classical algorithms due to their complexity, such as those from anti-de Sitter space coherent states and Bessel function kernels from Pöschl-Teller coherent states are now conceivable using the coherent state driven quantum SVM approach. The realization of generalized coherent states via experiments in quantum optics indicate the near term feasibility of this approach.

References

  1. The Elements of Statistical Learning: Data Mining, Inference, and Prediction,
    T. Hastie, R. Tibshirani, and J. Friedman, Springer-Verlag, New York, 2009.
  2. Neural Networks and Learning Machines: 3rd Edition,
    S. Haykin, Prentice Hall New Jersey, 2009.
  3. Physical Review Letters 113(13), 130503 (2014).
    P. Rebentrost, M. Mohseni, and S. Lloyd,
  4. Quantum Machine Learning,
    J. Biamonte, P. Wittek, N. Pancotti, P. Rebentrost, N. Lloyd, and S. Lloyd, arXiv:1611.09347, (2016).
  5. Pattern Recognition and Machine Learning,
    C. Bishop, Springer, New York, 2006.
  6. SVM Optimization and Steepest-Descent Line Search,
    N. List and H. U. Simon, in Proceedings of the 22nd Annual Conference on Computational Learning Theory, 2009.
  7. The Annals of Statistics 36(3), 1171–1220 (2008).
    T. Hofmann, B. Schölkopf, and A. J. Smola,
  8. Reproducing Kernel Hilbert spaces in Probability and Statistics,
    A. Berlinet and C. Thomas-Agnan, Springer Science & Business Media, 2001.
  9. Methods of Mathematical Physics, volume I and II,
    R. Courant and D. Hilbert, Wiley Interscience, 1970.
  10. Physics Letters. A 253(1-2), 12–15 (1999).
    K. Banaszek,
  11. Nature communications 4 (2013).
    F. Becerra, J. Fan, and A. Migdall,
  12. Neural Processing Letters 9(3), 293–300 (1999).
    J. A. Suykens and J. Vandewalle,
  13. Coherent States, Wavelets and their Generalizations: 2nd Edition,
    S. T. Ali, J.-P. Antoine, J.-P. Gazeau, et al., Springer, New York, 2014.
  14. Communications in Mathematical Physics 21(1), 41–55 (1971).
    A. Barut and L. Girardello,
  15. Communications in Mathematical Physics 26(3), 222–236 (1972).
    A. Perelomov,
  16. Generalized Coherent States and their Applications,
    A. Perelomov, Springer-Verlag, 1986.
  17. Annals of Physics 74(2), 391–463 (1972).
    R. Gilmore,
  18. Journal of Physics A: Mathematical and General 29(12), L293 (1996).
    J. R. Klauder,
  19. Journal of Physics A: Mathematical and General 32(1), 123 (1999).
    J. P. Gazeau and J. R. Klauder,
  20. in Conférence Moshé Flato 1999, pages 131–144, Springer, 2000.
    J.-P. Gazeau and P. Monceau,
  21. Coherent States in Quantum Physics,
    J.-P. Gazeau, Wiley, 2009.
  22. Lettere Al Nuovo Cimento (1971–1985) 22(9), 376–378 (1978).
    M. Marhic,
  23. American Journal of Physics 82(8), 742–748 (2014).
    T. Philbin,
  24. Physical Review 95(5), 1115 (1954).
    I. Senitzky,
  25. Physical Review B 72(11), 115340 (2005).
    H. Yildirim and M. Tomak,
  26. Physics Letters A 376(23), 1875–1880 (2012).
    S. Şakiroğlu, F. Ungan, U. Yesilgul, M. Mora-Ramos, C. Duque, E. Kasapoglu, H. Sari, and I. Sökmen,
  27. Journal of Physics B: Atomic, Molecular and Optical Physics 46(10), 104008 (2013).
    F. Ziesel, T. Ruster, A. Walther, H. Kaufmann, S. Dawkins, K. Singer, F. Schmidt-Kaler, and U. Poschinger,
  28. Optics Communications 284(7), 1833–1836 (2011).
    A. Perez-Leija, H. Moya-Cessa, F. Soto-Eguibar, O. Aguilar-Loreto, and D. N. Christodoulides,
  29. Physical Review A 93(5), 053815 (2016).
    A. Perez-Leija, A. Szameit, I. Ramos-Prieto, H. Moya-Cessa, D.N. Christodoulides, Demetrios N,
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
12366
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description