A quantum speedup in machine learning: Finding a -bit Boolean function for a classification
We compare quantum and classical machines designed for learning an N-bit Boolean function in order to address how a quantum system improves the machine learning behavior. The machines of the two types consist of the same number of operations and control parameters, but only the quantum machines utilize the quantum coherence naturally induced by unitary operators. We show that quantum superposition enables quantum learning that is faster than classical learning by expanding the approximate solution regions, i.e., the acceptable regions. This is also demonstrated by means of numerical simulations with a standard feedback model, namely random search, and a practical model, namely differential evolution.
Keywords: Quantum Information, Quantum Learning, Machine Learning
Over the past decades, quantum physics has brought remarkable innovations in the various fields of disciplines. For example, there are exponentially fast quantum algorithms compared to their classical counterparts [1, 2, 3]. The physical limit of measurement precision has been improved in quantum metrology [4, 5], and a plenty of protocols offering higher security have been proposed in quantum cryptography [6, 7]. These achievements are enabled by appropriate usage of quantum effects such as quantum superposition and quantum entanglement.
Another phenomenal science is machine learning which is a sub-field of artificial intelligence and one of the most advanced automatic control techniques. While learning is usually regarded as the character of humans or living things, machine learning enables a machine to learn a task . Machine learning has been attracting great attention with its novel ability to learn. On one hand, machine learning has been studied for the understanding the learning of a real biological system, in a theoretical manner. On the other hand, it is also expected that machine learning provides reliable control techniques in designing the complex systems in a practical manner .
Recently, hybridizing two scientific fields described above, quantum technology and machine learning, has received great interest [9, 10, 11, 12]. One question naturally arises: Can machine learning be improved by using favorable quantum effects? Several attempts to answer this question have been done in the past years, for example, quantum perceptrons , neural network [14, 15, 16], quantum-inspired evolutionary algorithm [17, 18]. Most recently, remarkable studies have been made [19, Bang:2013uj, 20, 21]. In Ref. , learning speedup of quantum machine was observed with less requirement of memory for a specific example, called -th root NOT. In Ref. [Bang:2013uj], a strategy to design a quantum algorithm was introduced, establishing the link between the learning speedup and the speedup of the found quantum algorithm. In Refs. [20, 21], the authors showed quantum speedup in the task of classifying large number of data. However, it is still unclear what and how quantum effects work in machine learning, particularly with the absence of fair comparison between classical and quantum machine.
In this work, we consider a binary classification problem as a learning task. Such classification can be realized to an -bit Boolean function that maps a set of -bit binary strings in into . The main work in this paper is to compare quantum machine with classical machine. These two machines are equivalent. The only differentiation is that the quantum machine can deal with quantum effects, whereas the classical machine cannot. The machines are analyzed in terms of acceptable region defined as a localized solution region of parameter space. In the analysis, it is shown that the quantum machine can learn faster due to the expanded acceptable region by quantum superposition. Such a quantum learning speedup is understood in terms of a expansion of the acceptable region. In order to make the analysis more explicit, we analyze further by using random search which is a standard model for the learning performance analysis . In such a primitive model, we validate the quantum speedup, showing that the overall number of iterations required to complete the learning is proportional to , with in classical machine, and in quantum machine. Here, is the size of the search space. Differential evolution is employed as a learning model, taking into account more realistic circumstance. By numerical simulations, we show that the quantum speedup is still observed even in such case.
2 Classical and quantum machines
Machine learning can be decomposed into two parts, machine and feedback. Machine performs various tasks depending on its internal parameters, and feedback adjusts the parameters of machine for machine to perform a required task called target. Learning is a process finding suitable parameters of machine, whereby machine is expected to generates desired results for a target 111 We consider the case of supervised learning that desired results of task are given to the machine. . This concept of machine learning has been widely adapted in the context of machine learning at the fundamental level .
In this work, we assign a machine a binary classification problem as a task, where machine will learn a target -bit Boolean function, defined as
where is represented as a -bit string of (). This function can be written by using Positive Polarity Reed-Muller expansion ,
where denotes modulo- addition, means a direct sum of the modulus, and Reed-Muller coefficients are either or . Here, is an index set whose elements are given in such a way; The number is then taken to be an element of only if is equal to when is written as a -bit string, . Thus, each set of corresponds to each of Boolean functions.
The Boolean function can be implemented by a reversible circuit as shown in Figure 1, where an additional bit channel, called work channel, and controlled operations are employed [25, 26]. A single-bit operation is placed on the work channel and controlled- operations are acted on the work channel when all the control bits, (), are . The input signal on the work channel is fixed to . The operation is given to be either identity (i.e., doing nothing) if or NOT (i.e., flipping an input bit to its complement bit) if . As an example, -bit Boolean function (i.e., ) has sets of Reed-Muller coefficients (, ), which determine all possible Boolean functions. Table 1 gives four possible -bit Boolean functions with Reed-Muller coefficients and corresponding operations.
With a reversible circuit model, we then define classical and quantum machines. Classical machine consists of classical channels and operations, and the Boolean function of classical machine is described as
with classical bits , , and . We suppose that Reed-Muller coefficients ’s are probabilistically determined by internal parameters ’s, which implies performs identity and NOT operation with probabilities and , respectively. This probabilistic operations are primarily intended for a fair comparison with the quantum machine that naturally employs a probabilistic operation. Now, we construct quantum machine by setting only work channel to be quantum. The input channels are left in classical, as the input information is classical in our work. Thus, the Boolean function of quantum machine is described as
where the signal on the work channel is encoded into a qubit state. The classical probabilistic operations are also necessarily replaced to unitary operators,
where is the probability of performing identity, i.e., , , and is that of performing NOT, i.e., , ). Note that the relative phases are free parameters suitably chosen before the learning. The feedback adjusts only ’s, controllable both in classical and quantum experimental setups [27, 28].
These classical and quantum machines are equivalent each other. They have the same structures of circuit and the exactly same number of control parameters, ’s. Moreover, single classical operation and the quantum operator cannot be discriminated by measuring distribution of outcomes for the same input and ’s.
3 Acceptable region
A target Boolean function is represented by a point, , in -dimensional search space spanned by the probabilities, ’s. For example, four possible learning targets, (), of -bit Boolean function correspond to four points on the search space; , , , and . Similarly, the machine behavior is also characterised as a point , i.e., the respective points lead to different probabilistic tasks that the machine performs. A learning is simply regarded as a process of moving to a given target point in the whole search space. It is however usually impractical (actually, impossible in a realistic circumstance) to locate exactly at the target point. Instead, it is feasible to find approximate solutions near to the exact target, i.e. the learning is expected to lead the point into a region near to the target point . We call such region acceptable region for the approximate target functions. As learning-time and convergence depend primarily on the size of the acceptable region, it is usually expected that larger acceptable region makes the learning faster . In this sense, we examine the acceptable regions of classical and quantum machines.
The acceptable region is defined as a set of points which guarantee the errors, , less than or equal to a tolerable value, . Here, is the figure of merit of machine performance, called task-fidelity, to quantify how well the machine perform a target function, defined by
where is a conditional probability of obtaining an output , given an input , and target probabilities is that of the target. For example, we have a target probabilities, for in Table 1, as
The term in Equation (6) corresponds to a closeness of the two probability distributions and for the given . The task fidelity, , increases as outputs get close to the required outputs; becomes unity only when the machine gives target for all , and otherwise, less than . The acceptable region can be seen as a set of probabilities, ’s, such that , and thus, higher guarantees a wider acceptable region for a given tolerance, .
As the simplest case, let us begin with target function 222Such constant function, , is one of trivial function, however, it is considerable for the machines to learn . of -bit Boolean function, whose task fidelity, , is reduced as
which is common in both classical and quantum machines. In the classical machine, Equation (8) is evaluated as
adopting the conditional probabilities given by
where (). In the quantum machine, the conditional probabilities slightly differ from due to the superposition between and . The conditional probabilities are given as
where , and is difference of the phases in the two unitaries and . Thus, the task-fidelity of quantum machine is evaluated as
where the additional term of is apparently the result of quantum superposition. From the result of Equation (11), we can see that
provided that (). The phase plays an important role in helping quantum machine by constructive interference leading to . The task fidelities for other three targets are also listed in Table 2.
Note here that, for all cases of target function , can always be larger than by choosing appropriate free parameters and before the learning. Therefore, the quantum machine has wider acceptable regions than the classical machine for a given tolerance.
In Figure 2, the task-fidelity and the acceptable region for each machine are shown for the target when is chosen to maximize the difference between the two machines. We also found that the acceptable region of quantum machine is larger about times than that of classical machine.
The optimal phase condition to improve the task-fidelity, as in Equation (12), can be generalized to arbitrary -bit Boolean function (). We provide one of the conditions as
where, is th component of a solution point in dimensional search space (See A). This condition yields , so that the acceptable region of quantum machine can be wider than classical machine for arbitrary -bit Boolearn function.
4 Learning speedup by expanded acceptable region
This section is devoted to learning-time in machine learning. For a numerical simulation, we employ random search as a feedback, which has been often considered for studying learning performance rather than for any practical reasons . Random search runs as follow: First, all control parameters are randomly chosen, and then, task-fidelity is measured with the chosen ’s. These two steps are thought of as a single iteration of the procedure. The iterations are repeated until the condition is satisfied for a given . After a sufficient number of simulations is performed, we then calculate the mean iteration number defined as , where is the probability to complete learning at the th iterations. This mean iteration number, , can be used to quantify the learning-time, and the results of numerical simulations for are shown in table 3, where quantum learning is demonstrated to be faster than classical learning. This is a direct result of the wider acceptable region of quantum machine as is inversely proportional to the size of acceptable region in random search; is given by substituting , where is equal to the ratio of the acceptable region to the whole space in random search. We demonstrate this by comparing the results of with the acceptable regions found by Monte-Carlo simulation in Table 3, and thereby we note the acceptable region is the main feature which directly influences learning-time in random search.
Also in Figure 3, the data for in Table 3 are well fitted to a function , implying that the size of the acceptable region is exponentially decreased as dimension of parameter space increases, i.e. . The fitting parameters are given as
Remarkable is that the exponent in quantum is much smaller than that in classical.
It follows from what has been shown that acceptable region is the main feature which directly influences learning-time in random search. We have proved that we can always prepare quantum machine which has larger acceptable region than classical one, in the previous chapter. Therefore, we finally conclude that the learning-time can be shorter in quantum than in classical case. The result of numerical simulation also support that quantum machine learns much faster, particularly in a large search space. We clarify again that such a quantum speedup is enabled by the quantum superposition, and appropriately arranged phases.
5 Applying differential evolution
We consider more practical learning model, taking into account a real circumstance. A general analysis of the learning efficiency is very complicated as too many factors are associated with the learning behavior. Furthermore, the most efficient learning algorithms tend to use the heuristic rules and are problem-specific [32, 33]. Nevertheless, it is usually believed that the acceptable region is a key factor of the learning efficiency in a heuristic manner . In this sense, we conjecture that the quantum machine offers the quantum speedup even in a practical learning method.
We apply differential evolution (DE) which is known as one of the most efficient learning methods for the global optimization . We start with sets of control parameter vectors , for , whose components are the control parameters of machine. In DE, these vectors, , are supposed to evolve by mating their components ’s with each other. Equation (6) is used as a criteria how well machines with fit to the target. This process is iterated until the task-fidelity reaches a certain level of accuracy (See reference  or [Bang:2013uj] for detailed method of the differential evolution).
We perform the numerical simulations by increasing from to . The results are averaged over realisations for and . The target function is a constant function, for all . Free parameters in differential evolution (e.g., crossover rate and differential weight) are chosen to achieve the best learning efficiency for classical machine333 In Figure 4(a), one may worry about the crossover point (for ) associated with validity of the quantum learning speedup for . However, the appearance of the crossover is due to the DE optimization with the free parameters. Note here that the free parameters are optimized the classical machine. The crossover can be removed by choosing the appropriate free parameters for each machine. . Nevertheless, we expect that the quantum machine still exhibits the quantum speedup, assisted by the quantum superposition, with the optimal phases in Equation (13).
We give the mean task-fidelity averaged over , in Figure 4(a). For both classical and quantum, the mean task-fidelities are increased close to , but quantum machine is much faster for all cases. We investigate learning-time as increasing the dimension of parameter space, as depicted in Figure 4(b). The data are well fitted to a presumable function , with , in classical machine, and , in quantum machine 444 Such polynomial result shows much improvement from the differential evolution, which is quite distinct from the case of random search which exhibits the exponential dependence. . We note that the quantum machine still exhibits the speedup, with the smaller and . Therefore, we expect that such quantum speedup can be achieved even in a real circumstance.
6 Summary and discussion
We investigated learning performance of two machines by considering the task of finding a -bit Boolean function which can be used in a binary classification problem. The two machines were equivalently designed to make the comparison of these two machine as convincing as possible. The critical difference between the two machines was that the operations in quantum machine are described by unitary operators to deal with the quantum superposition. The learning of the two machines were characterized in terms of acceptable region, the localized region of the parameter space including approximate solutions. We have found that the quantum machine has a wider acceptable region, induced by quantum superposition. We demonstrated simulation with a standard feedback method, random search, to show that the size of acceptable regions were inversely proportional to the learning-time. Here, it was also shown that the wider acceptable region make the learning faster; namely, the learning-time is proportional to , with in the classical learning and in the quantum machine. We then applied a practical learning method, differential evolution, to our main task, and observed the learning speedup of quantum machine.
Here, we would like to remind that the maximized learning speedup of the quantum machine is achieved by choosing the suitable phases as in Equation (13). From a practical perspective, one may consider that an additional task, such as finding the relative phases, is required to ensure remarkable performance of the quantum learning machine for other -bit Boolean function targets. Alternatively, such an issue can be overcome by synchronizing the relative phases with the control parameters in the quantum machine, still yielding the learning speedup (see B for details).
We expect that our work motivates researchers to study the role of various quantum effects in machine learning, and open up new possibilities to improve machine learning performance. It is still open whether the quantum machine can be improved more by using other quantum effects, such as quantum entanglement.
We acknowledge the financial support of the National Research Foundation of Korea (NRF) grant funded by the Korea government (MEST) (No. 2010-0018295 and No. 2010-0015059). We also thanks T. Ralph, M. Żukowski and H. J. Briegel for discussion and advice.
Appendix A Finding the optimal phase condition in Equation (13)
In deriving the above reduced form of Equation (15), we used that when is equal to the desired value for a given target , and otherwise . Equation (15) shows that the task fidelity is enlarged if for all are maximized.
To start, consider an ideal learning machine (either classical or quantum) that always generates the desired outcome results with perfect task-fidelity . From our analysis in Section 3, we can represent this machine as a point in -dimensional search space. In this sense, we call this ideal machine “solution machine”. We then consider a “near-solution machine” which is located on a point in the search space. More specifically, , where is Euclidean distance. Here we assume further that the search space is isotropic around so that the machines on the surface of the hyper sphere have the same task-fidelity. This assumption is physically reasonable for very small tolerance error. Thus, without loss of generality, we consider the near-solution machine corresponding to the point on the sphere , satisfying for all . Here, .
In the circumstance, of a classical near-solution machine is necessarily smaller than depending on . On the other hand, if we choose the optimal phases , can be 1 without any -dependence in quantum machine. To show this, let us first write the conditional probability in Equation (15) as
where is the index set whose elements are indices of the actually applied operators conditioned on the input . For example, if (i.e. in the binary representation), then we have because is always applied independently with the input, and the input signal activates (See Figure 1). Thus, . Based on the above description, we can generalize the calculations as
Here, Equation (16) becomes to 1 when or equivalently , because it is nothing but the solution machine. The basic idea is to find a condition that all terms of vanish even though is non-zero, i.e. the near-solution machine. Therefore, of the near-solution machine is mathematically equal to that of the solution machine. To do this, we consider the product of arbitrary two unitaries (), as
If we consider the near-solution machine, we can let and . We then calculate , for the given in , as
where , , and . In calculating Equation (19), we assumed a deterministic target, i.e. is to be either or , as it is usual in most tasks (but not necessarily). Here, the important thing is that we can vanish the term associated with , by letting
The above condition in Equation (20) can be applied for all . Thus, we provide here a generalized condition as
This is the optimal phase condition, as in Equation (13). We can check that this condition gives the maximum task-fidelity with (for all ).
Appendix B A practical version of quantum machine
The speedup introduced in this paper is enabled when quantum machine uses suitable phases. Accordingly, the suitable phases are pre-required for the fast learning. In a practical manner, the learning-time has to include complexity to get suitable phases which is not so easy to get. We introduce a practical quantum machine that does not require the effort in finding an optimal phases. To this end, we modify the unitary in Equation (5) by setting all the phases to , i.e., is written as
such that phases are getting closer to the optimized phases as the machine approaches to the solution point in parameter space during the learning, since the optimized phase condition is given by Equation (13). Thus, this guarantees wider acceptable regions than classical machine for any learning target.
Figure 5 (a) shows that the practical quantum machine has wider acceptable regions than classical machine for all -bit Boolean targets. Inside areas of solid and dashed lines represent acceptable regions of the practical quantum machine and classical machine, respectively. This supports that the practical quantum machine always learns faster than classical machine, while original quantum machine depends on target function.
We then obtain learning-time of the practical quantum machine in Figure 5 (b). The data are also well fitted to , with the fitting parameters and . Thus, in the practical quantum machine, whereas in the classical machine (See Equation (14)). The result shows that a considerable learning speedup is still achieved in this practical quantum machine, even though it takes little bit more time compared to original one available with the optimal relative phases ().
-  Deutsch D and Jozsa R 1992 P. Roy. Soc. A 439 553
-  Grover L K 1997 Phys. Rev. Lett. 79 325
-  Shor P W 1997 Siam J. Comput. 26 1484
-  Giovannetti V, Lloyd S and Maccone L 2004 Science 306 1330
-  Giovannetti V, Lloyd S and Maccone L 2011 Nat. Photonics 5 222
-  Bennett C H and Brassard G 1984 in Proceedings of the IEEE International Conference on Computers, Systems, and Signal Processing, Bangalore p. 175
-  Ekert A K 1991 Phys. Rev. Lett. 67 661
-  Langley P 1996 Elements of Machine Learning (San Francisco, CA: Morgan Kaufmann Publisher)
-  Pearson B J, White J L, Weinacht T C and Bucksbaum P H 2001 Phys. Rev. A 63 063412
-  Gammelmark S and Molmer K 2009 New J. Phys. 11 033017
-  Bang J, Lee S W, Jeong H and Lee J 2012 Phys. Rev. A 86 062317
-  Briegel H J and De las Cuevas G 2012 Sci. Rep. 2 400
-  Lewenstein M 1994 J. Mod. Optic. 41 2491
-  Kak S C 1995 Inform. Sciences 83 143
-  Chrisley R L 1996 in P Pylkkänen and P Pylkkö (Eds.) Brain, Mind and Physics (IOS Press) pp. 126–139
-  Narayanan A and Menneer T 2000 Inform. Sciences 128 231
-  Han K H and Kim J H 2002 IEEE T. Evolut. Comput. 6 580
-  Han K H and Kim J H 2006 in IEEE International Conference on Evolutionary Computation (IEEE) pp. 2622–2629
-  Manzano D, Pawłowski M and Brukner C 2009 New J. Phys. 11 113018
-  Lloyd S, Mohseni M and Rebentrost P 2013 arXiv:1307.0411
-  Rebentrost P, Mohseni M and Lloyd S 2013 arXiv:1307.0471
-  Opper M and Haussler D 1991 Phys. Rev. Lett. 66 2677
-  Rastrigin L A 1963 Automat. Rem. Ccontr. 24 1337
-  Gupta P, Agrawal A and Jha N K 2006 IEEE T. Comput. Aid. D. 25 2317
-  Maslov D and Dueck G W 2003 in 6th International Symposium on Representation and Methodology of Future Computing Technologies pp. 162–170
-  Toffoli T 1980 Reversible Computing Technical Memo MIT/LCS/TM-151 (MIT Lab for Computer Science)
-  Reck M, Zeilinger A, Bernstein H J and Bertani P 1994 Phys. Rev. Lett. 73 58
-  Kim J, Lee J S and Lee S 2000 Phys. Rev. A 61 032312
-  Storn R and Price K 1997 J. Global. Optim. 11 341
-  Nielsen M A and Chuang I L 2010 Quantum Computation and Quantum Information (Cambridge: Cambridge University Press) 10th ed.
-  van den Bergh F and Engelbrecht A P 2004 IEEE T. Evolut. Comput. 8 225
-  Middleton A A 2004 Phys. Rev. E 69 055701
-  Pál K F 1996 Physica A 233 60