Quantum distancebased classifier with constant size memory, distributed knowledge and state recycling
Abstract
In this work we examine recently proposed distancebased classification method designed for nearterm quantum processing units with limited resources. We further study possibilities to reduce the quantum resources without any efficiency decrease. We show that only a part of the information undergoes coherent evolution and this fact allows us to introduce an algorithm with significantly reduced quantum memory size. Additionally, considering only partial information at a time, we propose a classification protocol with information distributed among a number of agents. Finally, we show that the information evolution during a measurement can lead to a better solution and that accuracy of the algorithm can be improved by harnessing the state after the final measurement.
1 Introduction
Recently, a significant effort has been made to develop machine learning techniques that harness the power of quantum processing units. It is foreseen that quantum mechanics will offer tantalizing prospects to enhance machine learning, ranging from reduced computational complexity to improved generalization performance [1, 2]. A number of methods have been proposed, with algorithms for cluster finding [3], principal component analysis, quantum support vector machines, and quantum Boltzmann machines being the most promising ones [1].
The progress in the field is very dynamic, but the devices available in the nearterm provide limited resources to harness the proposed methods. Thus, a question of critical importance for the field of quantum computing, that will only gain more interest in the near future is whether quantum processing units can provide resources for algorithms solving welldefined computational tasks that lie beyond the reach of stateoftheart classical computers.
A number of approaches to harness nearterm devices have appeared. One of the most important directions of research is studying applications of sampling tasks, with works stating that some sampling tasks must take exponential time in a classical computer [4]. Another promising field that could yield valuable, noiserobust methods is concerned with quantum approximate optimization algorithms, designed to run on a gate model quantum computer and has shallow depth [5]. Also, quantum annealing is argued to be reliable enough for problems unfeasible for classical computers [6]. An important trend in context of this work is trying to get to the regime that existing supercomputers cannot easily simulate with methods to scale up using only small trusted devices [7]. This results encourage further study of methods that focus on nearterm resources more that sole scaling properties.
In the field of nearterm quantum machine learning some results demonstrate that, while complex faulttolerant architectures will be required for universal quantum computing, a significant quantum advantage already emerges in existing noisy systems [8]. The approach that is of particular importance in context of this work, is recently proposed distancebased classification method designed for low depth quantum interference circuits [9].
The main goal of this work is to consider ways to improve distancebased classification in terms of needed resources and to develop new features that extend its usefulness as well as boost its efficiency.
1.1 Quantum distancebased classifier
A distancebased classifier we consider in this paper is an example of a kernel method [10]. In this method classification, or prediction of the class label of some given test sample is determined based on a value of similarity function, called a kernel. The similarity is measured between the test sample and all of the training samples. For instance, having a collection of training samples consisting of pairs of feature vectors and corresponding class labels the goal is to assign a class to a test sample . The classifier we consider may be seen as a kernelized binary classifier
(1) 
where the similarity is measured based on the distance
(2) 
with being a normalizing constant equal to the size of the training set.
The quantum circuit implementation of the classifier is based on encoding the feature vectors in quantum states. For normalized feature vectors the corresponding quantum states are
(3) 
The classification is performed by preparing a state that encodes both the training set and the test sample and measuring the state after a simple preprocessing. For this purpose we entangle an ancillary qubit with the data. Additionally we introduce registers for element index and its class and obtain a state
(4) 
where
(5) 
and . The only quantum operation we perform is a Hadamard operation on the ancillary register. Finally we obtain a state
(6) 
where
(7) 
and . If we perform a measurement on the resulting state and get as the result on the auxiliary register, the conditional probabilities for each of the classes are proportional to . For any further details we encourage the reader to see [9].
2 Distancebased classifier as a quantum channel
In this section we aim to show that the resources used to implement the distancebased classifier as described in the previous section can be significantly reduced. The quantum operations we do during classification apply only to the ancillary and feature vectors registers. This means that these are the only registers that need to be encoded in the quantum memory. The index and the class registers may be classical. Thus, the algorithm may be expressed as a quantum channel, significantly reducing the number of required qubits.
2.1 Mapping the classifier to a quantum channel
Using a density matrix description that allows us to mix quantum and classical registers we can write a state encoding the training set similar to the one in equation (4). The state can be represented as
(8) 
where . Thus, in order to perform classification it is sufficient to prepare a state for chosen from uniform distribution and apply Hadamard operation on the auxiliary register
(9) 
If the result is 0, the output class is , the same as for the training element. As the auxiliary and feature registers are the only ones entangled, the probability distribution is exactly the same.
Note that the channel description is useful for representation of the whole classification procedure. The key part that is done on a QPU can still be described as a unitary operation on a reduced state. Thus, we can design the whole procedure to be as follows.

For given test sample compute normalized vector .

Pick random training sample from a uniform distribution.

Prepare a circuit that prepares the comparison state .

Run the QPU, preparing and measure the resulting state.

If the result at the auxiliary register is , goto 2.

Return .
The mixture of classical and quantum steps is corresponding to the quantum channel description.
2.2 Probability distribution equivalence
To show the equivalence of the probability distribution we recall the probability of measuring the state in Eq. (6) and obtaining result 0 on the auxiliary register and on the class register
(10) 
(11) 
It is straightforward to check that a measurement given state in Eq. (9) results in the same probability distribution. For a single sampled one obtains the probability of successful postselection equal to
(12) 
(13) 
(14) 
If we consider the probability of sampling equal to and sum over all of the samples with given class label equal to the result matches Eq. (11).
2.3 Implementation
For each comparison of randomly picked sample label with the test sample, we consider sample state and the input state . The main challenge in implementing the algorithm on a QPU is to design proper quantum circuit that prepares the desired state. In this case, we aim at preparing .
We begin with simplified 2 features case. Let us fix and . We present a general formula for preparing the state for any test and training samples. For preparing the desired state it is necessary to prepare a state with relative phase dependent on the auxiliary register. We assume that the initial state is . The initial rotation is performed along axis by angle equal to . Then controlled negation is performed to obtain different relative phases depending on the value of the auxiliary register. The final rotation, again along axis by results in the desired state. The complete preparation operator is equal to
(15) 
The resulting circuit can be expressed as
\Qcircuit@C=1.em @R=1.2em
A& & \gateH & \ctrl1 & \gateH & \measureM
F& & \gateR_y(α_1) & \gate⊕& \gateR_y(α_2) &\qw
where M stands for final projective measurement.
The method generalizes to larger feature registers provided that all the necessary controlled operators are provided in the architecture. If there are some restrictions it is important to note that in case of larger register, the states are entangled and preparing complexity grows, because we cannot prepare each of the qubits separately. In such case a good candidate for searching for proper preparation circuits is to develop a heuristic method for finding the circuits that with some smart guess of the available gate set finds the circuit without the need to provide explicit algorithm [11, 12].
The circuit that prepares desired state for any training and test samples with 4 features consists of four rotations and three controlled negations. The circuit is in the following form
\Qcircuit@C=1.em @R=1.2em
A& & \gateH & \ctrl2 & \qw& \qw&\qw&\ctrl2 &\gateH &\measureM
F& & \gateH & \qw& \qw& \ctrl1 &\qw&\qw&\qw&\qw
F& & \gateR_y(1) & \qw⊕& \gateR_y(2)& \qw⊕&\gateR_y(3)&\qw⊕&\gateR_y(4)&\qw
where the features register consists of the two lowest qubits.
2.4 Example
In order to show the increased potential of the quantum distancebased classifier we perform experiment on the same data set as in [10], but using all of the features the data poses. The introduced method additionally allows one to use arbitrarily many training samples. Thus, we perform leave one out cross validation using the whole dataset.
The probability that postselection is successful for given test/training class is presented in Table 1. We observe that even with increased number of classes considered the probability of the correct class is always the highest. The overall resulting probability distribution of output class labels, after postselection, for given training sample is presented in Table 2. In every case the probability of the correct class is the highest. Overall success probabilities are given in Table 3.
4 features  

Class  A  B  C 
A  0.97  0.38  0.08 
B  0.38  0.73  0.66 
C  0.08  0.66  0.89 
4 features  

Class  A  B  C 
A  0.68  0.27  0.06 
B  0.22  0.41  0.37 
C  0.05  0.40  0.55 
4 features  

Class  A  B  C 
All  0.68  0.41  0.55 
3 Generalized measurements sequence classifier
In this section we move to discussing further possibilities that come from the fact that part of the system may be seen as a collection of classical registers. In particular we aim at modeling the classification protocol as a sequence of steps in time that results in the measurement properties consistent with training data. Using the channel description we expressed the algorithm as a single preparation and measurement routine. If we want to describe a sequence of such steps where the state may be preserved after the measurement and the action is determined by the result of the measurement, a very convenient model is the Open Quantum Walk (OQW) [13], modeling any quantum system homogeneous in time with one classical property, called position, that governs the dynamics rules. This model enables one to clearly distinguish classical and quantum information and describe the dynamic rules.
3.1 Classification with distributed knowledge
The simplest open quantum walk on a graph of size and internal states of size is defined as a quantum channel , such that the state of the system evolves by applying the walk channel
(16) 
where the walk channel using Kraus representation is a sum of transition operators
(17) 
with . Note that are responsible for connectivity.
Because the index register could be classical it can be a position in an open quantum walk. A single node would correspond to a single test, comparing the test state with one fixed training element. To ensure proper information distribution. One may see this as a training done by walking. The class of the position measured would be the output.
The walk channel corresponding to the classifier that would implement preparation of the training state during transition from position to position is
(18) 
where is number of outgoing edges from node . Alternatively, the transition operators can be described without including the test state into the Kraus operators
(19) 
resulting in a scenario where the test state is introduced only once as a quantum state, making it impossible to save it for future use without disturbing the protocol.
As a result we obtain a procedure with distributed knowledge about the training samples. The scenario may include many parties, where each party has limited knowledge that is not shared and the result is the same as in the case when all the knowledge was available. The procedure that implements this walk in a situation with many parties could be as follows.

For given test sample compute normalized vector .

Provide one of the agents with a state .

The first agent applies as in Eq. (19).

Until the stop condition is reached do:

Pick random neighbor from a uniform distribution.

Transfer the state applying in the process.


Measure the final state .

If the result at the auxiliary register is , goto 2.

Return .
The caveat is that some normalization information has to be shared to ensure proper state preparation. As Kraus operators change the internal state, for many training states at one site we would have to use generalized OQW [14]. The model allows to study limiting properties of classification results [15, 16, 17]. The dynamics need to be fair in some sense that is different than in the classical case [18].
3.2 Distributed knowledge example
In this case the main aim of the simulation is to show what is the influence of the walk graph on the classification results. In the case when the graph is complete or when the initial position is chosen randomly we would obtain exactly the same results as for a simple channel model.
In our experiment we compare two cases chosen as representative extreme cases. We choose our graph to be a cycle, but we consider two arrangements of the nodes.

150 nodes corresponding to training samples from 3 classes (A, B, C) arranged in three clusters (AAA…BBB…CCC…), starting position in the middle of one of the classes.

150 nodes corresponding to training samples from 3 classes (A, B, C) arranged in a regular pattern with neighbors being always from different classes (ABCABC … ), starting position in the middle of one of the cycle.
For each of the scenarios we study the probability of correct classification. Regardless of nodes distribution the walk will always converge to the uniform distribution [19], thus exhibiting exactly the same probabilities as the channel model. The subject of the study is the behavior during the convergence time.
In Fig. 1 the results obtained for the introduced scenarios are presented. In the case when the training samples are mixed, the probabilities for the very start are close to the limiting ones, but it still takes hundreds of steps to reach the exact numbers with high precision. In the case when the samples are grouped into clusters from the same class the results are highly disturbed in the beginning. The class that we start from is chosen relatively often, what makes success rate for that class higher than normal, but the results for other classes are significantly worse. Let us note that for given cycle size any node is obtainable in 75 steps. Thus, one could consider other walk dynamics, in particular a coherent quantum walk, that features ballistic spreading, for a more efficient protocol.
3.3 Classification with quantum state recycling
The model introduced so far harnessed the information in all of the nodes in limited sense. The walk was designed to make it possible to measure any of the nodes, thus make a comparison to any of the training samples. This allows to obtain the same classification probability as in the initial proposal, but does not harness the information encoded in the state after the measurement. So called state recycling proved useful in quantum algorithms [20] and with a model of sequences of generalized measurements we can study possibilities of harnessing this information.
Note that if the measurement result indicates that the training element corresponding to the walk position during measurement does not belong to the same class as the test element, the state after the measurement () is rotated away from the state encoding element and usually gets closer to the goal class. An example is presented in Fig. 2. It is straightforward to check that rotation introduced by the measurement is
(20) 
where is the (acute) angle between and . Thus the resulting state is rotated towards the other pole by at least and proportionally to the initial angle.
This behavior can be used to improve the classification algorithm. The whole procedure will now consist of a sequence of steps, with measurements at each step. The measurement indicated whether current position is a good fit for the test element. If it is, the procedure stops. If not, it continues preserving the information encoded in the state as follows.

For given test sample compute normalized vector .

Pick random training sample from a uniform distribution.

Prepare a circuit that prepares the comparison state .

Run the QPU, preparing and measure the resulting state.

If the result at the auxiliary register is 0, continue. While the result at the auxiliary register is 1, repeat:

Preserve the resulting state .

Pick random sample from a uniform distribution.

Conditionally prepare sample state resulting in
. 
Measure the state .


Return .
The efficiency can be additionally affected with walk graph manipulation. When the labels of the element in the training set are known, we propose to use a graph where edges exist only between nodes belonging to different classes. In a simple 2 class case this results in a bipartite graph.
3.4 Quantum state recycling example
We analyze the efficiency of the proposed scheme with an experiment, in which we compare its successful classification probability with the basic scheme. For simplification we take only 2 classes of vectors from the same dataset as before. We consider a set of transition operators, one for each pair of the training elements defining an OQW. Then, we consider two scenarios:

one classification measurement with random training sample,

a sequence of up to two steps, with the second one done only if the result of the measurement during the first one is negative. The second measurement is acting on the state resulting from the first one.
In this case the underlying OQW is done on a complete graph, stopping after first successful comparison.
4 features  

Class  1 step  2 steps 
A  0.722  0.734 
B  0.689  0.710 
We measure the classification efficiency with leave one out cross validation. The results are presented in Table 4. For the considered data the state recycling scheme provides better overall average success probability. Moreover the success probability was better for and of the samples for class A and B respectively.
4 Discussion
The results presented in this work open new possibilities to develop distancebased classification methods for nearterm quantum processing units.
The most basic consequence of the introduced hybrid classicalquantum method is parallel computation potential. Given that only a part of the training information undergoes coherent evolution, a number of small QPUs can be used to obtain higher classification accuracy.
Moreover, joining quantum cryptographic protocols with distributed information classification could bring novel applications for QPUs. Harnessing information routing based on quantum walks can cause nontrivial dynamics. The walks introduced in this paper converge to the uniform distribution, but in general quantum walks can feature nontrivial probability distributions [21]. In particular, it has been shown that additional connections, while reducing distances, can cause periodic in space, nonuniform limiting behavior, in result leading to disturbed classification probabilities [22]. Similarly, nonrandom controlled walk direction evolution leads to phenomena that is not present in classical systems, resulting in gamelike behavior that requires certain strategy to assure that all of the points are properly included [18].
Finally, we have shown that quantum implementation of a distancebased classifier can achieve better accuracy when a state after the final measurement is preserved and processed. This leads to a number of open questions. The foundations of the observed improvement will be an object of our further study. Quantum processing units are naturally well suited for processing highdimensional vectors in large tensor product spaces, which should boost their time performance. Thus, significant additional performance improvement in terms of accuracy, especially with relatively small nearterms devices, would be a very desirable result.
5 Acknowledgements
Financial support by the Polish National Science Centre under project number 2015/17/B/ST6/01872 is gratefully acknowledged.
Footnotes
 psadowski@iitis.pl
References
 Jacob Biamonte, Peter Wittek, Nicola Pancotti, Patrick Rebentrost, Nathan Wiebe, and Seth Lloyd. Quantum machine learning. Nature, 549(7671):195, 2017.
 Maria Schuld, Ilya Sinayskiy, and Francesco Petruccione. An introduction to quantum machine learning. Contemporary Physics, 56(2):172–185, 2015.
 Seth Lloyd, Masoud Mohseni, and Patrick Rebentrost. Quantum algorithms for supervised and unsupervised machine learning. arXiv preprint arXiv:1307.0411, 2013.
 Sergio Boixo, Sergei V Isakov, Vadim N Smelyanskiy, Ryan Babbush, Nan Ding, Zhang Jiang, John M Martinis, and Hartmut Neven. Characterizing quantum supremacy in nearterm devices. arXiv preprint arXiv:1608.00263, 2016.
 Eddie Farhi. Quantum supremacy through the quantum approximate optimization algorithm. Bulletin of the American Physical Society, 62, 2017.
 Vadim N Smelyanskiy, Eleanor G Rieffel, Sergey I Knysh, Colin P Williams, Mark W Johnson, Murray C Thom, William G Macready, and Kristen L Pudenz. A nearterm quantum computing approach for hard computational problems in space exploration. arXiv preprint arXiv:1204.2821, 2012.
 Nathan Wiebe, Christopher Granade, Christopher Ferrie, and David G Cory. Hamiltonian learning and certification using quantum resources. Physical review letters, 112(19):190501, 2014.
 Diego Ristè, Marcus P Da Silva, Colm A Ryan, Andrew W Cross, Antonio D Córcoles, John A Smolin, Jay M Gambetta, Jerry M Chow, and Blake R Johnson. Demonstration of quantum advantage in machine learning. npj Quantum Information, 3(1):16, 2017.
 Maria Schuld, Mark Fingerhuth, and Francesco Petruccione. Implementing a distancebased classifier with a quantum interference circuit. EPL (Europhysics Letters), 119(6):60002, 2017.
 André Elisseeff and Jason Weston. A kernel method for multilabelled classification. In Advances in neural information processing systems, pages 681–687, 2002.
 Srinivas Sridharan, Mile Gu, Matthew R James, and William M McEneaney. Reducedcomplexity numerical method for optimal gate synthesis. Physical Review A, 82(4):042319, 2010.
 Sadowski Przemysław. Generating efficient quantum circuits for preparing maximally multipartite entangled states. International Journal of Quantum Information, 11(7):1350067, 2013.
 Stéphane Attal, Francesco Petruccione, and Ilya Sinayskiy. Open quantum walks on graphs. Physics Letters A, 376(18):1545–1548, 2012.
 Łukasz Pawela, Piotr Gawron, Jarosław Adam Miszczak, and Przemysław Sadowski. Generalized open quantum walks on Apollonian networks. PLoS ONE, 10(7):e0130967, 2015.
 Norio Konno and Hyun Jae Yoo. Limit theorems for open quantum random walks. Journal of Statistical Physics, 150(2):299–319, 2013.
 Przemysław Sadowski and Łukasz Pawela. Central limit theorem for reducible and irreducible open quantum walks. Quantum Information Processing, 15(7):2725–2743, 2016.
 Stéphane Attal, Nadine GuillotinPlantard, and Christophe Sabot. Central limit theorems for open quantum random walks and quantum measurement records. In Annales Henri Poincaré, volume 16, pages 15–43. Springer, 2015.
 Jarosław Adam Miszczak and Przemysław Sadowski. Quantum network exploration with a faulty sense of direction. Quantum Information and Computation, 14(13&14):1238–1250, 2014.
 Małgorzata Bednarska, Andrzej Grudka, Paweł Kurzyński, Tomasz Łuczak, and Antoni Wójcik. Quantum walks on cycles. Physics Letters A, 317(1):21 – 25, 2003.
 Enrique MartinLopez, Anthony Laing, Thomas Lawson, Roberto Alvarez, XiaoQi Zhou, and Jeremy L O’brien. Experimental realization of shor’s quantum factoring algorithm using qubit recycling. Nature Photonics, 6(11):773, 2012.
 Norie Konno. Quantum walks. In Quantum potential theory, pages 309–452. Springer, 2008.
 Przemysław Sadowski, Jarosław Adam Miszczak, and Mateusz Ostaszewski. Lively quantum walks on cycles. Journal of Physics A: Mathematical and Theoretical, 49(37):375302, 2016.