# Experimental Machine Learning of Quantum States with Partial Information

Quantum information technologies provide promising applications in communication BB84 (); QKDrev () and computation Shor (); Grover (), while machine learning has become a powerful technique for extracting meaningful structures in ‘big data’ ML (). A crossover between quantum information and machine learning represents a new interdisciplinary research area stimulating progresses in both fields Seth2017 (). Here we experimentally demonstrate an application of machine learning to a task in quantum information, namely construction of a quantum-state classifier Yung2017 (). We show that it is possible to experimentally train an artificial neural network to efficiently learn and classify quantum states, without the need of obtaining the full information of the states. We also show that a hidden layer of the neural network plays a crucial role in boosting the performance of the state classifier. These results shed new light on how classification of quantum states can be achieved with limited resources, and represent a step towards large-scale machine-learning-based applications in quantum information processing.

Over the last few years, there has been a significant advancement in an emerging field of quantum machine learning Dunjko2017 (), where quantum information meets the modern information-processing technologies. On one hand, various modern quantum technologies, such as quantum communication, quantum computation and quantum metrology have been experimentally demonstrated by exploiting quantum entanglement as a resource Jin2010 (); Yin2017 (); Lu2007 ().

On the other hand, machine learning, a modern technique for making predictions by mining information from ‘big data’, has been proven as one of the most successful achievements in artificial intelligence. Notable examples include self-driving cars and the famous Alpha-Go which surpasses the top human players at the game ‘Go’ Go (); Go0 ().

The key question in quantum machine learning is, how to develop new ideas for applying technologies in machine learning to quantum information, or vice versa, to gain advancements in both fields?

In fact, several promising steps along both directions have already been taken in the community. In particular, machine learning can in principle exploit quantum superposition to enhance its performance. For example, quantum versions of the principal component analysis (PCA) PCA () and support vector machines (SVM) SVM () have been proposed. Other quantum algorithms along with their experimental implemtations Seth (); Cai2015 (); Li2015 () have also been demonstrated in recent years to broaden the versatility of machine learning.

Besides, machine learning can also be applied to certain quantum tasks, from classifying separability of quantum states Yung2017 (); Lu2017 () to classifying phases in condensed matter physics Phase1 (); Phase2 (), and even the development of new classical algorithms for solving many-body systems MBS (). These results suggest that machine learning of quantum states represents a new platform for solving problems in quantum information science.

Here we report an experimental machine learning of quantum states with partial information, where an artificial neural networks (ANN) is trained for classifying the separability of some quantum states, given that only partial information about the quantum states is available. More specifically, based on the experimental data, we have constructed a quantum-state classifier, which generalizes the pattern recognition in learning theory for quantum data.

In the classical setting, a set of data and label are supplied as the training set for the machine-learning program, and the output is a classifier (a function or a program) for predicting labels of new data. In the quantum setting, the data may be replaced with a density matrix , and the corresponding label may be taken as any physical property, e.g. separability. However, the size of a quantum state grows exponentially when scaled up, which makes large-scale quantum state tomography Tomo () intractable to carry out; this motivates us to exploit the possibility of learning with only partial information of the quantum state.

For the purpose of illustration, our demonstration can be regarded as an application to the following toy model: suppose Alice and Bob are separated by a distance for performing an experiment (e.g., testing quantum nonlocality) utilizing a certain entangled state (not necessarily a Bell state), which is generated by a black-box machine far away from them. Suppose the machine is not so reliable, in the sense that sometimes it does generate the correct entangled state, but it may generates random bits. The task is to determine if, overall, the machine is still able to produce an entangled state, so that entanglement purification may still be possible. In other words, we take a class of Werner-like states Werner () as the training and testing sets. The label on the separability is determined by using the positive partial transpose (PPT) criteria PPT (), applied only to the training set, but not the testing set.

Experimentally, we first show that a linear ANN optimization of the Clauser-Horne-Shimony-Holt (CHSH) inequality Bell (); CHSH () can significantly boost the accuracy of the optimized CHSH inequality in identifying the separability of the quantum state of a pair of qubits. Although it is still far from being perfect, the results in the testing phase exceed the conventional CHSH inequality in detecting entanglement.

In the second part, we further equip the neural network with a hidden layer of neurons. We compare the performance of the two neural networks and find that the inclusion of a hidden layer can significantly improve the match rate of the classifier to nearly unity. The experimental result is consistent with the speculation Yung2017 () that the inclusion of the hidden layer can be regarded as an optimization of multiple linear CHSH-like inequalities.

Finally, we show that the performance of decreases, if the classifier is instead trained with theoretical values, but tested with experimental values. This result suggests that experimental training of the neural network is necessary for the construction of the classifier for testing experimental data.

## Ann-Based Classification Protocol

The necessary theoretical background can be summarized as follows. Let us first introduce the standard CHSH inequality, which is applicable for any quantum state of a pair of qubits,

(1) |

where represents the expectation value of observables under the measurements labeled by (or ) and (or ), for party and respectively. Violation of the CHSH inequality implies that the quantum state is entangled. However, the opposite is not true; there exists entangled quantum states which does not violate the CHSH inequality.

In particular, let us consider the class of Werner states ,

(2) |

where is supposedly unknown. The density matrix can be regarded as a result of sending a Bell state, , through a channel mixing with a completely-mixed state . The Werner state is entangled when , but it violates CHSH inequality only if . Therefore, without optimization, the original CHSH cannot be reliable for testing separability.

Furthermore, the violation also depends on the settings of the measurements. For example, for the following choices, , , and , the Bell state violates the inequality with the maximum value . However, if we change the relative phase between the two components of the state to , even with the same settings, the Bell state no longer violates the CHSH inequality.

Now, a linear classifier will be constructed by the following CHSH operator ,

(3) |

which contains a set of parameters to be optimized Yung2017 (). For the original CHSH inequality, the unoptimized values are . Here, the measurements are given by , , and . Moreover, as shown in Fig.1(a), each time we take as an input to the ANN only 4 observables (or ’features’ in the language of machine learning) , instead of 16 observables as in quantum state tomography Tomo ().

To further improve the performance of the classifier, we connect the input layer to a hidden layer, as shown in Fig.1(b). In addition, the measurements are no longer restricted in the - plane. The input vector is still given by the 4 observables, denoted as . Next, an intermediate vector is constructed through a non-linear relation for every neuron in the hidden layers,

(4) |

where is the ReLu function. The matrix and the vector are initialized uniformly and optimized through the learning process. Then the entire classification task of the ANN with hidden layers is to optimize the following function,

(5) |

where is the sigmoid function. Here the number of neurons (denoted as ) in the hidden layer can be varied to obtain the optimal performance.

## Experimental Quantum State Classifier With Ann

In the following, we report the experimental results in constructing the quantum-state classifier, where the experimental setup is shown in Fig.2. Polarization-entangled photon pairs are created through a type-II spontaneous parametric down conversion in a quasi-phase matched periodically-poled KTiOPO (PPKTP) crystal based on the Sagnac interferometer PPKTP (). The -nm-pump laser is first coupled into a single mode fiber to acquire a near-perfect transverse mode, and is prepared as a superposition state, , by combining a half-wave plate (HWP) with a quarter-wave plate (QWP). The pump light is then divided into two directions and focused on the crystal. Through a careful alignment of the Sagnac interferometer, the clockwise and anti-clockwise components become indistinguishable, generating the following entangled state,

(6) |

The photons are then coupled into two single-mode fibers and are spectrally filtered by two bandpass filters (BPF). By adjusting the HWP and QWP, we can control the parameter and with a high precision to generate a family of entangled states. We characterize the quality of the entangled state by quantum state tomography. The concurrence is found to be and the purity is .

The next step is to construct a series of desired quantum states which will be used for both the training and the testing stage. In our work, we use the time-mixing technique Borivoje2012 (); Dylan2017 () to create the Werner-like states,

(7) |

for . Here is the entangled state generated by the PPKTP source, and the identity matrix is obtained by collapsing the wave function to the following four components, , , and . The parameters and are manipulated by the rotation of the HWP and QWP. We conduct both the quantum-state tomography and the CHSH measurements for all the component states. All these data have been saved as a data pool; then, we randomly (with a quantum random number generator) picked the corresponding components from the pool to construct the desired density matrix of the Werner-like states, and the four observables under the projection.

We first study the performance of the linear CHSH classifier. We fix the relative phase between the two components of the entangled state and vary the parameter with 5 different values. For each value of , we prepare 99 Werner-like states with uniform distribution from 0.01 to 0.99. These states are used as the training set, and the labels (entangled or separable) are determined by the PPT criteria on the experimental density matrices. After training the ANN, we obtain an optimized weight coefficients .

With these new weight coefficients we then test the performance of the classifier and present the results in Fig.3(a). Here, we divide the whole pie into 5 sections to correspond to different values. The classification results are compared with the theoretical prediction made by the PPT criteria. Most states are classified with accurate labels and the total match rate of the result is . Another feature is that mismatch mainly occurs near the margin. This is because the weight coefficients are very sensitive near the margin area. It should also be noticed that classifier exceeds the the traditional CHSH bound of and can be applied to a wider range of states. We also compare the performance of these two classifiers in Fig.3(b). It is noteworthy that due to the uniform distribution of in our test states, the standard CHSH can still predict a bound to label the rest states above the bound as entangled leading to a considerable match rate, but the performance of the CHSH classifier cannot be considered as neither reliable or stable.

Next, we proceed to the case of ANN with a hidden layer. Unlike the linear classifier, we make several modifications in the training stage. We first prepare three different classes of states with different relative phases between the two components of the entangled state . By inserting an additional HWP or QWP, we can change the relative phase from 0 to or , as plotted in the Bloch spheres in Fig.4. We denote these classes as , and . In each class, we also vary the parameter with five different values so we have prepared totally 15 different states. Besides, the measurement settings have also been changed to the combination of , and since additional information of the phase should be acquired. Last but not least, we deliberately pick states near the entangled-separable boundary as the training set to improve the learning process. For each Werner-like state, we select 80 states with near the margin. The testing states, however, still have the uniform distribution of from 0.01 to 0.99.

To further study the performance of the non-linear ANN, we also vary the number of neurons in the hidden layers, from 0 to 5, 10 and 100. All the experimental results are shown in Fig.4. These results clearly show that the inclusion of the hidden layer can improve the performance, especially when we look at the most fallible and sensitive margin part. The average match rate increases from to . From these results we can also see that the linear classifier tends to predict an average bound for each class of state while the ANN with hidden layers can accurately predict the margin for every kind of state. The experimental results show that when dealing with more general scenario, ANN with a hidden layer (even with small numbers of neurons) can significantly enhance the performance of the classifier.

Finally, we compared the performance of the classifier if theoretical data is trained instead of experimental data. As shown in Fig.5, the performance of the theoretical classifier is not as good as the one trained with experimental data. These results imply that the machine-learning program does take into account experimental noise. Another feature is that the effect of the number of neurons is unclear and lack of apparent tendency, and the performance of classifier is state dependent.

## Conclusion

In summary, we experimentally demonstrate quantum machine learning of quantum entanglement by constructing a quantum-state classifier via artificial neural network. We show that a linear optimization of the neural network can already outperform the standard CHSH inequality, in terms of classifying quantum non-separability. The quantum-state classifier achieves an average match rate of . We further demonstrate that ANN with a hidden layer can be applied to more general quantum states with an average match rate of . Overall, the experimental results confirm the working principle of a quantum-state classifier in a small quantum system, where entanglement is taken as the label.

In principle, quantum state classifiers can be constructed in a scalable way Yung2017 (), and the involved key task is to find a suitable label. Since there is still no operationally-practical way to label the entanglement of multipartite states, classifier for entanglement remains a major challenge. Overall, the machine-learning method is beneficial when the labels require a long time to obtain e.g. by numerical methods such as quantum Monte Carlo (QMC) or density functional theory (DFT), and the numerical procedure is only required for the training set of the data only. The optimized state classifier can then be used to replace the numerical procedure to produce the labels of new data.

## Acknowledgments

The authors thank J.-W. Pan for helpful discussions. This research leading to the results reported here was supported by the National Natural Science Foundation of China under Grant No.11374211 and No.11675113, the Innovation Program of Shanghai Municipal Education Commission (No.14ZZ020), Shanghai Science and Technology Development Funds (No.15QA1402200), the open fund from HPCL (No.201511-01), the Guangdong Innovative and Entrepreneurial Research Team Program (No. 2016ZT06D348), and the Science Technology and Innovation Commission of Shenzhen Municipality (ZDSYS20170303165926217, JCYJ20170412152620376). M.-H.Y and X.-M.J. acknowledges support from the National Young 1000 Talents Plan.

## References

- (1) C. H. Bennett, and G. Brassard, Systems and Signal Processing 175-179 (1984).
- (2) H. K. Lo, M. Curty, and K. Tamaki, Nat. Photon. 8, 595-604 (2014).
- (3) P. W. Shor, in Proceedings of the 35th Annual Symposium on the Foundations of Computer Science 124-133 (IEEE Computer Society Press, Los Alamitos, California, 1994).
- (4) L. K. Grover, Phys. Rev. Lett. 79, 325â328 (1997).
- (5) R. S. Michalski, J. G. Carbonell, and T. M. Mitchell, Machine Learning: An Artificial Intelligence Approach (Springer Science Business Media, Berlin, 2013).
- (6) J. Biamonte, P. Wittek, N. Pancotti, P. Rebentrost, N. Wiebe, and S. Lloyd, Nature, 549, 195-202 (2017).
- (7) R. Horodecki, P. Horodecki, M. Horodecki, and K. Horodecki, Rev. Mod. Phys. 81, 865 (2009).
- (8) J.-W. Pan, Z.-B. Chen, C.-Y. Lu, H. Weinfurter, A. Zeilinger, and M. Zukowski, Rev. Mod. Phys. 84, 777 (2012)
- (9) Y.-C. Ma, and M.-H. Yung, preprint arXiv:1705.00813, (2017).
- (10) V. Dunjko, & H. J. Briegel, preprint arXiv:1709.02779 (2017).
- (11) X.-M. Jin, J.-G. Ren, B. Yang, Z.-H. Yi, F. Zhou, X.-F. Xu, S.-K. Wang, D. Yang, Y.-F. Hu, S. Jiang, T. Yang, H. Yin, K. Chen, C.-Z. Peng, and J.-W. Pan, Nat. Photon. 4, 376 (2010).
- (12) J. Yin, Y. Cao, Y.-H. Li, S.-K. Liao, L. Zhang, J.-G. Ren, W.-Q. Cai, W.-Y. Liu, B. Li, H. Dai, G.-B. Li, Q.-M. Lu, Y.-H. Gong, Y. Xu, S.-L. Li, F.-Z. Li, Y.-Y. Yin, Z.-Q. Jiang, M. Li, J.-J. Jia, G. Ren, D. He, Y.-L. Zhou, X.-X. Zhang, N. Wang, X. Chang, Z.-C. Zhu, N.-L. Liu, Y.-A. Chen, C.-Y. Lu, R. Shu, C.-Z. Peng, J.-Y. Wang, and J.-W. Pan, Science 356, 1140 (2017).
- (13) C.-Y. Lu, D. E. Browne, T. Yang, and J.-W. Pan, Phys. Rev. Lett. 99, 250504 (2007).
- (14) D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis, Nature, 529, 484-489 (2016).
- (15) D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton, Y. Chen, T. Lillicrap, F. Hui, L. Sifre, G. Driessche, T. Graepel, and D. Hassabis, Nature 550, 354-359 (2017).
- (16) S. Lloyd, M. Mohseni, and P. Rebentrost. Nat. Phys. 10, 631â633, (2014).
- (17) P. Rebentrost, M. Mohseni, and S. Lloyd, Phys. Rev. Lett. 113, 130503 (2014).
- (18) S. Lloyd, M. Mohseni, and P. Rebentrost, preprint arXiv:1307.0411 (2013)
- (19) X.-D. Cai, D. Wu, Z.-E. Su, M.-C. Chen, X.-L. Wang, L. Li, N.-L. Liu, C.-Y. Lu, and J.-W. Pan, Phys. Rev. Lett. 114, 110504 (2015).
- (20) Z. Li, X. Liu, N. Xu, and J. Du, Phys. Rev. Lett. 114, 140504 (2015).
- (21) S. Lu, S. Huang, K. Li, J. Li, J. Chen, D. Lu, Z. Ji, Y. Shen, D. Zhou, and B. Zeng, preprint arXiv:1705.01523 (2017).
- (22) J. Carrasquilla, and R. G. Melko, Nat. Phys. 13, 431-434, (2017).
- (23) E. P. L. van Nieuwenburg, Y.-H. Liu, and S. D. Huber, Nat. Phys. 13, 435-439, (2017).
- (24) G. Carleo, and M. Troyer, Science 355, 602 (2017).
- (25) M. A Nielsen and I. Chuang. Quantum computation and quantum information, (2002).
- (26) M. Horodecki, P. Horodecki, and R. Horodecki, Phys. Lett. A 223, 1-8 (1997).
- (27) J. S. Bell, Physics (Long Island City, N.Y.) 1, 195. (1964).
- (28) J. F. Clauser, M. A. Horne, A. Shimony, and R. A. Holt, Phys. Rev. Lett. 23, 880-884 (1969).
- (29) D. F. V. James, P. G. Kwiat, W. J. Munro, and A. G. White, Phys. Rev. A 64, 052312 (2001).
- (30) T. Kim, M. Fiorentino, and F. N. C. Wong, Phys. Rev. A 73, 012316 (2006).
- (31) B. Dakic, Y. O. Lipp, X. -S. Ma, M. Ringbauer, S. Kropatschek, S. Barz, T. Paterek, V. Vedral, A. Zeilinger, C. Brukner, and P. Walther, Nat. Phys. 8, 666, (2012).
- (32) D. J. Saunders, A. J. Bennet, C. Branciard, and G. J. Pryde, Sci. Adv. 3, e1602743, (2017)