Measurement-based adaptation protocol with quantum reinforcement learning

# Measurement-based adaptation protocol with quantum reinforcement learning

## Abstract

Machine learning employs dynamical algorithms that mimic the human capacity to learn, where the reinforcement learning ones are among the most similar to humans in this respect. On the other hand, adaptability is an essential aspect to perform any task efficiently in a changing environment, and it is fundamental for many purposes, such as natural selection. Here, we propose an algorithm based on successive measurements to adapt one quantum state to a reference unknown state, in the sense of achieving maximum overlap. The protocol naturally provides many identical copies of the reference state, such that in each measurement iteration more information about it is obtained. In our protocol, we consider a system composed of three parts, the “environment” system, which provides the reference state copies; the register, which is an auxiliary subsystem that interacts with the environment to acquire information from it; and the agent, which corresponds to the quantum state that is adapted by digital feedback with input corresponding to the outcome of the measurements on the register. With this proposal we can achieve an average fidelity between the environment and the agent of more than with less than iterations of the protocol. In addition, we extend the formalism to -dimensional states, reaching an average fidelity of around in less than iterations for 11, for a variety of genuinely quantum as well as semiclassical states. This work paves the way for the development of quantum reinforcement learning protocols using quantum data, and the future deployment of semi-autonomous quantum systems.

## I Introduction

Machine learning (ML) is an area of artificial intelligence that focuses on the implementation of learning algorithms, and which has undergone great development in recent years Russell et al. (1995); Michalski et al. (2013); Jordan and Mitchell (2015). ML can be classified into two broad groups, namely, learning by means of big data and through interactions. For the first group we have two classes, supervised learning, which uses previously classified data to train the learning program, inferring the function of relationship to classify new data. This is the case, e.g., of pattern recognition problems Kawagoe and Tojo (1984); Shannon et al. (1995); Jain (2007); Carrasquilla et al. (2017). The other class is unsupervised learning, which does not require training data, but this paradigm uses the big data distribution to obtain an optimal way to classify it using specific characteristics. An example is the clustering problem Fahad et al. (2014); Baldi et al. (2014).

For the second group, learning from interactions, we have the case of reinforcement learning (RL) Sutton and Barto (1998); Littman (2015). RL is the more similar paradigm to the human learning process. Its general framework is as follows: we define two basic systems, an agent and an environment , while often it is useful to define a register as an auxiliary system. The concept consists of inferring information by direct interaction with , or indirectly, using as a mediator the system . With the obtained information, makes a decision to perform a certain task. If the result of this task is good, then the agent receives a reward, otherwise a punishment. In addition, the RL algorithms can be divided into three basic parts, the policy, the reward function (RF) and the value function (VF). The policy can be subdivided into three stages: first, interaction with the environment. In this stage, the way in which or interacts with is specified. Second, information extraction, which indicates how obtains information from . Finally, action, where takes the decision of what to do with the information of the previous step. RF refers to the criterion to award the reward or punishment in each iteration. And VF evaluates the utility of referred to the given task. An example of RL consists in artificial players for go or chess Silver et al. (2017a, b).

Other essential aspect of the RL protocols is the exploitation-exploration relation. Exploitation refers to the ability to make good decisions, while exploration is the possibility of making different decisions. For example, if we want to select a gym to do sports, the exploitation is given by the quality of the gym we tested, while the exploration is the size of the search area in which we will choose a new gym to test. In the RL paradigm, a good exploitation-exploration relation can guarantee the convergence of the learning process, and its optimization depends on each algorithm.

On the other hand, quantum mechanics is known to improve computational tasks Nielsen and Chuang (2010), so a natural question is: how are the learning algorithms modified in the quantum domain? To answer this question the quantum machine learning field (QML) has emerged. In recent years, QML has been a fruitful area Schuld et al. (2015); Adcock et al. (2015); Dunjko et al. (2016); Dunjko and Briegel (2017); Biamonte et al. (2017); Biswas et al. (2017); Perdomo-Ortiz et al. (2017a, b), in which quantum algorithms have been developed Sasaki and Carlini (2002); Lloyd et al. (2013); Benedetti et al. (2017a, b), that show a possible speedup in certain situations in relation with their classical counterparts Aïmeur et al. (2013); Paparo et al. (2014). However, these novel works focus mainly on learning from classical data encoded in quantum systems, processed with a quantum algorithm and decoded to be read by a classical machine. In this context, the speedup of quantum machine learning is often balanced with the necessary resources to encode and decode the information, which leads to an unclear quantum supremacy. On the other hand, recent works analyze the QML paradigm in a purely quantum way Lamata (2017); Cárdenas-López et al. (2017); Alvarez-Rodriguez et al. (2017), in which quantum systems learn quantum data.

In this article, we present a quantum machine learning algorithm based on a reinforcement learning approach, to convert the quantum state of a system (Agent), into an unknown state encoded, via multiple identical copies, in another system (Environment), assisted by measurements on a third system (Register). We propose to use coherent feedback loops, conditioned to the measurements in order to perform the adaptation process without human intervention. In our numerical calculations we obtain average fidelities of more than for qubit states after less than measurements, while for qudits the protocol achieves average fidelities of using iterations with dimensions, either for genuinely quantum or semiclassical states. This proposal can be useful in the way to implement semi-autonomous quantum devices.

## Ii The Quantum Adaptation Algorithm

Our framework is as follows. We assume a known quantum system called agent (), and many copies of an unknown quantum state provided by a system called environment (). We also consider an auxiliary system called register () which interacts with . Then, we obtain information about by measuring , and employ the result as an input to the reward (RF) function. Finally, we perform a partially-random unitary transformation on , which depends on the output of the RF. The idea is to improve the fidelity between and , without projecting the state of with measurements.

This protocol differs from quantum state estimation Adamson and Steinberg (2010); Sosa-Martinez et al. (2017); Lumino et al. (2017); Rocchetto et al. (2017); Torlai et al. (2017); Sun et al. (2018) in the fact that we propose a semi-autonomous quantum agent, that is, the aim is that in the future a quantum agent will learn the state of the environment without any human intervention. Other authors have considered the inverse problem, an unknown state evolved to a known state assisted by measurements Roa et al. (2006), which deviate from the machine learning paradigm. Therefore, an optimal measurement is not performed in each step, but after a certain number of autonomous iterations, the agent converges to a large fidelity with the unknown state.

In the rest of the article we use the following notation: the subscripts , and refer to each subsystem, and the superscripts indicate the iteration. For example, refers to the operator that acts on the subsystem during the th iteration. Moreover, the lack of any of these indices indicates that we are referring to a general object in the iterations and/or in the subsystems.

We start with the case where each subsystem is described by a qubit state. We assume that is described by , and by an arbitrary state expressed in the Bloch sphere as [see Fig. 1 (a)]. The initial state reads

 |ψ(1)⟩=|0⟩A|0⟩R[cos(θ(1)/2)|0⟩E+eiϕ(1)sin(θ(1)/2)|1⟩E]. (1)

First of all, we will introduce the general elements of our reinforcement learning protocol, such as policy, the RF and the VF. For the policy, we perform a Controlled-NOT (CNOT) gate () with as control and as target (i.e., the interaction with the environment), in order to copy information of into , obtaining

 |Ψ1⟩= UNOTE,R|ψ(1)⟩ = |0⟩A[cos(θ(1)/2)|0⟩R|0⟩E+eiϕ(1)sin(θ(1)/2)|1⟩R|1⟩E]. (2)

We then measure the register qubit in the basis , with probability , or , to obtain the state or respectively (i.e., information extraction). If the result is , it means that we collapse into and do nothing, but if the result is , it means that we measure the orthogonal component to of , and thus we accordingly modify the agent. As we do not have additional information about the environment, we perform a partially-random unitary operator on given by (action), where and are random angles of the form , with a random number, is the range of random angles, , and is the th spin component. Now, we initialize the register qubit state and employ a new copy of , obtaining the next initial state for the second iteration

 |ψ(2)⟩=U(1)A|0⟩A|0⟩R|E⟩E=|¯0⟩(2)A|0⟩R|E⟩E, (3)

with

 U(1)A=[m(1)U(1)A(α(1),β(1))+(1−m(1))IA]. (4)

Here, , is the outcome of the measurement, is the identity operator, and we define the new agent state as .

Now, we define the RF to modify the exploration range of the th iteration as

 Δ(k)=[(1−m(k−1))R+m(k−1)P]Δ(k−1), (5)

where is the outcome of the th iteration, while and are the reward and punishment ratios, respectively. Equation (5) means, that the value of is modified by for the next iteration when the previous outcome is , and by when the outcome is . In our protocol, we choose for simplicity and , such that, every time that the state is measured, the value of is reduced, and increased in the other case. Also, the fact that means that the punishment and the reward have the same strength, or in other words, if the protocol yields the same number of outcomes and , the exploration range does not change. Finally, the VF is defined as the value of after all iterations. Therefore, if the protocol improves the fidelity between and .

To illustrate the behaviour of the protocol, we consider the th iteration. The initial state is given by

 |ψ⟩(k)=|¯0⟩(k)A|0⟩R|E⟩E, (6)

where , , with , and given by Eq. (4). Also , where we define

 Sz(j)A=12(|¯0⟩(j)A⟨¯0|−|¯1⟩(j)A⟨¯1|)=U(j−1)†ASz(j−1)AU(j−1)A, Sx(j)A=12(|¯0⟩(j)A⟨¯1|+|¯1⟩(j)A⟨¯0|)=U(j−1)†ASx(j−1)AU(j−1)A, (7)

with . We can write the state of in the Bloch representation using as a reference axis (see Fig. 1 (b)), and apply the operator , obtaining for ,

 U(k)†E|E⟩E= U(k)†E[cos(θ(k)/2)|¯0⟩(k)E+eiϕ(k)sin(θ(k)/2)|¯1⟩(k)E] = cos(θ(k)/2)|0⟩E+eiϕ(k)sin(θ(k)/2)|1⟩E=|¯E⟩(k)E. (8)

We can write the states and in terms of the initial logical states and , and the unknown angles , , and as follows

 |¯0(k)⟩= cos(θ(1)−θ(k)2)|0⟩+eiϕ(1)sin(θ(1)−θ(k)2)|1⟩ |¯1(k)⟩= −e−iϕ(k)sin(θ(1)−θ(k)2)|0⟩+ei(ϕ(1)−ϕ(k))sin(θ(1)−θ(k)2)|1⟩.

Therefore, the operator performs the necessary rotation to transform and . Then, we perform the gate

 |Φ(k)⟩=UNOTE,R|¯0⟩(k)A|0⟩R|¯E⟩E =|¯0⟩(k)A[cos(θ(k)/2)|0⟩R|0⟩E+eiϕ(k)sin(θ(k)/2)|1⟩R|1⟩E], (10)

and we measure , with probabilities , and , for the outcomes and , respectively. Finally, we apply the RF given by Eq. (5). We point out that, probabilistically, when , , and when , . In terms of exploitation-exploration relation this means that when the exploitation decreases (we measure often), we increase the exploration (we increase the value of ) to increase the probability of making a beneficial change, and when the exploitation improves (we measure many times), we reduce the exploration to allow only small changes in the following iterations. The diagram of this protocol is shown in Fig. (2).

Fig. 3 (a) shows the numerical calculation of mean fidelity between and for the single-qubit case. For this computation we use random initial states with (blue line), (red line), (yellow line), (purple line), and (green line). We can see that the protocol can reach fidelities over in less than iterations. Fig. 3 (b) depicts the evolution of the exploration parameter for each iteration for the same values of constant . We can see from Fig. 3, that when the parameter is small, the fidelity between and increases quickly (the learning speed increases), requiring less iterations to reach high fidelities, however, the maximum value of the average fidelity (maximum learning) is smaller than when increases. This means that small changes in the scan parameter (large ) result in a higher but slower learning.

## Iii Multilevel protocol

In this section, we extend the previous protocol to the case where , , and are described by one -dimensional qudit state. One of the ingredients in the qubit case is the CNOT gate. Here, we use the extension of the CNOT gate to multilevel states, also known as the XOR gate Alber et al. (2000) (). The action of this gate is given by

 UXORa,b|j⟩a|k⟩b=|j⟩a|j⊖k⟩b, (11)

where the index () refers to control(target) state, respectively, and denotes the difference modulo , with the dimension of each subsystem. The CNOT gate has two important properties, namely, (i) is Hermitian, and (ii) if and only if . These two properties are maintained in the XOR gate defined in Eq. (11). The Policy and VF are essentially the same than in previous case, but now we consider the multiple outcomes () that result of measuring . First, we introduce for the definition of and in Eq. (7). As in the previous case, we assume the initial state of to be , while is initialized in . Moreover, the state of is arbitrary, and expressed as , where , and is the dimension of . We can rewrite in a more convenient way as

 |E⟩E=cos(θ(1)/2)|¯0(1)⟩E+eiϕ(1)sin(θ(1)/2)|¯0(1)⊥⟩E, (12)

where , is the orthogonal component to , and . Subsequently, we perform the XOR gate obtaining

 |Φ0⟩= UXORE,R|0⟩A|0⟩R|E⟩E (13) =|0⟩A[cos(θ(1)/2)|0⟩R|0⟩E+eiϕ(1)sin(θ(1)/2)|ϰ⟩R,E],

with . As in the previous case, we measure , but now we have multiple outcomes. Therefore, we separated them in two groups. First, the outcome with probability . Second, outcomes with , and probability to obtain any of them of . As in the previous case, this means that either we measure in the state of or in the orthogonal subspace. With this information, we perform a partially-random unitary operation on the agent , using the definition (7) with , where is the outcome of the measurement. If , then . The random angles and are defined as in the qubit case. Now, the RF changes slightly and is given by

 Δ(j)=[δm(j−1),0R+(1−δm(j−1),0)P]Δ(j−1), (14)

where is the delta function. Equation (14) means that if we measure in , the value of decreases for the next iteration, and if we measure with , increases. Remember that and . As in the qubit case, the RF is binary, since all the results with are equally non-beneficial, so we give the same punishment to the agent. For this reason we use the same policy than in the qubit protocol for the case of multiple levels. As in the case of a single qubit state, the parameter plays a fundamental role in the learning process by handling the speed of learning and the maximum learning as we will understand in what follows.

Consider the protocol for three different multilevel examples for the state. First, we consider a total random state with of the form

 |E⟩E=1N10∑k=0ck|k⟩E,ck=a+ib, (15)

where are random numbers, and is a normalization factor. Figure 4 shows the numerical calculations for this case, where (a) gives the average fidelity for initial states given by Eq. (15), and (b) the evolution of in each iteration. It also shows how this exploration parameter is reduced when the fidelity between and grows (increasing the exploitation). We can see from Fig. 4 (a) that the protocol can reach mean fidelities of with about iterations, or, equivalently, the protocol increases the mean fidelity between and in about using iterations.

Second, consider the protocol for the coherent state defined by

 |α⟩=e−|α|2/2∞∑n=0αn√n!|n⟩ (16)

for this case we use , with and positive real random numbers smaller than . As , we can truncate the sum (16) to , since the probabilities to obtain with are bounded by . Figure 5 (a) shows the fidelity between and for each iteration, reaching values of in less than iterations. Figure 5 (b) depicts the value of in this process. We can also observe that the exploration is reduced when approaches (increasing the exploitation), as in the previous case.

Finally, we consider two quantum states, a cat state of the form

 |E⟩E=√1Nα(|α⟩+|−α⟩) (17)

where is given by Eq. (16) and is a normalization factor. Additionally, we study the superposition

 |E⟩E=√12(|0⟩+|n⟩) (18)

with . Figure 6 (a) shows the calculation for cat states (17). In this case, we reach fidelities over in about measurements. Moreover, Fig. 6 (b) shows results similar to the qubit case given by Fig. (3), surpassing fidelities of in less than iterations. The last figure reflects the fact that for the state in Eq. (18), the protocol is reduced to the qubit case, given that only two states are involved in the superposition. Thus, all states of the form in Eq. (18) have the same performance as the qubit case.

We can see from Figs. 4, 5 and 6, that the learning speed is inversely proportional to the parameter , which means that a small value of implies a rapid increase in fidelity between and , that is, it increases the speed of learning. On the other hand, the maximum learning is also directly proportional to , in other words, a small value of means lower maximum fidelities between and . It is pertinent to emphasize that our protocol for qubit and multilevel cases employs two-level operators , and each iteration only needs to calculate the operator . Hence, the protocol does not need to store the complete agent history, which is an advantage in terms of the required resources.

This protocol can be implemented in any platform that enables the logical operator for qubits, or for qudits, and digital feedback loops, as is the case of circuit quantum electrodynamics (cQEDs). This platform takes particular relevance due to its fast development in quantum computation Blais et al. (2004); Devoret et al. (2004); Hofheinz et al. (2008, 2009); DiCarlo et al. (2009); Devoret and Schoelkopf (2013); Otterbach et al. (2017). Current technology in cQEDs allows for digital quantum feedback loops with elapsed times about and fidelities around  Ristè et al. (2012a, b), well-controlled one and two-qubits gates with fidelities over in less than  Barends et al. (2016), with qubits with coherence times about  Paik et al. (2011); Rigetti et al. (2012). This allows for more than iterations of our protocol, a sufficient number for a feasible implementation. Aditionally, in the last decade, multilevel gates have been theoretically proposed Strauch (2011); Mischuck and Mølmer (2013); Kiktenko et al. (2015), as well as efficient multiqubit gates have recently been proposed using a ML approach Zahedinejad et al. (2015, 2016) providing all the necessary elements for the experimental implementation of the general framework of this learning protocol.

## Iv Conclusions

We propose and analyse a quantum reinforcement learning protocol to adapt a quantum state (the agent) to another, unknown, quantum state (the environment), in the context where several identical copies of the unknown state are available. The main goal of our proposal is for the agent to acquire information about the environment in a semi-autonomous way, namely, in the reinforcement learning spirit. We show that the fidelity increases rapidly with the number of iterations, reaching for qubit states average fidelities over with less than measurements. Also, for states with dimension , we obtain average fidelity over for , with about measurements. The performance is improved for special cases such as coherent states (average fidelities of with less than iterations), cat states (average fidelities of with about iterations) and states of the form (average fidelities of with less than iterations).

The performance of the protocol is handled by the value of the parameter and by the number of states involved in the superposition of the environment state, , in the measurement basis. For a small we get a high learning speed and a reduced maximum learning. Moreover, the number of states in the superposition is related to the overall performance of the protocol, that is, a superposition of fewer terms provides better performance, which increases learning speed as well as maximum learning, requiring less iterations to obtain high fidelity. These two facts imply that a possible improvement of the protocol can be achieved by using a dynamic parameter and a measurement device that can change its measurement basis throughout the protocol to reduce the number of states involved in the overlap of the state of . Besides, since our protocol increases the fidelity with a small number of iterations, it is useful even when the number of copies of is limited. Finally, this protocol opens up the door to the implementation of semi-autonomous quantum reinforcement learning, a next step for achieving quantum artificial life.

The authors acknowledge support from CONICYT Doctorado Nacional 21140432, Dirección de Postgrado USACH, FONDECYT Grant No. 1140194, Ramón y Cajal Grant RYC-2012-11391, MINECO/FEDER FIS2015-69983-P and Basque Government IT986-16.

### References

1. S. Russell  and P. Norvig, Artificial Intelligence: A modern approach (Prentice hall, 1995).
2. R. S. Michalski, J. G. Carbonell,  and T. M. Mitchell, Machine learning: An artificial intelligence approach (Springer Science & Business Media, 2013).
3. M. I. Jordan and T. M. Mitchell, Science 349, 255 (2015).
4. M. Kawagoe and A. Tojo, Pattern Recognition 17, 295 (1984).
5. R. V. Shannon, F.-G. Zeng, V. Kamath, J. Wygonski,  and M. Ekelid, Science 270, 303 (1995).
6. A. K. Jain, Nature 449, 38 (2007).
7. J. Carrasquilla,  and R. G. Melko, Nat. Phys. 13, 431 (2017).
8. A. Fahad, N. Alshatri, Z. Tari, A. Alamri, I. Khalil, A. Y. Zomaya, S. Foufou,  and A. Bouras, IEEE Transactions on Emerging Topics in Computing 2, 267 (2014).
9. P. Baldi, P. Sadowski,  and D. Whiteson, Nat. Comm. 5, 4308 (2014).
10. R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction (MIT press Cambridge, 1998).
11. M. L. Littman, Nature 521, 445 (2015).
12. D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton, Y. Chen, T. Lillicrap, F. Hui, L. Sifre, G. van den Driessche, T. Graepel,  and D. Hassabis, Nature 550, 354 (2017a).
13. D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel, et al.arXiv preprint arXiv:1712.01815  (2017b).
14. M. A. Nielsen and I. L. Chuang, Quantum computation and quantum information (Cambridge University Press, Cambridge, UK, 2010).
15. M. Schuld, I. Sinayskiy,  and F. Petruccione, Contemporary Physics 56, 172 (2015).
16. J. Adcock, E. Allen, M. Day, S. Frick, J. Hinchliff, M. Johnson, S. Morley-Short, S. Pallister, A. Price,  and S. Stanisic, arXiv preprint arXiv:1512.02900  (2015).
17. V. Dunjko, J. M. Taylor,  and H. J. Briegel, Phys. Rev. Lett. 117, 130501 (2014).
18. V. Dunjko and H. J. Briegel, arXiv preprint arXiv:1709.02779  (2017).
19. J. Biamonte, P. Wittek, N. Pancotti, P. Rebentrost, N. Wiebe,  and S. Lloyd, Nature 549, 195 (2017).
20. R. Biswas, Z. Jiang, K. Kechezhi, S. Knysh, S. Mandrà , B. O’Gorman, A. Perdomo-Ortiz, A. Petukhov, J. Realpe-Gómez, E. Rieffel, D. Venturelli, F. Vasko,  and Z. Wang, Parallel Computing 64, 81 (2017), high-End Computing for Next-Generation Scientific Discovery.
21. A. Perdomo-Ortiz, M. Benedetti, J. Realpe-Gómez,  and R. Biswas, arXiv preprint arXiv:1708.09757  (2017a).
22. A. Perdomo-Ortiz, A. Feldman, A. Ozaeta, S. V. Isakov, Z. Zhu, B. O’Gorman, H. G. Katzgraber, A. Diedrich, H. Neven, J. de Kleer, et al.arXiv preprint arXiv:1708.09780  (2017b).
23. M. Sasaki and A. Carlini, Phys. Rev. A 66, 022303 (2002).
24. S. Lloyd, M. Mohseni,  and P. Rebentrost, arXiv preprint arXiv:1307.0411  (2013).
25. M. Benedetti, J. Realpe-Gómez, R. Biswas,  and A. Perdomo-Ortiz, Phys. Rev. X 7, 041052 (2017a).
26. M. Benedetti, J. Realpe-Gómez,  and A. Perdomo-Ortiz, arXiv preprint arXiv:1708.09784  (2017b).
27. E. Aïmeur, G. Brassard,  and S. Gambs, Machine Learning 90, 261 (2013).
28. G. D. Paparo, V. Dunjko, A. Makmal, M. A. Martin-Delgado,  and H. J. Briegel, Phys. Rev. X 4, 031002 (2014).
29. L. Lamata, Scientific Reports 7, 1609 (2017).
30. F. Cárdenas-López, L. Lamata, J. C. Retamal,  and E. Solano, arXiv preprint arXiv:1709.07848  (2017).
31. U. Alvarez-Rodriguez, L. Lamata, P. Escandell-Montero, J. D. Martín-Guerrero,  and E. Solano, Scientific Reports 7, 13645 (2017).
32. R. B. A. Adamson and A. M. Steinberg, Phys. Rev. Lett. 105, 030406 (2010).
33. H. Sosa-Martinez, N. K. Lysne, C. H. Baldwin, A. Kalev, I. H. Deutsch,  and P. S. Jessen, Phys. Rev. Lett. 119, 150401 (2017).
34. A. Lumino, E. Polino, A. S. Rab, G. Milani, N. Spagnolo, N. Wiebe,  and F. Sciarrino, arXiv preprint arXiv:1712.07570v1  (2017).
35. A. Rocchetto, S. Aaronson, S. Severini, G. Carvacho, D. Poderini, I. Agresti, M. Bentivegna,  and F. Sciarrino, arXiv preprint arXiv:1712.00127  (2017).
36. G. Torlai, G. Mazzola, J. Carrasquilla, M. Troyer, R. Melko,  and G. Carleo, arXiv preprint arXiv:1703.05334  (2017).
37. L.-L. Sun, Y. Mao, F.-L. Xiong, S. Yu,  and Z.-B. Chen, arXiv preprint arXiv:1802.00140  (2018).
38. L. Roa, A. Delgado, M. L. Ladrón de Guevara,  and A. B. Klimov, Phys. Rev. A 73, 012322 (2006).
39. G. Alber, A. Delgado, N. Gisin,  and I. Jex, arXiv preprint quant-ph/0008022  (2000).
40. A. Blais, R.-S. Huang, A. Wallraff, S. M. Girvin,  and R. J. Schoelkopf, Phys. Rev. A 69, 062320 (2004).
41. M. H. Devoret, A. Wallraff,  and J. M. Martinis, arXiv preprint cond-mat/0411174  (2004).
42. M. Hofheinz, E. M. Weig, M. Ansmann, R. C. Bialczak, E. Lucero, M. Neeley, A. D. O’Connell, H. Wang, J. M. Martinis,  and A. N. Cleland, Nature 454, 310 (2008).
43. M. Hofheinz, H. Wang, M. Ansmann, R. C. Bialczak, E. Lucero, M. Neeley, A. D. O’Connell, D. Sank, J. Wenner, J. M. Martinis,  and A. N. Cleland, Nature 459, 546 (2009).
44. L. DiCarlo, J. M. Chow, J. M. Gambetta, L. S. Bishop, B. R. Johnson, D. I. Schuster, J. Majer, A. Blais, L. Frunzio, S. M. Girvin,  and R. J. Schoelkopf, Nature 460, 240 (2009).
45. M. H. Devoret and R. J. Schoelkopf, Science 339, 1169 (2013).
46. J. Otterbach, R. Manenti, N. Alidoust, A. Bestwick, M. Block, B. Bloom, S. Caldwell, N. Didier, E. S. Fried, S. Hong, et al.arXiv preprint arXiv:1712.05771  (2017).
47. D. Ristè, J. G. van Leeuwen, H.-S. Ku, K. W. Lehnert,  and L. DiCarlo, Phys. Rev. Lett. 109, 050507 (2012a).
48. D. Ristè, C. C. Bultink, K. W. Lehnert,  and L. DiCarlo, Phys. Rev. Lett. 109, 240502 (2012b).
49. R. Barends, A. Shabani, L. Lamata, J. Kelly, A. Mezzacapo, U. Las Heras, R. Babbush, A. Fowler, B. Campbell, Y. Chen, Z. Chen, B. Chiaro, A. Dunsworth, E. Jeffrey, E. Lucero, A. Megrant, J. Y. Mutus, M. Neeley, C. Neil, P. J. J. O’Malley, C. Quintana, P. Roushan, D. Sank, A. Vainsencher, J. Wenner, T. C. White, E. Solano, H. Neven,  and J. M. Martinis, Nature 534, 222 (2016).
50. H. Paik, D. I. Schuster, L. S. Bishop, G. Kirchmair, G. Catelani, A. P. Sears, B. R. Johnson, M. J. Reagor, L. Frunzio, L. I. Glazman, S. M. Girvin, M. H. Devoret,  and R. J. Schoelkopf, Phys. Rev. Lett. 107, 240501 (2011).
51. C. Rigetti, J. M. Gambetta, S. Poletto, B. L. T. Plourde, J. M. Chow, A. D. Córcoles, J. A. Smolin, S. T. Merkel, J. R. Rozen, G. A. Keefe, M. B. Rothwell, M. B. Ketchen,  and M. Steffen, Phys. Rev. B 86, 100506 (2012).
52. F. W. Strauch, Phys. Rev. A 84, 052313 (2011).
53. B. Mischuck and K. Mølmer, Phys. Rev. A 87, 022341 (2013).
54. E. O. Kiktenko, A. K. Fedorov, O. V. Man’ko,  and V. I. Man’ko, Phys. Rev. A 91, 042312 (2015).
55. E. Zahedinejad, J. Ghosh,  and B. C. Sanders, Phys. Rev. Lett. 114, 200502 (2015).
56. E. Zahedinejad, J. Ghosh,  and B. C. Sanders, Phys. Rev. Applied 6, 054005 (2016).
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters