Noise-tolerant parity learning with one quantum bit
Demonstrating quantum advantage with less powerful but more realistic devices is of great importance in modern quantum information science. Recently, a significant quantum speedup was achieved in the problem of learning a hidden parity function with noise. However, if all data qubits at the query output are completely depolarized, the algorithm fails. In this work, we present a new quantum parity learning algorithm with which the quantum advantage is retained as long as one qubit in the system has non-zero polarization. In this scenario, the quantum parity learning naturally becomes deterministic quantum computation with one qubit. Then the hidden parity function can be revealed by performing a set of operations that can be interpreted as measuring non-local observables on the auxiliary result qubit having non-zero polarization and each data qubit. We also discuss the source of the quantum advantage in our algorithm from the resource-theoretic point of view.
Experimental realizations of quantum information processing (QIP) have made impressive progress in the past years Nigg302 (); NVQEC (); SCQEC (); NMR2017 (). Nonetheless, a scalable architecture capable of universal and reliable quantum computation is still far within reach. While the development of such quantum computers is pursued, identifying well-defined computational tasks for which less powerful and less challenging devices (for example, sub-universal, without quantum error correction, etc.) can still outperform classical counterparts is of fundamental importance.
One interesting family of problems for which near-term quantum devices can exhibit considerable advantages is machine learning. In particular, the quantum speedup is demonstrated in the problem of learning a hidden parity function defined by the unknown binary string in the presence of noise (LPN). The LPN problem is thought to be computationally intractable classically Angluin1988 (); Blum2003 (); Lyubashevsky2005 (); Levieil2006 (), and hence cryptographic applications have been suggested based on this problem Regev:2005 (); Pietrzak2012 (). In the quantum setting, all possible input binary strings are encoded in the data qubits for parallel processing, and the outcome of the function is encoded in the auxiliary result qubit. Then the quantum learner with the ability to coherently rotate all qubits before the readout can solve the LPN problem in logarithmic time LPNTheory (). However, the number of required queries diverges as the noise (depolarizing) rate increases, and the learning becomes impossible if the final state of the data qubits is maximally mixed LPNTheory (); LPNexp (). This result intuitively makes sense since measuring the maximally mixed state outputs completely random bits. Hence the parity function can only be guessed with success probability decreasing exponentially with the size of the problem in both classical and quantum settings.
In this work, we present a new protocol with which the hidden bit string of the parity function can be learned efficiently even if all data qubits are completely depolarized, provided that the result qubit has non-zero polarization. Under aforementioned conditions, the learning algorithm can naturally become deterministic quantum computation with one quantum bit (DQC1) DQC1PhysRevLett.81.5672 (). Then the expectation value measurement on the result qubit allows for efficient evaluation of the normalized trace of the unitary gate that represents the hidden parity function. However, this unitary operator is traceless as long as at least one element of the hidden bit string is 1, and therefore the naive application of the DQC1 protocol does not help. Thus, we modify the original quantum LPN algorithm by adding a set of operations that can be understood as performing non-local measurements between each data qubit and the result qubit. With this change, the normalized trace is non-zero if the hidden bit encoded in the data qubit is 0, and zero if it is 1. Therefore, the parity function can be learned using the number of queries that grows only linearly with the length of the hidden bit string. This counter-intuitive result shows that the robustness of the quantum parity learning against decoherence is retained via the power of one quantum bit. The quantum advantage is achieved without any entanglement between the data qubits and the result qubit bipartition, and this brings up an interesting question: what is the quantum resource that empowers the learning protocol? We conjecture that the inherent ability of the DQC1 model to discriminate the coherence consumption, which results in producing non-classical correlation, lies at the center of our learning algorithm.
The remainder of the paper is organized as follows. Section II briefly reviews LPN and DQC1, the topics that have been studied extensively by numerous authors. In Sec. III.1 we describe the equivalence of the quantum parity learning circuit and the DQC1 circuit when the data output is in the maximally mixed state. The new DQC1 algorithm for solving the LPN problem is presented in Sec. III.2 including the discussion on the computation efficiency. The effect of errors at various locations in the DQC1 LPN protocol is discussed in Sec. III.3. Section IV discusses the origin of the quantum advantage in our learning algorithm, and Sec. V concludes.
Here we briefly summarize the work presented in Ref. LPNTheory (). In the parity learning problem, an oracle generates a uniformly random input and computes a Boolean function defined by a hidden bit string
where is the th bit of . A query to the oracle returns , and a learner tries to reconstruct by repeating the query. In the presence of noise, the learner obtains , where has Bernoulli distribution with parameter , i.e. Pr Angluin1988 (); Blum2003 (); Lyubashevsky2005 (). The LPN problem is equivalent to decoding a random binary linear code in the presence of random noise Lyubashevsky2005 ().
In the quantum setting, the learner has access to a quantum oracle which implements a unitary transformation on the computational basis states and returns the equal superposition of for all possible inputs . The subscript indicates the result (data) qubit. At the query output, the learner applies Hadamard gates to all qubits to create an entangled state
Thus, whenever the result qubit is (occurs with probability ), measuring data qubits in their computational bases reveals . The quantum version of the parity learning is depicted in Fig. 1. In this example, is , and it is encoded via a series of controlled-not (CNOT) gates targeting the result qubit conditioned on the data qubits. The gray box in the figure emphasizes that the learner does not have a priori knowledge about this part.
A noisy quantum oracle can be modelled with depolarizing channel acting independently on all qubits with a constant known noise rate .
Learning from the noiseless oracle is tractable for both quantum and classical learners. However, the quantum algorithm prevails when the noise is introduced. The best known classical algorithms for LPN has superpolynomial complexity in Angluin1988 (); Blum2003 (); Lyubashevsky2005 (); Levieil2006 (), while the quantum learning based on the bit-wise majority vote requires queries and running time (gates and measurements) LPNTheory (). This result contradicts the widely accepted idea that quantum computers are intrinsically more vulnerable to error than classical computers. The quantum LPN was realized experimentally with superconducting qubits in Ref. LPNexp ().
On the other hand, in terms of the noise strength, the query complexity of the quantum algorithm is LPNTheory (). The experimental results in Ref. LPNexp () also show that the number of queries diverges as . This is evident since maximally mixed states at the query output does not provide any knowledge about . In fact, in the learning algorithm discussed thus far, for each completely depolarized data qubit, the probability of finding exactly is reduced by . Repeating the query does not improve the success probability since the outcome is uniformly random every time. Therefore, under such noise model, the learner can only guess out of possibilities.
DQC1 is a sub-universal quantum computation model to which one probe qubit with non-zero polarization , bits in a maximally mixed state, an arbitrary unitary transformation, and the expectation measurement of the Pauli operator () on the probe qubit are available DQC1PhysRevLett.81.5672 (). Though weaker than standard universal quantum computers, it still offers efficient solutions to some problems that are classically intractable DQC1PhysRevLett.81.5672 (); PhysRevA.72.042316 (); DQC1complexity (). In particular, DQC1 can be employed to efficiently estimate the normalized trace of an -qubit unitary operator , provided that can be implemented with elementary quantum gates. In the trace evaluation protocol, a Hadamard gate on the probe qubit is followed by the controlled-unitary , where is the identity matrix. These operations transform the input state to
The traceless part that deviates from the identity is called the deviation density matrix, and only this part returns non-zero expectation values in DQC1. Measuring the expectation of or on the probe qubit ends the protocol, and
Repeating the protocol times allows for estimating the expectation values to within with probability of error estimate ().
Iii Parity learning with fully depolarizing noise
iii.1 From LPN to DQC1
The LPN algorithm fails if the noise completely randomizes the output. However, if the result qubit is alive with some polarization then can anything about the overall evolution be inferred from measuring the result qubit alone? This situation resembles the DQC1 model in which the probe qubit carries the information about the trace of the unitary operator applied to the completely mixed state. Indeed, using , the quantum circuit for the parity learning (Fig. 1) with completely depolarizing noise on the data qubits can be converted to the DQC1 circuit as depicted in Fig. 2.
Without loss of generality, the data qubits are supplied in the completely mixed state as input. The result qubit can be initialized with some error, but it should possess non-zero polarization. Since measuring the fully depolarized data qubits is redundant, only the result qubit is measured. Then by measuring the expectations of and , the normalized trace of the unitary matrix that implements the hidden parity function can be evaluated. The depolarizing noise at the result qubit ouput scales the expectation values by a factor of :
where is the unitary implementation of the hidden parity function acting on the data qubits. This is easy to verify using the Kraus representation of the depolarizing channel and the cyclic property of the trace. Hence as long as , the normalized trace can be estimated with high accuracy using number of repetitions . Equation 5 shows that some information about the hidden function can be contained in the coherent basis of the result qubit. Yet the trace of the hidden unitary matrix does not provide any useful knowledge about since it is zero for all except when it is uniformly 0.
In the following, we present a new strategy for finding from the trace estimation results.
iii.2 Solving LPN using DQC1
The quantum learner with an access to the DQC1 LPN circuit (Fig. 2) has the ability to implement additional quantum gates after the unknown unitary operation. If a rotation controlled by the result qubit is applied equally to all data qubits after the hidden parity function, the trace of the total unitary operator becomes
where is the number of CNOT operators implemented in the hidden parity function, i.e. the number of 1’s in . Now, if the rotation on one of the data qubits is undone by another controlled-rotation , then the normalized trace of the total unitary operator becomes
Here represents a coherent rotation uniformly applied to all qubits except the th qubit, i.e.,
and the superscript indicates that the Pauli operator is acting on the th data qubit while the identity operator acts on the rest. We use to represent the result qubit for clarity when needed. For , the DQC1 protocol can resolve whether the hidden bit encoded in the th data qubit is 0 or 1; the trace estimation returns a non-zero value if , and zero if . The quantum circuit for finding the value of is shown in Fig. 3.
The deviation density matrix at the end of the protocol can be written as
where . Then the expectation measurement on the result qubit can be expressed as
Consequently, the measurement outcome can be interpreted as the sum of two expectations determined from different deviation density matrices, and one of them (second line in Eq. III.2) corresponds to measuring the non-local observable on the result qubit and the th data qubit. This non-local contribution to the measurement extracts the information about the bit value hidden in the th qubit.
To optimally distinguish the normalized traces (the difference is denoted as ) without knowing , the rotation angle should be chosen as (or an odd-integer multiple of it). Then . Once is revealed, the th data qubit can be decoupled from the result qubit by applying the inverse of the unitary operator that encodes . Then in the subsequent run, the controlled-rotation is applied only to the remaining data qubits. This rotation can be expressed as
This extra procedure increases by a factor of for each decoupled data qubit, i.e. , and can reduce the computational overhead accordingly.
With these results, the full learning algorithm can be stated as follows.
Given a DQC1 circuit with the hidden unitary operator controlled-, for , do:
Apply the controlled-rotation to data qubits with the result qubit as the control.
Measure and . Repeat until desired accuracy is reached.
If =0, record . Otherwise, record .
If =0, apply a bit-flip () gate to the th data qubit controlled by the result qubit. Otherwise, do nothing.
Increment and go to step 1.
Until the first non-zero value is detected, both and must be measured because can be either real or imaginary depending on . However, once the non-zero trace is found, only one of them is needed to be measured in subsequent runs.
The number of queries required for estimating within with probability of error is , assuming an ensemble of quantum systems (e.g. spin-1/2 nuclei) encodes the result qubit. In order to identify whether is 0 or 1 with high certainty, must be satisfied. On the other hand, decreases exponentially in . Thus, the learning seems too expensive (though not impossible as in the classical case), especially when and for . However, for ensemble quantum computing models such as those based on nuclear magnetic resonance, . This means that for about , and the learning algorithm is efficient. For the hidden bit string beyond this length, the size of the ensemble should increase exponentially to maintain the efficiency in the number of queries.
iii.3 Error Analysis
The depolarizing noise (or any Pauli errors) on the result qubit anywhere during the protocol can be treated as either the initialization error that reduces or the measurement error that increases . Errors on the data qubits before the realization of the hidden function does not have any effect since all data qubits are completely mixed, as long as the noisy channel is unital. Also, errors on the data qubits after the controlled- are irrelevant since only the result qubit is detected. In contrast, for , a phase-flip () error that occurs on a data qubit in between the CNOT and the controlled- can propagate to the result qubit. Then the propagated error can be treated as an error in the state preparation or in the measurement. Because of the properties and , two quantum circuits shown in Fig. 4 are equivalent. This shows that a single phase-flip error ( in Fig. 4) that occurs on a data qubit results in two errors, a phase-flip and a bit-flip ( in Fig. 4) on the input state of the data and the result qubits, respectively.
Now suppose that the phase-flip error corrupts two data qubits simultaneously at this location. This sends two bit-flip errors to the initial state of the result qubit which cancel each other. Hence, the phase errors that occur simultaneously on an even number of data qubits cancel each other and do not affect the result qubit. For an odd number of the phase errors, only one of them affects the result qubit. Therefore, the depolarizing noise occurring in between the controlled- and the controlled- independently on all data qubits with the error rate results in a bit-flip error with the error rate on the initial state of the result qubit. The initial polarization of the result qubit is multiplied by a factor of .
Systematic errors in the controlled- also affects the result, but not severely. We already mentioned that the algorithm works for all , although ideally should be to minimize the computational overhead. Therefore, the algorithm withstands small amplitude errors. It is also robust to the error in the phase of the rotation. For example, consider the rotation around , where is some axis orthogonal to . Then the normalized trace is multiplied by a factor . In principle, the algorithm can distinguish as long as , but the optimal separation is attained when as chosen in our algorithm.
Iv Quantum Discord and Coherence Consumption
In the preceding, we showed that the learning is enabled by the non-local nature of the measurement embedded in each query. This section further investigates the source of the quantum advantage in our protocol from the resource-theoretic standpoint. According to the results in PhysRevA.95.022330 (); PhysRevA.72.042316 (), the DQC1 circuit cannot generate entanglement at the bipartition split as 1 result qubit and data qubits when . Hereinafter we limit our discussion to correlations that is generated at this result-data bipartition. Clearly, entanglement is not the source of the quantum supremacy in our algorithm. However, non-classical correlation other than entanglement as measured by quantum discord can exist for PhysRevLett.100.050502 (). Quantum discord quantifies the quantumness of correlations based on the entropic measure, and it can be understood as the amount of the disturbance induced to a bipartite quantum system via local measurements discord_vedral (); PhysRevLett.88.017901 (). We examine quantum discord with respect to the measurement on the result qubit in our DQC1 circuit, speculating that it is closely linked to the origin of the quantum advantage. First, the output state in the DQC1 version of the original LPN algorithm (Fig. 2) has zero discord since for all PhysRevLett.105.190502 (). However, discord is generated when the controlled rotation is added. We calculate the amount of discord generated in our modified DQC1 circuit shown in Fig. 3 for various hidden functions, and it is observed to be different depending on . This feature coincides with the dependence of the trace of the total unitary operator on (see Eq. III.2), which plays the central role in our learning algorithm.
The discord is plotted as a function of for some selection of the hidden bit strings in Fig. 5. As the length of increases, the difference of the discords for and becomes smaller, similar to the behavior of . Moreover, the difference of the discords decreases with , consistent with the scaling of the number of queries required in terms of for a fixed accuracy. The inset shows the difference of the discords for and as a function of the controlled-rotation angle when and . The discord contrast with respect to resembles in that it is the largest when is an odd-integer multiple of and vanishes at the integer multiples of as the discord is zero regardless of at these points. Above studies suggest that the presence of non-zero discord and the discord contrast in different DQC1 circuits are crucial for our learning algorithm. Nonetheless, claiming quantum discord as the necessary resource for the DQC1-based binary classification in general is problematic since one can easily come up with two unitary matrices with distinct normalized traces, but do not produce discord when implemented in the DQC1 circuit.
Alternatively, quantum coherence can be regarded as a resource, and it has been rigorously studied within the framework of quantum resource theory recently PhysRevLett.113.140401 (); PhysRevLett.116.120404 (); RevModPhys.89.041003 (). Evidently, the probe qubit must contain some amount of coherence as the minimal requirement for the DQC1 protocol 2058-9565-1-1-01LT01 (). The connection between coherence and discord in DQC1 is established in Ref. PhysRevLett.116.160407 (): the discord produced is upper-bounded by the coherence consumed by the probe qubit. Using the relative entropy of coherence as the quantifier PhysRevLett.113.140401 (), the coherence consumption in each execution of our DQC1 protocol can be expressed as
where is the binary Shannon entropy, and is the normalized trace of the total unitary operator acting on the data qubits controlled by the result qubit. This is monotonically decreasing with respect to for a fixed non-zero . Thus it appears that the DQC1 protocol is inherently capable of quantifying the consumption of the coherent resource supplied by one quantum bit. Furthermore, the magnitude of the partial derivative of with respect to () monotonically increases with the independent variable, meaning that is more sensitive to the changes in () when the independent variable is large. This feature is consistent with the computational complexity of our algorithm. By all means, the notion of the coherence consumption is purely quantum mechanical. Our algorithm is just set up in a way that the coherent resource used up in each query varies with the answer being probed. An interesting open question is whether manipulating the coherence consumption provides quantum advantage in solving problems other than those based on the trace estimation.
With only one quantum bit, an -bit hidden parity function can be identified, a task that is impossible classically. This situation arises when data qubits undergo the completely depolarizing channel in the original quantum LPN algorithm in Ref. LPNTheory (). The 1-qubit LPN algorithm is inspired by the DQC1 model. However, the naive translation of the original LPN algorithm to a DQC1 circuit does not solve the problem since the trace of the unitary matrix that encodes the hidden parity function is zero in instances. To circumvent the issue we introduced controlled uniform rotations so that the trace is either zero or non-zero depending on the hidden bit value encoded in the data qubit being probed. The additional operation can be viewed as if the final outcome includes the non-local measurement result between the result and the data qubit. The mere existence of non-zero quantum discord between the result and data qubits does not permit the learning. Instead, we conjectured that the discord contrast or, more fundamentally, the coherence consumption contrast is essential for the quantum advantage in our algorithm.
While efforts towards building standard quantum computers that fulfill what the theory of QIP promises continue, exploring weaker but more realistic quantum devices to solve interesting but classically hard problems is imperative. The LPN problem is one such problem in which the noisy quantum machine can shine. For the LPN problem, the ability to manipulate and measure the coherence consumed by one quantum bit suffices to demonstrate the quantum supremacy. This also motivates future studies on whether similar strategies can be utilized in the near-term quantum devices to perform other well-defined computational tasks beyond classical capabilities, and how much, if any, improvement can be achieved by utilizing coherence from more than one qubit.
Acknowledgements.We thank Sumin Lim for helpful discussions. This research was supported by the National Research Foundation of Korea (Grant no. NRF-2015R1A2A2A01006251, NRF-2016R1A5A1008184).
- (1) D. Nigg, M. Müller, E. A. Martinez, P. Schindler, M. Hennrich, T. Monz, M. A. Martin-Delgado, and R. Blatt, Science 345, 302 (2014).
- (2) T. H. Taminiau, J. Cramer, T. van der Sar, V. V. Dobrovitski, and R. Hanson, Nat. Nanotechnol.9, 171 (2014).
- (3) J. Kelly, R. Barends, A. G. Fowler, A. Megrant, E. Jeffrey, T. C. White, D. Sank, J. Y. Mutus, B. Campbell, Y. Chen, Z. Chen, B. Chiaro, A. Dunsworth, I. C. Hoi, C. Neill, P. J. J. O’Malley, C. Quintana, P. Roushan, A. Vainsencher, J. Wenner, A. N. Cleland, and J. M. Martinis, Nature 519, 66 (2015).
- (4) D. Lu, K. Li, J. Li, H. Katiyar, A. J. Park, G. Feng, T. Xin, H. Li, G. Long, A. Brodutch, J. Baugh, B. Zeng, and R. Laflamme, npj Quantum Info. 3, 45 (2017).
- (5) D. Angluin and P. Laird, Mach. Learn. 2, 343 (1988).
- (6) A. Blum, A. Kalai, and H. Wasserman, J. ACM 50, 506 (2003).
- (7) V. Lyubashevsky, “The Parity Problem in the Presence of Noise, Decoding Random Linear Codes, and the Subset Sum Problem,” in Approximation, Randomization and Combinatorial Optimization. Algorithms and Techniques, Lecture Notes in Computer Science, Vol. 3624 (Springer, Berlin, Heidelberg, 2005) pp. 378–389.
- (8) É. Levieil and P.-A. Fouque, “An Improved LPN Algorithm”, in Security and Cryptography for Networks. SCN 2006, Lecture Notes in Computer Science, Vol. 4116 (Springer, Heidelberg, Berlin, 2006) pp. 348–359.
- (9) O. Regev, in Proceedings of the Thirty-seventh Annual ACM Symposium on Theory of Computing, STOC ’05 (ACM, New York, NY, USA, 2005) pp. 84–93.
- (10) K. Pietrzak, “Cryptography from Learning Parity with Noise,” in Theory and Practice of Computer Science. SOFSEM 2012, Vol. 7147 (Springer, Berlin, Heidelberg, 2012) pp. 99–114.
- (11) A. W. Cross, G. Smith, and J. A. Smolin, Phys. Rev. A 92, 012327 (2015).
- (12) D. Ristè, M. P. da Silva, C. A. Ryan, A. W. Cross, A. D. Córcoles, J. A. Smolin, J. M. Gambetta, J. M. Chow, and B. R. Johnson, npj Quantum Info. 3, 16 (2017).
- (13) E. Knill and R. Laflamme, Phys. Rev. Lett. 81, 5672 (1998).
- (14) A. Datta, S. T. Flammia, and C. M. Caves, Phys. Rev. A 72, 042316 (2005).
- (15) P. W. Shor and S. P. Jordan, Quantum Info. Comput. 8, 681 (2008).
- (16) P. J. Huber, Robust Statistics (Wiley, New York, 1981).
- (17) M. Boyer, A. Brodutch, and T. Mor, Phys. Rev. A 95, 022330 (2017).
- (18) A. Datta, A. Shaji, and C. M. Caves, Phys. Rev. Lett. 100, 050502 (2008).
- (19) L. Henderson and V. Vedral, J. Phys. A: Math. Gen. 34, 6899 (2001).
- (20) H. Ollivier and W. H. Zurek, Phys. Rev. Lett. 88, 017901 (2001).
- (21) B. Dakić, V. Vedral, and Č. Brukner, Phys. Rev. Lett. 105, 190502 (2010).
- (22) T. Baumgratz, M. Cramer, and M. B. Plenio, Phys. Rev. Lett. 113, 140401 (2014).
- (23) A. Winter and D. Yang, Phys. Rev. Lett. 116, 120404 (2016).
- (24) A. Streltsov, G. Adesso, and M. B. Plenio, Rev. Mod. Phys. 89, 041003 (2017).
- (25) J. M. Matera, D. Egloff, N. Killoran, and M. B. Plenio, Quantum Science and Technology 1, 01LT01 (2016).
- (26) J. Ma, B. Yadin, D. Girolami, V. Vedral, and M. Gu, Phys. Rev. Lett. 116 160407 (2016).