Transfer learning in hybrid classicalquantum neural networks
Abstract
We extend the concept of transfer learning, widely applied in modern machine learning algorithms, to the emerging context of hybrid neural networks composed of classical and quantum elements. We propose different implementations of hybrid transfer learning, but we focus mainly on the paradigm in which a pretrained classical network is modified and augmented by a final variational quantum circuit. This approach is particularly attractive in the current era of intermediatescale quantum technology since it allows to optimally preprocess high dimensional data (e.g., images) with any stateoftheart classical network and to embed a select set of highly informative features into a quantum processor. We present several proofofconcept examples of the convenient application of quantum transfer learning for image recognition and quantum state classification. We use the crossplatform software library PennyLane to experimentally test a highresolution image classifier with two different quantum computers, respectively provided by IBM and Rigetti.
I Introduction
Transfer learning is a typical example of an artificial intelligence technique that has been originally inspired by biological intelligence. It originates from the simple observation that the knowledge acquired in a specific context can be transferred to a different area. For example, when we learn a second language we do not start from scratch, but we make use of our previous linguistic knowledge. Sometimes transfer learning is the only way to approach complex cognitive tasks, e.g., before learning quantum mechanics it is advisable to first study linear algebra. This general idea has been successfully applied also to design artificial neural networks Pratt (1993); Pan and Yang (2009); Torrey and Shavlik (2010). It has been shown Raina et al. (2007); Yosinski et al. (2014) that in many situations, instead of training a full network from scratch, it is more efficient to start from a pretrained deep network and then optimize only some of the final layers for a particular task and dataset of interest (see Fig. 1).
The aim of this work is to investigate the potential of the transfer learning paradigm in the context of quantum machine learning Biamonte et al. (2017); Schuld et al. (2015); Dunjko et al. (2016). We focus on hybrid models Farhi and Neven (2018); Schuld and Killoran (2019); McClean et al. (2016), i.e., the scenario in which quantum variational circuits Peruzzo et al. (2014); Schuld et al. (2018); PerdomoOrtiz et al. (2018); McClean et al. (2016); Sim et al. (2019); Killoran et al. (2018) and classical neural networks can be jointly trained to accomplish hard computational tasks. In this setting, in addition to the standard classicaltoclassical (CC) transfer learning strategy in which some preacquired knowledge is transferred between classical networks, three new variants of transfer learning naturally emerge: classical to quantum (CQ), quantum to classical (QC) and quantum to quantum (QQ).
In the current era of Noisy IntermediateScale Quantum (NISQ) devices Preskill (2018), CQ transfer learning is particularly appealing since it opens the possibility to classically preprocess large input samples (e.g., high resolution images) with any stateoftheart deep neural network and to successively manipulate few but highly informative features with a variational quantum circuit. This scheme is quite convenient since it makes use of the power of quantum computers, combined with the successful and welltested methods of classical machine learning. On the other hand, QC and QQ transfer learning might also be very interesting approaches especially once large quantum computers will be available. In this case, fixed quantum circuits might be pretrained as generic quantum feature extractors, mimicking well known classical models which are often used as pretrained blocks: e.g., AlexNet Krizhevsky et al. (2012), ResNet He et al. (2016), Inception Szegedy et al. (2015), VGGNet Simonyan and Zisserman (2014), etc. (for image processing), or ULMFiT Howard and Ruder (2018), Transformer Vaswani et al. (2017), BERT Devlin et al. (2018), etc. (for natural language processing). In summary, such classical stateoftheart deep networks can either be used in CC and CQ transfer learning or replaced by quantum circuits in the QC and QQ variants of the same technique.
Up to now, the transfer learning approach has been largely unexplored in the quantum domain with the exception of a few interesting applications, for example, in modeling manybody quantum systems ChﬂNg et al. (2017); Huembeli et al. (2018); Zen et al. (2019), in the connection of a classical autoencoder to a quantum Boltzmann machine Piat et al. (2018) and in the initialization of variational quantum networks Verdon et al. (2019). With the present work we aim at developing a more general and systematic theory, specifically tailored to the emerging paradigms of variational quantum circuits and hybrid neural networks.
For all the models theoretically proposed in this work, proofofprinciple examples of practical implementations are presented and numerically simulated. Moreover we also experimentally tested one of our models on physical quantum processors—ibmqx4 by IBM and Aspen44QA by Rigetti—demonstrating for the first time the successful classification of high resolution images with a quantum computer.
Ii Hybrid classicalquantum networks
Before presenting the main ideas of this work, we begin by reviewing basic concepts of hybrid networks and introduce some notation.
ii.1 Classical neural networks
A very successful model in classical machine learning is that of deep feedforward neural networks Goodfellow et al. (2016). The elementary block of a deep network is called a layer and maps input vectors of real elements to output vectors of real elements. Its typical structure consists of an affine operation followed by a nonlinear function applied elementwise,
(1) 
Here, the subscript indicates the number of input and output variables, and are the input and output vectors, is an matrix and is a constant vector of elements. The elements of and are arbitrary real parameters (respectively known as weights and baises) which are supposed be trained, i.e., optimized for a particular task. The nonlinear function is quite arbitrary but common choices are the hyperbolic tangent or the rectified linear unit defined as .
A classical deep neural network is the concatenation of many layers, in which the output of the first is the input of the second and so on:
(2) 
where different layers have different weights. Characteristic hyperparameters of a deep network are its depth (number of layers) and the number of features (number of variables) for each layer, i.e., the sequence of integers .
ii.2 Variational quantum circuits
One of the possible quantum generalizations of feedforward neural networks can be given in terms of variational quantum circuits Farhi and Neven (2018); Schuld and Killoran (2019); McClean et al. (2016); Peruzzo et al. (2014); Schuld et al. (2018); PerdomoOrtiz et al. (2018); McClean et al. (2016); Sim et al. (2019); Killoran et al. (2018). Following the analogy with the classical case, one can define a quantum layer as a unitary operation which can be physically realized by a lowdepth variational circuit acting on the input state of quantum subsystems (e.g., qubits or continuous variable modes) and producing the output state :
(3) 
where is an array of classical variational parameters. Examples of quantum layers could be: a sequence of singlequbit rotations followed by a fixed sequence of entangling gates Schuld et al. (2018); Sim et al. (2019) or, for the case of optical modes, some active and passive Gaussian operations followed by singlemode nonGaussian gates Killoran et al. (2018). Notice that, differently from a classical layer, a quantum layer preserves the Hilbertspace dimension of the input states. This fact is due to the fundamental unitary nature of quantum mechanics and, as discussed at the end of this section, should be taken into account when designing quantum networks.
A variational quantum circuit of depth is a concatenation of many quantum layers, corresponding to the product of many unitaries parametrized by different weights:
(4) 
In order to inject classical data in a quantum network we need to embed a real vector into a quantum state . This can also be done by a variational embedding layer depending on and applied to some reference state (e.g., the vacuum or ground state),
(5) 
Typical examples are singlequbit rotations or singlemode displacements parametrized by . Notice that, differently from , the embedding layer is a map from a classical vector space to a quantum Hilbert space.
Conversely, the extraction of a classical output vector from the quantum circuit can be obtained by measuring the expectation values of local observables . We can define this process as a measurement layer, mapping a quantum state to a classical vector:
(6) 
Globally, the full quantum network including the initial embedding layer and the final measurement can be written as
(7) 
The full network is a map from a classical vector space to a classical vector space depending on classical weights. Therefore, even though it may contain a quantum computation hidden in the quantum circuit, if considered from a global point of view, is simply a blackbox analogous to the classical deep network defined in Eq. (2).
However, especially when dealing with real NISQ devices, there are technical limitations and physical constraints which should be taken into account: while in the classical feedforward network of Eq. (2) we have complete freedom in the choice of the number of features for each layer; in the quantum network of Eq. (7) all these numbers are often linked to the size of the physical system. For example, even if not strictly necessary, typical variational embedding layers encode each classical element of into a single subsystem and so, in many practical situations, one has:
(8) 
This common constraint of a variational quantum network could be overcome by:

adding ancillary subsystems and discarding/measuring some of them in the middle of the circuit;

engineering more complex embedding and measuring layers;

adding preprocessing and postprocessing classical layers.
In this work, mainly because of its technical simplicity, we choose the third option and we formalize it through the notion of dressed quantum circuits introduced in the next subsection.
ii.3 Dressed quantum circuits
In order to apply transfer learning at the classicalquantum interface, we need to connect classical neural networks to quantum variational circuits. Since in general the size of the classical and quantum networks can be very different, it is convenient to use a more flexible model of quantum circuits.
Let us consider the variational circuit defined in Eq. (7) and based on subsystems. With the aim of adding some basic preprocessing and postprocessing of the input and output data we place a classical layer at the beginning and at the end of the quantum network, obtaining what we might call a dressed quantum circuit:
(9) 
where is given in Eq. (1) and is the associated bare quantum circuit defined in Eq. (7). Differently from a complex hybrid network in which the computation is shared between cooperating classical and quantum processors, in this case the main computation is performed by the quantum circuit , while the classical layers are mainly responsible for the data embedding and readout. A similar hybrid model was studied in Benedetti et al. (2018), but applied to a generative quantum Helmholtz machine
We can say that from a hardware point of view a dressed quantum circuit is almost equivalent to a bare one. On the other hand, it has two important advantages:

the two classical layers can be trained to optimally perform the embedding of the input data and the postprocessing of the measurement results;

the number of input and output variables are independent from the number of subsystems, allowing for flexible connections to other classical or quantum networks.
Even if our main motivation for introducing the notion of dressed quantum circuits is a smoother implementation of transfer learning schemes, this is also a quite powerful machine learning model in itself and constitutes a nontrivial contribution of this work. In the Examples section, a dressed quantum circuit is successfully applied to the classification of a nonlinear benchmark dataset (2D spirals).
Iii Transfer learning
In this section we discuss the main topic of this work, i.e., the idea of transferring some preacquired “knowledge” between two networks, say from network to network , where each of them could be either classical or quantum.
As discussed in the previous section, if considered as a black box, the global structure of a quantum variational circuit is similar to that of a classical network (see Eqs. (7), (9) and (2)).
For this reason, we are going to define the transfer learning scheme in terms of two generic networks and , independently from their classical or quantum physical nature.
Generic transfer learning scheme (see Fig. 1):

Take a network that has been pretrained on a dataset and for a given task .

Remove some of the final layers. In this way, the resulting truncated network can be used as a feature extractor.

Connect a new trainable network at the end of the pretrained network .

Keep the weights of constant, and train the final block with a new dataset and for a new task of interest .
Following the common convention used in classical machine learning Pratt (1993); Pan and Yang (2009); Torrey and Shavlik (2010); Raina et al. (2007); Yosinski et al. (2014), all situations in which there is a change of dataset and/or a change of the final task can be identified as transfer learning methods. The general intuition behind this training approach is that, even if has been optimized for a specific problem it can still act as a convenient feature extractor also for a different problem. This trick is improved by truncating the final layers of (step 2), since the final activations of a network are usually more tuned to the specific problem, while intermediate features are more generic and so more suitable for transfer learning.
In our hybrid setting, the fact that the networks and can be either classical or quantum gives rise to a rich variety of hybrid transfer learning models summarized in Table 1.
A  B  Transfer learning scheme 

Classical  Classical  CC (Pratt (1993); Pan and Yang (2009); Torrey and Shavlik (2010); Raina et al. (2007); Yosinski et al. (2014)) 
Classical  Quantum  CQ (Examples 2 and 3) 
Quantum  Classical  QC (Example 4) 
Quantum  Quantum  QQ (Example 5) 
For the reader familiar with quantum communication theory, this kind of classification might look similar to that of hybrid channels in which information can be exchanged between quantum and classical systems. Here however there is a fundamental difference: what is actually transferred in this case is not raw information but some more structured and organized learned representations. We expect that the problem of transferring structured knowledge between systems governed by different physical laws (classical/quantum) could stimulate many interesting foundational and philosophical questions. The aim of the present work is however much more pragmatic and consists of studying practical applications of this idea.
iii.1 Classical to quantum transfer learning
As discussed in the introduction, the CQ transfer learning approach is perhaps the most appealing one in the current technological era of NISQ devices. Indeed today we are in a situation in which intermediatescale quantum computers are approaching the quantum supremacy milestone Harrow and Montanaro (2017); Arute (2019) and, at the same time, we have at our disposal the very successful and welltested tools of classical deep learning. The latter are universally recognized as the bestperforming machine learning algorithms, especially for image and text processing.
In this classical field, transfer learning is already a very common approach, thanks to the large zoo of pretrained deep networks which are publicly available Canziani et al. (2016). CQ transfer learning consists of using exactly those classical pretrained models as feature extractors and then postprocessing such features on a quantum computer; for example by using them as input variables for the dressed quantum circuit model introduced in Eq. (9). This hybrid approach is very convenient for processing highresolution images since, in this configuration, a quantum computer is applied only to a fairly limited number of abstract features, which is much more feasible compared to embedding millions of raw pixels in a quantum system. We would like to mention that also other alternative approaches for dealing with large images have been recently proposed Piat et al. (2018); Henderson et al. (2019); Shiba et al. (2019); Liu et al. (2019).
We applied our model for the task of image classification in several numerical examples and we also tested the algorithm with two real quantum computers provided by IBM and Rigetti. All the details about the technical implementation and the associated results are reported in the next Section, Examples 2 and 3.
iii.2 Quantum to classical transfer learning
By switching the roles of the classical and quantum networks, one can also obtain the QC variant of transfer learning. In this case a pretrained quantum system behaves as a kind of feature extractor, i.e., a device performing a (potentially classically intractable) computation resulting in an output vector of numerical values associated to the input. As a second step, a classical network is used to further process the extracted features for the specific problem of interest. This scheme can be very useful in two important situations: if the dataset consists of quantum states (e.g., in a state classification problem), if we have at our disposal a very good quantum computer which outperforms current classical feature extractors at some task.
For case , one can imagine a situation in which a single instance of a variational quantum circuit is first pretrained and then used as a kind of multipurpose measurement device. Indeed one could make many different experimental analyses by simply letting input quantum systems pass through the same fixed circuit and applying different classical machine learning algorithms to the associated measured variables.
For case instead, one can envisage a multiparty scenario in which many classical clients can independently send samples of their specific datasets to a common quantum server which is pretrained to extract generic features by performing a fixed quantum computation. Server can send back the resulting features to the classical clients , which can now locally train their specific machine learning models on preprocessed data.
Given the current status of quantum technology, case is likely beyond a nearterm implementation. On the other hand, case could already represent a realistic scenario with current technology.
In Example 4 of the next Section, we present a proofofconcept example in which a pretrained quantum network introduced in Ref. Killoran et al. (2018) is combined with a classical postprocessing network for solving a quantum state classification problem.
iii.3 Quantum to quantum transfer learning
The last possibility is the QQ transfer learning scheme, where the same technique is applied in a fully quantum mechanical fashion. In this case a quantum network is pretrained for a generic task and dataset. Successively, some of the final quantum layers are removed, and replaced by a trainable quantum network which will be optimized for a specific problem. The main difference from the previous cases is that, since the process is fully quantum without intermediate measurements, features are implicitly transferred in the form of a quantum state, allowing for coherent superpositions.
The main motivation for applying a QQ transfer learning scheme is to reduce the total training time: instead of training a large variational quantum circuit, it is more efficient to initialize it with some pretrained weights and then optimize only a couple of final layers. From a physical point of view, such optimization of the final layers could be interpreted as a change of the measurement basis which is tuned to the specific problem of interest.
If compared with classical computers, current NISQ devices are not only noisy and small: they are also relatively slow. Training a quantum circuit might take a long time since it requires taking many measurement shots (i.e., performing a large number of actual quantum experiments) for each optimization step (e.g., for computing the gradient). Therefore any approach which can reduce the total training time, as for example the QQ transfer learning scheme, could be very helpful.
In Example 5 of the next Section, we trained a quantum state classifier by following a QQ transfer learning approach.
Iv Examples
Example 1  A 2D classifier based on a dressed quantum circuit
This first example demonstrates the dressed quantum circuit model introduced in Eq. (9).
We consider a typical benchmark dataset consisting of two classes of points (blue and red) organized in two concentric spirals as shown in Fig. 2. Each point is characterized by two real coordinates and we assume to have at our disposal a quantum processor of 4 qubits. Since we have two real coordinates as input and two real variables as output (onehot encoding the blue and red classes), we use the following model of a dressed quantum circuit:
(10) 
where represents a classical layer having the structure of Eq. (1) with , is a (bare) variational quantum circuit, and is a linear classical layer without activation i.e., with . The structure of the variational circuit is as in Eq. (7). The chosen embedding map prepares each qubit in a balanced superposition of and and then performs a rotation around the axis of the Bloch sphere parametrized by a classical vector :
(11) 
where is the singlequbit Hadamard gate. The trainable circuit is composed of 5 variational layers , where
(12) 
and is an entangling unitary operation made of three controlled NOT gates:
(13) 
Finally, the measurement layer is simply given by the expectation value of the Pauli matrix, locally estimated for each qubit:
(14) 
Given an input point of coordinates , the classification is done according to , where is the output of the dressed quantum circuit (10).
For training and testing the model, the dataset has been divided into 2000 training points (palecolored in Fig. 2) and 200 test points (sharpcolored in Fig. 2). As typical in classification problems, the cross entropy (implicitly preceded by a LogSoftMax layer) was used as a loss function and minimized via the Adam optimizer Kingma and Ba (2014). A total number of 1000 training iterations were performed, each of them with a batch size of 10 input samples. The numerical simulation was done through the PennyLane software platform Bergholm et al. (2018).
The results are reported in Fig. 2, where the dressed quantum network is also compared with an entirely classical counterpart in which the quantum circuit is replaced by a classical layer, i.e., . The corresponding accuracy, i.e., the fraction of test points correctly classified, is for the dressed quantum circuit and for the classical network.
The presented results suggest that a dressed quantum circuit is a very flexible quantum machine learning model which is capable of classifying highly nonlinear datasets. We would like to remark that the classical counterpart has been presented just as a qualitative benchmark: even if in this particular example the quantum model outperforms the classical one, any general and rigorous comparison would require a much more complex and detailed analysis which is beyond the aim of this work.
Example 2  CQ transfer learning for image classification (ants / bees)
In this second example we apply the classicaltoquantum transfer learning scheme for solving an image classification problem. We first numerically trained and tested the model, using PennyLane with the PyTorch Paszke et al. (2017) interface. Successively, we have also run it on two real quantum devices provided by IBM and Rigetti. To our knowledge, this is the first time that high resolution images have been successfully classified by a quantum computer.
Our example is a quantum model inspired by the official PyTorch tutorial on classical transfer learning 34. The model can be defined in terms of the general CQ scheme proposed in Section III and represented in Fig. 1, with the following specific settings:
 =

ImageNet: a public image dataset with 1000 classes Deng et al. (2009).
 =

RestNet18: a pretrained residual neural network introduced by Microsoft in 2016 He et al. (2016).
 =

Classification (1000 labels).
 =

RestNet18 without the final linear layer, obtaining a pretrained extractor of 512 features.
 =

Images of two classes: ants and bees (Hymenoptera subset of ImageNet), separated into a training set of 245 images and a testing set of 153 images.
 =

: i.e., a 4qubit dressed quantum circuit (9) with 512 input features and 2 real outputs.
 =

Classification (2 labels).
The bare variational circuit is essentially the same as the one used in the previous example (see Eqs. (11,12,13,14)), with the only difference that in this case the quantum depth is set to . The cross entropy is used as a loss function and minimized via the Adam optimizer Kingma and Ba (2014). We trained the variational parameters of the model for epochs over the training dataset, with a batch size of and an initial learning rate of , which was successively reduced by a factor of every 10 epochs. After each epoch, the model was validated with respect to the test dataset, obtaining a maximum accuracy of . A visual representation of a random batch of images sampled from the test dataset and the corresponding predictions is given in Fig. 3.
We also tested the model (with the same pretrained parameters), on two different real quantum computers: the ibmqx4 processor by IBM and the Aspen44QA processor by Rigetti (see Fig. 4). The corresponding classification accuracies, evaluated on the same test dataset, are reported in Table 2.
QPU  Accuracy 

Simulator  
ibmqx4  
Aspen44QA 
Our results demonstrate the promising potential of the CQ transfer learning scheme applied to current NISQ devices, especially in the context of highresolution image processing.
Example 3  CQ transfer learning for image classification (CIFAR)
We now apply the same CQ transfer learning scheme of the previous example but with a different dataset . Instead of classifying images of ants and bees, we use the standard CIFAR10 dataset Krizhevsky and Hinton (2009) restricted to the classes of cats and dogs. Successively we also repeat again the training and testing phases with the CIFAR10 dataset restricted to the classes of planes and cars (see Fig. 5).
We remark that, in both cases, the feature extractor ResNet18 is pretrained on ImageNet. Despite CIFAR10 and ImageNet being quite different datasets (they also have very different resolutions), the CQ transfer learning method achieves nonetheless relatively good results.
ants/bees  dogs/cats  planes/cars  

Quantum depth  6  5  4 
Number of epochs  30  3  3 
Batch size  4  8  8 
Learning rate  0.0004  0.001  0.0007 
Accuracy  0.976  0.8270  0.9605 
Example 4  QC transfer learning for quantum state classification
Quantum to classical (QC) transfer learning consists of using a pretrained quantum circuit as a feature extractor and in postprocessing its output variables with a classical neural network. In this case only the final classical part will be trained to the specific problem of interest.
The starting point of our example is the pretrained continuousvariable quantum network presented in Ref. Killoran et al. (2018), Section IV.D, Experiment C. The original aim of this network was to encode different images, representing the (L,O,T,I,S,J,Z) tetrominos (popularized by the video game Tetris 43), in the Fock basis of twomode quantum states. The expected input of the quantum network is one of the following combinations of twomode coherent states:
(15)  
where the parameter is a fixed constant. In Ref. Killoran et al. (2018) the network was successfully trained to generate an optimal unitary operation , such that the probability of finding photons in the first mode and photons in the second mode is proportional to the amplitude of the image pixel . More precisely, the network was trained to reproduce the tetromino images after projecting the quantum state on the subspace of up to 3 photons (see Fig. 6).
For the purposes of our example, we now assume that the previous input states (15) are subject to random Gaussian displacements in phase space:
(16) 
where is a twomode displacement operator Weedbrook et al. (2012), the values of the complex displacements and are sampled from a symmetric Gaussian distribution with zero mean and quadrature variance , and is the label associated to the input states (15). The noise is similar to a Gaussian additive channel Weedbrook et al. (2012); however, for simplifying the numerical simulation, here we assume that the unknown displacements remain constant during the estimation of expectation values. Physically, this situation might represent a slow phasespace drift of the input light mode.
We also assume that, differently from the original image encoding problem studied in Ref. Killoran et al. (2018), our new task is to classify the noisy input states. In other words, the network should take the states defined in (16) as inputs, and should ideally produce the correct label as output. In order to tackle this problem, we apply a QC transfer learning approach: we preprocess our random input states with the quantum network of Ref. Killoran et al. (2018) and we consider the corresponding images as features which we are going to postprocess with a classical layer to predict the state label . In simple terms, the QC transfer learning method allows us to convert a quantum state classification problem into an image classification problem.
Also in this case we can summarize the transfer learning scheme according to the notation introduced in Section III and represented in Fig. 1:
 =

twomode coherent states defined in Eq. (15).
 =

Photonic neural network introduced in Ref. Killoran et al. (2018), consisting of an encoding layer, variational layers, and a final (Fock) measurement layer.
 =

Fock basis encoding of tetrominos images (see Fig. 6).
 =

Pretrained network , truncated up to a quantum depth of variational layers.
 =

Same states of the original dataset but subject to random phasespace displacements as described in Eq. (16).
 =

: i.e., a classical linear layer having the structure of Eq. (1), without activation ().
Also in this case we used the Adam optimizer Kingma and Ba (2014) to minimize a crossentropy loss function associated to our classification problem. For each optimization step we sampled independent random displacements with variance that we applied to a batch of states defined in Eq. (15). We optimized the model over 1000 training batches with a learning rate of , obtaining a classification accuracy of . The numerical simulation was performed with the Strawberry Fields software platform Killoran et al. (2019), combined with the TensorFlow Abadi et al. (2016) optimization backend.
A summary of the hyperparameters and of the corresponding accuracy is given in Table 4.
QC Classifier  

Quantum depth  15 
Classical depth  1 
Noise variance  0.6 
Training batches  1000 
Batch size  7 
Learning rate  0.01 
Fockspace cutoff  11 
Accuracy  0.803 
Finally, the predictions for a sample of 7 noisy states are graphically visualized in Fig. 7, where the features extracted by the pretrained quantum network are represented as gray scale images. The features of Fig. 7 are quite different from the original tetrominos images shown in Fig. 6. This due to the truncation of network and to the presence of input noise. However, as long as the images of Fig. 7 are distinguishable, this is not a relevant issue since the final classical layer is still able to correctly classify the input states with high accuracy.
We conclude this example with an analysis of the model performance with respect to the values of the quantum and classical depths. Since the original pretrained network has quantum layers, for the truncated network we can choose a quantum depth within the interval 025. For the classical network we consider the cases of and layers, corresponding to the models , and , respectively.
The results are shown in Fig. 8. By direct inspection we can see that increasing the classical depth is helpful but it saturates the accuracy already after two layers. On the other hand, it is evident that the quantum depth has an optimal value around while for larger values the accuracy is reduced. This is a paradigmatic phenomenon well known in classical transfer learning: better features are usually extracted after removing some of the final layers of . Notice that because of the quantum nature of the system, the quantum state produced by the truncated variational circuit could be entangled and/or not aligned with the measurement basis. So the numerical evidence that the truncation of a quantum network does not always reduce the quality of the measured features, but it can actually be a convenient strategy for transfer learning, is a notable result.
Example 5  QQ transfer learning for quantum state classification (Gaussian / nonGaussian)
Finally, our last example is a proofofprinciple demonstration of QQ transfer learning. In this case we train an optical network to classify a particular dataset of Gaussian and nonGaussian quantum states. Successively, we use it as a pretrained block for a dataset consisting of Gaussian and nonGaussian states which are different from those of . The pretrained block is followed by some quantum variational layers that will be trained to classify the quantum states of .
Before presenting our model we need to define a continuousvariable singlemode variational layer, the analog of Eq. (12). We follow the general structure proposed in Ref. Killoran et al. (2018):
(17) 
where is a phase space rotation, is a squeezing operation, is a displacement and is a cubic phase gate. All operations depend on variational parameters and, for sufficiently many layer applications, the model can generate any singlemode unitary operation. Moreover, by simply removing the last nonGaussian gate from (17), we obtain a Gaussian layer which can generate all Gaussian unitary operations.
QQ Classifier  

Depth of  1 
Depth of  3 
Training batches  500 
Batch size  8 
Learning rate  0.01 
Fockspace cutoff  15 
Accuracy  0.869 
We can express the QQ transfer learning model of this example according to the notation introduced in Section III and represented in Fig. 1:
 =

Two classes, 0 and 1, of quantum states generated by two different variational random circuits. States of class 0 are generated by a random singlemode Gaussian layer applied to the vacuum. States of class 1 are generated by a random nonGaussian layer applied to the vacuum.
 =

Singlemode variational quantum layer followed by an on/off threshold detector.
 =

Classification (labels: 0 and 1).
 =

Network without the measurement layer.
 =

Two classes, 0 and 1, of quantum states. States of class 0 are generated by a random singlemode Gaussian layer applied to the coherent state with . States of class 1 are generated by a random Gaussian layer applied to the Fock state .
 =

Singlemode variational quantum circuit of depth , followed by a on/off threshold detector.
A summary of the hyperparameters used for defining and training this QQ model is reported in Table 5, together with the associated accuracy. In Fig. 9 we plot the loss function (cross entropy) of our quantum variational classifier with respect to the number of training iterations. We compare the results obtained with and without the pretrained layer (i.e., with and without transfer learning), for a fixed total depth of 3 or 4 layers. It is clear that the QQ transfer learning approach offers a strong advantage in terms of training efficiency.
For a sufficiently long training time however, the network optimized from scratch achieves the same or better results with respect to the network with a fixed initial layer . This effect is well known also in the classical setting and it is not surprising: the network trained from scratch is in principle more powerful by construction, because it has more variational parameters. However, there are many practical situations in which the training resources are limited (especially when dealing with real NISQ devices) or in which the dataset is experimentally much more expensive with respect to . In all these kind of practically constrained situations, QQ transfer learning could represent a very convenient strategy.
V Conclusions
We have outlined a framework of transfer learning which is applicable to hybrid computational models where variational quantum circuits can be connected to classical neural networks. With respect to the wellstudied classical scenario, in hybrid systems several new and promising opportunities naturally emerge as, for example, the possibility of transferring some preacquired knowledge at the classicalquantum interface (CQ and QC transfer learning) or between two quantum networks (QQ transfer learning). As an additional contribution, we have also introduced the notion of “dressed quantum circuits”, i.e., variational quantum circuits augmented with two trainable classical layers which improve and simplify the data encoding and decoding phases.
Each theoretical idea proposed in this work is supported with a proofofconcept example, numerically demonstrating the validity of our models for practical applications such as image recognition or quantum state classification. Particular focus has been dedicated to the CQ transfer learning scheme because of its promising potential with currently available quantum computers. In particular we have used the CQ transfer learning method to successfully classify high resolution images with two real quantum processors (by IBM and Rigetti).
From our theoretical and experimental analysis, we can conclude that transfer learning is a promising approach, allowing to get performances which can already compete with classical algorithms, despite the early stage of current quantum technology. In the hybrid classicalquantum scenario considered in this work, transfer learning could be a key tool to help observe evidence of a quantum advantage in the near future.
Acknowledgements.
We thank Christian Weedbrook for helpful discussions. The authors would like to thank Rigetti for access to their resources, Forest Smith et al. (2016), QCS and Aspen44QA backend. We also acknowledge the use of the IBM Q Experience, Qiskit et al. . (2019) and IBM Q 5 Tenerife v1.0.0 (ibmqx4) backend.References
 (2016) TensorFlow: largescale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467. Cited by: §IV.
 (2019) Quantum supremacy using a programmable superconducting processor. Nature 574 (7779), pp. 505–510. Cited by: §III.1.
 (2018) Quantumassisted Helmholtz machines: a quantum–classical deep learning framework for industrial datasets in nearterm devices. Quantum Science and Technology 3 (3), pp. 034007. Cited by: §II.3.
 (2018) PennyLane: automatic differentiation of hybrid quantumclassical computations. arXiv preprint arXiv:1811.04968. Cited by: §IV.
 (2017) Quantum machine learning. Nature 549 (7671), pp. 195. Cited by: §I.
 (2016) An analysis of deep neural network models for practical applications. arXiv preprint arXiv:1605.07678. Cited by: §III.1.
 (2017) Machine learning phases of strongly correlated fermions. Physical Review X 7 (3), pp. 031038. Cited by: §I.
 (2009) ImageNet: a largescale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. Cited by: item =.
 (2018) BERT: pretraining of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. Cited by: §I.
 (2016) Quantumenhanced machine learning. Physical Review Letters 117 (13), pp. 130501. Cited by: §I.
 (2019) Qiskit: an opensource framework for quantum computing. Note: \urldoi:10.5281/zenodo.2562110 External Links: Document Cited by: Transfer learning in hybrid classicalquantum neural networks.
 (2018) Classification with quantum neural networks on near term processors. arXiv preprint arXiv:1802.06002. Cited by: §I, §II.2.
 (2016) Deep learning. MIT press. Cited by: §II.1.
 (2017) Quantum computational supremacy. Nature 549 (7671), pp. 203. Cited by: §III.1.
 (2016) Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778. Cited by: §I, Figure 4, item =.
 (2019) Quanvolutional neural networks: powering image recognition with quantum circuits. arXiv preprint arXiv:1904.04767. Cited by: §III.1.
 (2018) Universal language model finetuning for text classification. arXiv preprint arXiv:1801.06146. Cited by: §I.
 (2018) Identifying quantum phase transitions with adversarial neural networks. Physical Review B 97 (13), pp. 134109. Cited by: §I.
 (2018) Continuousvariable quantum neural networks. arXiv preprint arXiv:1806.06871. Cited by: §I, §II.2, §III.2, Figure 6, item =, §IV, §IV, §IV.
 (2019) Strawberry Fields: a software platform for photonic quantum computing. Quantum 3, pp. 129. Cited by: §IV.
 (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §IV, §IV, §IV.
 (2009) Learning multiple layers of features from tiny images. Technical report University of Toronto. Cited by: §IV.
 (2012) Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, pp. 1097–1105. Cited by: §I.
 (2019) Machine learning by unitary tensor network of hierarchical tree structure. New Journal of Physics 21 (7), pp. 073059. Cited by: §III.1.
 (2016) The theory of variational hybrid quantumclassical algorithms. New Journal of Physics 18 (2), pp. 023023. Cited by: §I, §II.2.
 (2009) A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering 22 (10), pp. 1345–1359. Cited by: §I, Table 1, §III.
 (2017) Automatic differentiation in PyTorch. In NIPS Autodiff Workshop, Cited by: §IV.
 (2018) Opportunities and challenges for quantumassisted machine learning in nearterm quantum computers. Quantum Science and Technology 3 (3), pp. 030502. Cited by: §I, §II.2.
 (2014) A variational eigenvalue solver on a photonic quantum processor. Nature Communications 5, pp. 4213. Cited by: §I, §II.2.
 (2018) Image classification with quantum pretraining and autoencoders. International Journal of Quantum Information 16 (08), pp. 1840009. Cited by: §I, §III.1.
 (1993) Discriminabilitybased transfer between neural networks. In Advances in Neural Information Processing Systems, pp. 204–211. Cited by: §I, Table 1, §III.
 (2018) Quantum computing in the NISQ era and beyond. Quantum 2, pp. 79. Cited by: §I.
 (2007) Selftaught learning: transfer learning from unlabeled data. In Proceedings of the 24th International Conference on Machine Learning, pp. 759–766. Cited by: §I, Table 1, §III.
 Sasank Chilamkurthy, PyTorch transfer learning tutorial. Note: \urlhttps://pytorch.org/tutorials/beginner/transfer_learning_tutorial.htmlAccessed: 20190808 Cited by: §IV.
 (2018) Circuitcentric quantum classifiers. arXiv preprint arXiv:1804.00633. Cited by: §I, §II.2.
 (2019) Quantum machine learning in feature Hilbert spaces. Physical Review Letters 122 (4), pp. 040504. Cited by: §I, §II.2.
 (2015) An introduction to quantum machine learning. Contemporary Physics 56 (2), pp. 172–185. Cited by: §I.
 (2019) Convolution filter embedded quantum gate autoencoder. arXiv preprint arXiv:1906.01196. Cited by: §III.1.
 (2019) Expressibility and entangling capability of parameterized quantum circuits for hybrid quantumclassical algorithms. arXiv preprint arXiv:1905.10876. Cited by: §I, §II.2.
 (2014) Very deep convolutional networks for largescale image recognition. arXiv preprint arXiv:1409.1556. Cited by: §I.
 (2016) A practical quantum instruction set architecture. arXiv preprint arXiv:1608.03355. Cited by: Transfer learning in hybrid classicalquantum neural networks.
 (2015) Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9. Cited by: §I.
 Tetris, Wikipedia, 2019.. Note: \urlhttps://en.wikipedia.org/wiki/TetrisAccessed: 20190808 Cited by: §IV.
 (2010) Transfer learning. In Handbook of Research on Machine Learning Applications and Trends: Algorithms, Methods, and Techniques, pp. 242–264. Cited by: §I, Table 1, §III.
 (2017) Attention is all you need. In Advances in Neural Information Processing Systems, pp. 5998–6008. Cited by: §I.
 (2019) Learning to learn with quantum neural networks via classical neural networks. arXiv preprint arXiv:1907.05415. Cited by: §I.
 (2012) Gaussian quantum information. Reviews of Modern Physics 84 (2), pp. 621. Cited by: §IV.
 (2014) How transferable are features in deep neural networks?. In Advances in Neural Information Processing Systems, pp. 3320–3328. Cited by: §I, Table 1, §III.
 (2019) Transfer learning for scalability of neuralnetwork quantum states. arXiv preprint arXiv:1908.09883. Cited by: §I.