Feedback controlled heat transport in quantum devices: Theory and solid state experimental proposal

Feedback controlled heat transport in quantum devices: Theory and solid state experimental proposal

Michele Campisi NEST, Scuola Normale Superiore & Istituto Nanoscienze-CNR, I-56126 Pisa, Italy    Jukka Pekola Low Temperature Laboratory, Department of Applied Physics, Aalto University School of Science, 00076 AALTO, Finland    Rosario Fazio ICTP, Strada Costiera 11, Trieste 34151, Italy NEST, Scuola Normale Superiore & Istituto Nanoscienze-CNR, I-56126 Pisa, Italy

A theory of feedback controlled heat transport in quantum systems is presented. It is based on modelling heat engines as driven multipartite systems subject to projective quantum measurements and measurement-conditioned unitary evolutions. The theory unifies various results presented in the previous literature. Feedback control breaks time reversal invariance. This in turn results in the fluctuation relation not being obeyed. Its restoration occurs by an appropriate accounting of the information gain and information use via measurements and feedback. We further illustrate an experimental proposal for the realisation of a Maxwell demon using superconducting circuits and single photon on-chip calorimetry. A two level qubit acts as a trapdoor which, conditioned on its state is coupled to either a hot resistor or a cold one. The feedback mechanism alters the temperatures felt by the qubit and can result in an effective inversion of temperature gradient, where heat flows from cold to hot thanks to information gain and use.

1 Introduction

In a famous thought experiment Maxwell envisioned a method for apparently defying the second law of thermodynamics by means of a feedback control mechanism [1]. Maxwell’s idea is based on a malicious demon, an intelligent being that is able to observe the microscopic dynamics of a system, and acts on it so as to steer it toward defying the second law. In one of Maxwell’s original concepts, the system is a container with two chambers, containing respectively a hot gas and a cold gas. The two chambers are separated by a wall presenting a trap-door which the demon can open and close at will. The demon observes the erratic motion of the gas particle and when sees a particle of the cold chamber approach the trap-door with sufficiently high velocity, she/he swiftly opens the door as to let the particle go through and closes it immediately afterwards. In this way, particle after particle, heat flows from the cold chamber to the hot chamber in contradiction with the second law.

Advance in nanotechnology has made the possibility of bringing Maxwell demons and similar devices from the realm of thought experiments to the realm of real experiments [2, 3, 4, 5]. Both theoretical and experimental studies so far have focused mainly on situations where feedback control is operated as a measurement-conditioned driving on some working substance (classical or quantum) coupled to a single temperature, so as to withdraw energy from the latter in contradiction with the second law as formulated by Kelvin. Interesting realistic proposals have appeared in Refs. [6, 7]. Situations where heat flows between different temperature reservoirs is controlled, however have not been addressed so far, neither theoretically nor experimentally. The main motivation of the present work is that of filling that gap. In the following we shall present the general theory of feedback controlled heat transport in quantum devices, and shall describe a possible experimental realisation thereof.

The theory presented here builds on previous works concerning fluctuation relations in presence of measurements without feedback [8, 9] and with feedback [10], combined with an inclusive approach where quantum heat engines are seen as mechanically driven multipartite systems starting in a multi-temperature initial state [11, 12, 13, 14]. Reference [10] reported on the theory of a one-measurement based feedback control on a quantum working-substance prepared by contact with a single bath. That formalism is here extended to the case of many heat baths, and also repeated measurements, to allow for the study of continuous feedback control of heat flow in a multi reservoir scenario. Previous work concerning repeated measurements appeared in Refs [15] for classical systems in contact with a single bath. Fluctuation relations need to be modified by a mutual information term, which we shall explicitely provide.

Our experimental proposal is based on the fast developing advancements in experimental solid state low temperature techniques: in particular the calorimetric measurement scheme that has been put forward by one of us and co-workers [16, 17]. As proven by some recent theoretical proposals [13, 18] the method opens up a new avenue for the practical management of heat and work on a chip by means of superconducting devices, particularly superconducting qubits. Here we illustrate the possible implementation of very simple feedback controlled heat transport where the trapdoor is realised by a superconducting qubit whose coupling with two resistors at different temperatures is controlled based on the outcomes of continuous calorimetric monitoring of the resistors themselves.

Figure 1: Feedback controlled heat transport. A bi-partite system starting in a two temperature Gibbs state is observed by a Demon, who measures an observable . Depending on the outcome of the measurement the demon applies a quantum gate to the bi-partite system with the aim of beating the second law. Each partition is composed of a heat reservoir and possibly one part of a working substance. The whole system evolves with unitaries interrupted by projections.

2 Theory

Following [14] we model a generic heat transport/heat engine scenario as a driven multi-partite system starting in the factorised state, see Figure 1


where is the Hamiltonian of each partition including a heat bath and possibly a portion of the working substance, and is the corresponding partition function [14]. Let the total Hamiltonian be


where is an interaction term that is switched on for the time interval over which the system is monitored. We assume that at times some observable is measured thus causing the wave function describing the compound to collapse onto the subspace spanned by the eigenvectors belonging to the measured eigenvalue . Following [10] we shall assume that there can be a measurement error where the eigenvalue is recorded instead of the actual eigenvalue . This is assumed to happen with probability . The choice of the interaction in the interval is dictated by the sequence of recorded eigenvalues, or more simply the recorded sequence , that is for . The corresponding unitary operator describing the evolution in the time span is where denotes time ordered exponential and . We shall denote the un-conditioned evolution operator from time to the time of the first measurement as . Note that the sequence of recorded labels generally differs from the sequence of labels specifying in which subspace the system state was actually projected at the measurement times . As customary in the context of the fluctuation theorem we shall assume that besides the intermediate measurements of , all ’s are measured at times giving the eigenvalues , respectively.

The quantity of primary interest is the probability that is obtained in the first energy measurement, the sequence is realised, the sequence is recorded and is obtained in the final energy measurement. Here we have introduced the simplified notations , . The explicit expression of is:


where denotes the probability of obtaining the eigenvalue in the first measurement; denotes the corresponding projector; denotes the projector onto the subspace belonging to the eigenvalue of ; the symbol denotes -ordered product, that is, .

Let be the energy change in the partition observed in a single realisation of the feedback driven protocol. Using the cyclic property of the trace and completeness , we obtain the following:


The proof is reported in the appendix. This relation extends the result presented in Ref. [10] to the case of multipartite system with initial multi temperature state, and to repeated measurements.111For simplicity we restricted to the case of cyclic . The extension to non-cyclic case is straightforward. The quantity represents the probability that the sequences , , are realised under the backward evolution specified by the adjoint Kraus operators . The total probability does not generally add to one. The reason for that is that the -th evolution occurs before the the -th eigenvalue is realised in the backward map. The feedback loop is evidently not time-reversal symmetric, and such lack of reversibility breaks the fluctuation theorem which in fact is a manifestation of time-reversal symmetry [19]. This is reflected by the fact that the quantum channel specified by the Kraus operators is generally not unital.222We recall that a quantum channel specified by Kraus operators , that is trace preserving , is unital when it maps the identity into itself . The adjoint of a non-unital quantum channel is not trace preserving. In the case of feedback control the quantum channel is generally not unital, as a consequence its adjoint is generally not trace preserving, hence we have generally . Lack of unitality generally reflects lack of time-reversal symmetry. Examples are thermalisation maps, namely maps that have a thermal state (not the identity) as a fixed point. Physically these are realised by means of weak contact of a system with a thermal bath, leading to irreversible dynamics. Likewise feedback control breaks the symmetry. This observation reveals some analogy between feedback control and dissipative dynamics.

Before proceeding let us comment briefly on the origin of lack of unitality in feedback controlled systems, in order to gain insight in the issue. For simplicity let us consider the case of a single measurement . Let us begin by noticing that the quantum channel specified by the is trace preserving. We have , where we have used the cyclic property of the trace, unitarity , idempotence , normalisation , and completeness . Let us now turn to unitality. We have . If the evolution did not dependent on , that is was chosen regardless of the recorded value (e.g, is pre-specified or is completely random), one could perform the sum over using and then use to conclude the map is unital. Feedback, implying explicit dependence on of breaks unitality. Unitality would occur also in the case when does not depend on , meaning the measurement outcome is completely random and has no correlation with the actual state . In sum if the feedback control measurement is off, either because one decides not to use the information gathered in the measurement, or because the measurement gathers no information in the first place, unitality is recovered, and the fluctuation theorem is restored. This result is in agreement with the established fact that projective measurements without feedback control do not alter the validity of the fluctuation theorem [8, 20, 21]. Here we have further learned that noise, i.e. choosing the ’s between the measurements completely randomly, also does not affect the integral fluctuation relation.

Let us now turn to thermodynamics. Using Jensen’s inequality, Eq. (5) implies:


In the case when the map is unital it is , and the second law of thermodynamics is recovered [14]. When the condition is not forbidden, and the apparent violation of the second law becomes possible. This occurs with a proper “demonic” design of the feedback control. When instead the second law is more strictly enforced by means of an “angelic” intervention.

As shown in Refs. [10, 22] in the case of a single measurement (in either classical or quantum systems) the fluctuation relation can be restored if an information theoretic term, in the form of a mutual information, is added to the exponent in the exponential average. Ref. [15] reports the extension to the case of repeated measurements in the classical scenario. All these results are for a single-temperature initial state. In the present set-up we find as well an information theoretic correction term (see the appendix for a proof):


where is defined by the following set of equations:


The symbol represents the joint probability that the sequence is realised and the sequence is recorded, while is the probability that is recorded. The symbol stands for the probability that the sequence is realised, conditioned on being the record. More explicitely


The operators differ from the operators by the term containing the conditional probability . Note that the Bayes rule does not apply here, i.e. generally it is . The reason is that and are concatenated with each other. An outcome influences the record , which in turn influences the next outcome and so on. The quantity measures the degree of such mutual influence, or correlation between the two sequences and 333Eq. (7) is reminiscent of a similar relation reported by Vedral [23], see Eq. (8) there. The two relations fundamentally differ in various respects. Notably in the meaning of the mutual information term. In our case measuring the correlation between outcomes and their records, in the case of Ref. [23] measuring the correlation between the measurements themselves. In absence of feedback, namely when there is no correlation between the two sequences, is null and the standard relation is recovered. Note that given a feedback rule, generally would grow with the length of the sequences, i.e. the number of measurements. It is accordingly expected that in the large regime.

With Jensen’s inequality Eq. (7) implies


We thus have found two bounds to .

By looking directly at the as in Ref. [14] we have found a third bound whose interpretation is most direct and straightforward. Let


be the system density matrix at time . In the second equality we have used completeness and the fact that the initial state has no coherences in the energy eigenbasis . Simple manipulations, similar to those employed in Ref. [14] lead to the following salient result




denote the Kullback Leibler divergence between the final state and the initial state , Eq. (17); the total amount of correlations (mutual information) that builds up among the partitions as a consequence of their interaction during the time span , Eq. (18); and the total change in von-Neumann entropy of the whole compound, Eq. (19). Here is the reduced state of partition at time ( denotes trace over all partitions but the -th). The mutual information among the partitions of the system (measuring all correlations, quantal and classical), which develops generally due to their interaction (and can also occur in absence of measurements and feedback [14]), should not be confused with the classical mutual information between the realisation sequence and the record sequence caused by the feedback mechanism.

Both the Kullback Leibler divergence and the mutual information are non negative quantities. We thus arrive at the central inequality:


In the standard no measurement case, is linked to via a unitary map, hence and one recovers the result of Ref. [14], namely , and the second law in its standard form. Note that when there are measurements, but no feedback, the is linked to via a unital map, implying , , and hence , meaning that, as is already known [8, 20, 21] the second law is not altered by the mere application of projective measurements that interrupt an otherwise unitary dynamics. However Eq. (20) clearly indicates that there is a dissipation term associated with quantum-mechanical measurements, which is not present in the classical case. In sum through Eq. (20) we see that there is a thermodynamic cost associated to quantum measurements.

Combining Eqs. (6,14,20) the second law of thermodynamics, in presence of feedback control takes the form


3 Illustrative example

To exemplify the theory above we consider a prototypical model of quantum heat engine whose working substance is made of two qubits [13, 14, 24]. Their Hamiltonian reads


where denote Pauli operators. We assume the two qubits have same level spacing and are initially in the state:


with their partition functions. At the ’s are measured collapsing the two qubits in the state , with . We assume classical error in the measurement of each qubit , for some . Accordingly the eigenvalues are recorded with probability . If the states , are recorded we do nothing: ; else, i.e., if we apply a swap operation, , that maps into . The system is now in a joint eigenstate of the two qubits Hamiltonian , hence the final measurement of the is irrelevant. At the end of the process each qubit is allowed to relax to thermal equilibrium with their respective thermal baths of inverse temperatures so as to re-establish the initial state . Accordingly the average energies acquired by each qubit during the process equals the average heats that they release in the baths in the thermal relaxation step. Due to the feedback mechanism energy may be withdrawn from the cold bath and released in the hot one. Note that, due to the fact that the two qubits have same level spacing the SWAP operation does not alter their total energy. Namely there is no energy injection by the Demon: to steer the energy flow he only uses information. The set-up is illustrated in Fig. 2 panel a).

Figure 2: Panel a). Scheme of a two-qubit feedback controlled refrigerator. Two qubits are prepared each in thermal equilibrium with a thermal bath. When the Demon sees the cold qubit in the excited state and the hot qubit in ground state, he swaps them. He then lets them thermalise each with its own bath and starts over. He thus transfers heat from the cold bath to the hot bath without investing energy. Panels b,c). , , , as a function of the error probability for , and two different values of .

The relevant probability chain is a bit simpler than in the general case because the first energy measurement is itself here also the first feedback measurement. It reads with . For we have . The final state is . The probability that the outcome is realised conditioned on being recorded is simply the marginal probability that is realised because the record comes chronologically after the realisation of and hence cannot have any influence on it. The quantity boils down then to the logarithm of the ratio [10] hence its expectation is the non-negative mutual information between and : .

Panels b,c) of Fig. 2 show for two choices of and same , as a function of the error probability . In accordance with Eq. (21) we see that is bounded from below by and . Independent of all other parameters the refrigerator cannot work in the region where and are anti-correlated, while it may only work if . This is captured by being positive in the region and negative for . At outcome and recording are fully uncorrelated, which restores unitality as discussed above and implies . Regarding , while it tends to be closer to in the operation region (), it greatly departs from it in the non-operation region, where it can even get negative values. Notably in both panels there is a value of for which the bound is saturated by . Regarding we note it is everywhere non-positive as expected. Furthermore it is symmetric with respect to . This reflects the fact that the mutual information does not distinguish between correlation and anti-correlation. The maximum is attained at where are uncorrelated, and the standard fluctuation relation is recovered (i.e., ). In both panels we see that . Whether this a generic bound is yet to be understood. We note that while at both and are null, is non-negative, reflecting the fact that in absence of feedback there is nonetheless an entropic cost associated to measurements, as discussed above. Such cost can be counterbalanced in presence of feedback (note that may be negative for ). Confronting now the two panels, we see that the higher the thermal gradient , the larger is the point where the engine starts operating, i.e. where turns from positive into negative: As intuition suggests the more the gradient the better must your measurement be. This feature is captured also by but not by . Also the smaller the gradient the more the shape of the function resembles that of , with the shift between the two being approximately the value of at : that is .

4 Experimental proposal

Figure 3: Set up of the proposed experiment. A superconducting qubit (black rectangle) embodies a Maxwell demon trap door, and two resistors embedded in RLC circuits embody the two chambers of different temperatures. Qubit and RLC circuits are inductively coupled. Calorimetric monitoring of photons entering and exiting each resistor is applied, allowing to both measure heat exchanged by each resistor, and monitoring the state of the qubit at any time. When the qubit is up a feedback algorithm drives the resonance frequency of the cold RLC circuit out of tune with the qubit frequency, while keeping the hot RLC in tune with it (and vice versa) so that an overall heat current flows from cold to hot. The resonance frequencies of the RLC circuits are controlled by tuning their non-linear inductive elements, i.e., SQUIDs, via application of external magnetic flux .

The general theory developed above allows for a joint information theoretic and thermodynamic analysis of feedback controlled dynamics in the broad scenario where a demon can influence not only the amount of work being provided by the outside as in previous works [2, 3, 4], but also the heat flow between the various parts of a compound system, e.g. the heat flow between various heat baths.

The progress of solid state technology on the other hand allows to realise such feedback controlled heat transport mechanisms in real devices. The example illustrated above can be experimentally realised by introducing a feedback mechanism in the two-superconducting qubits scheme illustrated in in Ref. [13]. Below we illustrate a design that is of more immediate realisation. It is a based on a single qubit and it does not involve any qubit-operation, but only manipulations of qubit-bath couplings. The proposal that we put forward here is based on two ingredients that enable unique capabilities allowing for the implementation of a Maxwell demon based on a most simple concept. The two ingredients are a two-level-system acting as quantum trap door and the calorimetric measurement scheme developed in Refs. [16, 17].

The one qubit set-up is illustrated in Fig. 3. The two-level system is embodied by a superconducting qubit of level spacing . The two chambers are embodied by two resistors being kept at different temperatures. Qubit and resistors con exchange energy (i.e. heat) in the form of photons of energy associated to the TLS absorbing/emitting one photon from/to one of the two baths. The resistors are embedded into an RLC loop of tunable resonance frequency. This results into a tuneable TLS/resistor coupling. When an RLC circuit is far detuned from , the qubit is effectively decoupled from the resistor, while maximal coupling occurs when it is in tune with the qubit. The resonance frequency can be tuned by using a SQUID as a non-linear and tuneable inductor, its inductance being governed by a controllable threading magnetic flux.

When a photon enters/exits one of the two resistors, its electronic temperature undergoes a positive/negative jump followed by a fast decay. Two calorimeters [16, 17] continuously monitor the two resistors, and count how many photons enter/exit them. This allows for a directional full counting statistics of heat. Most remarkably it also allows to infer the state of the TLS at each time. If an absorption (in either resistor) is observed, it means the TLS jumped down, hence it was up before the absorption was detected, and is down afterwards. This allows to experimentally access the quantum state trajectory of the TLS.

The feedback concept is extremely simple: as soon as a jump-down is observed, turn on the interaction with the cold resistor and turn off the interaction with the hot resistor. Vice-versa for the observation of a jump up. This results in a net flow of heat from the cold resistor to the hot one. Based on the above general analysis the apparent violation of the second law is understood in terms of lack of time-reversal symmetry of feedback control, leading to an overall non-unital dynamics of resistors plus TLS. In a practical realisation one is realistically not able to fully turn off the interactions. Furthermore there will be some delay time between measurement being performed and feedback being realised, giving rise effectively to possible error between measured state and actual state of the qubit.

5 Modelling

In the following we model the dynamics of the proposed experiment. We model the evolution of the two level system via a standard Lindblad master equation


where is the two level system Hamiltonian expressed in terms of the Pauli matrices , and are Lindblad operators


expressed in terms of and the super-operator and the rising and lowering spin operators of the Hamiltonian , defined via , where is the ground (excited) state of . Here denote either the left or the right reservoir. The rates for jump down/up in the ’th resistor are given by


where is the current noise spectrum expressed in terms of the voltage noise spectrum , is the quality factor and the resonance frequency of resonator , expressed in terms of its resistance, inductance and capacitance . By increasing the rates , can be quenched, namely the interaction between the TLS and the -th resistor can be turned off. The symbol stands for the mutual inductance between the qubit and the -th resistor and is the flux quantum. Note that the rates are detailed balanced:


The study of heat and work fluctuations requires the study of the dynamics to be performed at the level of single quantum-jump trajectories [13, 25], resulting from the unravelling of the master equation. This is here achieved by means of the Monte Carlo wave function (MCWF) method [26, 27]. In the specific case under study of a two level system subject to dissipation terms leading to full wave function collapse in either state or , this results in a classical dichotomous Poisson process with rates [13].

The basis of our numerical experiment is the generation of such dichotomous Poisson random trajectories. We chose the right reservoir as the cold one and the left as the hot one. The TLS is assumed to be initially in equilibrium with the left bath. We produce a large sample of trajectories and build the normalised historgram of the number of photons entering the right reservoir. Since the heat entering the right reservoir is given as , the statistics is the heat statistics. In absence of feedback it satisfies the fluctuation relation


The feedback is introduced as follows. At each moment in time we distinguish between the actual state of the system and the knowledge we have about it. The latter does not necessarily coincide with the former because we allow for some delay-time between a jump occurring in the TLS and our knowledge of the state of the qubit being updated accordingly. The delay time thus effectively introduces an error probability between the actual state and the knowledge about the state, at each time. At each time, conditioned on the knowledge of the state we use either one set of rates favouring the interaction with either the cold or hot bath. More explicitly, let be the rate for jump down (up) in -th bath conditioned on TLS being measured to be in state . In accordance with Eq. (26) we use the following rates


where are determined by the circuitry parameters, and can be tuned via external fluxes . With , this means that energy exchange with the right (cold) bath is larger when the TLS is believed to be down, so that it becomes more likely that energy flows out of the cold reservoir. Similarly energy exchange with the left (hot) bath is larger when the TLS is believed to be up, so that it becomes more likely that energy flows in the hot reservoir. Overall this results in an effect that contrasts the natural flow from hot to cold. The largest effect can be achieved when turning off the unwanted interaction completely, namely when . Having in mind a realistic set-up here we keep the ratio finite, meaning partial turning-off is considered.

Figure 4: Left: typical histogram . Right: as a function of . Straight dashed line is . Straight solid line is . Here . These thermal energies are expressed in units of . also fixes the time unit. Delay time is in those time units. It is , corresponding to the largest rate timescale . The simulation time is . The statistics is built on a sample of trajectories.

Because of the feedback the fluctuation relation (28) is not obeyed. However it can be proved (see appendix) that, due to the feedback mechanism, the TLS feels the effective temperature gradient


we thus see that by tuning the ratio the effective temperature gradient can be manipulated and if the errors associated to the measurement is not too big, it can even be inverted as compared to the original thermal gradient . So the overall effect of the demon is to change the “temperatures felt” by the TLS. Accordingly the following fluctuation relation


is obeyed by the histogram . This immediately allows to interpret the quantity


via Eq. (7) as the mutual information encoded in a trajectory along which a heat is exchanged with the bath. Note that when , the feedback has no effect and accordingly . Likewise if (hence ) meaning no correlation between state and knowledge thereof, feedback control does not work and again . Most importantly the experimental mutual information is proportional to the heat exchanged. This allows for accessing a fluctuating information theoretic quantity by means of a thermodynamic measurements in a realistic experimental scenario.

Figure 4 shows typical histograms for realistic parameters. We also plotted the quantity finding a good agreement with the theoretical prediction . The effective conditional probabilities were obtained by recording for each trajectory the total time when state was and knowledge was , and averaging their value over the whole ensemble of trajectories. The observed deviation is a consequence of the fact that error here is not introduced in the form of an outcome being missed (as assumed in deriving Eq. (34)), but rather being reported with some delay. With the histogram we computed , , , for the chosen parameters. The computed values are in agreement with the prediction of Eq. (21). The proposed experiment does not allow to measure , which would require accessing the full system+baths density matrix.

5.1 Energy spent by the Demon

What is the energy cost incurred by the demon to open/close the trap-door? To roughly estimate that we model the LCR circuit as a classical harmonic oscillator (LC circuit) in contact with a heat bath (the resistor) at temperature . To open/close the door towards one of the two reservoirs, the demon switches the LC frequency from to another frequency so as to put it in/off resonance with the qubit. If the operation is carried in a quasi static manner, the work done is equal to the free energy change: . The operation would in this case be reversible, and the work lost when opening the door will be retrieved when opening it. The overall cost of a open/close cycle would be null in this limiting case. The other limiting case is when the switch is infinitely fast. The overall cost of a single open/close cycle in this case would be non-negative in accordance with the second law of thermodynamics, and amounts to . The overall work incurred in a repeated feedback operation is proportional to the number of open/close cycles, which in turn is proportional to the net number of energy quanta being transported, namely the total heat transported. Interestingly we note that the faster the open/close operation, the more effective is the feedback mechanism, the more energy needs to be invested.

6 Conclusions

We have developed a general quantum theory of repeated feedback control in a multiple heat reservoir scenario. The main effect of feedback control is that it induces a generally non-unital dynamics of the full reservoirs+system compound. As a consequence the standard bound set by the second law od thermodynamics on the dissipation quantifier is shifted and may become negative. We have illustrated an experimental proposal where a single superconducting qubit plays the role of a trap-door that is subject to feedback control. The envisaged method for simultaneously measuring the qubit state and the heat exchanged by each reservoir is single photon calorimetry.


This research was supported by a Marie Curie Intra European Fellowship within the 7th European Community Framework Programme through the project NeQuFlux grant n. 623085 (M.C.), by Unicredit Bank (M.C.), by the Academy of Finland contract no. 272218 (J.P.), and by the COST action MP1209 “Thermodynamics in the quantum regime”.

Appendix A Derivation of Eq. (5)

Eq. (3) and have been used to obtain the second line. Completeness and unitarity led to the third line. Fourth line follows from the cyclical property of the trace, idempotence and .

Appendix B Derivation of Eq. (7)

Using Eq. (11), the exponentiated fluctuating mutual information can be conveniently expressed as




Eq. (3), and Eq. (13) have been used to obtain the second line. Completeness and unitarity led to the third line. The fourth line follows from which follows by expanding the -ordered products, apply idempotence , completeness , and unitarity . Cyclical property of the trace, idempotence and lead to the fifth line. The final result is a consequence of normalisation of and of .

Appendix C Derivation of Eq. (33)

Under the operation of the demon the TLS experiences effective temperatures of the baths that differ from their actual value. To fix ideas, let us for the moment, assume no delay time and no error in the measurement. The qubit is effectively subject to the following effective rates . Accordingly, the detailed balance temperatures are shifted:


where we used the explicit expressions Eq. (26). This implies the effective temperatures


Let us now introduce the errors related to the measurement. The stochastic process describing the dynamics of the TLS is still Poissonian with one rate occurring in case of right measurement and one rate occurring in the other case. The idea is that monitoring is continuous, or better, occurring with a sampling time interval , which we assume short compared to all rates . Let us imagine the system is in state . There is a probability the observation is and a probability the observation is . Thus the probability to undergo a jump down in the reservoir in the interval is


Similarly for the jump up. Overall the TLS experience the new rates