On Lossless Approximations, the Fluctuation-Dissipation Theorem, and Limitations of Measurements

# On Lossless Approximations, the Fluctuation-Dissipation Theorem, and Limitations of Measurements

Henrik Sandberg, Jean-Charles Delvenne, and John C. Doyle H. Sandberg is with the School of Electrical Engineering, Royal Institute of Technology (KTH), Stockholm, Sweden, hsan@ee.kth.se. Supported in part by the Swedish Research Council and the Swedish Foundation for Strategic Research.J.-C. Delvenne is with the University of Namur (FUNDP), Department of Mathematics, Namur, Belgium, jean-charles.delvenne@math.fundp.ac.be. Supported in part by the Belgian Programme on Interuniversity Attraction Poles DYSCO, initiated by the Belgian Federal Science Policy Office. The scientific responsibility rests with its authors.J.C. Doyle is with California Institute of Technology, Control and Dynamical Systems, M/C 107-81, Pasadena, CA 91125, USA, doyle@cds.caltech.edu. Supported by grants NSF-EFRI-0735956, AFOSR-FA9550-08-1-0043, and ONR-MURI-N00014-08-1-0747.
###### Abstract

In this paper, we take a control-theoretic approach to answering some standard questions in statistical mechanics, and use the results to derive limitations of classical measurements. A central problem is the relation between systems which appear macroscopically dissipative but are microscopically lossless. We show that a linear system is dissipative if, and only if, it can be approximated by a linear lossless system over arbitrarily long time intervals. Hence lossless systems are in this sense dense in dissipative systems. A linear active system can be approximated by a nonlinear lossless system that is charged with initial energy. As a by-product, we obtain mechanisms explaining the Onsager relations from time-reversible lossless approximations, and the fluctuation-dissipation theorem from uncertainty in the initial state of the lossless system. The results are applied to measurement devices and are used to quantify limits on the so-called observer effect, also called back action, which is the impact the measurement device has on the observed system. In particular, it is shown that deterministic back action can be compensated by using active elements, whereas stochastic back action is unavoidable and depends on the temperature of the measurement device.

## 1 Introduction

Analysis and derivation of limitations on what is achievable are at the core of many branches of engineering, and thus of tremendous importance. Examples can be found in estimation, information, and control theories. In estimation theory, the Cramér-Rao inequality gives a lower bound on the covariance of the estimation error, in information theory Shannon showed that the channel capacity gives an upper limit on the communication rate, and in control theory Bode’s sensitivity integral bounds achievable control performance. For an overview of limitations in control and estimation, see the book [1]. Technology from all of these branches of engineering is used in parallel in modern networked control systems [2]. Much research effort is currently spent on understanding how the limitations from these fields interact. In particular, much effort has been spent on merging limitations from control and information theory, see for example [3, 4, 5]. This has yielded insight about how future control systems should be designed to maximize their performance and robustness.

Derivation of limitations is also at the core of physics. Well-known examples are the laws of thermodynamics in classical physics and the uncertainty principle in quantum mechanics [6, 7, 8]. The exact implications of these physical limitations on the performance of control systems have received little attention, even though all components of a control system, such as actuators, sensors, and computers, are built from physical components which are constrained by physical laws. Control engineers discuss limitations in terms of location of unstable plant poles and zeros, saturation limits of actuators, and more recently channel capacity in feedback loops. But how does the amount of available energy limit the possible bandwidth of a control system? How does the ambient temperature affect the estimation error of an observer? How well can you implement a desired ideal behavior using physical components? The main goal of this paper is to develop a theoretical framework where questions such as these can be answered, and initially to derive limitations on measurements using basic laws from classical physics. Quantum mechanics is not used in this paper.

The derivation of physical limitations broaden our understanding of control engineering, but these limitations are also potentially useful outside of the traditional control-engineering community. In the physics community, the rigorous error analysis we provide could help in the analysis of far-from-equilibrium systems when time, energy, and degrees of freedom are limited. For Micro-Electro-Mechanical Systems (MEMS), the limitation we derive on measurements can be of significant importance since the physical scale of micro machines is so small. In systems biology, limits on control performance due to molecular implementation have been studied [9]. It is hoped that this paper will be a first step in a unified theoretical foundation for such problems.

### 1.1 Related work

The derivation of thermodynamics as a theory of large systems which are microscopically governed by lossless and time-reversible fundamental laws of physics (classical or quantum mechanics) has a large literature and tremendous progress for over a century within the field of statistical physics. See for instance [10, 11, 12, 13] for physicists’ account of how dissipation can appear from time-reversible dynamics, and the books [6, 7, 8] on traditional statistical physics. In non-equilibrium statistical mechanics, the focus has traditionally been on dynamical systems close to equilibrium. A result of major importance is the fluctuation-dissipation theorem, which plays an important role in this paper. The origin of this theorem goes back to Nyquist’s and Johnson’s work [14, 15] on thermal noise in electrical circuits. In its full generality, the theorem was first stated in [16]; see also [17]. The theorem shows that thermal fluctuations of systems close to equilibrium determines how the system dissipates energy when perturbed. The result can be used in two different ways: By observing the fluctuation of a system you can determine its dynamic response to perturbations; or by making small perturbations to the system you can determine its noise properties. The result has found wide-spread use in many areas such as fluid mechanics, but also in the circuit community, see for example [18, 19]. A recent survey article about the fluctuation-dissipation theorem is [20]. Obtaining general results for dynamical systems far away from equilibrium (far-from-equilibrium statistical mechanics) has proved much more difficult. In recent years, the so-called fluctuation theorem [21, 22], has received a great deal of interest. The fluctuation theorem quantifies the probability that a system far away from equilibrium violates the second law of thermodynamics. Not surprisingly, for longer time intervals, this probability is exceedingly small. A surprising fact is that the fluctuation theorem implies the fluctuation-dissipation theorem when applied to systems close to equilibrium [22]. The fluctuation theorem is not treated in this paper, but is an interesting topic for future work.

From a control theorist’s perspective, it remains to understand what these results imply in a control-theoretical setting. One contribution of this paper is to highlight the importance of the fluctuation-dissipation theorem in control engineering. Furthermore, additional theory is needed that is both mathematically more rigorous and applies to systems not merely far-from-equilibrium, but maintained there using active control. More quantitative convergence and error analysis is also needed for systems not asymptotically large, such as arise in biology, microelectronics, and micromechanical systems.

Substantial work has already been done in the control community in formulating various results of classical thermodynamics in a more mathematical framework. In [23, 24], the second law of thermodynamics is derived and a control-theoretic heat engine is obtained (in [25] these results are generalized). In [26], a rigorous dynamical systems approach is taken to derive the laws of thermodynamics using the framework of dissipative systems [27, 28]. In [29], it is shown how the entropy flows in Kalman-Bucy filters, and in [30] Linear-Quadratic-Gaussian control theory is used to construct heat engines. In [31, 32, 33], the problem of how lossless systems can appear dissipative (compare with [10, 11, 12] above) is discussed using various perspectives. In [34], how the direction of time affects the difficulty of controlling a process is discussed.

### 1.2 Contribution of the paper

The first contribution of the paper is that we characterize systems that can be approximated using linear or nonlinear lossless systems. We develop a simple, clear control-theoretic model framework in which the only assumptions on the nature of the physical systems are conservation of energy and causality, and all systems are of finite dimension and act on finite time horizons. We construct high-order lossless systems that approximate dissipative systems in a systematic manner, and prove that a linear model is dissipative if, and only if, it is arbitrarily well approximated by lossless causal linear systems over an arbitrary long time horizon. We show how the error between the systems depend on the number of states in the approximation and the length of the time horizon (Theorems 1 and 2). Since human experience and technology is limited in time, space, and resolution, there are limits to directly distinguishing between a low-order macroscopic dissipative system and a high-order lossless approximation. This result is important since it shows exactly what macroscopic behaviors we can implement with linear lossless systems, and how many states are needed. In order to approximate an active system, even a linear one, with a lossless system, we show that the approximation must be nonlinear. Note that active components are at the heart of biology and all modern technology, in amplification, digital electronics, signal transduction, etc. In the paper, we construct one class of low-order lossless nonlinear approximations and show how the approximation error depends on the initial available energy (Theorems 4 and 5). Thus in this control-theoretic context, nonlinearity is not a source of complexity, but rather an essential and valuable resource for engineering design. These result are all of theoretical interest, but should also be of practical interest. In particular, the results give constructive methods for implementing desired dynamical systems using finite number of lossless components when resources such as time and energy are limited.

As a by-product of this contribution, the fluctuation-dissipation theorem (Propositions 2 and 3) and the Onsager reciprocal relations (Theorem 3) easily follows. The lossless systems studied here are consistent with classical physics since they conserve energy. If time reversibility (see [28] and also Definition 2) of the linear lossless approximation is assumed, the Onsager relations follow. Uncertainty in the initial state of linear lossless approximations give a simple explanation for noise that can be observed at a macroscopic level, as quantified by the fluctuation-dissipation theorem. The fluctuation-dissipation theorem and the Onsager relations are well know and have been shown in many different settings. Our contribution here is to give alternative explanations that use the language and tools familiar to control theorists.

The second contribution of the paper is that we highlight the importance of the fluctuation-dissipation theorem for deriving limitations in control theory. As an application of control-theoretic relevance, we apply it on models of measurement devices. With idealized measurement devices that are not lossless, we show that measurements can be done without perturbing the measured system. We say these measurement devices have no back action, or alternatively, no observer effect. However, if these ideal measurement devices are implemented using lossless approximations, simple limitations on the back action that depends on the surrounding temperature and available energy emerge. We argue that these lossless measurement devices and the resulting limitations are better models of what we can actually implement physically.

We hope this paper is a step towards building a framework for understanding fundamental limitations in control and estimation that arise due to the physical implementation of measurement devices and, eventually, actuation. We defer many important and difficult issues here such as how to actually model such devices realistically. It is also clear that this framework would benefit from a behavioral setting [35]. However, for the points we make with this paper, a conventional input-output setting with only regular interconnections is sufficient. Aficionados will easily see the generalizations, the details of which might be an obstacle to readability for others. Perhaps the most glaring unresolved issue is how to best motivate the introduction of stochastics. In conventional statistical mechanics, a stochastic framework is taken for granted, whereas we ultimately aim to explain if, where, and why stochastics arise naturally. We hope to address this in future papers. The paper [33] is an early version of this paper.

### 1.3 Organization

The organization of the paper is as follows: In Section 2, we derive lossless approximations of various classes of systems. First we look at memoryless dissipative systems, then at dissipative systems with memory, and finally at active systems. In Section 3, we look at the influence of the initial state of the lossless approximations, and derive the fluctuation-dissipation theorem. In Section 4, we apply the results to measurement devices, and obtain limits on their performance.

### 1.4 Notation

Most notation used in the paper is standard. Let and be the -th element. Then denotes the transpose of , and the complex conjugate transpose of . We define , , and is the largest singular value of . Furthermore, , and . is the -dimensional identity matrix.

## 2 Lossless Approximations

### 2.1 Lossless systems

In this paper, linear systems in the form

 ˙x(t) =Jx(t)+Bu(t), x(t) ∈Rn, (1) y(t) =BTx(t)+Du(t), u(t),y(t) ∈Rp,

where and are anti symmetric (, ) and is controllable are of special interest. The system (1) is a linear lossless system. We define the total energy of (1) as

 E(x):=12xTx. (2)

Lossless [27, 28] means that the total energy of (1) satisfies

 dE(x(t))dt=x(t)T˙x(t)=y(t)Tu(t)=:w(t), (3)

where is the work rate on the system. If there is no work done on the system, , then the total energy is constant. If there is work done on the system, , the total energy increases. The work, however, can be extracted again, , since the energy is conserved and the system is controllable. In fact, all finite-dimensional linear minimal lossless systems with supply rate can be written in the form (1), see [28, Theorem 5]. Nonlinear lossless systems will also be of interest later in the paper. They will also satisfy (2)–(3), but their dynamics are nonlinear. Conservation of energy is a common assumption on microscopic models in statistical mechanics and in physics in general [6]. The systems (1) are also time reversible if, and only if, they are also reciprocal, see [28, Theorem 8] and also Definitions 12 in Section 2.3. Hence, we argue the systems (1) have desirable “physical” properties.

###### Remark 1.

In this paper, we only consider systems that are lossless and dissipative with respect to the supply rate . This supply rate is of special importance because of its relation to passivity theory. Indeed, there is a theory for systems with more general supply rates, see for example [27, 28], and it is an interesting problem to generalize the results here to more general supply rates.

###### Remark 2.

The system (1) is a linear port-Hamiltonian system, see for example [36], with no dissipation. Note that the Hamiltonian of a linear port-Hamiltonian system is identical to the total energy .

There are well-known necessary and sufficient conditions for when a transfer function can be exactly realized using linear lossless systems: All the poles of the transfer function must be simple, located on the imaginary axis, and with positive semidefinite residues, see [28]. In this paper, we show that linear dissipative systems can be arbitrarily well approximated by linear lossless systems (1) over arbitrarily large time intervals. Indeed, if we believe that energy is conserved, then all macroscopic models should be realizable using lossless systems of possibly large dimension. The linear lossless systems are rather abstract but have properties that we argue are reasonable from a physical point of view, as illustrated by the following example.

###### Example 1.

It is a simple exercise to show that the circuit in Fig. 1 with the current through the current source as input , and the voltage across the current source as output is a lossless linear system. We have

 ˙x(t) =⎛⎜ ⎜⎝0−1/√L1C101/√L1C10−1/√L1C201/√L1C20⎞⎟ ⎟⎠x(t) +⎛⎜⎝1/√C100⎞⎟⎠u(t), y(t) =(1/√C100)x(t), x(t)T =(√C1v1(t)√L1i1(t)√C2v2(t)), E(x(t)) =12x(t)Tx(t)=12(C1v1(t)2+L1i1(t)2+C2v2(t)2), w(t) =y(t)u(t)=v1(t)i(t).

Note that coincides with the energy stored in the circuit, and that is the power into the circuit. Electrical circuits with only lossless components (capacitors and inductors) can be realized in the form (1), see [37]. Circuits with resistors can always be approximated by systems in the form (1), as is shown in this paper.

### 2.2 Lossless approximation of dissipative memoryless systems

Many times macroscopic systems, such as resistors, are modeled by simple static (or memoryless) input-output relations

 y(t)=ku(t), (4)

where . If is positive semidefinite, this system is dissipative since work can never be extracted and the work rate is always nonnegative, , for all and . Hence, (4) is not lossless. Next, we show how we can approximate (4) arbitrarily well with a lossless linear system (1) over finite, but arbitrarily long, time horizons . First of all, note that can be decomposed into where is symmetric positive semidefinite, and is anti symmetric. We can use in the lossless approximation (1) and need only to consider the symmetric matrix next.

First, choose the time interval of interest, , and rewrite as the convolution

 y(t)=∫∞−∞κ(t−s)u(s)ds,κ(t):=ksδ(t), (5)

where is at least continuous and has support in the interval ,

 u(t)=0,t∈(−∞,0]∪[τ,∞),

and is the Dirac distribution. The time interval should contain all the time instants where we perform input-output experiments on the system (4)–(5). The impulse response can be formally expanded in a Fourier series over the interval ,

 κ(t)∼ks2τ+∞∑l=1ksτcoslω0t,ω0:=π/τ. (6)

To be precise, the Fourier series (6) converges to in the sense of distributions. Define the truncated Fourier series by and split into a causal and an anti-causal part:

 κN(t) =:κcN(t)+κacN(t) κcN(t) =0(t<0),κacN(t)=0(t≥0).

The causal part can be realized as the impulse response of a lossless linear system (1) of order using the matrices

 J=JN :=⎡⎢⎣00000ΩN0−ΩN0⎤⎥⎦, (7) ΩN :=diag{ω0Ir,2ω0Ir,…,(N−1)ω0Ir}, B=BN :=√1τ(kTf√2kTf…kTf0…0)T,

where and satisfies . That the series (6) converges in the sense of distributions means that for all smooth of support in we have that

 ksu(t)=limN→∞∫∞−∞(κacN(t−s)+κcN(t−s))u(s)ds.

A closer study of the two terms under the integral reveals that

 limN→∞∫∞−∞κacN(t−s)u(s)ds =12ksu(t+), limN→∞∫∞−∞κcN(t−s)u(s)ds =12ksu(t−),

because of the anti-causal/causal decomposition and . Thus since is smooth, we can also model using only the causal part if it is scaled by a factor of two. This leads to a linear lossless approximation of that we denote by the linear operator defined by

 yN(t)=(KNu)(t) =∫∞−∞2κcN(t−s)u(s)ds (8) =∫t02κcN(t−s)u(s)ds.

Here denotes the space of twice continuously differentiable functions on the interval . The linear operator is realized by the triple . We can bound the approximation error as seen in the following theorem.

###### Theorem 1.

Assume that and . Let with symmetric positive semidefinite and anti symmetric. Define a lossless approximation with realization , . Then the approximation error is bounded as

 ∥y(t)−yN(t)∥2≤2¯σ(ks)τπ2(N−1)(∥˙u(t)∥2+∥˙u(0)∥2+∥¨u∥L1[0,t]),

for in .

###### Proof.

We have that , . The order of summation and integration has changed because this is how the value of the series is defined in distribution sense. We proceed by using repeated integration by parts on each term in the series. It holds that . Hence, we have the bound

 ∥y(t)−yN(t)∥2≤2¯σ(ks)τ∞∑l=N1l2ω20(∥˙u(t)∥2+∥˙u(0)∥2+∫t0∥¨u(s)∥1ds).

Since , we can establish the bound in the theorem. ∎

The theorem shows that by choosing the truncation order sufficiently large, the memoryless model (4) can be approximated as well as we like with a lossless linear system, if inputs are smooth. Hence we cannot then distinguish between the systems and using finite-time input-output experiments. On physical grounds one may prefer the model even though it is more complex, since it assumes the form (1) of a lossless system (and is time reversible if is reciprocal, see Theorem 3). Additional support for this idea is given in Section 3. Note that the lossless approximation is far from unique: The time interval is arbitrary, and other Fourier expansions than (6) are possible to consider. The point is, however, that it is always possible to approximate the dissipative behavior using a lossless model.

It is often a reasonable assumption that inputs , for example voltages, are smooth if we look at a sufficiently fine time scale. This is because we usually cannot change inputs arbitrarily fast due to physical limitations. Physically, we can think of the approximation order as the number of degrees of freedom in a physical system, usually of the order of Avogadro’s number, . It is then clear that the interval length can be very large without making the approximation error bound in Theorem 1 large. This explains how the dissipative system (4) is consistent with a physics based on energy conserving systems.

###### Remark 3.

Note that it is well known that a dissipative memoryless system can be modeled by an infinite-dimensional lossless system. We can model an electrical resistor by a semi-infinite lossless transmission line using the telegraphists’s equation (the wave equation), see [38], for example. If the inductance and capacitance per unit length of the line are and , respectively, then the characteristic impedance of the line, , is purely resistive. One possible interpretation of is as a finite-length lossless transmission line where only the lowest modes of the telegraphists’s equation are retained. Also in the physics literature lossless (or Hamiltonian) approximations of dissipative memoryless systems can be found. In [10, 11, 12], a so-called Ohmic bath is used, for example. Note that it is not shown in these papers when, and how fast, the approximation converges to the dissipative system. This is in contrast to the analysis presented herein, and the error bound in Theorem 1.

### 2.3 Lossless approximation of dissipative systems with memory

In this section, we generalize the procedure from Section 2.2 to dissipative systems that have memory. We consider asymptotically stable time-invariant linear causal systems with impulse response . Their input-output relation is given by

 y(t)=(Gu)(t)=∫t0g(t−s)u(s)ds. (9)

Possible direct terms in can be approximated separately as shown in Section 2.2. The system (9) is dissipative with respect to the work rate if and only if , for all and admissible . An equivalent condition, see [28], is that the transfer function satisfies

 ^g(jω)+^g(−jω)T≥0for allω. (10)

Here is the Fourier transform of .

We will next consider the problem of how well, and when, a system (9) can be approximated using a linear lossless system (1) (call it ) with fixed initial state ,

 yN(t)=BTeJtx0+∫t0BTeJ(t−s)Bu(s)ds, (11)

for a set of input signals. Let us formalize the problem.

###### Problem 1.

For any fixed time horizon and arbitrarily small , when is it possible to find a lossless system with fixed initial state and output such that

 ∥y(t)−yN(t)∥2≤ϵ∥u∥L2[0,t], (12)

for all input signals and ?

Note that we require to be fixed in Problem 1, so that it is independent of the applied input . This means the approximation should work even if the applied input is not known beforehand. Let us next state a necessary condition for linear lossless approximations.

###### Proposition 1.

Assume there is a linear lossless system that solves Problem 1. Then it holds that

• If , then is an unobservable state;

• If , then is an uncontrollable state; and

• If the realization of is minimal, then .

###### Proof.

(i): The inequality (12) holds for when . Then (12) reduces to , for , which implies . Thus a nonzero must be unobservable. (ii): For the lossless realizations it holds that , where and are the observability and controllability matrices for the realization . Thus if is unobservable, it is also uncontrollable. (iii): Both (i) and (ii) imply (iii). ∎

Proposition 1 significantly restricts the classes of systems we can approximate using linear lossless approximations. Intuitively, to approximate active systems there must be energy stored in the initial state of . But Proposition 1 says that such initial energy is not available for the inputs and outputs of . The next theorem shows that we can approximate using if, and only if, is dissipative.

###### Theorem 2.

Suppose is a linear time-invariant causal system (9), where is uniformly bounded, , and . Then Problem 1 is solvable using a linear lossless if, and only if, is dissipative.

###### Proof.

See Appendix 6.1. ∎

The proof of Theorem 2 shows that the number of states needed in is proportional to , and again the required state space is large. The result shows that for finite-time input-output experiments with finite-energy inputs it is not possible to distinguish between the dissipative system and its lossless approximations. Theorem 2 illustrates that a very large class of dissipative systems (macroscopic systems) can be approximated by the lossless linear systems we introduced in (1). The lossless systems are dense in the dissipative systems, in the introduced topology. Again this shows how dissipative systems are consistent with a physics based on energy-conserving systems.

In [28, Theorem 8], necessary and sufficient conditions for time reversible systems are given. We can now use this result together with Theorem 2 to prove a result reminiscent to the Onsager reciprocal relations which say physical systems tend to be reciprocal, see for example [6]. Before stating the result, we properly define what is meant by reciprocal and time reversible systems. These definitions are slight reformulations of those found in [28].

A signature matrix is a diagonal matrix with entries either and .

###### Definition 1.

A linear time-invariant system with impulse response is reciprocal with respect to the signature matrix if .

###### Definition 2.

Consider a finite-dimensional linear time-invariant system and assume that . Let be admissible inputs to , and be the corresponding outputs. Then is time reversible with respect to the signature matrix if whenever .

###### Theorem 3.

Suppose satisfies the assumptions in Theorem 2. Then is dissipative and reciprocal with respect to if, and only if, there exists a time-reversible (with respect to ) arbitrarily good linear lossless approximation of .

###### Proof.

See Appendix 6.2. ∎

Hence, one can understand that macroscopic physical systems close to equilibrium usually are reciprocal because their underlying dynamics are lossless and time reversible.

###### Remark 4.

There is a long-standing debate in physics about how macroscopic time-irreversible dynamics can result from microscopic time-reversible dynamics. The debate goes back to Loschmidt’s paradox and the Poincaré recurrence theorem. The Poincaré recurrence theorem says that bounded trajectories of volume-preserving systems (such as lossless systems) will return arbitrarily close to their initial conditions if we wait long enough (the Poincaré recurrence time). This seems counter-intuitive for real physical systems. One common argument is that the Poincaré recurrence time for macroscopic physical systems is so long that we will never experience a recurrence. But this argument is not universally accepted and other explanations exist. The debate still goes on, see for example [13]. In this paper we construct lossless and time-reversible systems with arbitrarily large Poincaré recurrence times, that are consistent with observations of all linear dissipative (time-irreversible) systems, as long as those observations take place before the recurrence time. For a control-oriented related discussion about the arrow of time, see [34].

### 2.4 Nonlinear lossless approximations

In Section 2.2, it was shown that a dissipative memoryless system can be approximated using a lossless linear system. Later in Section 2.3 it was also shown that the approximation procedure can be applied to any dissipative (linear) system. Because of Proposition 1 and Theorem 2, it is clear that it is not possible to approximate a linear active system using a linear lossless system with fixed initial state. Next we will show that it is possible to solve Problem 1 for active systems if we use nonlinear lossless approximations.

Consider the simplest possible active system,

 y(t)=ku(t), (13)

where is negative definite. This can be a model of a negative resistor, for example. More general active systems are considered below. The reason a linear lossless approximation of (13) cannot exist is that the active device has an internal infinite energy supply, but we cannot store any energy in the initial state of a linear lossless system and simultaneously track a set of outputs, see Proposition 1. However, if we allow for lossless nonlinear approximations, (13) can be arbitrarily well approximated. This is shown next by means of an example.

Consider the nonlinear system

 ˙xE(t) =1√2E0u(t)Tku(t),xE(0)=√2E0,E0>0, (14) yE(t) =xE(t)√2E0ku(t),

with a scalar energy-supply state , and total energy . The system (14) has initial total energy , and is a lossless system with respect to the work rate , since

 ddtE(xE(t))=xE(t)˙xE(t)=yE(t)Tu(t).

The input-output relation of (14) is given by

 xE(t) =√2E0+1√2E0∫t0u(s)Tku(s)ds, (15) yE(t) =ku(t)+12E0ku(t)∫t0u(s)Tku(s)ds.

We have the following approximation result.

###### Theorem 4.

For uniformly bounded inputs, , , the error between the active system (13) and the nonlinear lossless approximation (14) can be bounded as

 ∥yE(t)−y(t)∥2≤ϵ∥u∥L2[0,t],

for , where .

###### Proof.

A simple bound on from (15) gives . Then using , , gives the result. ∎

The error bound in Theorem 4 can be made arbitrarily small for finite time intervals if the initial total energy is large enough. This example shows that active systems can also be approximated by lossless systems, if the lossless systems are allowed to be nonlinear and are charged with initial energy.

The above approximation method can in fact be applied to much more general systems. Consider the ordinary differential equation

 ˙x(t) =f(x(t),u(t)),x(0)=x0, (16) y(t) =g(x(t),u(t)),

where , and . In general, this is not a lossless system with respect to the supply rate . A nonlinear lossless approximation of (16) is given by

 ˙^x(t) =xE(t)√2E0f(^x(t),u(t)),^x(0)=x0, (17) ˙xE(t) =1√2E0g(^x(t),u(t))Tu(t)−1√2E0^x(t)Tf(^x(t),u(t)), yE(t) =xE(t)√2E0g(^x(t),u(t)),xE(0)=√2E0,

where again is a scalar energy-supply state, and can be interpreted as an approximation of in (16). That (17) is lossless can be verified using the storage function

 E=12^x(t)T^x(t)+12xE(t)2,

since

 ˙E =(xE/√2E0)(^xTf(^x,u)+g(^x,u)Tu−^xTf(^x,u)) =(xE/√2E0)g(^x,u)Tu=yTEu=w.

Since for small , it is intuitively clear that in (17) will be close to in (16), at least for small and large initial energy . We have the following theorem.

###### Theorem 5.

Assume that is continuous with respect to and , and that (16) has a unique solution for . Then there exist positive constants and such that for all (17) has a unique solution which satisfies for all .

###### Proof.

Introduce the new coordinate and define . The system (17) then takes the form

 ˙^x =(1+ϵ0ΔxE)f(^x,u),^x(0)=x0, Δ˙xE =ϵ0g(^x,u)Tu−ϵ0^xTf(^x,u),ΔxE(0)=0.

Perturbation analysis [39, Section 10.1] in the parameter as yields that there are positive constants and such that for all . The result then follows with . ∎

Just as in Section 2.3, the introduced lossless approximations are not unique. The one introduced here, (17), is very simple since only one extra state is added. Its accuracy (, ) of course depends on the particular system (, ) and the time horizon . An interesting topic for future work is to develop a theory for “optimal” lossless approximations using a fixed amount of energy and a fixed number of states.

### 2.5 Summary

In Section 2, we have seen that a large range of systems, both dissipative and active, can be approximated by lossless systems. Lossless systems account for the total energy, and we claim these models are more physical. It was shown that linear lossless systems are dense in the set of linear dissipative systems. It was also shown that time reversibility of the lossless approximation is equivalent to a reciprocal dissipative system. To approximate active systems nonlinearity is needed. The introduced nonlinear lossless approximation has to be initialized at a precise state with a large total energy (). The nonlinear approximation achieves better accuracy (smaller ) by increasing initial energy (increasing ). This is in sharp contrast to the linear lossless approximations of dissipative systems that are initialized with zero energy (). These achieve better accuracy (smaller ) by increasing the number of states (increasing ). The next section deals with uncertainties in the initial state of the lossless approximations.

## 3 The Fluctuation-Dissipation Theorem

As discussed in the introduction, the fluctuation-dissipation theorem plays a major role in close-to-equilibrium statistical mechanics. The theorem has been stated in many different settings and for different models. See for example [17, 20], where it is stated for Hamiltonian systems and Langevin equations. In [18, 19], it is stated for electrical circuits. A fairly general form of the fluctuation-dissipation theorem is given in [6, p. 500]. We re-state this version of the theorem here.

Suppose that and , , are conjugate external variables (inputs and outputs) for a dissipative system in thermal equilibrium of temperature [Kelvin] (as defined in Section 3.1). We can interpret as a generalized velocity and as the corresponding generalized force, such that is a work rate [Watt]. Although the system is generally nonlinear, we only consider small variations of the state around a fixpoint of the dynamics, which allows us to assume the system to be linear. Assume first that the system has no direct term (no memoryless element). If we make a perturbation in the forces , the velocities respond according to

 y(t)=∫t0g(t−s)u(s)ds,

where is the impulse response matrix by definition. The following fluctuation-dissipation theorem now says that the velocities actually also fluctuates around the equilibrium.

###### Proposition 2.

The total response of a linear dissipative system with no memoryless element and in thermal equilibrium of temperature is given by

 y(t)=n(t)+∫t0g(t−s)u(s)ds, (18)

for perturbations . The fluctuations is a stationary Gaussian stochastic process, where

 En(t) =0, (19) Rn(t,s) :=En(t)n(s)T ={kBTg(t−s),t−s≥0kBTg(s−t)T,t−s<0,

where is Boltzmann’s constant.

###### Proof.

See Section 3.1. ∎

The covariance function of the noise is determined by the impulse response , and vice versa. The result has found wide-spread use in for example fluid mechanics: By empirical estimation of the covariance function we can estimate how the system responds to external forces. In circuit theory, the result is often used in the other direction: The forced response determines the color of the inherent thermal noise. One way of understanding the fluctuation-dissipation theorem is by using linear lossless approximations of dissipative models, as seen in the next subsection.

We may also express (18) in state space form in the following way. A dissipative system with no direct term can always be written as [28, Theorem 3]:

 ˙x(t) =(J−K)x(t)+Bu(t), (20) y(t) =BTx(t),

where is positive semidefinite and anti symmetric. To account for (18)–(19), it suffices to introduce a white noise term in (20) in the following way,

 ˙x(t) =(J−K)x(t)+Bu(t)+√2kBTLv(t), (21) y(t) =BTx(t),

where the matrix is chosen such that . Equation (21) is the called the Langevin equation of the dissipative system.

Dissipative systems with memoryless elements are of great practical significance. Proposition 2 needs to be slightly modified for such systems.

###### Proposition 3.

The total response of a linear dissipative memoryless system in thermal equilibrium of temperature and for perturbations is given by

 y(t)=n(t)+ku(t)=n(t)+ksu(t)+kau(t), (22)

where is symmetric positive semidefinite, and anti symmetric. The fluctuations is a white Gaussian stochastic process, where

 En(t) =0, Rn(t,s) :=En(t)n(s)T=2kBTksδ(t−s).

Proposition 3 follows from Proposition 2 if one extracts the dissipative term from the memoryless model and puts . However, the integral in (18) runs up to and cuts the impulse in half. The re-normalized impulse response of the dissipative term is therefore given by (see also Section 2.2). The result then follows using this by application of Proposition 2. One explanation for why the anti symmetric term can be removed from is that it can be realized exactly using the direct term in linear lossless approximation (1). An application of Proposition 3 gives the Johnson-Nyquist noise of a resistor.

###### Example 2.

As first shown theoretically in [15] and experimentally in [14], a resistor of temperature generates white noise. The total voltage over the resistor, , satisfies , , where is the current.

### 3.1 Derivation using linear lossless approximations

Let us first consider systems without memoryless elements. The general solution to the linear lossless system (1) is then

 y(t)=BTeJtx0+∫t0BTeJ(t−s)Bu(s)ds, (23)

where is the initial state. It is the second term, the convolution, that approximates the dissipative in the previous section. In Proposition 1, we showed that the first transient term is not desired in the approximation. Theorems 1 and 2 suggest that we will need a system of extremely high order to approximate a linear dissipative system on a reasonably long time horizon. When dealing with systems of such high dimensions, it is reasonable to assume that the exact initial state is not known, and it can be hard to enforce . Therefore, let us take a statistical approach to study its influence. We have that

 Ey(t)=BTeJtEx0+∫t0BTeJ(t−s)Bu(s)ds,t≥0,

if the input is deterministic and is the expectation operator. The autocovariance function for is then

 Ry(t,s) :=E[y(t)−Ey(t)][y(s)−Ey(s)]T (24) =BTeJtX0e−JsB,

where is the covariance of the initial state,

 X0:=EΔx0ΔxT0, (25)

where is the stochastic uncertain component of the initial state, which evolves as . The positive semidefinite matrix can be interpreted as a measure of how well the initial state is known. For a lossless system with total energy we define the internal energy as

 U(x):=12ΔxTΔx,Δx:=x−Ex. (26)

The expected total energy of the system equals . Hence the internal energy captures the stochastic part of the total energy, see also [25, 30]. In statistical mechanics, see [6, 7, 8], the temperature of a system is defined using the internal energy.

###### Definition 3 (Temperature).

A system with internal energy [Joule] has temperature [Kelvin] if, and only if, its state belongs to Gibbs’s distribution with probability density function

 p(x)=1Zexp[−U(x)/kBT], (27)

where is Boltzmann’s constant and is the normalizing constant called the partition function. A system with temperature is said to be at thermal equilibrium.

When the internal energy function is quadratic and the system is at thermal equilibrium, it is well known that the uncertain energy is equipartitioned between the states, see [6, Sec. 4-5].

###### Proposition 4.

Suppose a lossless system with internal energy function has temperature at time . Then the initial state belongs to a Gaussian distribution with covariance matrix , and .

Hence, the temperature is proportional to how much uncertain equipartitioned energy there is per degree of freedom in the lossless system. There are many arguments in the physics and information theory literature for adopting the above definition of temperature. For example, Gibbs’s distribution maximizes the Shannon continuous entropy (principle of maximum entropy [40, 41]). In this paper, we will simply accept this common definition of temperature, although it is interesting to investigate more general definitions of temperature of dynamical systems.

###### Remark 5.

Note that lossless systems may have a temperature at any time instant, not only at . For instance, a lossless linear system (23) of temperature at that is driven by a deterministic input remains at the same temperature and has constant internal energy at all times, since is independent of . To change the internal energy using deterministic inputs, nonlinear systems are needed as explained in [23, 24]. For the related issue of entropy for dynamical systems, see [23, 25].

If a lossless linear system (23) has temperature at as defined in Definition 3 and Proposition 4, then the autocovariance function (24) takes the form

 Ry(t,s)=kBT⋅BTeJ(t−s)B=kBT⋅[BTeJ(s−t)B]T,

since . It is seen that linear lossless systems satisfy the fluctuation-dissipation theorem (Proposition 2) if we identify the stochastic transient in (23) with the fluctuation, i.e. (assuming ), and the impulse response as . In particular, is a Gaussian process of mean zero because is Gaussian and has mean zero.

Theorem 2 showed that dissipative systems with memory can be arbitrarily well approximated by lossless systems. Hence we cannot distinguish between the two using only input-output experiments. One reason for preferring the lossless model is that its transient also explains the thermal noise that is predicted by the fluctuation-dissipation theorem. To explain the fluctuation-dissipation theorem for systems without memory (Proposition 3), one can repeat the above arguments by making a lossless approximation of (see Theorem 1). The anti symmetric part does not need to be approximated but can be included directly in the lossless system by using the anti symmetric direct term in (12).

Proposition 3 captures the notion of a heat bath, modelling it (as described in Theorem 1) with a lossless system so large that for moderate inputs and within the chosen time horizon, the interaction with its environment is not significantly affected.

That the Langevin equation (21) is a valid state-space model for (18) is shown by a direct calculation. If we assume that (20) is a low-order approximation for a high-order linear lossless system (23), in the sense of Theorem 2, it is enough to require that both systems are at thermal equilibrium with the same temperature in order to be described by the same stochastic equation (18), at least in the time interval in which the approximation is valid.

### 3.2 Nonlinear lossless approximations and thermal noise

Lossless approximations are not unique. We showed in Section 2.4 that low-order nonlinear lossless approximations can be constructed. As seen next, these do not satisfy the fluctuation-dissipation theorem. This is not surprising since they can also model active systems. If they are used to implement linear dissipative systems, the linearized form is not in the form (1). By studying the thermal noise of a system, it could in principle be possible to determine what type of lossless approximation that is used.

Consider the nonlinear lossless approximation (14) of , where is scalar and can be either positive or negative. The approximation only works well when the initial total energy is large. To study the effect of thermal noise, we add a random Gaussian perturbation to the initial state so that the system has temperature at according to Definition 3 and Proposition 4. This gives the system

 ˙xE(t) =k√2E0u(t)2,xE(0)=√2E0+Δx0,EΔx0=0, (28) yE(t) =k√2E0xE(t)u(t),EΔx20=kBT.

The solution to the lossless approximation (28) is given by

 yE(t)=ku(t)+ns(t)+nd(t), (29)

where

 nd(t)=k22E0u(t)∫t0u(s)2ds,ns(t)=kΔx0√2E0u(t). (30)

We call the deterministic implementation noise and the stochastic thermal noise. The ratio between the deterministic and stochastic noise is

 nd(t)ns(t)=k√2E0Δx0∫t0u(s)2ds=ku(0)2√2E0Δx0t+O(t2),

as , if is continuous. Hence, for sufficiently small times and if , the stochastic noise is the dominating noise in the lossless approximation (28). Since