# Information thermodynamics for interacting stochastic systems without bipartite structure

###### Abstract

Fluctuations in biochemical networks, e.g., in a living cell, have a complex origin that precludes a description of such systems in terms of bipartite or multipartite processes, as is usually done in the framework of stochastic and/or information thermodynamics. This means that fluctuations in each subsystem are not independent: subsystems jump simultaneously if the dynamics is modeled as a Markov jump process, or noises are correlated for diffusion processes. In this paper, we consider information and thermodynamic exchanges between a pair of coupled systems that do not satisfy the bipartite property. The generalization of information-theoretic measures, such as learning rates and transfer entropy rates, to this situation is non-trivial and also involves introducing several additional rates. We describe how this can be achieved in the framework of general continuous-time Markov processes, without restricting the study to the steady-state regime. We illustrate our general formalism on the case of diffusion processes and derive an extension of the second law of information thermodynamics in which the difference of transfer entropy rates in the forward and backward time directions replaces the learning rate. As a side result, we also generalize an important relation linking information theory and estimation theory. To further obtain analytical expressions we treat in detail the case of Ornstein-Uhlenbeck processes, and discuss the ability of the various information measures to detect a directional coupling in the presence of correlated noises. Finally, we apply our formalism to the analysis of the directional influence between cellular processes in a concrete example, which also requires considering the case of a non-bipartite and non-Markovian process.

###### Contents

- I Introduction
- II Setup and brief reminder of the bipartite case
- III Information measures for non-bipartite processes
- IV Markov diffusion processes and second law of information thermodynamics
- V Stationary bi-dimensional Ornstein-Uhlenbeck process
- VI Generalization to a non-Markovian process and application to the study of directional influence between cellular processes
- VII Summary
- A Expression of the learning rates for a pure jump Markov process in discrete space
- B Proof of various relations of the main text
- C Sufficient statistic for a non-bipartite dynamics
- D Analytical expressions for a stationary bi-dimensional Ornstein-Uhlenbeck process
- E Multi-time-step TE rates for the three-component process of Sec. VI.2

## I Introduction

This work is motivated by the observation that there exist, broadly speaking, two different sources of fluctuations contributing to the stochasticity of biochemical processes, for instance in cell metabolic networks. The first one – commonly called “intrinsic”– is due the small numbers of biomolecules involved in a given reaction. The second one – the “extrinsic” source– arises from the heterogeneity in the physical environment of the cell and the occurrence of (many) other biochemical reactions (see, e.g., E2002 (); TWW2006 (); DCLME2008 (); UW2011 (); GW2014 (); K2014 (); HT2016 (); LEM2017 (); LNTRL2017 ()). As a result, the stochastic noises have a nontrivial structure that invalidates a description in terms of bipartite or multipartite processes (or systems). In the case of signaling networks, for instance, this means that the noise in the input biochemical signal – to be detected– and the noise of the reactions that form the network are correlated.

In contrast, in the context of stochastic and information thermodynamics, a recent and active field of research, as reviewed in PHS2015 (), the bipartite assumption is usually made (e.g., for modeling Maxwell’s demons) as it simplifies the theoretical analysis and allows the contribution of each components of the system to the entropy production to be clearly identified AJM2009 (); BHS2013 (); HE2014 (); IS2013 (); DE2014 (); BHS2014 (); HBS2014 (); HS2014 (); IS2015 (); SS2015 (); H2015 (); HBS2016 (); SLP2016 (); I2016 (); RH2016 (); MS2018 (); I2018 (). Although the abandon of the bipartite (or multipartite H2015 ()) structure seriously complicates the interpretation of information and thermodynamics exchanges, our objective in the present work is to show that a detailed description is still available. The price to pay is that several information-theoretic measures must be added to those already introduced in the literature (information flow, aka learning rate, and transfer entropy), which characterize how information is exchanged between two interacting systems in the course of their dynamical evolution. The positive side is that this will allow us to propose, at least for diffusion processes, a generalized version of the so-called “second law of information thermodynamics” that applies to non-bipartite systems. (This second-law inequality differs from the one recently obtained in SLP2018 ().)

In this paper, we will consider in particular non-equilibrium systems that can be modeled by continuous-time Markov processes (diffusions, jump processes, or both). It turns out however that many definitions or relations are also valid beyond the Markovian description and we will therefore provide a general framework. Moreover, in order to offer a sufficiently general perspective, we assume the presence of multiplicative noises (additive noises being regarded as a only special case) and we do not restrict the study to steady-state situations, as is often done. On the other hand, we only consider averaged quantities and do not derive fluctuation relations. We leave this important issue to future investigations.

The paper is organized as follows. In Sec. II, we describe our general setup and briefly review the existing results for bipartite processes. To this aim, we first present the formal tools that will be used throughout the paper, in particular those related to continuous-time Markov processes. We then define the information-theoretic measures that are commonly considered in the framework of information thermodynamics and that satisfy some useful inequalities. We then recall the corresponding formulations of the second law. In Sec. III, the bipartite assumption is lifted and we introduce the new information measures needed for a proper description of information and thermodynamics exchanges. The usual inequalities are then generalized. In Sec. IV, to make all of the introduced definitions and relations more explicit, we focus on Markov diffusion processes. The central result is the derivation of a generalized second law involving both forward and backward transfer entropies. In Sec. V, as a special case, we consider a stationary bidimensional Ornstein-Uhlenbeck process with additive noises for which a full analytical study can be carried out. This allows us to illustrate on a simple example the ability of the various information measures to infer a directional coupling in the underlying dynamics. Finally, in Sec. VI, we generalize the formalism to a class of non-Markovian processes and apply it to the study of the directional influence between cellular processes. This complements the previous experimental and theoretical investigations of Refs. K2014 (); LNTRL2017 (). A brief summary is given in Sec. VII and some demonstrations and technical details are presented in Appendices.

## Ii Setup and brief reminder of the bipartite case

### ii.1 Setup

We are interested in the information and thermodynamic exchanges between two subsystems, denoted by and , of a stochastic system whose microscopic states at time are denoted by . The random variables and may be multivariate ( and are then vectors), continuous or discrete, and live in arbitrary, and not necessarily identical, spaces. In the following, the full process can be Markovian or non-Markovian, but even in the former case the individual dynamics of and (viewed as coarse-grained descriptions of ), are in general non-Markovian.

When is a continuous-time Markovian process R1989 (); G2004 (); C2005 (); EK2005 (); RW2005 (), which may involve a combination of drift, diffusion, and jump, the building block of its description is the transition probability , for and all . Such an object is generated by a kernel , called the Markovian generator, according to the forward Kolmogorov equation,

(1) |

where is the appropriate measure for either continuous or discrete space. For pure jump processes in a discrete space the Markovian generator is a matrix involving the transition rates (where by convention the transition is from to )

(2) |

On the other hand, pure diffusion processes in continuous space are usually described by the stochastic differential equations

(3) |

where , , , and are time-dependent vector fields, and the ’s are independent Brownian motions. The non-negative covariance (diffusion) matrix is the block matrix with components

(4) |

where the symbol applied to two vectors and means the matrix construction . The associated Markovian generator is then obtained as

(5) |

where is the (Fokker-Planck) second-order differential operator

(6) |

obtained by interpreting Eqs. (3) with Ito convention. (In the above expression the last term is a priori ambiguous because is not necessarily symmetric. The notation should thus be interpreted as , using Einstein summation convention for repeated indices.) As usual, one can introduce the probability currents

(7) |

and recast the Fokker-Planck (FP) equation as the continuity equation,

(8) |

Note that the currents are defined up to a divergence-free vector. We will use the above definition in the following, which means that correction terms must be added in all expressions involving the currents if another decomposition is adopted.

In continuous time, the Markov process is called bipartite if the transition probability satisfies the property

(9) |

when . From the forward Kolmogorov equation (1), the above condition is equivalent to assuming that the Markovian generator can be written as

(10) |

where and are called partial generators and the delta function becomes a Kronecker matrix in the case of discrete space. The partial generators must individually satisfy conservation of probability, i.e., . In particular, a pure jump process is bipartite if the transition rates have the additive form , which implies Eq. (10), as can be readily checked. On the other hand, a pure diffusion process is bipartite if , i.e., if the diffusion matrix is block diagonal. From Eqs. (4), a sufficient condition is that for all , which means that the overall noises affected and are independent.

### ii.2 Definition of information measures

We start our reminder of information thermodynamics by recalling the definitions of several information-theoretic measures that are usually introduced in this framework. As already stressed, a consequence of the abandon of the bipartite assumption will be a proliferation of information measures. It is thus desirable to use transparent notations as much as possible (already in the bipartite case, the same quantity may have different names or be defined with different signs, which is a source of confusion). It is also important to clearly state under which condition a relation is valid: in the following, the capital letter M on the left of an equation indicates that the joint process is Markovian, the capital letter B indicates that the process is Markovian and bipartite, and the capital letter S indicates that the process is stationary.

1. Information flows, aka learning rates

Information flows quantify how the dynamical evolution of or contributes to the change in the mutual information, (where is the joint probability distribution and , its marginals), which characterizes the instantaneous correlation between and at time . These information-theoretic measures were first considered in the context of interacting diffusion processes AJM2009 () and subsequently introduced in the analysis of the thermodynamics of continuously-coupled, discrete-space stochastic systems HE2014 (); HBS2014 (); SS2015 (). Consider for instance the dynamical evolution of . Introducing the time-shifted mutual information (with ) and taking the limit , one then defines AJM2009 ()

(11) |

where here and in the following we use the bracket symbol for an expectation. Similarly, is defined from . (For brevity, it will be implicit in the following that similar quantities can be defined by exchanging and .) One could also introduce Shannon entropies instead of mutual informations (using , with and CT2006 ()), but we will try to avoid too many equivalent formulations throughout the paper. Note that the definition (11) is not restricted to a steady state, but in this case the information flow identifies with the so-called learning rate defined in BHS2014 (); HBS2016 (). Hereafter, we will also use this denomination for note00 ().

As discussed in AJM2009 (); HE2014 (), learning rates have a clear meaning: For instance, reveals that the dynamical evolution of increases the mutual information on average. In other words, the future of is more predictable than its present from the viewpoint of AJM2009 (), or is “learning about” through its dynamics HE2014 ().

For a bipartite Markov process, one has the natural decomposition of the time derivative of AJM2009 (); HE2014 (); MS2018 ()

(12) |

as will be explicitly illustrated below for Markov processes.

2. Transfer entropy

Transfer entropy (TE) is an information-theoretic measure that is used to assess directional dependencies between time series and possibly infer causal interactions S2000 (); PKHS2001 (). It may be viewed as a non-linear extension of Granger causality G1969 (), which is a concept widely used in econometrics and neuroscience (see e.g. AM2013 () of a review). Instead of , one considers the change in the mutual information between stochastic trajectories observed during some time interval, say from to , and which are denoted by and hereafter. Specifically, we define the TE rate from to in continuous time as

(13) |

where we have assumed that for infinitesimal and used the chain rule for mutual information, , where is a conditional mutual information CT2006 (), to go from the first line to the second one. Like the learning rate, has a clear interpretation in terms of information transfer: It quantifies how much the knowledge of the trajectory reduces the uncertainty about (for infinitesimal) when the trajectory is already known. As a conditional mutual information, is a non-negative quantity, whereas has no definite sign. Note that the present definition is more general than the one adopted in HBS2016 () or MS2018 () since we do not assume at this stage that the joint process is Markovian. When the joint process is Markovian, one has , and after some manipulations Eq. (II.2) can be rewritten as

(14) |

which clearly shows the difference with the learning rate . (The original definition of transfer entropy in discrete time is even more general since the number of time bins in the past of and may be different S2000 (). This definition can also be extended to continuous time SLP2016 (); BBHL2016 (); SL2018 (). Finally, see Ref. WKP2013 () for a rigorous definition via a partition of the time interval.)

For a Markov bipartite process, in full analogy with Eq. (12), one has the decomposition

(15) |

where we have used Eq. (9) and assumed for infinitesimal. Let us stress that the trajectory mutual information is a time-extensive quantity, in contrast with . As a consequence, does not vanish in a steady state and , whereas . (Throughout the paper, quantities without explicit time-dependence will refer to a steady state.)

Since the TE rates are conditioned on whole trajectories, they are very hard to compute numerically and one often replaces and by the states and at the latest time . One then defines AJM2009 (); HBS2014 (); HBS2016 (); MS2018 ()

(16) |

which is called a “single-time-step” TE rate in MS2018 () to contrast with the “multi-time-step” TE rate . We will adopt this terminology hereafter.

### ii.3 Inequalities and sufficient statistic

For a Markov bipartite process in a steady state, one has the two inequalities HBS2014 (); HBS2016 ()

(17) | |||||

(18) |

(We do not report the demonstration here since more general inequalities will be derived in the next section.) The second inequality expresses the intuitive idea that the instantaneous value of is less informative about the instantaneous value of that the whole past trajectory of . Within the context of a sensory system, where and denote the states of the signal and the sensor, respectively, this prompted the authors of HBS2016 () to introduce a so-called “sensory capacity” as a tool to quantify the performance of the sensor (assuming that ). In particular, reaches its maximal value when inequality (18) is saturated. As discussed in MS2018 (), inequalities (17) and (18) are both saturated when the following condition is satisfied:

(19) |

which means that “ is a sufficient statistic of ” CT2006 () and no more information about is contained in the trajectory than in alone. By construction, this condition is realized by the Kalman-Bucy filter (which reduces to the Wiener-Kolmogorov filter in a steady state) K1968 (). Interestingly, such an optimization of information transfer may occur in actual biological signaling circuits HSWIH2016 (); MS2018 ().

As can be expected, things become more complicated when the bipartite assumption is dropped, and we show in the next section that this requires introducing additional information-theoretic measures.

### ii.4 Entropy production and second law

While the conventional second law of thermodynamics deals with the irreversibility of the whole process , information measures can be used to formulate modified versions of the second law (which may then be called “second laws of information thermodynamics ”) that assess the irreversibility of one subsystem alone, say , in the presence of the coupling with the other subsystem. The key quantity is the (fixed-time) entropy production rate which is defined by considering as an open system and as just a fictitious external protocol (or idealized work source) SU2012 (). On general grounds (see, e.g., VdBE2015 ()), can be decomposed as

(20) |

where , the time derivative of the marginal Shannon entropy , is the rate of change of the entropy of , and is the rate of change of the entropy of the environment or the bath. (From now on the Boltzmann constant is set equal to , so that we may use instead of as Shannon entropy.) As is now standard in the framework of stochastic thermodynamics (see, e.g., M2003 ()), the cumulative entropy change can be expressed as the mean of the logratio of the probabilities to observe a trajectory in forward and backward “experiments”. As is treated as an external protocol, one has

(21) |

where is the probability of the trajectory of for a fixed trajectory of and is the corresponding probability of the time-reversed trajectory footnote_probdetached (). For a bipartite pure jump process, is then given by

(22) |

whereas for a bipartite diffusion process it is equal to

(23) |

where the diffusion matrix and the probability current have been defined above, and is the modified drift defined by CG2008 (). In cases where the thermodynamics of subsystem can be defined and the environment is a single thermal bath at a given inverse temperature , identifies with the heat flow from to the bath.

Since the two subsystems are coupled, may become negative, but a lower bound is provided by including the information shared with . For a bipartite Markov process, the various second-law-like inequalities proven in the literature AJM2009 (); IS2013 (); HE2014 (); HBS2014 (); IS2015 () can be summarized by the following hierarchy of bounds,

(24) |

or, for the time-integrated quantities,

(25) |

where and , with by marginalization. (Note that some of these inequalities were first proven in a steady-state setup.) The key fact is that the tightest bound is provided by the learning rate.

## Iii Information measures for non-bipartite processes

### iii.1 Learning rates

We first search for a generalization of Eq. (12). The decomposition of introduced above in the bipartite case suggests to define the new rate

(26) |

in addition to . Accordingly, will be denoted hereafter to make the notations more consistent. Indeed, can be also written as whereas . This also suggests to call a backward learning rate, in contrast with the forward rate . (We stress that the definition of as a derivative has nothing to do with stationarity but simply results from a Taylor expansion in : Indeed, for any continuously differentiable function , one has . Note also that .)

Thanks to the introduction of , and of the corresponding , Eq. (12) is now replaced by the two relations

(27) |

The distinction between the forward and backward learning rates in the general (i.e., non-bipartite) case suggests to introduce the symmetric quantities

(28) |

such that Eqs. (27) now yield

(29) |

As will be seen below, these symmetric learning rates play a natural role in the thermodynamics since they vanish when the joint system is at equilibrium, i.e., when the condition of detailed balance is satisfied. On the other hand, the non-symmetric rates vanish when the processes and are independent.

In the general case, but in a steady state, there are only two independent learning rates since and Eqs. (27) yield the “conservation” relations

(30) |

and thus

(31) |

A basic feature of the learning rates is that they can be expressed in terms of the two-point probability distribution , with , and the corresponding marginal distributions. (Hereafter, variables with a prime symbol such as will always refer to a time in the two-point probability distribution functions.) Starting from the definitions (11) and (26), and using the normalization condition , we get

(32a) | ||||

(32b) |

where we have used the notation for the joint probability distribution at the same time (and , for the marginal distributions); similar expressions are obtained for and . (We recall that we use the same notations for continuous and discrete spaces. In the latter case, integrals must be replaced by sums.) From these equations, we readily see that the learning rates vanish when , which means that the processes and are independent. We stress that these formulas are fully general and do not require the joint process to be Markovian. However, further simplifications occur in the Markovian case, as one can replace the derivative with respect to by using the Kolmogorov equation and introducing the Markovian generator , which leads to

(33) |

Furthermore, by using the decomposition in Eq. (27) and the expression of the time derivative of the mutual information,

(34) |

is obtained as

(35) |

Finally, after using the conservation of probability , we obtain the symmetric learning rates as

(36) |

From the above expression, one can immediately see that if is an equilibrium process, such that the probability current vanishes, the symmetric learning rates both vanish. In Sec. IV.1, we will provide more explicit expressions for these learning rates in the case of a Markovian diffusion process. The case of a Markovian pure jump process in a discrete space is treated in Appendix A.

If we now come back to the special situation of a bipartite process, due to the additive form of the Markovian generator [Eq. (10)] and the conservation of probability, the formulas given in Eqs. (33) and (35) coincide and

(37) |

A similar relation holds for and . There are thus only two independent learning rates instead of four, and Eq. (27) then gives back Eq. (12). Note that the present equalities between learning rates differ from those expressed in Eq. (III.1). More generally, one must carefully distinguish relations valid for a bipartite process from those valid for a non-bipartite process in a steady state note1 (). Of course, if the joint process is both bipartite and stationary, Eqs. (III.1) and (37) imply that only one independent learning rate subsists, for instance .

We conclude this part on the learning rates by briefly discussing their content in terms of information. The various quantities , , , and their counterparts for the subsystem, all measure the change of mutual information between and due to different aspects of an infinitesimal dynamical evolution of one subsystem or the other. When both and are strictly positive, and as a direct consequence , one can plausibly conclude that is “learning about” through its dynamics. However, the non-bipartite structure of the process allows cases with and , which have no manifest interpretation in the context of learning.

### iii.2 Transfer entropy rates

Transfer entropy rates and can be defined by the same formulas as in the bipartite case: see Eq. (II.2). They keep the same property of being non-negative and the same meaning as information-theoretic measures. However, whereas the generalization of the decomposition of the variation of mutual information [Eq. (12)] to a non-bipartite process was straightforward, a similar operation for the pathwise mutual information [Eq. (II.2)] turns out to be problematic. Indeed, using the second line of Eq. (II.2), one can write

(38) |

where is a symmetric quantity measuring the “instantaneous” dependence of the two processes (see, e.g., C2011 () for the discrete-time version). However, there is a serious obstruction, at least for diffusion processes: is either zero if is bipartite [cf. Eq. (II.2) above] or infinite otherwise! In other words, and thus itself are infinite for non-bipartite diffusions. Indeed, as noticed in N2016 (), and have a non-zero quadratic variation if the noises are correlated, which results in the singularity of their joint distribution with respect to the product of the corresponding marginals. Such a difficulty does not occur in discrete time, as briefly discussed in note note4 (), nor in discrete space.

We thus turn our attention to another class of rates which are well-defined in the non-bipartite case and will allow us to generalize the important inequality (18). These rates are associated with the mutual informations and which are the natural quantities from the viewpoint of filtering theory K1962 (); LS2001 (). Consider for instance the case where is an unobserved signal and is the observation. We then introduce the TE rate, called “filtered transfer entropy rate”,

(39) |

which quantifies how much the prediction of , for infinitesimal, is improved by knowing in addition to the trajectory . In general there is no simple relation between and , except when the process is Markov bipartite, where

(40) |

Indeed, from the definitions (II.2) and (39), we have the general equation

(41) |

and it can be proven that the right-hand side of this equation is equal to when the process is bipartite. The demonstration for jump processes is in Appendix C of HBS2014 () and for diffusion processes it is in given in Appendix B of the present paper.

There is of course a single-time-step TE rate corresponding to , which is defined as

(42) |

Then,

(43) |

and

(44) |

in the bipartite case, as will be illustrated below for diffusion processes [see Eq. (79)].

As for the learning rates, the single-time-step TE rates can be expressed in terms of the two-point probability distribution , with , and the corresponding marginal distributions:

(45a) | ||||

(45b) |

In contrast with the learning rates [Eqs. (32)], one cannot generally introduce derivatives with respect to in these expressions. On the other hand, we shall derive explicit expressions for Markov diffusion processes: see Sec. IV.2 below.

### iii.3 Backward transfer entropy rates

Finally, we add to our list of information-theoretic measures another TE rate which can be used to assess the directionality of information transfer (see Sec. VI.2) and which will play an important role in the generalization of the second law (see Sec. IV.4). To this aim, we slightly change our notations by assuming that the trajectories of and are now observed in the time interval . We then define