Integrated information in the thermodynamic limit
Abstract
The capacity to integrate information is a prominent feature of biological and cognitive systems. Integrated Information Theory (IIT) provides a mathematical approach to quantify the level of integration in a system, yet its computational cost generally precludes its applications beyond relatively small models. In consequence, it is not yet well understood how integration scales up with the size of a system or with different temporal scales of activity, nor how a system maintains its integration as its interacts with its environment. Here, we show for the first time how measures of information integration scale when systems become very large. Using kinetic Ising models and meanfield approximations from statistical mechanics, we show that information integration diverges in the thermodynamic limit at certain critical points. Moreover, by comparing different divergent tendencies of blocks of a system at these critical points, we delimit the boundary between an integrated unit and its environment. Finally, we present a model that adaptively maintains its integration despite changes in its environment by generating a critical surface where its integrity is preserved. We argue that the exploration of integrated information for these limit cases helps in addressing a variety of poorly understood questions about the organization of biological, neural, and cognitive systems.
pacs:
87.18.Nq, 87.18.Sn, 87.19.L, 87.19.lj, 87.19.loAlso at ]ISAAC Lab, Aragón Institute of Engineering Research, University of Zaragoza, Zaragoza, Spain.Also at ]Ikerbasque, Basque Foundation for Science, Bizkaia, Spain, and the Centre for Computational Neuroscience and Robotics, Department of Informatics, University of Sussex, Brighton, UK.
I Introduction
Cognition emerges from the distributed activity of many neural, bodily, and environmental processes. The problem of largescale integration of neural processes is crucial for understanding how unified cognitive and behavioural states arise from the coordination of these distributed sources of activity. Evidence Bassett and Gazzaniga (2011); Pessoa (2014) suggests this integration process is nondecomposable: we cannot understand it in terms of modular components or timescales of activity in a neural system nor can we decouple neural activity from the external environment Aguilera et al. (2013). The different components and scales of the cognitive process are deeply intertwined. Yet, the functional components of the process are still able to maintain their differentiated characteristics in order to generate complex adaptive patterns of behaviour.
How can such an integrated, complex organization emerge and be maintained? One of the most attractive theories is that neural activity is coordinated into a coherent yet flexible ‘dynamic core’ Varela (1995); Tononi and Edelman (1998), which balances opposing tendencies of integration and segregation. The interplay of these opposing tendencies generates information (understood as described by information theory, not in a semantic or intensional sense) that is highly diversified among functional parts of the nervous system, and at the same time unified into a coherent whole, thus displaying highly complex patterns of activity.
Integrated information is defined as the information possessed by a system which is above and beyond the information that is available from the sum of its parts. Information integration was first conceived of as linked to consciousness Tononi and Edelman (1998); Oizumi et al. (2014) but it can also be manifested without awareness Mudrik et al. (2014) and has been used more generally to describe biological autonomy Marshall et al. (2017). Although the topic of information integration has received interest from different communities in recent years, we are still lacking a full understanding of the principles that underlie this fundamental process: how integrative forces are deployed temporally or spatially, how they cope with the surrounding environment, or how they scale with the size of the system.
Different approaches have proposed ways to formalize this idea; one of the most popular has been developed as a measure connected to consciousness under the name of integrated information theory (IIT, Oizumi et al. (2014)). In its latest versions, IIT is based on interventionist notions of causality to characterize the causal influences between the components of a system Oizumi et al. (2014); Marshall et al. (2017). That is, instead of assessing whether a system is unified into a coherent whole by analysing its behaviour in regular conditions, IIT proposes that the forces integrating the behaviour of the system are better captured by observing its behaviour under perturbations.
IIT postulates that any subset of elements of the system is a mechanism
Nevertheless, current formulations of IIT present some limitations for studying brain organization. We propose that, in order to extend current uses of IIT to capture some important aspects of neural organization, we should reexamine some of the main assumptions behind its conception:

Scalability. A system can present different levels of integration at different spatial and temporal scales Hoel et al. (2016); Marshall et al. (2018) and, in general, it is not well understood how integration behaves at different scales. However, analyses of the properties of braininspired statistical mechanical models have unveiled how many processes in neural systems take the form of phase transitions occurring in the thermodynamic limit, showing properties that diverge as the size of the system scales up. Here we apply models from statistical mechanics to describe integration in terms of the tendencies of the system near the thermodynamic limit.

Temporal deployment The latest formulations of IIT Oizumi et al. (2014) attempt to capture the dynamical nature of neural systems by focusing on the dynamics of causal processes, not taking the stationarity or ergodicity of the system as initial assumptions. Nevertheless, IIT is only measured at a single scale of temporal activity, since it analyses integration in the causal power of a mechanism from one time step to the next. We propose a modification of to study integration along different temporal spans, showing that systems at critical points must be evaluated for very long timescales.

Nondecomposability. As we mentioned, empirical evidence points to the nondecomposability of cognitive processes. In its current formulation, IIT considers elements outside the system under analysis as independent sources of noise. Here, we propose instead that the level of integration of a system must be evaluated in the context of the other systems it is coupled to (therefore not assuming that elements in the environment are just sources of statistical noise). This modification allows us to correctly determine the boundary between a system and its environment in the thermodynamic limit.
Some of the assumptions and modifications pointed out here are explained later in the text, and a detailed account and comparison between IIT and our measure of integrated information can be found in Appendix B. Part of the reasons why some of the aspects above have not yet been addressed is that, due to its computational complexity, the application of current IIT measures is limited to very small systems and short timescales. In general, IIT has been tested in small toy models (e.g., Oizumi et al. (2014); Albantakis et al. (2014), although some alternative formulations try to circumvent this problem, see Barrett and Seth (2011); Oizumi et al. (2016)). In contrast, our approach, apart from the modifications proposed above, introduces some simplifications and approximations in order to measure integrated information as a system scales to very large sizes. Specifically, we introduce a simple kinetic Ising model of infinite size and quasihomogeneous connectivity, which presents an exact mean field solution that we use to simplify the calculation of integrated information of the mechanisms of a system.
We proceed as follows. First, we introduce the kinetic Ising model and a mean field approximation for solving it. Then, we introduce a measure of integrated information and how it can be computed for Ising models of infinite size. Finally, we present the results of our method in three scenarios of increasing complexity for depicting how integrated information can be used to characterize an integrated system interacting with an environment:

In the first scenario, we illustrate the measure in a simple homogeneous model. In the thermodynamic limit, we can describe integrated information as the susceptibility of the system to changes in the direction of the minimum information partition (MIP). Consequently, integrated information diverges when the system is near a critical point.

The second scenario depicts a system coupled to an external environment, showing the system and the systemenvironment compound both show integrated information diverging near a shared critical point. Nevertheless, depending on the coupling strength, the system and systemenvironment mechanisms present different speeds of divergence. This allows us to delimit the dominant dynamical unit where integration takes place.

Finally, we tune the parameters of a system with internal selfregulation in order to present high integration when interacting with a variety of environments. The system’s internal inhibitory interactions generate a critical surface in the direction of the MIP which describe the viable region in which its integration is maintained.
The results presented here represent a first attempt at using integrated information theory to delimit the boundaries of a family of infinite size systems that can be formally solved. The interest of the study is twofold. First, it allows us to check some of the assumptions of IIT and propose some modifications to maintain its consistency in the thermodynamic limit, and to propose a way to adapt IIT measures for very large systems. Second, although the results presented are obtained from relatively simple cases, they offer an opportunity to speculate about how the causal integrative forces of a system (both its internal cohesion and the coupling with its environment) might scale up when a system approaches the thermodynamic limit. This provides an opportunity to address unanswered questions about integrated organization of biological and cognitive systems.
Ii Model
We start by describing a general model defining causal temporal interactions between variables. Looking for generality, we use the least structured statistical model (i.e., a maximum caliber model PressÃ© et al. (2013)) defining causal correlations between pairs of units from one time step to the next. We study a kinetic Ising model where binary variables (Ising spins) evolve in discrete time, with synchronous parallel dynamics (Fig 1.A). Given the configuration of spins at the previous step, , the spins are independent random variables drawn from the distribution:
(1) 
where
(2) 
The parameters and represent the local fields at each spin and the couplings between pairs of spins, and is the inverse temperature of the model. Without loss of generality, we assume .
ii.1 Mean field kinetic Ising model
We focus on the particular case of a system of infinite size where . The system is divided into different regions (from 1 to 3 depending on the example), and the coupling values are positive and homogeneous for each intra or interregion connections , where and are regions of the system with sizes and .
For a system of infinite size (and all regions with also infinite size), a mean field approximation allows to calculate the field of all units belonging to the region as:
(3)  
where is the mean field of region . Now we can exactly define the update of the mean field variables using Eq 1 as:
(4) 
ii.2 Integrated Information
We use a simplified version of the integrated effect information described by IIT Oizumi et al. (2014), implementing some modifications to measure the scaling of integrated information in the thermodynamic limit. In IIT, both causes and effects of a state are taken into account. For simplicity, we consider only the effects of a particular state. Also, although IIT is defined only for the immediate effects after one update of the state of the system, we define integrated information for an arbitrary number of updates of the system. See Appendix B for a list of the differences between IIT and the measure employed here.
Given an initial state , we define a ‘mechanism’ (following IIT’s nomenclature) as a subset of units . The integrated information of mechanism , , is defined as the distance between the behaviour of the original system to a system in which a partition (from the set of possible bipartitions) is applied over the units in . Fig 1.B depicts an example of a partition. When a partition is applied, the input coming from the partitioned connections of the system is replaced by a random unconstrained noise (binary white noise in the case of an Ising model).
Once the partition is applied, the probability of the state is computed after updates, injecting noise at the partitioned elements during each update. Then, integrated information is defined as the distance between the conditional probability distributions at :
(5) 
where refers to the Wasserstein distance (also known as earth mover’s distance) used by IIT to quantify the statistical distance between probability distributions. Here specifies the partition applied over the elements of mechanism , , where design the blocks of a bipartition of the mechanism at the current state , and refer to the blocks of a bipartition (not necessarily the same) of the updated state of the units . Fig 1.B represents the partition .
Specifically, IIT computes integrated information as the value of under the minimum information partition (MIP), which is the partition of mechanism with the least difference to the original partition (i.e., ). We use to denote the minimum information partition integrated information .
Note that some important modifications have been made. The most important one is that IIT considers the element outside of the mechanism as unconstrained sources of noise. As we show in Figure B2, this can radically change the results of integrated information theory, provoking spurious divergences at points other than the critical point. To preserve the consistency of our results, we let elements outside the mechanism operate normally (see Appendix B3 for details).
ii.3 Integrated information in the mean field model
We now show how integrated information can be computed for the mean field approximation of the Ising model. Thanks to the mean field approximation we can simplify the calculation of the probability distributions of trajectories to a Markovian distribution dependent on the mean field at the previous step.
In general, can be computed recursively applying the equation:
(6)  
In the kinetic Ising model of inifine size, the mean fields of the system’s regions are deterministic, and instead of computing all possible paths of the system we can just determine the evolution of the mean field using Equation 4. Moreover, knowing the mean field of each region we can calculate the value of the effective fields received by each unit using Equation 3. Also, given the mean field value at a specific point, the posterior probability distribution of each unit is independent. Thus, using the value of computed evolving from we can just take:
(7) 
In this context, the calculation of the Wasserstein distance is drastically simplified, and we can compute as the sum of distances between independent binary variables, which is equivalent to computing the difference of their mean values:
(8) 
Once we can calculate , we still have the problem of finding the MIP of the system. Luckily, since the connectivity of the system is homogeneous for all nodes in the same region, finding the MIP is equivalent to finding the partition that cuts the lowest number of connections. For infinite size systems where interregion connections are not zero, the MIP will be one of the possible partitions that isolate just one node of the system. Also, the partition that isolates a single unit in time always has a smallest value of than the partition isolating a node at time , since partitioning the posterior distribution corresponds to a larger difference between and . Thus, finding the MIP corresponds to finding which region of the system least affects future states when one node of the region is isolated in the partition at time (e.g., Fig 1.B).
Finally, we define a function that recursively applies the update rule in Eq 4 for steps starting from an initial value with a mean field value , such that . In our mean field approximation, applying the MIP to the quasihomogeneous system described here is equivalent to just removing one connection
Assuming that the number of units per region is equal to and , we get a simplified expression for the partitioned and unpartitioned terms:
(9)  
where and in the partitioned case and otherwise. Now, computing the unpartitioned and partitioned cases case is equivalent to calculating and respectively. Given this, assuming we calculate the final form of as a sum of the derivatives of function :
(10)  
where . Note that this defines integrated information in similar terms as the magnetic susceptibility typically used in Ising model to identify critical points, although in this case the mean field of the system is differentiated along the parametrical direction of the MIP.
Iii Results
iii.1 Integrated information in a homogeneous kinetic Ising model
As an example, we compute numerically the value of for a homogeneous kinetic Ising model containing just one region (as in Fig 1.A). The system only has one parameter describing all connections in the system.
For different values of , we compute for the system starting from a state in the stationary solution. For doing so, we need to know how to compute , that is, how to compute the mean field of units at a particular time.
First, we numerically compute and for different values of for the largest mechanism of size , and different values of and equal to the value at the stationary solution of the system. We estimate the values of the derivative as , using a value .
As we observe in Fig 2.B, the value of appears to diverge as grows
Similarly, we numerically compute by using the mean field of the model iterating the equation until the difference in the update is smaller than . In Fig 2.C we observe how shows an apparent divergence around . Also, we compute the value of for different mechanisms of size as a fraction of . As shown in Fig 2.D, the resulting value of integrated information still diverges but is smaller than the value of of the whole system, indicating that the system is irreducible.
We can go beyond numerical computations and calculate the analytic value of near the point of divergence by approximating the values of around as the value of that solves . Note that, more generally, we can compute just by substituting .
The system has a trivial solution at . Also, for the solution at becomes unstable and a pair of solutions in a pitchfork bifurcation (Fig 2.A). Although there is no analytic solution of the problem, we can compute the value of near by approximating the hyperbolic tangent by the first two terms of its Taylor series, finding that in the limit we approximate:
(11)  
Thus, we can confirm that the value of integrated information diverges when . This has interesting implications. If the a system must maintain a growing level of integration as its size increases, it needs to be poised near a critical point that shows a divergence of the values of .
iii.2 Integrated information for measuring agentenvironment asymmetries
We apply the proposed measure of integrated information to the problem of determining the boundaries of an agent interacting with an environment. One of the central aspects of agency is the existence of agentenvironment asymmetries Barandiaran et al. (2009), in which the part of the system corresponding to the agent is able (to an extent) to define the terms in which it relates to the surrounding milieu. We test our measure in two simple cases of systems presenting asymmetries in their interaction.
We model a minimal case of agentenvironment bidirectional interaction with two regions, where only the region corresponding to the ‘agent’ has the capacity to selfregulate through recurrent connections (Fig 3.A). In this case, we have two regions and , only presenting selfconnections. The mean field of the system is updated as:
(12)  
For simplicity, we study the case where agentenvironment connections are symmetric , and . We numerically compute that the system has an similar solution than the previous case, presenting a pitchfork bifurcation at a critical point (Fig 3.B,D).
Moreover, we compute the value of for different mechanisms. For the case of the mechanism covering the whole system , we look for the MIP of the system by isolating single units of the mechanism at (Fig 1.B). If we isolate a unit from region , two connections are cut (one with value and one with value ). Otherwise, if we isolate a unit from region , only one connection with value is cut. Thus, this second partition is always the MIP of the system (). For , the only candidate for the MIP is isolating one node from , therefore cutting one connection with value (). Finally, for mechanism there are no connections within the mechanism and we can directly conclude that .
Now, the question is: can we consider as an individual system or should we consider instead the coupled system as an integrated unit? Assuming , we define the values of integrated information as:
(13)  
In Fig 3.C,E we estimate the value of for an initial value corresponding to the stationary solution of the system, and values of (left) and (right). We observe that in all cases the values of diverge next to the critical point. Nevertheless, in the first case when agentenvironment connections are weaker next to the critical point. In contrast, for stronger couplings between agent and environment in the vicinity of the critical point.
We validate this results by solving Eq 12 near criticality. We do this by transforming it into a system of one equation and finding its Taylor series near . We obtain that near the critical point:
(14)  
Similarly, and are easily calculated by adding a factor to the partitioned connections. Thus, we find that the location of the critical point which is the one satisfying (Fig 3.F). From here, we get:
where and .
Near the critical point at , the values of integrated information are approximated by the expressions:
(15)  
by defining and we describe with these variables the level of integrated information of the agent and the whole agentenvironment system near the critical point. In Fig 3.G we observe that there is a transition from the agent being the system with highest integration to the agentenvironment.
This illustrates that, near a critical point, the value of integrated information scales up indefinitely in an agentenvironment system. In the case of symmetric interaction only for some cases the agent can be identified as the predominant integrated unit in the system, while in others the agentenvironment system is the predominant unit.
iii.3 Adaptive integrated information facing environmental diversity
We have just used integrated information for delimiting an agent interacting with a static environment. The environment was ‘passive’ in the sense that it showed no selfinteraction. This is not a common scenario, since typically environments change and display their own dynamics. A key aspect of agency is the ability of an agent to sometimes modulate the coupling with its environment to preserve its individuality Barandiaran et al. (2009), generating an interactional asymmetry between agent and environment. Thus, a basic feature of living and cognitive sysetms is to display adaptive mechanisms regulating its coupling to the environment to maintain their level of functional integration for a range of external environments.
In order to characterize a scenario that is more realistic in this sense, we model an agent with two internal regions and , interacting with an environment with recurrent connections (Fig 4.A). and present feedback loops that we fit in order to maintain integration for a range of environmental parametric configurations. The evolution of the system is described by:
(16) 
where and describe in vector and matrix notation the mean fields and couplings of the three regions , and . We assume that the environment is defined by two parameters defining the agent environment couplings and environmental selfcouplings . Values of will be tuned maximize integration. We also assume .
In particular, the system will be tuned to maximize the integrated information of the agent , while facing 5 different environments defined by values of from the set . We calculate for different parameters as in previous cases, testing the possible candidates for the MIP (in the case of , the MIP candidates are isolating one node either from or ) and the one minimizing integrated information is chosen.
In order to find the parameter values that maximize for the set of environments, we first run a microbial genetic algorithm Harvey (2009) and then (using the parameters of the agent with larger fit) a NelderMead algorithm Nelder and Mead (1965) to adjust the results. For both algorithms, the fitness function is defined as the value of , with some exceptions. For reducing the computational cost, the value of will be for the genetic algorithm and for the NelderMead algorithm. In order to avoid the case where and are independent integrated units, fitness will be set to zero in the case that or are larger than . As well, fitness is set to zero in the case where does not converge to a stationary value.
After running the genetic and NelderMead algorithms, we obtain an agent with parameters , and . This agent presents negative weights connecting and and positive selfcoupling values. Thus, each region will inhibit the behaviour of the other while reinforcing itself, therefore regulating its activity to maintain high integrated information for the presented environments.
After tuning the parameters of the system, we evaluate its behaviour for different environments. For the values of used during training, we find that the mean values of regions and , and display a similar transition than the previous examples (Fig 4.B shows the case of , although other cases are similar). Moreover, we can observe that there is a divergence of the values of for a range of values of (Fig 4.C). For larger values of the transition disappears and the values of do not diverge.
The example presented here displays an important qualitative change in comparison with the previous one. The value of diverges but not only for a specific environment due to fine tuning of its selfcouplings as in the previous case. Instead, the divergence is maintained for an approximate range of of . Moreover, this divergence is also maintained if we modify the value of , displaying a surface in which the value of diverges (Fig 4.D). This means that the points of divergence from previous examples are transformed here into a critical surface that maintains integration of the system for a wide range of environmental parameters. That is, the agent is able to selfregulate to some extent to maintain its integration, and thus its viability as an agent.
Iv Discussion
We have proposed a simplified measure of IIT measure which, together with mean field approximations in a kinetic Ising model, allows us to capture for the first time integrated information in very large systems, up to the thermodynamic limit. Using this method we are able to compute for infinite size mean field kinetic Ising models with quasihomogeneous infiniterange connectivity.
Our models, although highly idealized, allow us to speculate about some of the properties of integrated neural organization. First, we observe that, despite the infinite size of the models, the amount of integrated information is bounded for most of its parameter space. Only near critical points does the level of total integrated information diverge, suggesting that integrated entities need to organize themselves close to critical points in their parameter space to maintain their level of integration as their size grows. This suggests that it may be of greater interest to describe brain organization in terms of diverging tendencies of IIT in different modules rather than in therms of the specific values of in finite systems.
Furthermore, we have shown how integrated information can be used to define the boundaries between a system and its environment by comparing the divergent tendencies of their joint and individual integration. For doing so, some of the assumptions of current formulations of IIT had to be modified. Our tests show that integrated information cannot, in principle, be measured in a brain independently of its environment (bodily and extrabodily), nor by assuming that the environment is an independent source of noise. Moreover, our results show that near critical points in some cases both the system and systemenvironment integrated information diverges. Nevertheless, we have shown how to characterize the dominant dynamical unit by comparing the difference in the diverging tendencies between the two configurations.
Our results connect the emergence of boundaries of integration with phenomena related to criticality. Systems near critical points are maximally sensitive to changes in some directions of their parameter space (generally measured as the susceptibility of the system to changes in this parametrical direction). Here, we capture integrated information measures by applying different partitions to the system which are interpreted as changes in particular directions of the parameter space. Thus, the level of integrated information corresponds to the susceptibility of the system for the minimum information partition, i.e., the partition with the less significant effect on the system’s causal powers. In the framework of IIT, systems highly sensitive to their minimum information partition are interpreted as maximally irreducible units.
This could allow further simplifications in order to measure integrated information in complex models or even empirical setups. By testing the behaviour of a system when perturbations in its components are introduced (i.e., noise injected in partitioned connections), the integrated information of a mechanism can be described as the minimal susceptibility the set of perturbations from different partitions. The connection between information integration and critical susceptibility allows us to speculate about the link between integration and properties that have been postulated as pervasive of living beings such as selforganized criticality Bak et al. (1988).
By interpreting integrated information in terms of susceptibilities in the parametrical direction of partitions of the system, we can think of integration as the sensitivity of a system to the decoupling of the modules composing it. In our last example, we show how internal regulation results in the capacity for maintaining this susceptibility for a range of different situations. We hypothesize that this can be achieved by similar dynamics as those of systems showing selforganized criticality, which are attracted to critical points of maximum susceptibility. This could be achieved in systems capable of selforganizing near points where they maintain maximal sensitivity to the integrity of their internal organization while they interact with changing environments (e.g., maintaining internal invariances near critical surfaces Aguilera and Bedia (2018)).
V Conclusion
The core ideas that IIT intends to capture apply to a variety of poorly understood questions in biological and cognitive systems. By introducing some modifications to take into account different temporal spans and influences from the environment, and studying the behaviour of integration measures in the thermodynamic limit, we have shown the existence of critical points that maximise a system’s integration, for instance, an organism or a cognitive agent. The fact that our case studies remain general and abstract (we do not specify any detail about the neural, sensorimotor, and environmental processes involved) suggests that robust individuation and susceptibility towards loss of integration are inherent consequences of maximising a tendency towards integration, and so they are likely to be observable trends in all systems that are able to do so.
A limiting assumption in our approach is the homogeneity of the elements within a each region. Biological systems cannot be assumed to present such a degree of homogeneity and the variability in their components and interactions has to be accounted for. Our framework, however, can take into account higher levels of heterogeneity by introducing a larger number of regions. In the case of three regions we observe that tuning the parameters of the system results in the extensions of critical points of diverging integration into regions of the parameter space. We expect (but have not yet verified) that increasing the number of interacting regions will still result in critical regions of divergent integration. In brain network models, it has been found that structural heterogeneity can generate extended criticallike regions Moretti and MuÃ±oz (2013), thus we may also expect this phenomenon to be reinforced in the presence of higher heterogeneity in our models. Our results are also limited to models with stationary solutions where we can evaluate the stable solution when the temporal span tends to infinity. This is not a limitation of the method, though. The results of more realistic systems presenting cyclic or chaotic dynamics could be harder to interpret, although they are in principle tractable within the framework presented here and could be explored in further work.
The models presented here allow a shift of focus toward the integrative tendencies of systems as they grow or evolve. This opens up the applicability of IIT to a range of questions about changes over developmental and evolutionary time. Even in the simple cases we have considered, the existence of critical points that maximise integration may be important for understanding apparent jumps in complexity, including the transitions at the origin of life Walker and Davies (2013) or cognitive developmental transitions Molenaar and van der Maas (2004).
Focusing on the divergent tendencies of integration measures, we are able to capture the asymmetry of agentenvironment interactions. Thinking interactions with the environment in this terms is fruitful for grounding notions such as the individuality or the autonomy of a system. Often, these concepts have been formalized in terms of selfdetermination and independence from an environment Bertschinger et al. (2008); Krakauer et al. (2014). By contrast, our examples show how both integration of a system and integration between system an environment can diverge together, while the level of individuality of the system can be quantified by the relative divergence speed of both terms. This is a robust finding obtained under the minimal assumptions and thus, we suggest, a general trend in large complex systems. The key data of interest as systems scale up are not so much the absolute values of integrated information, but the relative divergent tendencies of system integration and systemenvironment integration.
In addition, by exploring different kinds of agentenvironment configurations, we observe that agents assumed to maximise integration are likely to do so robustly for a range of environmental situations due to the existence of critical surfaces. The existence of these surfaces that guarantee maximal integration is coherent with postulates at the theoretical foundations of adaptive systems research, such as the existence of ’regions of viabilityâ that guarantee the integrity of an agent Ashby (1960); Barandiaran and Egbert (2014). While such conditions of viability have often been imposed by the designer or assumed to be given by evolutionary or material constraints, our approach allows to think of them as critical regions emerging at the level of the integrative forces of the system. This illustrates how viability regions could scale up from material or pregiven constraints to regions defined by increasing complexity of the integrated activity of a system.
Acknowledgements.
M.A. was supported by the UPV/EHU postdoctoral training program ESPDOC17/17 and project TIN201680347R funded by the Spanish Ministry of Economy and Competitiveness.Appendix A Iit 3.0
In the last version of integrated information theory Oizumi et al. (2014), integrated information of a subset of elements of a system is computed as follows. For a system of elements in state , we describe the inputoutput relationship of the system elements through its corresponding transition probability function , describing the probabilities of the transitions from one state to another for all possible system states. IIT requires that satisfies the Markov property (i.e., the state at time only depends on the state at time ), and that the current states of elements are independent, conditional on the past state of the system. This conditions are satisfied by the asymmetric kinetic Ising model used here.
For any two subsets of , called the mechanism and the purview , we can define the cause and effect repertoires of over , that is, how in its current state , constrains the potential past or future states of or . Cause and effect repertoires of the system are described by the probability distributions and .
The integrated causeeffect information of is then defined as the distance between the causeeffect repertoires of the mechanism, and the causeeffect repertoires of their minimum information partition (MIP) over the purview that is maximally irreducible,
(17)  
where is a partition of the mechanism into two halves, and the cause or effect probability distribution under the partition,
(18)  
The integrated information of the mechanism is the minimum of its corresponding integrated cause and effect information,
(19) 
The integrated information of the entire system is then defined as the distance between the causeeffect structure of the system, and causeeffect structure defined by its minimum information partition, eliminating constraints from one part of the system to the rest:
(20) 
For both the integrated information of a mechanism () and the integrated information of a system (), distance is computed as the Wasserstein or earth moverâs distance. Finally, if is a subset of elements of a larger system, all elements outside of are considered as part of the environment and are conditioned on their current state throughout the causal analysis. Further details of the steps described here can be found in reference [6]
Appendix B Simplified integrated information
Measures in this paper are inspired by the IIT framework, although we apply some modifications and simplifications.
B1 Temporal range
First, as we mentioned in the paper, we only compute the value of for the effects of the current system in a posterior state , while IIT computes the minimum of and at and . However, IIT can also deal with temporal scales. As IIT operates with the transition probability matrix of a system, one could compute this matrix from time to time and apply the operations for computing over it. This implies that the noise injected by partitions in the connections that are cut down is only injected at time , and the system behaves normally for the following steps. In our case, we inject independent noise at every update from time to .
We can test the difference between the two approaches in a homogeneous kinetic Ising model with and . As we showed in the paper, applying a continuous noise injection in partitions makes the value of diverge around the critical point as grows (Figure B2.A). Conversely, in we only apply an initial noise injection at partitioned connections, we see that the measured operates in a different way (Figure B2.B). In this case, as increases, the value of decreases as the system regains stability in its original position. Moreover, for small values of the values of with larger are above the critical point. However, we observe that, the closer we are to the critical point, the slower decreases. This is due to a phenomena called ‘critical slowing down’, a phenomena characteristic of critical dynamics in which the response time of a system near criticality tends to infinity. Curiously, if we compute the cumulative sum of the values of from 1 to , i.e. (Figure B2.C), we observe that the result is identical to the case of continuous noise injection at partitions.
B2 Purview
In IIT, integrated information of a mechanism is evaluated not only for a particular mechanism , but also for a purview . If the mechanism defines which units of we take into account, the purview defines which units of the future state we take into account. Given these subset of present and future states, partitions are computed over the join space of and , and the purview with maximum integrated information for its MIP is selected. Here for simplicity, we apply the partition over and , making the mechanism and purview coincide, and the distance for computing integrated information is measured for the distance of all elements of the system, not only the elements contained in the purview.
Allowing more choices of purview could make a big difference in certain systems, although in the quasihomogeneous systems tested in the paper the differences are small.
B3 Elements outside of a mechanism
More importantly, there are significant differences from the IIT framework in the way we treat the elements that are outside of the evaluated mechanism . In IIT, elements outside the mechanism are assumed to be unconstrained (i.e., as random as possible). We decided to modify this assumption because it can have dramatic effects when measuring the behaviour of large systems. Specifically, assuming unconstrained elements outside the mechanism create an artifact that provokes a shift in the critical point of the system (this will be detailed in future work).
Let’s exemplify an example using an homogeneous Ising model with local fields and couplings . As we shown, compute the value of for the whole system using continuous noise injection at partitioned connection yields a divergence around the critical point at . Now, we will show what is the behaviour of its internal mechanisms assuming different behaviours of the units outside of the mechanism.
First, we compute values of mechanism covering a fraction of the system (since the system is homogeneous, any fraction we choose has the same behaviour) assuming that the elements outside of the mechanism keep operating normally (Figure B2.A). In this case, we observe that the divergence of is maintained, although the value of decreases with the mechanism size.
In contrast, if we accept IIT assumption and take the elements of the mechanism as independent sources of noise, the behaviour of changes radically. In this case, the divergence is maintained but takes place at a different value of the parameter (Figure B2.B). This happens because independent sources of noise have a zero mean field value, and thus the phase transition of the system takes place at larger values of that compensate the units that now are contributing with a zero mean field. Thus, we think that considering the elements outside of the mechanism as independent sources of noise can be misleading about the operation of mechanisms that are embedded in large systems.
A less loaded assumption could be maintaining the state of the units outside of the mechanism with the static values that they had at time , that is, maintaining their mean field constant. We can see at Figure B2.C that this behaviour is also not satisfactory, since for mechanism sizes smaller than the value of decreases very rapidly, and it is exactly zero at the critical point. We can understand this thinking that the effect of constant fields is equal to adding a value of equal to the input from frozen units, therefore breaking the symmetry of the system and precluding a phase transition.
B4 Mean field approximation of partitioned systems
We simplify the calculation of the probabilities and by using a mean field approximation described by Equations 3 and 4.
In the case of partitioned systems for computing integrated information, cutting connections injects uniform noise on the input node. In the mean field approximation, this would be equivalent to inject a zero mean field signal, which is equivalent to setting to zero the affected connection weights when computing .
B5 Integrated conceptual information
Finally, once is computed, IIT proposes a second level of calculations for computing integrated conceptual information where new bidirectional partitions are applied to the system. In our case, given the homogeneity of the system, we do not compute conceptual information since all the mechanisms composing each set have similar behaviour. Thus, for simplicity we do not apply a second level of partitions.
Footnotes
 We use the term ‘mechanism’ in the technical sense described later and not in the specific sense of efficient causality of the mechanical kind. We acknowledge that different forms of causal and enabling relations between processes are possible and relevant, yet we retain the term ‘mechanism’ in this context to remain coherent with the existing literature.
 Note that cutting a connection implies injecting uniform noise, which in the mean field approximation is equivalent to substitute the input by a zero mean field or just removing the connection. This is an important approximation that allow us to obtain the main results of the paper, although it will only be valid when the size of the system is infinite and is larger than .
 Note that for larger the partition is applied for a longer period of time, and therefore yielding larger integration in some cases.
References
 Danielle S. Bassett and Michael S. Gazzaniga, “Understanding complexity in the human brain,” Trends in Cognitive Sciences 15, 200–209 (2011).
 Luiz Pessoa, “Understanding brain networks and brain organization,” Physics of Life Reviews 11, 400–435 (2014).
 Miguel Aguilera, Manuel G. Bedia, Bruno A. Santos, and Xabier E. Barandiaran, “The situated HKB model: how sensorimotor spatial coupling can alter oscillatory brain dynamics,” Frontiers in Computational Neuroscience 7 (2013), 10.3389/fncom.2013.00117.
 Francisco J. Varela, “Resonant cell assemblies: a new approach to cognitive functions and neuronal synchrony,” Biological Research 28, 81–95 (1995).
 Giulio Tononi and Gerald M. Edelman, ‘‘Consciousness and Complexity,” Science 282, 1846–1851 (1998).
 Masafumi Oizumi, Larissa Albantakis, and Giulio Tononi, “From the phenomenology to the mechanisms of consciousness: integrated information theory 3.0,” PLoS computational biology 10, e1003588 (2014).
 Liad Mudrik, Nathan Faivre, and Christof Koch, “Information integration without awareness,” Trends in Cognitive Sciences 18, 488–496 (2014).
 William Marshall, Hyunju Kim, Sara I. Walker, Giulio Tononi, and Larissa Albantakis, “How causal analysis can reveal autonomy in models of biological systems,” Phil. Trans. R. Soc. A 375, 20160358 (2017).
 We use the term ‘mechanism’ in the technical sense described later and not in the specific sense of efficient causality of the mechanical kind. We acknowledge that different forms of causal and enabling relations between processes are possible and relevant, yet we retain the term ‘mechanism’ in this context to remain coherent with the existing literature.
 Erik P. Hoel, Larissa Albantakis, William Marshall, and Giulio Tononi, “Can the macro beat the micro? Integrated information across spatiotemporal scales,” Neuroscience of Consciousness 2016 (2016), 10.1093/nc/niw012.
 William Marshall, Larissa Albantakis, and Giulio Tononi, “Blackboxing and causeeffect power,” PLOS Computational Biology 14, e1006114 (2018).
 Larissa Albantakis, Arend Hintze, Christof Koch, Christoph Adami, and Giulio Tononi, “Evolution of integrated causal structures in animats exposed to environments of increasing complexity,” PLoS computational biology 10, e1003966 (2014).
 Adam B. Barrett and Anil K. Seth, “Practical Measures of Integrated Information for TimeSeries Data,” PLOS Computational Biology 7, e1001052 (2011).
 Masafumi Oizumi, Shunichi Amari, Toru Yanagawa, Naotaka Fujii, and Naotsugu Tsuchiya, “Measuring Integrated Information from the Decoding Perspective,” PLOS Computational Biology 12, e1004654 (2016).
 Steve PressÃ©, Kingshuk Ghosh, Julian Lee, and Ken A. Dill, “Principles of maximum entropy and maximum caliber in statistical physics,” Reviews of Modern Physics 85, 1115–1141 (2013).
 Note that cutting a connection implies injecting uniform noise, which in the mean field approximation is equivalent to substitute the input by a zero mean field or just removing the connection. This is an important approximation that allow us to obtain the main results of the paper, although it will only be valid when the size of the system is infinite and is larger than .
 Note that for larger the partition is applied for a longer period of time, and therefore yielding larger integration in some cases.
 Xabier E Barandiaran, Ezequiel Di Paolo, and Marieke Rohde, “Defining agency: Individuality, normativity, asymmetry, and spatiotemporality in action,” Adaptive Behavior 17, 367–386 (2009).
 Inman Harvey, “The Microbial Genetic Algorithm,” in Advances in Artificial Life. Darwin Meets von Neumann, Lecture Notes in Computer Science (Springer, Berlin, Heidelberg, 2009) pp. 126–133.
 J. A. Nelder and R. Mead, “A Simplex Method for Function Minimization,” The Computer Journal 7, 308–313 (1965).
 Per Bak, Chao Tang, and Kurt Wiesenfeld, “Selforganized criticality,” Physical review A 38, 364 (1988).
 Miguel Aguilera and Manuel G. Bedia, “Adaptation to criticality through organizational invariance in embodied agents,” Scientific Reports 8, 7723 (2018).
 Paolo Moretti and Miguel A. MuÃ±oz, “Griffiths phases and the stretching of criticality in brain networks,” Nature Communications 4, 2521 (2013).
 Sara Imari Walker and Paul C. W. Davies, “The algorithmic origins of life,” Journal of The Royal Society Interface 10, 20120869 (2013).
 P. C. M. Molenaar and H. L. J. van der Maas, “Commentary on: ”Piaget’s stages: The unfinished symphony of cognitive development” by D.H. Feldman,” New Ideas in Psychology 22 (2004), https://doi.org/10.1016/j.newideapsych.2004.11.003.
 Nils Bertschinger, Eckehard Olbrich, Nihat Ay, and JÃ¼rgen Jost, “Autonomy: An information theoretic perspective,” Biosystems Modelling Autonomy, 91, 331–345 (2008).
 David Krakauer, Nils Bertschinger, Eckehard Olbrich, Nihat Ay, and Jessica C Flack, “The information theory of individuality,” arXiv preprint arXiv:1412.2447 (2014).
 William Ross Ashby, Design for a brain; the origin of adaptive behavior (New York, Wiley, 1960).
 Xabier E. Barandiaran and Matthew D. Egbert, “Normestablishing and normfollowing in autonomous agency,” Artificial Life 20, 5–28 (2014).