Abstract
Comprehending complex systems by simplifying and highlighting important dynamical patterns requires modeling and mapping higherorder network flows. However, complex systems come in many forms and demand a range of representations, including memory and multilayer networks, which in turn call for versatile communitydetection algorithms to reveal important modular regularities in the flows. Here we show that various forms of higherorder network flows can be represented in a unified way with networks that distinguish physical nodes for representing a complex system’s objects from state nodes for describing flows between the objects. Moreover, these socalled sparse memory networks allow the informationtheoretic community detection method known as the map equation to identify overlapping and nested flow modules in data from a range of different higherorder interactions such as multistep, multisource, and temporal data. We derive the map equation applied to sparse memory networks and describe its search algorithm Infomap, which can exploit the flexibility of sparse memory networks. Together they provide a general solution to reveal overlapping modular patterns in higherorder flows through complex systems.
1 \articlenumberx \doinum10.3390/—— \pubvolumexx 2017 \copyrightyear2017 \history \NewEnvironmyequation

(1) 
Mapping HigherOrder Network Flows in Memory and Multilayer Networks with Infomap \AuthorDaniel Edler, Ludvig Bohlin, and Martin Rosvall * \AuthorNamesDaniel Edler, Ludvig Bohlin, and Martin Rosvall \corres Correspondence: martin.rosvall@umu.se
1 Introduction
To connect structure and dynamics in complex systems, researchers model, for example, people navigating the web Brin and Page (1998), rumors wandering around among citizens Newman (2005), and passengers traveling through airports Lordan et al. (2015), as flows on networks with random walkers. Take an air traffic network as an example. In a standard network approach, nodes represent airports, links represent flights, and random walkers moving on the links between the nodes represent passengers. This dynamical process corresponds to a firstorder Markov model of network flows: a passenger arriving in an airport will randomly continue to an airport proportional to the air traffic volume to that airport. That means, for example, that two passengers who arrive in Las Vegas, one from San Francisco and one from New York, will have the same probability to fly to New York next. In reality, however, passengers are more likely to return to where they come from Rosvall et al. (2014). Accordingly, describing network flows with a firstorder Markov model suffers from memory loss and washes out significant dynamical patterns Belik et al. (2011); Pfitzner et al. (2013); Poletto et al. (2013) (Figure 1a). Similarly, aggregating flow pathways from multiple sources, such as different airlines or seasons in the air traffic example, into a single network can distort both the topology of the network and the dynamics on the network Mucha et al. (2010); Kivelä et al. (2014); Boccaletti et al. (2014); De Domenico et al. (2015) (Figure 1c,d).
As a consequence, the actual patterns in the data, such as pervasively overlapping modules in air traffic and social networks Ahn et al. (2010), cannot be identified with communitydetection algorithms that operate on firstorder network flows Rosvall et al. (2014). To take advantage of higherorder network flows, researchers have therefore developed different representations, models, and communitydetection algorithms that broadly fall into two research topics: memory networks and multilayer networks. In memory networks, higherorder network flows can represent multistep pathways such as flight itineraries Rosvall et al. (2014); Peixoto and Rosvall (2017); Xu et al. (2016); Scholtes (2017), and in multilayer networks, they can represent temporal or multisource data such as multiseason air traffic Mucha et al. (2010); De Domenico et al. (2013, 2015); Wehmuth et al. (2016) (Figure 1b). Whereas memory networks can capture where flows move to depending on where they come from (Figure 1e), multilayer networks can capture where flows move to depending on the current layer (Figure 1f). Because memory and multilayer networks originate from different research topics in network science, revealing their flow modules have been considered as disparate community detection problems. With the broad spectrum of higherorder network representations required to best describe diverse complex systems, the apparent need for specialized communitydetection methods raises the question: How can we reveal communities in higherorder network flows with a general approach?
Here we show that describing higherorder network flows with socalled sparse memory networks and identifying modules with long flow persistence times, such as groups of airports that contain frequently traveled routes, provides a general solution to reveal modular patterns in higherorder flows through complex systems. When modeling flows in conventional networks, a single node type both represents a complex system’s objects and describes the flows with the nodes’ links (Figure 1c,d). Sparse memory networks, however, discriminate physical nodes, which represent a complex system’s objects, from state nodes, which describe a complex system’s internal flows with their links (Figure 1e,f). In sparse memory networks, state nodes are not bound to represent, for example, previous steps in memory networks or layers in multilayer networks, but are free to represent abstract states such as lumped states Persson et al. (2016) or states in multilayer memory networks, which we demonstrate with multistep and multiquarter air traffic data. In this way, a sparse memory network is a MultiAspect Graph with two aspects: the physical object and the flow state such as memory or layer Wehmuth et al. (2016). We show that various higherorder network flow representations, including memory and multilayer networks, can be represented with sparse memory networks. We also provide a detailed derivation of the informationtheoretic map equation for identifying hierarchically nested modules with long flow persistence times in sparse memory networks, and introduce a new version of the map equation’s search algorithm Infomap that exploits the flexibility of sparse memory networks.
2 Modeling Network Flows
The dynamics and function of complex systems emerge from interdependences between their components Barabási and Albert (1999); Barrat et al. (2004); Boccaletti et al. (2006). Depending on the system under study, the components can be, for example, people, airports, hospital wards, and banks. The interdependence, in turn, often comes from flows of some entity between the components, such as ideas circulating among colleagues, passengers traveling through airports, patients moving between hospital wards, or money transferred between banks. To efficiently capture such flows through complex systems, researchers model them with random walks on networks.
2.1 FirstOrder Network Flows
In a firstorder network representation, the flow direction only depends on the previously visited physical node. That is, for a random walker that moves between physical nodes , which represent objects in a complex system, and in steps generates a random variable sequence , the transition probabilities
(2) 
only depend on the previously visited node’s outlinks. With link weights between physical nodes and , and total outlink weights of node , the firstorder transition probabilities are
(3) 
which give the stationary visit rates
(4) 
To ensure ergodic stationary visit rates, from each node we can let the random walker teleport with probability , or with probability if the node has no outlinks, to a random target node proportional to the target node’s inlink weight Lambiotte and Rosvall (2012).
While a firstorder model is sufficient to capture flow dynamics in some systems, recent studies have shown that higherorder flows are required to capture important dynamics in many complex systems Song et al. (2010); Belik et al. (2011); Pfitzner et al. (2013). The standard approach of modeling dynamical processes on networks with firstorder flows oversimplifies the real dynamics and sets a limit of what can actually be detected in the system (Figure 1). Capturing critical phenomena in the dynamics and function of complex systems therefore often requires models of higherorder network flows Meiss et al. (2008); Chierichetti et al. (2012); Singer et al. (2014); Takaguchi et al. (2011); Holme and Saramäki (2012); Pfitzner et al. (2013); Song et al. (2010); Belik et al. (2011); Poletto et al. (2013).
2.2 HigherOrder Network Flows
In higherorder network representations, more information than the previously visited physical node’s outlinks are used to determine where flows go. Examples of higherorder network representations are memory, multilayer, and temporal networks.
In memory networks, the flow direction depends on multiple previously visited nodes. Specifically, for a random walker that steps between physical nodes and generates a sequence of random variables , the transition probabilities for a higherorder flow model of order ,
(5) 
depend on the previously visited physical nodes. Assuming stationarity and visited nodes from steps ago to previously visited in sequence , the thorder transition probabilities between physical nodes and correspond to firstorder transition probabilities between state nodes and . We use subscript in to highlight the state node’s physical node. With link weights between state nodes and , and total outlink weights , the transition probabilities for memory networks are
(6) 
As in a firstorder network representation, teleportation can ensure ergodic stationary visit rates Rosvall et al. (2014).
In multilayer networks with the same physical nodes possibly present in multiple layers, the flow direction depends on both the previously visited physical node and layer . Similar to the memory representation, the multilayer transition probabilities between physical nodes in layer and in layer correspond to firstorder transition probabilities between states nodes and .
In many cases, empirical interaction data only contain links within layers such that only for . Without data with links between layers, it is useful to couple layers by allowing a random walker to move between layers at a relax rate . With probability , a random walker follows a link of the physical node in the currently visited layer, and with probability the random walker relaxes the layer constraint and follows any link of the physical node in any layer. In both cases, the random walker follows a link proportional to its weight. With total outlink weight from physical node in layer , and total outlink weight from physical node across all layers, which both correspond to total intralayer outlink weights as long as there are only empirical links within layers, the transition probabilities for multilayer networks with modeled interlayer transitions are
(7) 
where is the Kronecker delta, which is 1 if and 0 otherwise. In this way, the transition probabilities capture completely separated layers for relax rate and completely aggregated layers for relax rate . The system under study and problem at hand should determine the relax rate. In practice, for relax rates in a wide range around 0.25, results are robust in many real multilayer networks De Domenico et al. (2015).
With both intralayer and interlayer links such that also for , either from empirical data or modeled interlayer links , the transition probabilities for multilayer networks can be written in their most general form,
(8) 
where again is the total outlink weight from node in layer . For directed multilayer networks, teleportation can ensure ergodic stationary visit rates .
2.3 Sparse Memory Networks
While memory networks and multilayer networks operate on different higherorder interaction data, the resemblance between Equations (6) and (8) suggests that they are two examples of a more general network representation. In a memory network model, a random walker steps between physical nodes such that the next step depends on the previous steps (Equation (6)). In a multilayer network model, instead the next step depends on the visited layer (Equation (8)). However, as Equations (6) and (8) show, both models correspond to firstorder transitions between state nodes and associated with physical nodes and in the network,
(9) 
Consequently, the state node visit rates are
(10) 
where the sum is over all state nodes in all physical nodes. The physical node visit rates are
(11) 
where the sum is over all state nodes in physical node . Both memory and multilayer networks can be represented with a network of physical nodes and state nodes that neither are bound to represent previous steps nor current layer. Because state nodes are free to represent abstract states, and redundant state nodes can be lumped together, we call this representation a sparse memory network.
2.4 Representing Memory and Multilayer Networks with Sparse Memory Network
To illustrate that sparse memory networks can represent both memory and multilayer networks, we use a schematic network with five physical nodes (Figure 2a). The network represents five individuals, with the center node’s two friends to the left and two colleagues to the right. We imagine that the multistep pathway data come from two sources, say a Facebook conversation thread among the friends illustrated with the top sequence above the network and the solid pathway on the network, and an email conversation thread among the colleagues illustrated with the bottom sequence and the dashed pathway. For simplicity, the pathway among the friends stays among the friends and steps between friends with equal probability. The pathway among the colleagues behaves in corresponding way. We first represent these pathway data with a memory and a multilayer network, and then show that both can be represented with a sparse memory network.
We can represent multistep pathways with links between state nodes in physical nodes. For a memory network representation of a secondorder Markov model, each state node captures which physical node the flows come from. For example, the highlighted pathway step in Figure 2a corresponds to a link between state node in physical node —capturing flows coming to physical node from physical node —and state node in physical node —capturing flows coming to physical node from physical node (Figure 2b). In this way, random walker movements between state nodes can capture higherorder network flows between observable physical nodes.
We can represent multistep pathways with links between state nodes in layers. First, we can map all state nodes that represent flows coming from a specific physical node onto the same layer. For example, we map red state nodes and for flows coming from red physical node to physical nodes and , respectively, in Figure 2b onto the red layer at the bottom in Figure 2c. This mapping gives onetoone correspondence between the memory network and the multilayer network. Therefore we call it a multilayer memory network.
An alternative and more standard multilayer representation exploits that the pathway data come from two sources and uses one layer for each data source (Figure 2d). Network flows are firstorder when constrained to move within individual layers and higherorder when free to move within and between layers. The highlighted pathway step in Figure 2a now corresponds to a step in layer between state node and state node —capturing flows remaining among friends. Again, random walker movements between state nodes can capture higherorder network flows between observable physical nodes.
In our multistep pathway example from two sources, a full secondorder model in the memory network is not required and partly redundant to describe the network flows. We can model the same pathways with a more compact description. Specifically, we can lump together state nodes and in the same physical node if they have identical outlinks, for all . The lumped state node replaces and and assembles all their inlinks and outlinks such that the transition probabilities remain the same. For example, for describing where flows move from physical node , state nodes and have identical outlinks—reflecting that it does not matter which friend was previously active in the conversation—as well as state nodes and —reflecting that it does not matter which colleague was previously active in the conversation. Lumping together all such redundant state nodes, such as or , gives the sparse memory network in Figure 2e, where state nodes no longer are bound to capture the exact previous steps but still capture the same dynamics as the redundant full secondorder model. These unbound state nodes are free to represent abstract states, such as lumped state nodes in a secondorder memory network, but can also represent state nodes in a full secondorder memory network or state nodes in a multilayer network such as in Figure 2d and e. For example, lumping state nodes with minimal information loss can balance under and overfitting for efficient sparse memory networks that represent variableorder network flow models Persson et al. (2016). Consequently, rather than an explosion of application and data dependent representations, sparse memory networks with physical nodes and state nodes that can represent abstract states provides an efficient solution for modeling higherorder network flows.
3 Mapping Network Flows
While networks and higherorder models make it possible to describe flows through complex systems, they remain highly complex even when abstracted to nodes and links. Thousands or millions of nodes and links can bury valuable information. To reveal this information, many times it is indispensable to comprehend the organization of large complex systems by assigning nodes into modules with communitydetection algorithms. Here we show how the communitydetection method known as the map equation can operate on sparse memory networks and allow for versatile mapping of network flows from multistep, multisource, and temporal data.
3.1 The Map Equation for FirstOrder Network Flows
When simplifying and highlighting network flows with possibly nested modules, the map equation measures how well a modular description compresses the flows Rosvall and Bergstrom (2008). Because compressing data is dual to finding regularities in the data Shannon (1948), minimizing the modular description length of network flows is dual to finding modular regularities in the network flows. For describing movements within and between modules, the map equation uses code books that connect node visits, module exits, and module entries with code words. To estimate the shortest average description length of each code book, the map equation takes advantage of Shannon’s source coding theorem and measures the Shannon entropy of the code word use rates Shannon (1948). Moreover, the map equation uses a hierarchically nested code structure designed such that the description can be compressed if the network has modules in which a random walker tends to stay for a long time. Therefore, with a random walker as a proxy for real flows, minimizing the map equation over all possible network clusterings reveals important modular regularities in network flows.
In detail, the map equation measures the minimum average description length for a multilevel map of physical nodes clustered into modules, for which each module has a submap with submodules, for which each submodule has a submap with submodules, and so on (Figure 3). In each submodule at the finest level, the code word use rate for exiting a module is
(12) 
and the total code word use rate for also visiting nodes in a module is
(13) 
such that the average code word length is
(14) 
Weighting the average code word length of the code book for module at the finest level by its use rate gives the contribution to the description length,
(15) 
In each submodule at intermediate levels, the code word use rate for exiting to a coarser level is
(16) 
and for entering the submodules at a finer level is
(17) 
Therefore, the total code rate use in submodule is
(18) 
which gives the average code word length
(19) 
Weighting the average code word length of the code book for module at intermediate levels by its use rate, and adding the description lengths of submodules at finer levels in a recursive fashion down to the finest level in Equation (15), gives the contribution to the description length,
(20) 
At the coarsest level, there is no coarser level to exit to, and the code word use rate for entering the submodules at a finer level is
(21) 
such that the total code rate use at the coarsest level is
(22) 
which gives the average code word length
(23) 
Weighting the average code word length of the code book at the coarsest level by its use rate, and adding the description lengths of submodules at finer levels from Equation (20) in a recursive fashion, gives the multilevel map equation Rosvall and Bergstrom (2011)
(24) 
To find the multilevel map that best represents flows in a network, we seek the multilevel clustering of the network that minimizes the multilevel map equation over all possible multilevel clusterings of the network (see Algorithm 1 in Section 4).
While there are several advantages with the multilevel description, including potentially better compression and effectively eliminated resolution limit Kawamoto and Rosvall (2015), for simplicity researchers often choose twolevel descriptions. In this case, there are no intermediate submodules and the twolevel map equation is
(25) 
3.2 The Map Equation for HigherOrder Network Flows
The map equation for firstorder network flows measures the description length of a random walker stepping between physical nodes within and between modules. This principle remains the same also for higherorder network flows, although higherorder models guide the random walker between physical nodes with the help of state nodes. Therefore, extending the map equation to higherorder network flows, including those described by memory, multilayer, and sparse memory networks, is straightforward. Equations (12) to (25) remain the same with and . The only difference is at the finest level (Figure 3b). State nodes of the same physical node assigned to the same module should share code word, or they would not represent the same object. That is, if multiple state nodes of the same physical node are assigned to the same module , we first sum their probabilities to obtain the visit rate of physical node in module ,
(26) 
In this way, the frequency weighted average code word length in submodule codebook in Equation (14) becomes
(27) 
where the sum is over all physical nodes that have state nodes assigned to module . In this way, the map equation can measure the modular description length of statenodeguided higherorder flows between physical nodes.
To illustrate that the separation between physical nodes and state nodes matters when clustering higherorder network flows, we cluster the red state nodes and the dashed blue state nodes in Figure 4 in two different modules overlapping in the center physical node. For a more illustrative example, we also allow transitions between red and blue nodes at rate / in the center physical node. In the memory and sparse memory networks, the transitions correspond to links from state nodes in physical node to state nodes in the other module with relative weight / and to the same module with relative weight / (Figurs 4b,c). In the multilayer network, the transitions correspond to relax rate according to Equation (7), since relaxing to any layer in physical node with equal link weights in both layers means that one half of relaxed flows switch layer (Figure 4a). Independent of the relax rate, in these symmetric networks the node visit rates are uniformly distributed: for each of the six state nodes in the multilayer and sparse memory networks, and for each of the twelve state nodes in the memory network. For illustration, if we incorrectly treat state nodes as physical nodes in the map equation, Equations (12) to (25), the onemodule clustering of the memory network with twelve virtual physical nodes, , has code length
(28) 
The corresponding twomodule clustering of the memory network with twelve virtual physical nodes, , has code length {myequation} L(M_2^12p) = ⏟r6_q_↶ ⏟H(r12,r12)_H(Q) +2 ⏟6+r12_p^↻_m ⏟H(112,112,112,112,112,112,r12)_H(P_m) ≈[2.58,3.44] for r ∈[0,1].
Therefore, the twomodule clustering gives best compression for all relax rates (Figure 4e). For the sparse memory network with six virtual physical nodes, the onemodule clustering, , has code length
(29) 
and the twomodule clustering, , has code length
(30) 
While the twomodule clustering again gives best compression for all relax rates, the code lengths are shifted by 1 bit compared with the memory network with twelve virtual physical nodes (Figure 4e). Same dynamics but different code length. These expanded solutions with virtual physical nodes do not capture the important and special role that physical nodes play as representatives of a system’s objects.
Properly separating state nodes and physical nodes as in Equation (27) instead gives equal code lengths for identical clusterings irrespective of representation. For example, the onemodule clustering of the memory network with twelve state nodes, , and the sparse memory network with six state nodes, , have identical code length
(31) 
because visits to a physical node’s state nodes in the same module are aggregated according to Equation (26) such that those state nodes share code words and the encoding represents higherorder flows between a system’s objects. Similarly, the twomodule clustering of the memory network with twelve state nodes, , and the sparse memory network with six state nodes, , have identical code length
(32) 
That is, same dynamics give same code length for identical clusterings with proper separation of state nodes and physical nodes.
For these solutions with physical nodes and state nodes that properly capture higherorder network flows, the overlapping twomodule clustering gives best compression with relax rate (Figure 4e). In this example, the onemodule clustering can for sufficiently high relax rate better compress the network flows than the twomodule clustering. Compared with the expanded clusterings with virtual physical nodes where this cannot happen, the onemodule clustering gives a relatively shorter code length thanks to code word sharing between state nodes in physical node . In general, modeling higherorder network flows with physical nodes and state nodes, and accounting for them when mapping the network flows, gives overlapping modular clusterings that do not depend on the particular representation but only on the actual dynamical patterns.
4 Infomap
To find the best modular description of network flows according to the map equation, we have developed the stochastic and fast search algorithm called Infomap. Infomap can operate on standard, multilayer, memory, and sparse memory networks with unweighted or weighted and undirected or directed links, and identify twolevel or multilevel and nonoverlapping or overlapping clusterings. Infomap takes advantage of parallelization with OpenMP and there is also a distributed version implemented with GraphLab PowerGraph for standard networks Bae and Howe (2015). In principle, the search algorithm can optimize other objectives than the map equation to find other types of community structures.
Recent versions of Infomap operate on physical nodes and state nodes. Each state node has a unique state id and is assigned to a physical id, which can be the same for many state nodes. Infomap only uses physical nodes and state nodes for higherorder network flow representations, such as multilayer, memory, and sparse memory networks, and physical nodes alone when they are sufficient to represent firstorder network flows in standard networks.
To balance accuracy and speed, Infomap uses some repeatedly and recursively applied stochastic subroutine algorithms. For example, the multilevel clustering (Algorithm 1) and twolevel clustering (Algorithm 2) algorithms first repeatedly aggregate nodes in modules (Algorithm 3) to optimize the map equation (Algorithm 4) and then repeatedly and recursively finetune (Algorithm 5) and coarsetune (Algorithm 6) the results. Together with complete restarts, these tuning algorithms avoid local minima and improve accuracy.
By default, Infomap tries to find a multilevel clustering of a network (Algorithm 1).
4.1 TwoLevel Clustering
The complete twolevel clustering algorithm improves the repeated node aggregation algorithm (Algorithm 3) by breaking up modules of suboptimally aggregated nodes. That is, once the algorithm assigns nodes to the same module, they are forced to move jointly when Infomap rebuilds the network, and what was an optimal move early in the algorithm might have the opposite effect later in the algorithm. Similarly, two or more modules that merge together and form a single module when the algorithm rebuilds the network can never be separated again in the repeated node aggregation algorithm. Therefore, the accuracy can be improved by breaking up modules of the final state of the repeated node aggregation algorithm and trying to move individual nodes or smaller submodules into new or neighboring modules. The twolevel clustering algorithm (Algorithm 2) performs this refinement by iterating finetuning (Algorithm 5) and coarsetuning (Algorithm 6).
4.2 Repeated Node Aggregation
The repeated node aggregation in Algorithm 3 follows closely the machinery of the Louvain method Blondel et al. (2008). While the Louvain method seeks to maximize the modularity score, which fundamentally operates on link density rather than network flows, its repeated node aggregation machinery is effective also for optimizing other objectives than the modularity score.
Infomap aggregates neighboring nodes into modules, subsequently aggregates them into larger modules, and repeats. First, Infomap assigns each node to its own module. Then, in random sequential order, it tries to move each node to the neighboring or a new module that gives the largest code length decrease (Algorithm 4). If no move gives a code length decrease, the node stays in its original module. The change of code length for each possible move can be calculated locally based only on the change in and flows, the current flow in the moving node and affected modules, and the initial flow for the module. For state nodes, the physical node visit rates can be locally updated.
Infomap repeats this procedure, each time in a new random sequential order, until no move happens. Then Infomap rebuilds the network with the modules of the last level forming the nodes at the new level, and, exactly as in the previous level, aggregates the nodes into modules, which replace the existing ones. Infomap repeats this hierarchical network rebuilding until the map equation cannot be further reduced.
The computational complexity of the repeated node aggregation is considered to be for the Louvain method applied to networks with nodes Blondel et al. (2008). While Infomap optimizes the map equation with some more costly logarithms, there is no fundamental difference in the scaling. However, for sparse memory networks, the scaling applies to the number of state nodes. In any case, the first aggregation step is the most expensive because Infomap considers each node and all its links such that the complexity scales with the number of links in the network. To further optimize the map equation, Infomap performs stochastic tuning steps with more network dependent computational complexity.
4.3 FineTuning
In the finetuning step, Infomap tries to move single nodes out from existing modules into new or neighboring modules (Algorithm 5). First, Infomap reassigns each node to be the sole member of its own module to allow for singlenode movements. Then it moves all nodes back to their respective modules of the previous step. At this stage, with the same clustering as in the previous step but with each node free to move between the modules, Infomap reapplies the repeated node aggregation (Algorithm 3).
4.4 CoarseTuning
In the coursetuning step, Infomap tries to move submodules out from existing modules into new or neighboring modules (Algorithm 6). First, Infomap treats each module as a network on its own and applies repeated node aggregation (Algorithm 3) on this network. This procedure generates one or more submodules for each module. Infomap then replaces the modules by the submodules and moves the submodules into their modules as an initial solution. At this stage, with the same clustering as in the previous step but with each submodule free to move between the modules, Infomap reapplies the repeated node aggregation (Algorithm 3).
4.5 Multilevel Clustering
Infomap identifies multilevel clusterings by extending twolevel clusterings both by iteratively finding supermodules of modules and by recursively finding submodules of modules. To find the optimal multilevel solution, Infomap first tries to find an optimal twolevel clustering, then iteratively tries to find supermodules and then recursively tries to cluster those modules until the code length cannot be further improved (Algorithm 1).
To identify a hierarchy of supermodules from a twolevel clustering, Infomap first tries to find a shorter description of flows between modules. It iteratively runs the twolevel clustering algorithm (Algorithm 2) on a network with one node for each module at the coarsest level, nodevisit rates from the moduleentry rates, and links that describe aggregated flows between the modules (Algorithm 7). For each such step, a twolevel code book replaces the coarsest code book. In this way, describing entries into, within, and out from supermodules at a new coarsest level replaces only describing entries into the previously coarsest modules.
For a fast multilevel clustering, Infomap can keep this hierarchy of supermodules and try to recursively cluster the bottom modules into submodules with Algorithm 8. However, as this hierarchy of supermodules is constrained by the initial optimal twolevel clustering, discarding everything but the top supermodules and starting the recursive submodule clustering from them often identifies a better multilevel clustering. To find submodules, Infomap runs the twolevel clustering (Algorithm 2) followed by the supermodule clustering (Algorithm 7) algorithm on the network within each module to find the largest submodule structures. If Infomap finds nontrivial submodules (that is, not one submodule for all nodes and not one submodule for each node), it replaces the module with the submodules and collects the submodules in a queue to explore each submodule in parallel for each hierarchical level. In this way, Infomap can explore the submodule hierarchy in parallel with a breadthfirst search to efficiently find a multilevel clustering of the input network.
4.6 Download Infomap
Infomap can be downloaded from www.mapequation.org, which contains installation instructions, a complete list of running options, and examples for how to run Infomap standalone or with other software, including igraph, Python, Jupyter, and R. In the Appendix, we provide Infomap’s basic syntax (Appendix A.1) and examples for how to cluster a multilayer network (Appendix A.2), a memory network (Appendix A.3), and a sparse memory network (Appendix A.4). The website also contains interactive storyboards to explain the mechanics of the map equation, and interactive visualizations to simplify largescale networks and their changes over time.
4.7 Mapping Multistep and MultiSource Data with Infomap
We illustrate our approach by mapping network flows from air traffic data between more than 400 airports in the US US Department of Transportation (). For quarterly reported itineraries in 2011, we assembled pathways into four quarterly, two halfyear, and one fullyear collection. From pathways of length two, three, and four, for each collection we generated sparse memory networks corresponding to first, second, and thirdorder Markov models of flows. Then we generated sparse memory networks corresponding to multilayer networks with four layers from the quarterly collections and two layers from the halfyear collections for all Markov orders. This example illustrates how sparse memory networks can go beyond standard representations and represent combinations of multilayer and memory networks.
To identify communities in multistep and multisource air traffic data represented with sparse memory networks, we run Infomap ten times and report the median values in Table 1. Except for a small drop in the number of physical nodes for the thirdorder models because some airports are not represented in long itineraries, the number of physical nodes remain the same for all models. However, the number of state nodes and links between them increases with order. Larger systems require statenode lumping based on minimum information loss into more efficient sparse memory networks that correspond to variableorder Markov models. Such more compact descriptions of higherorder can balance over and underfitting as well as cut the computational time Persson et al. (2016). In any case, higherorder representations can better capture constraints on flows. For example, a drop in the entropy rate by one bit corresponds to doubled flow prediction, and the fourbit drop between the first and thirdorder models corresponds to a sixteenfold higher flow prediction accuracy. These flows move with relatively long persistence times among groups of airports that form connected legs in frequent itineraries. Infomap capitalizes on this modular flow pattern and identifies pervasively overlapping modules.
Quarters  Time  

1  1  438  0.016 s  
1  2  438  0.063 s  
1  4  438  0.14 s  
2  1  438  1.7 s  
2  2  438  10 s  
2  4  438  27 s  
3  1  432  8.2 min  
3  2  432  22 min  
3  4  432  52 min 
5 Conclusions
The map equation applied to sparse memory networks provides a general solution to reveal modular patterns in higherorder flows through complex systems. Rather than multiple communitydetections algorithms for a range of network representations, the map equation’s search algorithm Infomap can be applied to general higherorder network flows described by sparse memory networks. A sparse memory network uses abstract state nodes to describe higherorder dynamics and physical nodes to represent a system’s objects. This distinction makes all the difference. The flexible sparse memory network can efficiently describe higherorder network flows of various types such as flows in memory and multilayer networks or their combinations. Simplifying and highlighting important patterns in these flows with the map equation and Infomap open for more effective analysis of complex systems.
Acknowledgements.
We are grateful to Christian Persson and Manlio De Domenico for helpful discussions. Martin Rosvall was supported by the Swedish Research Council grant 201600796. \authorcontributionsD.E, L.B., and M.R conceived and designed the experiments; D.E. performed the experiments. D.E, L.B., and M.R analyzed the data and wrote the paper. \conflictsofinterestThe authors declare no conflict of interest. \appendixtitlesyes \appendixsectionsmultipleAppendix A Running Infomap
a.1 Infomap Syntax
To run Infomap on a network, use the command
The option network_data should point to a valid network file and dest to a directory where Infomap should write the output files. If no option is given, Infomap will assume an undirected network and try to cluster it hierarchically. Optional arguments, including directed or undirected links, twolevel or multilevel solutions, or various input formats, can be put anywhere. Run ./Infomap help for a list and explanation of supported arguments.
a.2 Clustering a Multilayer Network
The multilayer network in Figure 4a can be described with the multilayer network format in fig3a.net below and clustered for relax rate with the command
See Appendix A.5 for the overlapping clustering output, and Appendix A.4 for an alternative representation with a sparse memory network. In fact, Infomap internally represents the multilayer network in fig3a.net for with the sparse memory network in Appendix A.4 with transition rates / between state nodes in different layers, since / stays among state nodes in the same layer in this symmetric twolayer network.
The interlayer links will be generated based on the multilayer relax rate, but could also be modeled explicitly with an *Inter section as below
a.3 Clustering a Memory Network
The memory network in Figure 4b with transition rates corresponding to relax rate for the multilayer network in Figure 4a, can be described with the memory network format in fig3b.net below and clustered with the command
See Appendix A.5 for the overlapping clustering output, and Appendix A.4 for an alternative representation with a sparse memory network.
a.4 Clustering a Sparse Memory Network
The sparse memory network in Figure 4c with transition rates corresponding to relax rate for the multilayer network in Figure 4a, can be described with the sparse memory network format in fig3c.net below and clustered with the command
See Appendix A.5 for the overlapping clustering output.
For reference, the memory network in Appendix A.3 can also be represented with the sparse memory network format in fig3b_sparse.net below and clustered with the command
See Appendix A.5 for the overlapping clustering output.
a.5 Clustering Output
All examples in Appendices A.2–A.4 give the same overlapping physical node clustering below, where 1:1 0.166667 "i" 1 says, reading from right to left, that the physical node with id 1 is called i and has visit rate 0.166667 as the first node in the first module.
With the optional argument expanded, Infomap gives the clustering of each state node (not shown).
References
 Brin and Page (1998) Brin, S.; Page, L. The anatomy of a largescale hypertextual Web search engine. Comput. Netw. ISDN 1998, 30, 107–117.
 Newman (2005) Newman, M.E. A measure of betweenness centrality based on random walks. Soc. Netw. 2005, 27, 39–54.
 Lordan et al. (2015) Lordan, O.; Florido, J.; Sallan, J.M.; Fernandez, V.; Simo, P.; GonzalezPrieto, D. Study of the robustness of the European air routes network. In LISS 2014; Springer: Berlin, Germany, 2015; pp. 195–200.
 Rosvall et al. (2014) Rosvall, M.; Esquivel, A.V.; Lancichinetti, A.; West, J.D.; Lambiotte, R. Memory in network flows and its effects on spreading dynamics and community detection. Nat. Commun. 2014, 5, 4630.
 Belik et al. (2011) Belik, V.; Geisel, T.; Brockmann, D. Natural human mobility patterns and spatial spread of infectious diseases. Phys. Rev. X 2011, 1, 011001.
 Pfitzner et al. (2013) Pfitzner, R.; Scholtes, I.; Garas, A.; Tessone, C.J.; Schweitzer, F. Betweenness preference: Quantifying correlations in the topological dynamics of temporal networks. Phys. Rev. Lett. 2013, 110, 198701.
 Poletto et al. (2013) Poletto, C.; Tizzoni, M.; Colizza, V. Human mobility and time spent at destination: Impact on spatial epidemic spreading. J. Theor. Biol. 2013, 338, 41–58.
 Mucha et al. (2010) Mucha, P.J.; Richardson, T.; Macon, K.; Porter, M.A.; Onnela, J.P. Community structure in timedependent, multiscale, and multiplex networks. Science 2010, 328, 876–878.
 Kivelä et al. (2014) Kivelä, M.; Arenas, A.R; Barthelemy, M.; Gleeson, J.P.; Moreno, Y.; Porter, M.A. Multilayer Networks. J. Comput. Netw. 2014, 2, 203–271.
 Boccaletti et al. (2014) Boccaletti, S.; Bianconi, G.; Criado, R.; Del Genio, C.; GómezGardeñes, J.; Romance, M.; SendiñaNadal, I.; Wang, Z.; Zanin, M. The structure and dynamics of multilayer networks. Phys. Rep. 2014, 544, 1–122.
 De Domenico et al. (2015) De Domenico, M.; Lancichinetti, A.; Arenas, A.; Rosvall, M. Identifying modular flows on multilayer networks reveals highly overlapping organization in interconnected systems. Phys. Rev. X 2015, 5, 011027.
 Ahn et al. (2010) Ahn, Y.; Bagrow, J.; Lehmann, S. Link communities reveal multiscale complexity in networks. Nature 2010, 466, 761–764.
 Peixoto and Rosvall (2017) Peixoto, T.P.; Rosvall, M. Modeling sequences and temporal networks with dynamic community structures. arXiv 2017, arXiv:1509.04740.
 Xu et al. (2016) Xu, J.; Wickramarathne, T.L.; Chawla, N.V. Representing higherorder dependencies in networks. Sci. Adv. 2016, 2, e1600028.
 Scholtes (2017) Scholtes, I. When is a network a network? Multiorder graphical model selection in pathways and temporal networks. arXiv 2017, arXiv:1702.05499.
 De Domenico et al. (2013) De Domenico, M.; SoléRibalta, A.; Cozzo, E.; Kivelä, M.; Moreno, Y.; Porter, M.A.; Gómez, S.; Arenas, A. Mathematical formulation of multilayer networks. Phys. Rev. X 2013, 3, 041022.
 Wehmuth et al. (2016) Wehmuth, K.; Fleury, É.; Ziviani, A. MultiAspect Graphs: Algebraic representation and algorithms. Algorithms 2016, 10, 1.
 Persson et al. (2016) Persson, C.; Bohlin, L.; Edler, D.; Rosvall, M. Maps of sparse Markov chains efficiently reveal community structure in network flows with memory. arXiv 2016, arXiv:1606.08328.
 Barabási and Albert (1999) Barabási, A.; Albert, R. Emergence of scaling in random networks. Science 1999, 286, 509–512.
 Barrat et al. (2004) Barrat, A.; Barthelemy, M.; PastorSatorras, R.; Vespignani, A. The architecture of complex weighted networks. Proc. Natl. Acad. Sci. USA 2004, 101, 3747–3752.
 Boccaletti et al. (2006) Boccaletti, S.; Latora, V.; Moreno, Y.; Chavez, M.; Hwang, D.U. Complex networks: Structure and dynamics. Phys. Rep. 2006, 424, 175–308.
 Lambiotte and Rosvall (2012) Lambiotte, R.; Rosvall, M. Ranking and clustering of nodes in networks with smart teleportation. Phys. Rev. E 2012, 85, 056107.
 Song et al. (2010) Song, C.; Qu, Z.; Blumm, N.; Barabási, A. Limits of predictability in human mobility. Science 2010, 327, 1018–1021.
 Meiss et al. (2008) Meiss, M.R.; Menczer, F.; Fortunato, S.; Flammini, A.; Vespignani, A. Ranking web sites with real user traffic. In Proceedings of the International Conference on Web Search and Web Data Mining, Palo Alto, CA, USA, 11–12 February 2008; pp. 65–76.
 Chierichetti et al. (2012) Chierichetti, F.; Kumar, R.; Raghavan, P.; Sarlós, T. Are web users really Markovian? In Proceedings of the 21st International Conference on World Wide Web, Lyon, France, 16–20 April 2012; pp. 609–618.
 Singer et al. (2014) Singer, P.; Helic, D.; Taraghi, B.; Strohmaier, M. Memory and Structure in Human Navigation Patterns. arXiv 2014, arXiv:1402.0790.
 Takaguchi et al. (2011) Takaguchi, T.; Nakamura, M.; Sato, N.; Yano, K.; Masuda, N. Predictability of conversation partners. Phys. Rev. X 2011, 1, 011008.
 Holme and Saramäki (2012) Holme, P.; Saramäki, J. Temporal networks. Phys. Rep. 2012, 519, 97–125.
 Rosvall and Bergstrom (2008) Rosvall, M.; Bergstrom, C. Maps of random walks on complex networks reveal community structure. Proc. Natl. Acad. Sci. USA 2008, 105, 11181123.
 Shannon (1948) Shannon, C. A Mathematical Theory of Communication. Bell Syst. Tech. J 1948, 27, 379423.
 Rosvall and Bergstrom (2011) Rosvall, M.; Bergstrom, C. Multilevel compression of random walks on networks reveals hierarchical organization in large integrated systems. PLoS ONE 2011, 6, e18209.
 Kawamoto and Rosvall (2015) Kawamoto, T.; Rosvall, M. Estimating the resolution limit of the map equation in community detection. Phys. Rev. E 2015, 91, 012809.
 Bae and Howe (2015) Bae, S.H.; Howe, B. GossipMap: A distributed community detection algorithm for billionedge directed graphs. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, Austin, TX, USA, 15–20 November 2015; p. 27.
 Blondel et al. (2008) Blondel, V.D.; Guillaume, J.L.; Lambiotte, R.; Lefebvre, E. Fast unfolding of communities in large networks. J. Stat. Mech. Theory Exp. 2008, 2008, P10008.
 (35) We have compiled the network from the Airline Origin and Destination Survey (DB1B), which is a 10% aample of airline tickets from reporting carriers made public by the Research and Innovative Technology Administration (RITA). Data from 2011. Available online: transtats.bts.gov (accessed on September 1, 2017).