Exploiting Cellular Data for Disease Containment and Information Campaigns Strategies in CountryWide Epidemics
Abstract
Human mobility is one of the key factors at the basis of the spreading of diseases in a population. Containment strategies are usually devised on movement scenarios based on coarsegrained assumptions. Mobility phone data provide a unique opportunity for building models and defining strategies based on very precise information about the movement of people in a region or in a country. Another very important aspect is the underlying social structure of a population, which might play a fundamental role in devising information campaigns to promote vaccination and preventive measures, especially in countries with a strong family (or tribal) structure.
In this paper we analyze a largescale dataset describing the mobility and the call patterns of a large number of individuals in Ivory Coast. We present a model that describes how diseases spread across the country by exploiting mobility patterns of people extracted from the available data. Then, we simulate several epidemics scenarios and we evaluate mechanisms to contain the epidemic spreading of diseases, based on the information about people mobility and social ties, also gathered from the phone call data. More specifically, we find that restricting mobility does not delay the occurrence of an endemic state and that an information campaign based on onetoone phone conversations among members of social groups might be an effective countermeasure.
I Introduction
Health and wellbeing of populations are heavily influenced by their behaviour. The impact of the habits and local customs, including patterns of interactions and mobility at urban and regional scales, on health issues is remarkable (1). The diffusion of mobile technology we are experiencing nowadays gives scholars an unprecedented opportunity to study massive data that describe human behavior (2). An increasing number of people carries smart mobile phones, equipped with many sensors and connected to the Internet, for the whole day (3).
Data coming from a large number of people can describe trends in the macroscopic behavior of populations (4); (5); (6). The results of the analysis of these trends can be directly applied to a number of realworld scenarios, and, more in general, to several applications where cultural and local differences play a central role. Analyzing this kind of data can provide invaluable help to support the decisionmaking process, especially in critical situations. For this reason, many public and private organizations are nowadays increasingly adopting a datacentric approach in their decisional process (7). We believe that this strategy can be particularly useful in developing countries, which might have a lacking infrastructure
Among the issues that developing countries are facing today, healthcare is probably the most urgent (9). In these countries the effectiveness of campaigns is often reduced due to low availability of data, inherent limits in the infrastructure and difficult communication with the citizens, who might live in vast and remote rural areas. As a result, action plans are difficult to deliver. However, we believe that a datacentric approach can be an innovative and effective way to address these issues.
In this paper, we focus on containment of epidemics. We use movement data extracted from the registration patterns in a cellular network to evaluate the influence of human mobility on the spreading of diseases in a geographic area. In particular, we utilize this model to investigate how infectious agents might spread to distant locations because of human movement in order to identify optimal strategies that can be adopted to contrast the epidemics. We also evaluate how the collaborative effort of the population can be crucial in critical scenarios. For the reasons we mentioned before, in countries that are facing development challenges, vaccination campaigns are often hard to advertise to the population. Lack of communication and information is believed to be among the main causes of failure for immunization campaigns. The same applies to awareness campaigns that try to promote prophylaxis procedures that reduce the occurrence of contagion. However, in these cases, we argue that a collaborative effort leveraging individual social ties can be effective in propagating effective information (i.e., a sort of “immunizing information”) to a widespread audience. Moreover, information received by people who are socially close can have a higher chance of leading to an actual action.
A large body of research has been conducted on models that describe the diffusion of diseases, with a particular recent interest on the role that human movement plays in spreading infections in large geographic areas (10); (11); (12), and also on the impact of human behavior on the spreading itself (13); (12). With respect to the state of the art, the main contributions of this paper can be summarized as follows:

We propose an epidemic model based on a network of geographic metapopulations, which describes how people move between different geographic regions and spread the disease.

We evaluate containment techniques based on the restriction of mobility of the most central areas. The centrality of the areas is extracted by building a movement network between all the geographic areas based on the mobility patterns of the individuals.

We extend the model with a competing information spreading where distance contagion might take place. In other words, we study the dynamics of the system considering three characterizing aspects of the problem: the disease epidemics, human mobility and information spreading. This epidemics represents the diffusion of information related to measures to prevent or to combat the diseases, such as information about the ongoing vaccination and prevention campaigns in a certain area or actions that will help to limit spread of the infection, such as boiling water or avoiding contacts with people that are already ill.

We evaluate the models by using the data provided by the Orange “Data for Development” (14). We discuss the effectiveness of the containment strategies and, in particular, for the information dissemination strategy, we identify the degree of participation that is required to make it successful.

We observe that restricting mobility by disallowing any movement from and to a limited set of subprefectures does not delay the occurrence of the endemic state in the rest of the country. We also find that a collaborative effort of prevention information spreading can be an effective countermeasure.
This paper is organized as follows. In Sec. II we briefly describe the four different datasets provided by Orange and we specify how they are used in the present study. In Sec. III we introduce our two models for epidemics and information spreading by taking into account human mobility and call patterns observed in Ivory Coast. In Sec. IV we present the results obtained by simulating several epidemics scenarios and evaluating mechanisms to contain the epidemic spreading of diseases. Finally, in Sec. V we summarize our main findings and we propose how the present study can be improved if more detailed data about mobility and calls will be available.
Ii Overview of the Dataset
The data provided for the D4D challenge (14) consist of four datasets (identified by the labels SET1, SET2, SET3, SET4), containing information about user mobility and call patterns at various levels of granularity and time duration. We will now discuss how these datasets can be used to build a model which accounts for user mobility and information spreading.
Two datasets contain information about mobility and communication patterns at macroscopic level. More precisely:

The SET1 dataset contains the number and the duration of calls between pairs of cell phone towers, aggregated by hour. This dataset provides macroscopic information about communication in the country. We associate cell phone towers with the subprefecture they are located in, by using the supplied geographic position. Then, we evaluate the probability of a call being established between subprefectures and with:
(1) where is the number of phone calls initiated from the subprefecture and directed to the subprefecture , during the entire period of observation. The term at denominator indicates the total communication flux between every pair of subprefectures and it is used to normalize the probability. Using these values we build a calls matrix , shown in Fig. 1. This matrix also shows high values along the diagonal, but it is distinctly denser, showing that calls between subprefectures are more common than movement. The vertical line at identifies calls directed to the subprefecture that contains the capital.

The SET3 dataset contains the trajectories of 50,000 randomlyselected individuals, at a subprefecture level resolution, for five months.
^{3} This dataset can be used to estimate the probability that an individual moves from the subprefecture to the subprefecture :(2) where is the number of times user moves from the subprefecture to . The numerator counts how many times users who are in move to ; the denominator normalizes this number by the total number of transitions from to any subprefecture . Using these values we build a mobility matrix , shown in Fig. 1. By using this matrix, we model human mobility in the country as a Markov process (15). We observe that the matrix is quite sparse and the highest values are concentrated along the diagonal. As the representation is in logarithmic scale, this demonstrates that the movement between subprefectures is present, but rather uncommon.
In Fig. 2 and Fig. 2 we show the geographic networks of calls and mobility, respectively. Nodes are positioned using the geographic locations of the subprefecture they represent, and their color indicates the community structure of the network based on (16).
The other two datasets provide microscopic information about mobility and communication patterns between individuals. Although we do not use them for the analysis in this paper, we now briefly outline how they could be used:

The SET2 dataset contains finegrained individual trajectories of 50,000 randomly sampled individuals over twoweek periods. This dataset could be used to estimate the number of potential connections that an individual might have in a certain area, served by a cell phone tower.

The SET4 dataset contains timevarying egonetworks of 5,000 users, describing the network of communication in timeslots of 2 weeks. If two users are connected by a link in a timeslot, it means that at least one call occurred during the two weeks under consideration
^{4} . The egonetwork aggregated over the whole observation time, built considering every link that is present at least once, describes the number of people contacted by an individual during the entire period. This dataset could be used to estimate the number of potential social connections that an individual might get in touch with. The degree distribution of the aggregate egonetwork is shown in Fig. 3.
Iii Spreading Models
In this section we discuss two models: a model of disease spreading as a function of the mobility patterns of individuals between different geographic areas inferred from the cellular registration records and a model for information spreading among the same population, considering the social structure inferred from the call records. In the following section, we will evaluate the models using the data provided for the Orange Data for Development challenge.
iii.1 Epidemic Spreading and Mobility
We will now present a model that represents the evolution of an epidemic taking place on a network of metapopulations. The aim of the model is to describe how the system evolves under the action of two processes, contagion and mobility. For this dataset, each metapopulation is composed by the individuals located in a particular subprefecture. Hence, the population is distributed in different metapopulations, each having individuals at time . We make the simplifying assumption that there are no deaths and births in the considered time window, i.e., at each time the total population is constant .
We assume that contagion happens inside each metapopulation following a standard SIS model (17). We indicate the number of infected and susceptible individuals at time in a subprefecture with and , respectively. At each time a person is either infected or susceptible, therefore .
Simultaneously, individuals move through the metapopulation network according to the mobility matrix of dimension extracted from the cellular traces. The generic element of the matrix represents the probability that a person moves from the metapopulation to , as described by Eq. 2
for each subprefecture , with being the product of contact rate and contagion probability and being the recovery rate. The formulae inside the square brackets describe the evolution of SIS models, one for each metapopulation. They are multiplied for the elements of the mobility matrix, which accounts for individuals moving between metapopulations.
This analytical model describes the expected outcome of a stochastic model where the following actions occur at each time step:

Each infected person in the subprefecture causes the infection of new individuals inside . This step is repeated for each subprefecture.

A new position is assigned to each individual in the subprefecture according to the probability density function . This step is repeated for each subprefecture.
iii.2 Information Spreading
The model we presented in the last section tries to reproduce the spreading of a disease in a population where individuals change locations over time. The aim of this work is to analyze some scenarios and study the effectiveness of some containment techniques. In particular, as anticipated, we would like to investigate if a collaborative effort of the population is able, in theory, to reduce considerably the spread of the disease and what proportions should it have to be effective. More precisely, the population can disseminate information through personal social ties immunizing, such as information about prevention techniques, hygiene practises, advertisement of nearby vaccination campaigns and in general any information that can lead to a reduction of the number of contagion events.
In order to take into consideration these aspects, we now use a SIR model for each metapopulation, so that each person either belongs to the susceptible (S), infected (I) or resistant (R) category. At the same time, another simultaneous epidemic happens on the network of metapopulations, disseminating information that can make individuals resistant to the disease. In fact, a person also belongs to the category of unaware (U) or aware (A) individuals, with respect to the immunizing information. More formally, we have that .
It is worth noting that this “immunizing epidemic” goes beyond the boundaries of metapopulations (subprefectures): in other words, it is a distance contagion. It is also important to remark that the states “aware” and “resistant” are substantially different. An unaware person that receives the information (i.e. has an “information contact”) becomes aware with rate ; since the person is aware, he or she will start spreading the information as well. An infected person that receives the information becomes immune with rate . Additionally, individuals who have acquired immunity through information can lose it with rate . The transition rates between states are summarized in Fig. 4. The model can be described by the following set of equations, specifying how state vectors evolve over time:
(3) 
for every . The fraction represents the probability that a call from an aware person occurs in the metapopulation . It models the distancecontagion, and it is possible to verify that if the matrix is identical (absence of contacts between populations) it reduces to , falling back to a model where contagion occurs only inside metapopulations.
This analytical model describes the expected value of a stochastic model where the following actions occur at each time step :

Each infected person in the subprefecture causes new individuals to get infected, inside . This step is repeated for each subprefecture.

Each unaware person in the subprefecture becomes aware with probability . This step is repeated for each subprefecture.

Each person in the subprefecture who is susceptible, becomes resistant with probability . This step is reapeated for each subprefecture.

A new position is assigned to each person in the subprefecture according to the probability density function . This step is repeated for each subprefecture.
Iv Analysis
We initialize each scenario by allocating 22 million individuals (the estimated population size of Ivory Coast for July 2012 is 21,952,093 (20)) to different subprefectures across the country, according to the data in SET3. In each scenario we bootstrap the spreading process by infecting a fraction of the population () distributed across metapopulations according to different criteria:

Uniform distribution: every subprefecture gets a number of infected proportional to their population, i.e., every subprefecture has the same fraction of infected population.

Random: a single subprefecture, chosen randomly, is the origin of the infection.

Centrality based: the subprefectures are ordered by decreasing centrality values, then the first 1, 5 or 10 highest ranked subprefectures are chosen, as shown in Table 1.
Betweeness  Closeness  Degree  Eigenvalue 

60  60  60  60 
39  58  58  58 
89  39  39  39 
58  69  69  69 
75  138  138  250 
144  250  64  138 
138  64  144  64 
165  144  250  144 
212  182  122  122 
168  122  182  182 
We study the evolution of the epidemics for a period of 6 months. We investigate multiple scenarios using the analytical model considering a large set of ranges for the key parameters. We conducted a series of MonteCarlo simulations for multiple sets of parameters, confirming the validity of the analytical models presented in the previous section. In the following, we present results based on these models.
iv.1 No Countermeasures
We will firstly explore the evolution of the epidemics in the case where no countermeasures are taken. In order to analyze the evolution of the system more clearly, we investigate two measures: the fraction of infected population at the stationary state and the time required to reach the stationary state . In Fig. 5 we plot their values versus , which is the basic reproductive ratio of a classic SIS model (17). As a future work, we plan to derive the analytical form of the basic reproductive ratio of our models, which take into account mobility and information spreading. We observe that for there is no endemic state (i.e., the final fraction of infected population is zero), whereas for a nonnull fraction of population is infected. Values for are missing since no stationary state is reached within our observation window. In other words, for this particular scenario, experimental results show that the basic reproductive ratio of our model is very close to ; we expect this to be a consequence of the low intersubprefectures mobility. We can also notice that the initial conditions do not affect at all. Before the critical point (i.e., ) the choice of the initial conditions has also no impact on the delay time, whereas for it slightly affects the delay: epidemics that initially involves more subprefectures are slightly faster than the others.
iv.2 Geographic Quarantine
We now analyze the effects of curbing on the mobility between subprefectures, i.e., forbidding all the incoming and outgoing movement of a group of subprefectures. In order to do so, we calculate the centrality values of each subprefecture in the mobility matrix. We present the results for eigenvalues centrality. As it is possible to observe in Tab. 1, the ranking based on other centralities is very similar. Then, for the quarantine operations, we select those with the highest centrality values. From a practical point of view, this is achieved by simply changing the th row and column in the mobility matrix, so that all the elements and are null, except for the elements . For these scenarios, we randomly choose a single subprefecture where the initial individuals are infected, and then we average and over all runs. As shown in Fig. 6, the fraction of the infected population is sensibly affected by this measure, as the population inside the quarantined areas is protected from contagion. However, contrary to the intuition, the delay is not affected by the quarantine, even when the countermeasures involves 10 subprefectures, which account for almost half population. This suggests that such an invasive, expensive and hard to enforce measure reduces considerably the endemic size, but does not slow down the disease spreading in the rest of the country. For this reason, we now investigate a radically different approach to protect the population.
iv.3 Information Campaign (Social Immunization)
We now show how a collaborative information campaign could help in contrasting the spread of the disease, following the model we presented in the last section. We initialize the scenario by distributing the immunizing information to 1% of the population, randomly chosen regardless of their location. These people will be informed and will be instructed to spread the information. In other words, we assume that they will contact their social connections, according to the call matrix.
In Fig. 7 we show the density plots describing and for various values of (, for a subset of scenarios where , i.e., when the information that spreads among the population has the same chance to immunize a person and to involve the person in the spreading process. This is consistent with a scenario where the same set of people who become aware also become immunized by the information they have received. Blank squares show that a stationary state was not reached for the corresponding set of parameters. The figure shows how contagious (=) the immunizing information has to be with respect to how often people “forget” () in order to slow down the disease considerably and to reduce the endemic cases. When we fall back to the model without information spreading, and the value of does not affect and . For and the fraction of infected population goes to zero in all cases, because the number of people aware of the information does not decrease, thus increasing the number of new immunized individuals at each step. We can notice that even for low values of participation and for information that gives temporary immunization (), the final fraction of infected individuals is considerably lower than in the case where no countermeasures are taken.
In Figs. 8 and 9 we show the density plots for and when is constant. In particular, we analyze the scenario for (Fig. 8), which represents for example a scenario where the immunizing information is about vaccination campaigns (individuals who have been administered vaccination do not lose immunity). For every combination of parameters we have absence of endemic state even with the highest considered value of . The two parameters that represent how individuals are likely to get involved both in the immunization and in the information spreading ( and ) seem to have the same impact on the delay of the infection.
The value (Fig. 9) describes the scenario when the information is about a good practice (e.g., boiling water, using mosquito nets, etc.), which loses its effectiveness or it is stopped being used by a person with rate . For this case we can notice that the fraction of infected population is independent from , as rows in the density plot are of the same color. This suggests that, for this scenario, the rate at which people lose immunity does not affect the size of the endemic state.
V Conclusions
In this paper we have presented a model that describes the spreading of disease in a population where individuals move between geographic areas, extracted from cellular network records. We have showed the evolution of the disease and we have evaluated two types of countermeasures, namely the quarantine of central geographic areas and a collaborative “viral” information campaign among the population, by inferring the underlying social structure from the call records.
Our future research agenda includes the investigation of analytical aspects of the model, such as the derivation of the critical reproductive ratio , i.e., the value that corresponds to the transition between an endemic and an endemicfree infection. Currently, the model is based on the assumption of a static mobility matrix: our goal is to refine the model by introducing timedependent matrices, also exploring the application of the recent theoretical results related to temporal networks. We also plan to refine the model introducing specific contact rates for each metapopulation, potentially based on more finegrained information about the number of encounters and the number of calls of each individual, if available. Finally, we plan to explore hybrid countermeasures, such as concurrent partial restrictions of mobility and targeted information campaigns.
Acknowledgements.
The authors thank Charlotte Sophie Mayer for useful and fruitful discussions. This work was supported through the EPSRC Grant “The Uncertainty of Identity: Linking Spatiotemporal Information Between Virtual and Real Worlds” (EP/J005266/1).Footnotes
 Appeared in Proceedings of NetMob 2013. Boston, MA, USA. May 2013.
 We use the term “developing” to indicate countries that are assigned a low Human Development Index (HDI) by United Nations Statistics Division. We are aware of the limitations of this classification. As reported by UN, the designations “developed” and “developing” are intended for statistical convenience and do not necessarily express a judgment about the stage reached by a particular country or area in the development process (8).
 17 subprefectures do not have any cell phone towers and for this reason do not appear in SET3. We discard these subprefectures from our analysis, since their users will be considered as belonging to nearby subprefectures.
 We have found that 1.31% of the total number of edges in egonetworks connect pairs of users who are neither egos nor firstlevel neighbors: therefore, we do not consider such edges in our analysis.
 In general, this matrix can be timevarying, and it can be adjusted according to seasonal trends or realtime data at each step, for example following estimates based on historical data. In particular, this matrix can be used to study the impact of policies in realtime. However, in order to simplify the presentation, we use a matrix not changing over time. The treatment can be generalized, also applying the recent theoretical results related to timevarying networks (18); (19).
 This assumption can also be relaxed when data about the different classes of individuals is available, i.e., when a matrix for each class can be defined.
References
 C.G. Helman et al., Culture, health and illness. (Arnold, Hodder Headline Group, London, United Kingdom, 2001), No. Ed. 4.
 D. Lazer, A.S. Pentland, L. Adamic, S. Aral, A.L. Barabasi, D. Brewer, N. Christakis, N. Contractor, J. Fowler, M. Gutmann, et al., Life in the network: the coming age of computational social science Science 323, 721 (2009).
 A. T. Campbell, S. B. Eisenman, N. D. Lane, E. Miluzzo, R. Peterson, H. Lu, X. Zheng, M. Musolesi, K. Fodor, and G.S. Ahn, The Rise of PeopleCentric Sensing IEEE Internet Computing Special Issue on Mesh Networks (2008).
 J.P. Onnela, J. Saramäki, J. Hyvönen, G. Szabó, D. Lazer, K. Kaski, J. Kertész, and A.L. Barabási, Structure and tie strengths in mobile communication networks Proceedings of the National Academy of Sciences 104, 7332 (2007).
 M. C. Gonzalez, C. A. Hidalgo, and A.L. Barabasi, Understanding individual human mobility patterns Nature 453, 779 (2008).
 N. Eagle and A. Pentland, Eigenbehaviors: Identifying structure in routine Behavioral Ecology and Sociobiology 63, 1057 (2009).
 J. Manyika, M. Chui, B. Brown, J. Bughin, R. Dobbs, C. Roxburgh, and A. H. Byers, Big data: The next frontier for innovation, competition, and productivity McKinsey Global Institute (2011).
 United Nations Statistics Division, Standard Country and Area Codes for Statistical Use.
 The health of the people the African regional health report (World Health Organization, Regional Office for Africa, Brazzaville, Republic of Congo, 2013).
 V. Colizza, A. Barrat, M. Barthelemy, A.J. Valleron, and A. Vespignani, Modeling the Worldwide Spread of Pandemic Influenza: Baseline Case and Containment Interventions PLoS Med 4, e13 (2007).
 J. M. Epstein, D. M. Goedecke, F. Yu, R. J. Morris, D. K. Wagener, and G. V. Bobashev, in Controlling Pandemic Flu: The Value of International Air Travel Restrictions, PLoS ONE 2, e401 (2007).
 S. Meloni, N. Perra, A. Arenas, S. Gómez, Y. Moreno, and A. Vespignani, Modeling human mobility responses to the largescale spreading of infectious diseases Scientific Reports 1, (2011).
 S. Funk, M. Salathé, and V. A. A. Jansen, Modelling the influence of human behaviour on the spread of infectious diseases: a review Journal of The Royal Society Interface 7, 1257 (2010).
 V.D. Blondel, M. Esch, C. Chan, F. Clerot, P. Deville, E. Huens, F. Morlot, Z. Smoreda, and C. Ziemlicki, Data for Development: the D4D Challenge on Mobile Phone Data arXiv preprint arXiv:1210.0137 (2012).
 J. R. Norris, Markov Chains (Cambridge University Press, Cambridge, United Kingdom, 1998).
 V. D. Blondel, J.L. Guillaume, R. Lambiotte, and E. Lefebvre, Fast unfolding of communities in large networks Journal of Statistical Mechanics: Theory and Experiment 2008, P10008 (2008).
 M. J. Keeling and P. Rohani, Modeling infectious diseases in humans and animals (Princeton University Press, Princeton, NJ, 2011).
 J. Tang, S. Scellato, M. Musolesi, C. Mascolo, and V. Latora, Smallworld Behavior in Timevarying Graphs Physical Review E 81, (2010), 055101(R).
 P. Holme and J. Saramäki, Temporal networks Physics Reports 519, (2012).
 CIA, The World Factbook, 2012.