Interactive Sensing in Social Networks
Abstract
This paper presents models and algorithms for interactive sensing in social networks where individuals act as sensors and the information exchange between individuals is exploited to optimize sensing. Social learning is used to model the interaction between individuals that aim to estimate an underlying state of nature. In this context the following questions are addressed: How can selfinterested agents that interact via social learning achieve a tradeoff between individual privacy and reputation of the social group? How can protocols be designed to prevent data incest in online reputation blogs where individuals make recommendations? How can sensing by individuals that interact with each other be used by a global decision maker to detect changes in the underlying state of nature? When individual agents possess limited sensing, computation and communication capabilities, can a network of agents achieve sophisticated global behavior? Social and game theoretic learning are natural settings for addressing these questions. This article presents an overview, insights and discussion of social learning models in the context of data incest propagation, change detection and coordination of decision making.
1 Introduction and Motivation
The proliferation of social media such as real time microblogging services (Twitter
Social sensing, also called participatory sensing [1, 2, 3, 4], is defined as a process where physical sensors present in mobile devices such as GPS are used to infer social relationships and human activities. In this paper, we work at a higher level of abstraction. We use the term social sensor or humanbased sensor to denote an agent that provides information about its environment (state of nature) on a social network after interaction with other agents. Examples of such social sensors include Twitter posts, Facebook status updates, and ratings on online reputation systems like YELP and Tripadvisor. Such social sensors go beyond physical sensors for social sensing. For example [5], user opinions/ratings (such as the quality of a restaurant) are available on Tripadvisor but are difficult to measure via physical sensors. Similarly, future situations revealed by the Facebook status of a user are impossible to predict using physical sensors.
Statistical inference using social sensors is relevant in a variety of applications
including localizing special events for targeted advertising [6, 7], marketing [8],
localization of natural disasters [9] and predicting sentiment of investors in financial markets [10, 11].
It is demonstrated in [12] that models built from the rate of tweets for particular products can outperform marketbased predictors.
However, social sensors present unique challenges from a statistical estimation point of view.
First, social sensors interact with and influence other social sensors. For example, ratings posted on online reputation systems strongly influence the behaviour of individuals.
As is apparent from the above discussion, there is strong motivation to construct mathematical models that capture the dynamics of interactive sensing involving social sensors. Such models facilitate understanding the dynamics of information flow in social networks and therefore the design of algorithms that can exploit these dynamics to estimate the underlying state of nature.
In this paper, social learning [15, 16, 17] serves as a useful mathematical abstraction for modelling the interaction of social sensors. Social learning in multiagent systems seeks to answer the following question:
How do decisions made by agents affect decisions made by subsequent agents?
In social learning, each agent chooses its action by optimizing its local utility function. Subsequent agents then use their private observations together with the actions of previous agents to estimate (learn) an underlying state. The setup is fundamentally different to classical signal processing in which sensors use noisy observations to compute estimates  in social learning agents use noisy observations together with decisions made by previous agents, to estimate the underlying state of nature.
In the last decade, social learning has been used widely in economics, marketing, political science and sociology to model the behavior of financial markets, crowds, social groups and social networks; see [15, 16, 18, 17, 19, 20] and numerous references therein. Related models have been studied in the context of sequential decision making in information theory [21, 22] and statistical signal processing [23, 24] in the electrical engineering literature.
Social learning models for interactive sensing can predict unusual behavior. Indeed, a key result in social learning of an underlying random variable is that rational agents eventually herd [16], that is, they eventually end up choosing the same action irrespective of their private observations. As a result, the actions contain no information about the private observations and so the Bayesian estimate of the underlying random variable freezes. For a multiagent sensing system, such behavior can be undesirable, particularly if individuals herd and make incorrect decisions.
Main Results and Organization
In the context of social learning models for interactive sensing, the main ideas and organization of this paper are as follows:
1. Social Learning Protocol:
Sec.2 presents a formulation and survey of the classical Bayesian social learning model which forms the mathematical basis for modelling interactive sensing amongst humans.
We illustrate the social learning model in the context of Bayesian signal processing (for easy access to an electrical engineering
audience).
We then address how selfinterested agents performing social learning can achieve useful behavior in terms of
optimizing a social welfare function. Such problems are motivated by privacy
issues in sensing. If an agent reveals less information
in its decisions,
it maintains its privacy; on the other hand as part of a social group it has an incentive to optimize a social welfare
function that helps estimate the state of nature.
2. Data Incest in Online Reputation Systems:
Sec.3 deals with the question:
How can data incest (misinformation propagation) be prevented in online reputation blogs where social sensors make recommendations?
In the classical social learning model, each agent acts once in a predetermined order. However, in online reputation systems such as Yelp or Tripadvisor which maintain logs of votes (actions) by agents, social learning takes place with information exchange over a loopy graph (where the agents form the vertices of the graph). Due to the loops in the information exchange graph, data incest (misinformation) can propagate: Suppose an agent wrote a poor rating of a restaurant on a social media site. Another agent is influenced by this rating, visits the restaurant, and then also gives a poor rating on the social media site. The first agent visits the social media site and notices that another agent has also given the restaurant a poor rating  this double confirms her rating and she enters another poor rating.
In a fair
reputation
system, such “double counting” or data incest should have been prevented by making the first agent aware that the rating of the second agent was influenced by her own rating.
Data incest results in a bias in the estimate of state of nature.
How can automated protocols be designed to prevent data incest and thereby maintain a fair
3. Interaction of Local and Global Decision Makers for Change Detection: Sec.4 deals with the question: In sensing where individual agents perform social learning to estimate an underlying state of nature, how can changes in the state of nature be detected?
Sec.4 considers a sensing problem
that involves change detection.
Such sensing problems arise in a variety of applications such as financial trading where individuals react to financial shocks
[25]; marketing and advertising [26, 27] where consumers
react to a new product; and localization of
natural disasters (earthquake and typhoons) [9].
For example, consider measurement of the adoption of a new product using a microblogging platform like twitter. The adoption of the technology diffuses through the market but its effects can only be observed through the tweets of select members of the population. These selected members act as sensors for the parameter of interest.
Suppose the state of nature suddenly changes due to a sudden market shock or presence of a new competitor.
Based on the local actions of the multiagent system that is performing social learning, a global decision maker (such as a market monitor or technology manufacturer) needs to decide whether or not to declare if a change has occurred.
How can the global decision maker achieve such change detection to minimize a cost function comprised of false alarm rate and delay penalty? The local and global decision makers
interact, since
the local decisions determine the posterior distribution of subsequent agents which determines the global decision (stop or continue) which determines subsequent local decisions.
We show that this social learning based change detection problem leads to unusual behavior.
The optimal decision policy of the stopping time problem has multiple thresholds. This is unusual:
if it is optimal to declare that a change has occurred based on the posterior probability of change, it may not be optimal to declare a change when the posterior probability of change is higher!
4. Coordination of Decisions as a Noncooperative Game: No discussion on social learning would be complete
without mentioning gametheoretic methods.
A large body of research on social networks has been devoted to the diffusion of information (e.g., ideas, behaviors, trends) [28, 29], and particularly on finding a set of target nodes so as to maximize the spread of a given product [30, 31]. Often customers end up choosing a specific product among several competitors.
A natural approach to model this competitive process is via the use of noncooperative game theory [32, 33].
Game theory has traditionally been used in economics and social sciences with a focus on fully rational interactions where strong assumptions are made on the information patterns available to individual agents. In comparison, social sensors are agents with partial information and it is the dynamic interactions between agents that is of interest. This motivates the need for game theoretic learning models for agents interacting in social networks.
Sec.5 deals with the question:
When individuals are selfinterested and possess limited sensing, computation and communication capabilities,
can a network
(social group)
of sensors whose utility functions interact achieve sophisticated global behavior?
In Sec.5,
we discuss a noncooperative game theoretic learning approach for adaptive decision making in social networks. This can be viewed as a nonBayesian version
of social learning, The aim is to ensure that all agents eventually
choose actions from a common polytope of randomized strategies  namely, the set of correlated equilibria of a noncooperative game.
Correlated equilibria are a generalization of Nash equilibria and were introduced by Aumann [34].
Perspective
The social learning and gametheoretic learning formalisms mentioned above can be used either as descriptive tools, to predict the outcome of complex interactions amongst agents in sensing, or as prescriptive tools, to design social networks and sensing systems around given interaction rules. Information aggregation, misinformation propagation and privacy are important issues in sensing using social sensors. In this paper, we treat these issues in a highly stylized manner so as to provide easy accessibility to an electrical engineering audience. The underlying tools used in this paper are widely used by the electrical engineering research community in the areas of signal processing, control, information theory and network communications.
In Bayesian estimation, the twin effects of social learning (information aggregation with interaction amongst agents) and data incest (misinformation propagation) lead to nonstandard information patterns in estimating the underlying state of nature. Herding occurs when the public belief overrides the private observations and thus actions of agents are independent of their private observations. Data incest results in bias in the public belief as a consequence of the unintentional reuse of identical actions in the formation of public belief in social learning; the information gathered by each agent is mistakenly considered to be independent. This results in overconfidence and bias in estimates of the state of nature.
Privacy issues impose important constraints on social sensors. Typically, individuals are not willing to disclose private observations. Optimizing interactive sensing with privacy constraints is an important problem. Privacy and trust pose conflicting requirements on humanbased sensing: privacy requirements result in noisier measurements or lower resolution actions, while maintaining a high degree of trust (reputation) requires accurate measurements. Utility functions, noisy private measurements and quantized actions are essential ingredients of the social and gametheoretic learning models presented in this paper that facilitate modelling this tradeoff between reputation and privacy.
The literature in the areas of social learning, sensing and networking is extensive. Due to page restrictions, in each of the following sections, we provide only a brief review of relevant works. Seminal books in social networks include [35, 36]. The book [17] contains a complete treatment of social learning models with several remarkable insights. For further references, we refer the reader to [37, 38, 39, 40, 41]. In [42], a nice description is given of how, if individual agents deploy simple heuristics, the global system behavior can achieve ”rational” behavior. The related problem of achieving coherence (i.e., agents eventually choosing the same action or the same decision policy) among disparate sensors of decision agents without cooperation has also witnessed intense research; see [43] and [44]. NonBayesian social learning models are also studied in [45, 46].
2 Multiagent Social Learning
This section starts with a brief description of the classical social learning model. In this paper, we use social learning as the mathematical basis for modelling interaction of social sensors. A key result in social learning is that rational agents eventually herd, that is, they pick the same action irrespective of their private observation and social learning stops. To delay the effect of herding, and thereby enhance social learning, Chamley [17] (see also [47] for related work) has proposed a novel constrained optimal social learning protocol. We review this protocol which is formulated as a sequential stopping time problem. We show that the constrained optimal social learning proposed by Chamley [17] has a threshold switching curve in the space of public belief states. Thus the global decision to stop can be implemented efficiently in a social learning model.
2.1 Motivation: What is social learning?
We start with
a brief description of the ‘vanilla’
Consider a countable number of agents performing social learning to estimate the state of an underlying finite state Markov chain . Let denote a finite state space, the transition matrix and the initial distribution of the Markov chain.
Each agent acts once in a predetermined sequential order indexed by The index can also be viewed
as the discrete time instant when agent acts.
A multiagent system seeks to estimate . Assume at the beginning of iteration ,
all agents have access to the public belief defined in Step (iv) below.
The social learning protocol proceeds as follows
[16, 17]:
(i) Private Observation: At time ,
agent records a private observation
from the observation distribution , .
Throughout this section we assume that is finite.
(ii)
Private Belief: Using the public belief available at time (defined in Step (iv) below), agent updates its private
posterior belief as the following Bayesian update (this is the classical Hidden Markov Model
filter [48]):
(1) 
Here denotes the dimensional vector of ones, is an dimensional probability mass function (pmf) and denotes transpose of the matrix
.
(iii) Myopic Action: Agent takes action to minimize its expected cost
(2) 
Here denotes an dimensional cost vector, and denotes the cost incurred when the underlying state is and the agent chooses action .
Agent then broadcasts its action to subsequent agents.
(iv) Social Learning Filter:
Given the action of agent , and the public belief , each subsequent agent
performs social learning to
compute the public belief according to the following “social learning filter”:
(3) 
and is the normalization factor of the Bayesian update. In (3), the public belief and has elements
(4)  
The derivation of the social learning filter (3) is given in the discussion below.
2.2 Discussion
Let us pause to give some intuition about the above social learning protocol.
1. Information Exchange Structure: Fig.1 illustrates the above social learning protocol in which the information exchange is sequential. Agents send their hard decisions
(actions) to subsequent agents.
In the social learning protocol we have assumed that each agent acts once. Another way of viewing the social learning protocol
is that there are finitely many agents that act repeatedly in some predefined order. If each
agent chooses its local decision using the current public belief, then the setting is identical to the social learning
setup. We also refer the reader to [18] for several recent results in social learning
over several types of network adjacency matrices.
2. Filtering with Hard Decisions: Social learning can be viewed as agents making hard decision estimates at each time and sending these estimates to subsequent agents.
In conventional Bayesian state estimation, a soft decision is made, namely, the posterior distribution (or equivalently, observation) is sent
to subsequent agents. For example, if , and
the costs are chosen as where denotes the unit indicator with in the th position, then
, i.e., the maximum aposteriori probability (MAP) state estimate. For this example, social learning is equivalent
to agents sending the hard MAP estimates to subsequent agents.
Note that rather than sending a hard decision estimate, if
each agent chooses its action (that is agents send their private observations), then
the righthand side of (4) becomes and so
the problem becomes a standard Bayesian filtering problem.
4. Dependence of Observation Likelihood on Prior: The most unusual feature of the above protocol (to a signal processing audience) is
the social learning filter (3). In standard state estimation via a Bayesian filter, the observation likelihood given the state is completely parametrized
by the observation noise distribution and is functionally independent of the current prior distribution.
In the social learning filter,
the likelihood of the action given the state (which is denoted by ) is
an explicit function of the prior !
Not only does the action likelihood depend on the prior, but it is also a discontinuous
function, due to the presence of the in (2).
5. Derivation of Social Learning Filter:
The derivation of the social learning filter (3) is as follows: Define the posterior as
. Then
where the normalization term is
The above social learning protocol and social learning filter (3) result in interesting dynamics in state estimation and decision making. We will illustrate two interesting consequences that are unusual to an electrical engineering audience:
2.3 Rational Agents form Information Cascades
The first consequence of the unusual nature of the social learning filter (3) is that social learning can result in multiple rational agents taking the same action independently of their observations. To illustrate this behavior, throughout this subsection, we assume that is a finite state random variable (instead of a Markov chain) with prior distribution .
We start with the following definitions; see also [17]:

An individual agent herds on the public belief if it chooses its action in (2) independently of its observation .

A herd of agents takes place at time , if the actions of all agents after time are identical, i.e., for all time .

An information cascade occurs at time , if the public beliefs of all agents after time are identical, i.e. for all .
Note that if an information cascade occurs, then since the public belief freezes, social learning ceases. Also from the above definitions it is clear that an information cascade implies a herd of agents, but the reverse is not true; see Sec.4.3 for an example.
The following
result which is well known in the economics literature [16, 17] states that if agents follow the above social learning protocol, then after some finite time , an
information cascade occurs.
Theorem 2.1 ([16])
The social learning protocol described in Sec.2.1 leads to an information cascade in finite time with probability 1. That is there exists a finite time after which social learning ceases, i.e., public belief , , and all agents choose the same action, i.e., , .
Instead of reproducing the proof, let us give some insight as to why Theorem 2.1 holds. It can be shown using martingale methods that at some finite time , the agent’s probability becomes independent of the private observation . Then clearly from (4), . Substituting this into the social learning filter (3), we see that . Thus after some finite time , the social learning filter hits a fixed point and social learning stops. As a result, all subsequent agents completely disregard their private observations and take the same action , thereby forming an information cascade (and therefore a herd).
2.4 Constrained Interactive Sensing: Individual Privacy vs Group Reputation
The above social learning protocol can be interpreted as follows. Agents seek to estimate an underlying state of nature but reveal their actions by maximizing their privacy according to the optimization (2). This leads to an information cascade and social learning stops. In other words, agents are interested in optimizing their own costs (such as maximizing privacy) and ignore the information benefits their action provides to others.
Partially Observed Markov Decision Process Formulation
We now describe an optimized social learning procedure
that
delays herding.
Suppose agents seek to maximize the reputation of their social group by minimizing the following social welfare cost involving all agents in the social group (compared to the myopic objective (2) used in standard social learning):
(5) 
In (5), denotes the decision rule that agents use to choose their actions as will be explained below. Also is an economic discount factor and denotes the initial probability (prior) of the state . and denote the probability measure and expectation of the evolution of the observations and underlying state which are strategy dependent.
The key attribute of (5) is that each agent chooses its action according to the privacy constrained rule
(6) 
Here, the policy
maps the available public belief to the set of privacy values. The higher the privacy value, the less the agent reveals through its action. This is in contrast to standard social learning (2) in which the action chosen is , namely a myopic function of the private observation and public belief.
The above formulation can be interpreted as follows: Individual agents seek to maximize their privacy according to social learning (6) but also seek to maximize the reputation of their entire social group (5).
Determining the policy that minimizes (5), and thereby maximizes the social group reputation, is equivalent to solving a stochastic control problem that is called a partially observed Markov decision process (POMDP) problem [40, 51]. A POMDP comprises of a noisy observed Markov chain and the dynamics of the posterior distribution (belief state) is controlled by a policy ( in our case).
Structure of Privacy Constrained Sensing Policy
In general, POMDPs are computationally intractable to solve and therefore one cannot say anything useful about the structure of the optimal policy . However, useful insight can be obtained by considering the following extreme case of the above problem. Suppose there are two privacy values and each agent chooses action
That is, an agent either reveals its raw observation (no privacy) or chooses its action by completely neglecting its observation (full privacy). Once an agent chooses the full privacy option, then all subsequent agents choose exactly the same option and therefore herd  this follows since each agent’s action reveals nothing about the underlying state of nature. Therefore, for this extreme example, determining the optimal policy is equivalent to solving a stopping time problem: Determine the earliest time for agents to herd (maintain full privacy) subject to maximizing the social group reputation.
For such a quickest herding stopping time problem, one can say a lot about the structure of .
Suppose the sensing system
wishes to determine if the state of nature is a specific target state (say state 1).
Then [40] shows that under reasonable conditions
on the observation distribution and supermodular conditions on the costs ([30] discusses
supermodularity of influence in social networks),
the dynamic programming recursion has a supermodular structure
(see also [52, 53, 54, 41, 55] for related results).
This implies that the optimal policy has the following structure:
There exists a threshold curve that partitions the belief space such that when the belief state is on one
side of the curve it is optimal for agents to reveal full observations; if the belief state is on the other side of the curve
then it is optimal to herd. Moreover, the target state 1 belongs to the region in which it is optimal to herd.
“Choose increased privacy when belief is close to target state” ,
then the group behavior is sophisticated – herding is delayed and accurate estimates of the state of nature can be obtained.
3 Data Incest in Online Reputation Systems
This section generalizes the previous section by considering social learning in a social network. How can multiple social sensors interacting over a social network estimate an underlying state of nature? The state could be the position coordinates of an event [9] or the quality of a social parameter such as quality of a restaurant or political party.
The motivation for this section can be understood in terms of the following sensing example. Consider the following interactions in a multiagent social network where agents seek to estimate an underlying state of nature. Each agent visits a restaurant based on reviews on an online reputation website. The agent then obtains a private measurement of the state (e.g., the quality of food in a restaurant) in noise. After that, he reviews the restaurant on the same online reputation website. The information exchange in the social network is modeled by a directed graph. As mentioned in the introduction, data incest [57] arises due to loops in the information exchange graph. This is illustrated in the graph of Fig.2. Agents 1 and 2 exchange beliefs (or actions) as depicted in Fig.2. The fact that there are two distinct paths between Agent 1 at time 1 and Agent 1 at time 3 (these two paths are denoted in red) implies that the information of Agent 1 at time 1 is double counted thereby leading to a data incest event.
How can data incest be removed so that agents obtain a fair (unbiased) estimate of the underlying state? The methodology of this section can be interpreted in terms of the recent Time article [58] which provides interesting rules for online reputation systems. These include: (i) review the reviewers, and (ii) censor fake (malicious) reviewers. The data incest removal algorithm proposed in this paper can be viewed as “reviewing the reviews” of other agents to see if they are associated with data incest or not.
The rest of this section is organized as follows:

Sec.3.1 describes the social learning model that is used to mimic the behavior of agents in online reputation systems. The information exchange between agents in the social network is formulated on a family of time dependent directed acyclic graphs.

In Sec.3.2, a fair reputation protocol is presented and the criterion for achieving a fair rating is defined.

Sec.3.3 presents an incest removal algorithm so that the online reputation system achieves a fair rating. A necessary and sufficient condition is given on the graph structure of information exchange between agents so that a fair rating is achievable.
Related works Collaborative recommendation systems are reviewed and studied in [59, 60]. In [61], a model of Bayesian social learning is considered in which agents receive private information about the state of nature and observe actions of their neighbors in a treebased network. Another type of misinformation caused by influential agents (agents who heavily affect actions of other agents in social networks) is investigated in [18]. Misinformation in the context of this paper is motivated by sensor networks where the term “data incest” is used [62]. Data incest also arises in Belief Propagation (BP) algorithms [63, 64] which are used in computer vision and errorcorrecting coding theory. BP algorithms require passing local messages over the graph (Bayesian network) at each iteration. For graphical models with loops, BP algorithms are only approximate due to the overcounting of local messages [65] which is similar to data incest in social learning. With the algorithms presented in this section, data incest can be mitigated from Bayesian social learning over nontree graphs that satisfy a topological constraint. The closest work to the current paper is [57]. However, in [57], data incest is considered in a network where agents exchange their private belief states  that is, no social learning is considered. Simpler versions of this information exchange process and estimation were investigated in [66, 67, 68].
3.1 Information Exchange Graph in Social Network
Consider an online reputation system comprised of social sensors that aim to estimate an underlying state of nature (a random variable). Let represent the state of nature (such as the quality of a hotel) with known prior distribution . Let depict epochs at which events occur. These events involve taking observations, evaluating beliefs and choosing actions as described below. The index marks the historical order of events and not necessarily absolute time. However, for simplicity, we refer to as “time”.
To model the information exchange in the social network, we will use a family of directed acyclic graphs. It is convenient also to reduce the coordinates of time and agent to a single integer index which is used to represent agent at time :
(7) 
We refer to as a “node” of a time dependent information flow graph that we now define.
Some Graph Theoretic Definitions
Let
(8) 
denote a sequence of timedependent graphs of information flow in the social network until and including time where . Each vertex in represents an agent in the social network at time and each edge in shows that the information (action) of node (agent at time ) reaches node (agent at time ). It is clear that the communication graph is a subgraph of . This means that the diffusion of actions can be modelled via a family of timedependent directed acyclic graphs (a directed graph with no directed cycles.
The algorithms below will involve specific columns of the adjacency matrix transitive closure matrix of the graph . The Adjacency Matrix of is an matrix with elements given by
(9) 
The transitive closure matrix is the matrix
(10) 
where for any matrix , the matrix has elements
Note that if there is a single hop path between nodes and , In comparison, if there exists a path (possible multihop) between node and .
The information reaching node depends on the information flow graph . The following two sets will be used to specify the incest removal algorithms below:
(11)  
(12) 
Thus denotes the set of previous nodes that communicate with node in a singlehop. In comparison, denotes the set of previous nodes whose information eventually arrive at node . Thus contains all possible multihop connections by which information from a node eventually reaches node .
Example
To illustrate the above notation consider a social network consisting of two groups with the following information flow graph for three time points .
Fig.3 shows the nodes where .
Note that in this example, as is apparent from Fig.2, each node remembers all its previous actions.
The information flow is characterized by
the family of directed acyclic graphs with adjacency matrices
,
,
,
Since nodes 1 and 2 do not communicate, clearly and are zero matrices. Nodes 1 and 3 communicate as do nodes 2 and 3, hence has two ones, etc. Finally from (11) and (12),
where denotes all one hop links to node 5 while denotes all multihop links to node 5.
Note that is always the upper left submatrix of . Also due to causality with respect to the time index , the adjacency matrices are always upper triangular.
3.2 Fair Online Reputation System
Protocol for Fair Online Reputation System
The procedure summarized in Protocol 1 aims to evaluate a fair reputation that uses social learning over a social network by eliminating incest.
(i) Information from Social Network:

Recommendation from friends: Node receives past actions from previous nodes in the social network. is defined in (11).

Automated Recommender System: For these past actions , the network administrator has already computed the public beliefs using Step (v) below.
The automated recommender system fuses public beliefs , into the single recommendation belief as(13) The fusion algorithm will be designed below.
(ii) Observation: Node records private observation from distribution , .
(iii) Private Belief: Node then uses and public belief to update its private belief via Bayes formula as
(14) 
(iv) Myopic Action: Node takes action
and inputs its action to the online reputation system.
(v) Public Belief Update by Network Administrator: Based on action ,
the network administrator (automated algorithm) computes the public belief using the social learning
filter (3) with .
Aim: Our aim is to design algorithm in the automated recommender system (13) of Protocol 1 so that the following requirement is met:
where  (15) 
We call in (15) the true or fair online rating available to node since defined in (12) denotes all information (multihop links) available to node . By definition is incest free since it is the desired conditional probability that we want. If algorithm is designed so that satisfies (15), then the computation (14) and Step (v) yield
which are, respectively, the correct private belief for node and the correct afteraction public belief.
Discussion of Protocol 1
(i) Data Incest: It is important to note that without careful design of algorithm ,
due to loops in the dependencies of actions on previous actions, the public rating computed using (13) can be substantially different
from the fair online rating of (15).
As a result, computed via (14) will not be the correct private belief and
incest will propagate in the network. In other words, , and are defined purely in terms of their computational expressions in Protocol 1 – at this stage
they are not necessarily the desired conditional probabilities, unless algorithm is designed to remove incest.
Note that instead of (13), node could naively (and incorrectly) assume that the public beliefs that it received are independent. It would then fuse these public beliefs as
(16) 
This, of course, would result in data incest.
(ii) How much does an individual remember?: The above protocol has the flexibility of modelling cases where
either each node remembers some (or all) of its past actions or none of its past actions. This facilitates modelling cases in which
people forget most of the past except for specific highlights.
(iii) Automated Recommender System: Steps (i) and (v) of Protocol 1 can be combined into an automated recommender system that maps previous actions of agents in the social group
to a single recommendation (rating) of (13). This recommender system can operate completely opaquely to the actual user (node ). Node simply
uses the automated rating as the current best available rating from the reputation system.
(iii) Social Influence. Informational Message vs Social Message: In Protocol 1, it is important that each individual deploys
Algorithm to fuse the beliefs ; otherwise incest can propagate.
Here, can be viewed as the “social message”, i.e., personal friends of node since they directly communicate to node while the associated beliefs can be viewed as the
“informational message”.
The social message from personal friends exerts a large social influence – it provides significant incentive (peer pressure) for individual to comply with
Protocol 1 and thereby prevent incest.
Indeed, a remarkable recent study described in [69] shows that social messages (votes)
from known friends has significantly more influence on an individual than the information in the messages themselves. This study includes comparison of information messages and social messages on Facebook and their direct
effect on voting behavior.
To quote [69], “The effect of social transmission on realworld voting
was greater than the direct effect of the messages themselves…”
(iv) Agent Reputation:
The cost function minimization in Step (iv) can be interpreted in terms of the reputation of agents in online reputation systems. If an agent continues to write bad reviews for high quality
restaurants on Yelp, his reputation becomes lower among the users. Consequently, other people ignore reviews of that (lowreputation) agent in evaluating their opinion about the social unit under study (restaurant). Therefore, agents minimize the penalty of writing inaccurate reviews (or equivalently increase their reputations) by choosing proper actions.
(v) Think and act: Steps (ii), (iii) (iv) and (v) of Protocol 1 constitute standard social learning as described in Sec.2.1.
The key difference with standard social learning is Steps (i) performed by the network administrator.
Agents receive public beliefs from the social network with arbitrary random delays.
These delays reflect the time an agent takes between reading the publicly available reputation and making its decision. It is typical behavior of people to
read published ratings multiple times and then think for an arbitrary amount of time before acting.
3.3 Incest Removal Algorithm in Online Reputation System
Fair Rating Algorithm
It is convenient to work with the logarithm of the unnormalized belief
The following theorem shows that the logarithm of the fair rating defined in (15) can be obtained as linear weighted combination of the logarithms of previous public beliefs.
Theorem 3.1 (Fair Rating Algorithm)
Consider the online reputation system running Protocol 1. Suppose the following algorithm is implemented in (13) of Protocol 1 by the network administrator:
where  (17) 
Then . That is, algorithm computes the fair rating defined in (15).
In (3.1), is an dimensional weight vector.
Recall that
denotes the first elements of the th column of transitive closure matrix .
Theorem 3.1 says that the fair rating can be expressed as a linear function of the action loglikelihoods in terms of the transitive closure matrix of graph . This is intuitive since can be viewed as the sum of information collected by the nodes such that there are paths between all these nodes and .
Achievability of Fair Rating by Protocol 1
We are not quite done!

On the one hand, algorithm at node specified by (13) has access only to beliefs – equivalently it has access only to beliefs from previous nodes specified by which denotes the last column of the adjacency matrix .

On the other hand, to provide incest free estimates, algorithm specified in (3.1) requires all previous beliefs that are specified by the nonzero elements of the vector .
The only way to reconcile points 1 and 2 is to ensure that implies for . The condition means that the single hop past estimates available at node according to (13) in Protocol 1 provide all the information required to compute in (3.1). This is a condition on the information flow graph . We formalize this condition in the following theorem.
Theorem 3.2 (Achievability of Fair Rating)
Consider the fair rating algorithm specified by (3.1). For Protocol 1 with available information to achieve the estimates of algorithm (3.1), a necessary and sufficient condition on the information flow graph is
(18) 
Therefore for Protocol 1 to generate incest free estimates for nodes , condition (18) needs to hold for each . (Recall is specified in (3.1).)
Illustrative Example (continued)
Let us continue with the example of Fig.2 where we already specified the adjacency matrices of the graphs
, , , and .
Using (10), the transitive closure matrices obtained from the adjacency matrices are given by:
,
.
Note that is nonzero only for due to causality since information sent by a social group can only arrive at another social group at a later time instant. The weight vectors are then
obtained from (3.1) as
Let us examine these weight vectors. means that node does not use the estimate from node . This formula is consistent with the constraint information flow because estimate from node is not available to node ; see Fig.3.
means that node uses estimates from node and ; means
that node uses estimates only from node and node . The estimate from node is not available at node 4. As shown in Fig.3, the misinformation propagation occurs at node . The vector says that node 5 adds estimates from nodes and and removes estimates from nodes and to avoid double counting of these estimates already integrated into estimates from node and . Indeed, using the algorithm (3.1), incest is completely prevented in this example.
3.4 Summary
In this section, we have outlined a controlled sensing problem over a social network in which the administrator controls (removes) data incest and thereby maintains an unbiased (fair) online reputation system. The state of nature could be geographical coordinates of an event (in a target localization problem) or quality of a social unit (in an online reputation system). As discussed above, data incest arises due to the recursive nature of Bayesian estimation and nondeterminism in the timing of the sensing by individuals. Details of proofs, extensions and further numerical studies are presented in [57, 70].
4 Interactive Sensing for Quickest Change Detection
In this section we consider interacting social sensors in the context of detecting a change in the underlying state of nature. Suppose a multiagent system performs social learning and makes local decisions as described in Sec.2. Given the public beliefs from the social learning protocol, how can quickest change detection be achieved? In other words, how can a global decision maker use the local decisions from individual agents to decide when a change has occurred? It is shown below that making a global decision (change or no change) based on local decisions of individual agents has an unusual structure resulting in a nonconvex stopping set.
A typical application of such social sensors arises in the measurement of the adoption of a new product using a microblogging platform like Twitter. The adoption of the technology diffuses through the market but its effects can only be observed through the tweets of select individuals of the population. These selected individuals act as sensors for estimating the diffusion. They interact and learn from the decisions (tweeted sentiments) of other members and therefore perform social learning. Suppose the state of nature suddenly changes due to a sudden market shock or presence of a new competitor. The goal for a market analyst or product manufacturer is to detect this change as quickly as possible by minimizing a cost function that involves the sum of the false alarm and decision delay.
Related works [26, 27] model diffusion in networks over a random graph with arbitrary degree distribution. The resulting diffusion is approximated using deterministic dynamics via a mean field approach [71]. In the seminal paper [1], a sensing system for complex social systems is presented with data collected from cell phones. This data is used in [1] to recognize social patterns, identify socially significant locations and infer relationships. In [9], people using a microblogging service such as Twitter are considered as sensors. In particular, [9] considers each Twitter user as a sensor and uses a particle filtering algorithm to estimate the centre of earthquakes and trajectories of typhoons. As pointed out in [9], an important characteristic of microblogging services such as Twitter is that they provide realtime sensing – Twitter users tweet several times a day; whereas standard blog users update information once every several days.
Apart from the above applications in real time sensing, change detection in social learning also arises in mathematical finance models. For example, in agent based models for the microstructure of asset prices in high frequency trading in financial systems [25], the state denotes the underlying asset value that changes at a random time . Agents observe local individual decisions of previous agents via an order book, combine these observed decisions with their noisy private signals about the asset, selfishly optimize their expected local utilities, and then make their own individual decisions (whether to buy, sell or do nothing). The market evolves through the orders of trading agents. Given this order book information, the goal of the market maker (global decision maker) is to achieve quickest change point detection when a shock occurs to the value of the asset [72].
4.1 Classical Quickest Detection
The classical Bayesian quickest time detection problem [73] is as follows: An underlying discretetime state process jumpchanges at a geometrically distributed random time . Consider a sequence of discrete time random measurements , such that conditioned on the event , , are independent and identically distributed (i.i.d.) random variables with distribution and are i.i.d. random variables with distribution . The quickest detection problem involves detecting the change time with minimal cost. That is, at each time , a decision needs to be made to optimize a tradeoff between false alarm frequency and linear delay penalty.
To formalize this setup, let denote the transition matrix of a two state Markov chain in which state 1 is absorbing. Then it is easily seen that the geometrically distributed change time is equivalent to the time at which the Markov chain enters state 1. That is and . Let be the time at which the decision (announce change) is taken. The goal of quickest time detection is to minimize the Kolmogorov–Shiryaev criterion for detection of a disorder [74]:
(19) 
Here if and otherwise. The nonnegative constants and denote the delay and false alarm penalties, respectively. So waiting too long to announce a change incurs a delay penalty at each time instant after the system has changed, while declaring a change before it happens, incurs a false alarm penalty . In (19) denotes the strategy of the decision maker. and are the probability measure and expectation of the evolution of the observations and Markov state which are strategy dependent. denotes the initial distribution of the Markov chain .
In classical quickest detection, the decision policy is a function of the twodimensional belief state (posterior probability mass function) , , with . So it suffices to consider one element, say , of this probability mass function. Classical quickest change detection (see for example [73]) says that the policy which optimizes (19) has the following threshold structure: There exists a threshold point such that
(20) 
4.2 Multiagent Quickest Detection Problem
With the above classical formulation in mind, consider now the following multiagent quickest change detection problem. Suppose that a multiagent system performs social learning to estimate an underlying state according to the social learning protocol of Sec.2.1. That is, each agent acts once in a predetermined sequential order indexed by (Equivalently, as pointed out in the discussion in Sec.2.1, a finite number of agents act repeatedly in some predefined order and each agent chooses its local decision using the current public belief.) Given these local decisions (or equivalently the public belief), the goal of the global decision maker is to minimize the quickest detection objective (19). The problem now is a nontrivial generalization of classical quickest detection. The posterior is now the public belief given by the social learning filter (3) instead of a standard Bayesian filter. There is now interaction between the local and global decision makers. The local decision from the social learning protocol determines the public belief state via the social learning filter (3), which determines the global decision (stop or continue), which determines the local decision at the next time instant, and so on.
The global decision maker’s policy that optimizes the quickest detection objective (19) and the cost of this optimal policy are the solution of “Bellman’s dynamic programming equation”
(21) 
Here and are given by the social learning filter (3)  recall that denotes the local decision. is called the “value function” – it is the cost incurred by the optimal policy when the initial belief state (prior) is . As will be shown the numerical example below, the optimal policy has a very different structure compared to classical quickest detection.
4.3 Numerical Example
We now illustrate the unusual multithreshold property of the global decision maker’s optimal policy in multiagent quickest detection with social learning. Consider the social learning model of Sec.2.1 with the following parameters: The underlying state is a 2state Markov chain with state space and transition probability matrix . So the change time (i.e., the time the Markov chain jumps from state 2 into absorbing state 1) is geometrically distributed with .
Social Learning Parameters: Individual agents observe the Markov chain in noise with the observation symbol set . Suppose the observation
likelihood matrix with elements is
.
Agents can choose their local actions from the action set .
The state dependent cost matrix of these actions is .
Agents perform social learning with the above parameters.
The intervals and
in Fig.4(a) are regions where the optimal local actions taken by agents are independent of their observations.
For , the optimal local action is 2 and for , the optimal local action is 1.
So individual agents herd for belief states in these intervals (see the definition in Sec.2.3)
and the local actions
do not yield any information about the underlying state.
Moreover, the interval depicts a region where all agents herd (again see the definition in Sec.2.3), meaning that once the belief state is in this region, it remains so
indefinitely and all agents choose the same local action 1.
Global Decision Making: Based on the local actions of the agents performing social learning, the global decision maker needs to perform quickest change detection. The global decision maker uses the delay penalty and false alarm penalty in the objective function (19). The optimal policy of the global decision maker where is plotted versus in Fig.4(a). Note means that with certainty no change has occurred, while means with certainty a change has occurred. The policy was computed by constructing a uniform grid of 1000 points for and then implementing the dynamic programming equation (4.2) via a fixed point value iteration algorithm for 200 iterations. The horizontal axis is the posterior probability of no change. The vertical axis denotes the optimal decision: denotes stop and declare change, while denotes continue.
The most remarkable feature of Fig.4(a) is the multithreshold behavior of the global decision maker’s optimal policy . Recall depicts the posterior probability of no change. So consider the region where and sandwiched between two regions where . Then as (posterior probability of no change) increases, the optimal policy switches from to . In other words, the optimal global decision policy “changes its mind” – it switches from no change to change as the posterior probability of a change decreases! Thus, the global decision (stop or continue) is a nonmonotone function of the posterior probability obtained from local decisions.
Fig.4(b) shows the associated value function obtained via stochastic dynamic programming (4.2). Recall that is the cost incurred by the optimal policy with initial belief state . Unlike standard sequential detection problems in which the value function is concave, the figure shows that the value function is nonconcave and discontinuous. To summarize, Fig.4 shows that social learning based quickest detection results in fundamentally different decision policies compared to classical quickest time detection (which has a single threshold). Thus making global decisions (stop or continue) based on local decisions (from social learning) is nontrivial. In [41], a detailed analysis of the problem is given together with a characterization of this multithreshold behavior. Also more general phasedistributed change times are considered in [41].
5 Coordination of Decisions in Sensing  Noncooperative Game Approach
The discussion so far has dealt with Bayesian social learning models for sensing. In this section, we present a highly stylized nonBayesian noncooperative game theoretic learning approach for adaptive decision making amongst agents.
Social and economic situations often involve interacting decision making with diverging interests. Decision makers may act independently or form collaborative groups wherein enforceable binding agreements ensure coordination of joint decisions. For instance, a person may choose the same cellphone carrier as the majority of family and friends to take advantage of the free talk times. Social networks diffuse information and hence facilitate coordination of such cooperative/selfinterested units. This section examines how global coordination of decisions can be obtained when self interested agents form a social network.
As mentioned in the Introduction, humanbased sensing systems are comprised of agents with partial information and it is the dynamic
interactions between agents that is of interest. This motivates the need for game theoretic learning models for agents interacting in social networks.
Learning dynamics in games typically can be classified into Bayesian learning, adaptive learning and evolutionary dynamics. We have already focussed on
Bayesian social learning
In this section we focus on adaptive learning where individual agents deploy simple ruleofthumb strategies. The aim is to determine if such simple individual behaviour can result in sophisticated global behaviour. We are interested in cases where the global behaviour converges to the set of correlated equilibria.
5.1 Correlated Equilibria and Related Work
The set of correlated equilibria is a more natural construct in decentralized adaptive learning environments than the
set of Nash equilibria
Related works A comprehensive textbook in game theoretic learning is [77]. Algorithms for gametheoretic learning are broadly classified into best response, fictitious play and regret matching. In general it is impossible to guarantee convergence to a Nash equilibrium without imposing conditions on the structure of the utility functions in the game. For supermodular games [78], best response algorithms can be designed to converge either to the smallest or largest Nash equilibrium. Fictitious play is one of the oldest and best known models of learning in games; we refer the reader to [79] for convergence of stochastic fictitious play algorithms. In this section we focus on regretmatching algorithms. Regretmatching as a strategy of play in longrun interactions has been introduced in [75, 76]. In [75], it is proved that when all agents share stage actions and follow the proposed regretbased adaptive procedure, the collective behavior converges to the set of correlated equilibria. In [76], the authors assumed that agents do not observe others’ actions and proposed a reinforcement learning procedure that converges to the set of correlated equilibria. More recently, [80, 81, 37, 38] consider learning in a dynamic setting where a regret matching type algorithm tracks a time varying set of correlated equilibria.
5.2 RegretBased Decision Making
Consider a noncooperative repeated game comprising of agents. Each agent has a utility function . Here denotes the action chosen by agent and denote the actions chosen by all agents excluding agent . The utility function can be quite general. For example, [82] considers the case in which the agents are organized into nonoverlapping social (friendship) groups such that agents in a social group share the same utility function. It could also reflect reputation or privacy using the models in [49, 50].
Suppose each agent chooses its actions according to the following adaptive algorithm running over time :

At time , choose action from probability mass function , where
(22) Here is a sufficiently large positive constant so that is a valid probability mass function.

The regret matrix that determines the pmf is updated via the stochastic approximation algorithm
(23)
Step 1 corresponds to each agent choosing its action randomly from a Markov chain with transition probability . These transition probabilities are computed in Step 2 in terms of the regret matrix which is the timeaveraged regret agent experiences for choosing action instead of action for each possible action (i.e., how much better off it would be if it had chosen action instead of ):
(24) 
The above algorithm can be generalized to consider multiple social groups. If agents within each social group share their actions and have a common utility, then they can fuse their individual regrets into a regret for the social group. As shown in [82], this fusion of regrets can be achieved via a linear combination of the individual regrets where the weights of the linear combination depend on the reputation of the agents that constitute the social group.
5.3 Coordination in Sensing
We now address the following question:
If each agent chooses its action according to the above regretbased algorithm, what can one say about the emergent global behavior?
By emergent global behavior, we mean the empirical frequency of actions taken over time by all agents. For each tuple of actions define the empirical frequency of actions taken up to time as
The seminal papers [75] and [42] show that the empirical frequency of actions converges as to the set of correlated equilibria of a noncooperative game. Correlated equilibria constitute a generalization of Nash equilibria and were introduced by Aumann [34]. The set of correlated equilibria is the set of probability distributions on the joint action profile that satisfy
(25) 
Here denotes the randomized strategy (joint probability) of player choosing action and the rest of the players choosing action . The correlated equilibrium condition (25) states that instead of taking action (which is prescribed by the equilibrium strategy ), if player cheats and takes action , it is worse off. So there is no unilateral incentive for any player to cheat.
To summarize, the above algorithm ensures that all agents eventually achieve coordination (consensus) in decision making – the randomized strategies of all agents converge to a common convex polytope . Step 2 of the algorithm requires that each agent knows its own utility and the actions of other agents – but agents do not need to know the utility functions of other agents. In [76] a ‘blind’ version of this regretbased algorithm is presented in which agents do not need to know the actions of other agents. These algorithms can be viewed as examples in which simple heuristic behavior by individual agents (choosing actions according to the measured regret) resulting in sophisticated global outcomes [42], namely convergence to thereby coordinating decisions.
We refer to [37, 38, 80] for generalizations of the above algorithm to the tracking case where the step size for the regret matrix update is a constant. Such algorithms can track the correlated equilibria of games with timevarying parameters. Moreover [81] gives sufficient conditions for algorithm to converge to the set of correlated equilibria when the regrets from agents to other agents diffuse over a social network.
6 Closing Remarks
In this paper we have used social learning as a model for interactive sensing with social sensors. We summarize here some extensions of the social learning framework that are relevant to interactive sensing.
Wisdom of Crowds
Surowiecki’s book [83] is an excellent popular piece that explains the wisdomofcrowds hypothesis. The wisdomofcrowds hypothesis predicts that the independent judgments of a crowd of individuals (as measured by any form of central tendency) will be relatively accurate, even when most of the individuals in the crowd are ignorant and error prone. The book also studies situations (such as rational bubbles) in which crowds are not wiser than individuals. Collect enough people on a street corner staring at the sky, and everyone who walks past will look up. Such herding behavior is typical in social learning.
In which order should agents act?
In the social learning protocol, we assumed that the agents act sequentially in a predefined order.
However, in many social networking applications, it is important to optimize the order in which agents act. For example,
consider an online review site where individual reviewers with different reputations make their reviews publicly available.
If a reviewer with high reputation publishes its review first, this review will unduly affect the decision of a reviewer with lower reputation.
In other words, if the most senior agent “speaks” first it would unduly affect the decisions of more junior agents. This could lead to an increase in bias of the underlying state estimate.
Global Games for Coordinating Sensing
In the classical Bayesian social learning model of Sec.2, agents act sequentially in time. The global games model that has been studied in economics during the last two decades, considers multiple agents that act simultaneously by predicting the behavior of other agents. The theory of global games was first introduced in [86] as a tool for refining equilibria in economic game theory; see [87] for an excellent exposition. Global games are an ideal method for decentralized coordination amongst agents; they have been used to model speculative currency attacks and regime change in social systems, see [87, 88, 89].
The most widely studied form of a global game is a oneshot Bayesian game which proceeds as follows: Consider a continuum of agents in which each agent obtains noisy measurements of an underlying state of nature . Assume all agents have the same observation likelihood density but the individual measurements obtained by agents are statistically independent of those obtained by other agents. Based on its observation , each agent takes an action to optimize its expected utility where denotes the fraction of all agents that take action 2. Typically, the utility is set to zero.
For example, suppose (state of nature) denotes the quality of a social group and denotes the measurement of this quality by agent . The action means that agent decides not to join the social group, while means that agent joins the group. The utility function for joining the social group depends on , where is the fraction of people that decide to join the group. In [88], the utility function is chosen as follows: If , i.e., too many people join the group, then the utility to each agent is small since the group is too congested and age