Enabling Data Exchange in Interactive State Estimation under Privacy Constraints
Abstract
Data collecting agents in large networks, such as the electric power system, need to share information (measurements) for estimating the system state in a distributed manner. However, privacy concerns may limit or prevent this exchange leading to a tradeoff between state estimation fidelity and privacy (referred to as competitive privacy). This paper builds upon a recent informationtheoretic result (using mutual information to measure privacy and meansquared error to measure fidelity) that quantifies the region of achievable distortionleakage tuples in a twoagent network. The objective of this paper is to study centralized and decentralized mechanisms that can enable and sustain nontrivial data exchanges among the agents. A centralized mechanism determines the data sharing policies that optimize a networkwide objective function combining the fidelities and leakages at both agents. Using commongoal games and bestresponse analysis, the optimal policies allow for distributed implementation. In contrast, in the decentralized setting, repeated discounted games are shown to naturally enable data exchange without any central control nor economic incentives. The effect of repetition is modeled by a timeaveraged payoff function at each agent which combines its fidelity and leakage at each interaction stage. For both approaches, it is shown that nontrivial data exchange can be sustained for specific fidelity ranges even when privacy is a limiting factor.
I Introduction
The increasing demand for sustainable energy in the information era requires a highly efficient and reliable electric power system in which renewables can be effectively integrated. Given the size and complexity of the electric network, sustained and reliable operations involve an intelligent cyber layer that enables distributed monitoring, processing and control of the network. In fact, data collection and processing is performed locally at various collecting entities (e.g., utility companies, systems operators, etc.) that are spread out geographically. The interconnectedness of the network requires that these distributed entities share data amongst themselves to ensure precise estimation and control, and in turn, system stability and reliability. Despite its importance, data sharing in the electric power system is limited  sometimes with catastrophic consequences [2, 3]  because of competitive interests or privacy concerns. Furthermore, this problem of distributed computation, control and data sharing is not specific to electrical power networks and may arise in other critical infrastructure networks (e.g., air transport, electronic healthcare, and the Internet). We henceforth refer to this problem as competitive privacy as in [4].
The notion of privacy is predominantly associated with the problem of ensuring that personal data about individuals, stored in a variety of databases or cloud servers, is not revealed. Quantifying the privacy of released data has captured a lot of attention from the computer science and information theoretic research communities leading to two different rigorous frameworks: differential privacy introduced by Dwork et al. [5], [6]; and informationtheoretic privacy developed in [7]. The first framework focuses on worstcase guarantees and ignores the statistics of the data; while the latter focuses on average guarantees and is cognizant of the input data statistics; their appropriateness depends on the application at hand. In the information era, however, privacy restrictions also appear in data exchange contexts as detailed here; it was first studied via an informationtheoretic framework in [4].
For the distributed state estimation problems via data exchange (as applied to the electric power system), the informationtheoretic competitive privacy framework holds the following advantages: (a) takes into account the statistical nature of the measurements and underlying state (e.g., complex voltage measurements in the grid that are often assumed to be Gaussian distributed); (b) combines both compression and privacy in one analysis by developing rate and privacy optimal data sharing protocols; and (c) quantifies privacy over all possible sequence of measurements and system states.
The competitive privacy informationtheoretic framework introduced recently in [4] studies data sharing among two interconnected agents when privacy concerns limits data sharing, and therefore, the fidelity of distributed state estimation performed by the agents. The authors proposed a distributed source coding model to quantify the informationtheoretical tradeoff between estimate fidelity (distortion via meansquare error), privacy (information leakage), and communication rate (data sharing rate). Every achievable distortionleakage tuple represents a fourdimensional vector of opposing quantities that cannot be optimal simultaneously; minimum distortion for one agent implies maximum leakage for the other; minimum leakage for one agent implies maximum distortion for the other. A pertinent question follows: how to choose such a tradeoff in practice?
The objective of this work is to address this question via mechanisms that can enable and sustain specific distortionleakage tuples in both centralized (a unique decisionmaker) and decentralized settings (each agent has his own individual agenda). Game theory is a mathematical toolbox for studying interactions among strategic agents and has established its value in a widevariety of fields including wireless communications [8], [9]. While often applied in the noncooperative and decentralized context, even in centralized settings, game theory can be valuable when devising efficient and distributed algorithms to compute the solution; in fact, these tools can be very useful to solve difficult, nonconvex problems that arise in multiagent models with multiple performance criteria (such as leakage vs. fidelity) as we present later in the sequel.
Our first approach assumes a central controller that imposes the data sharing choices of the two agents (e.g., when electric utility companies share their data with a central systems operator). The networkwide objective function captures both, the overall leakage of information and the total distortion of the estimates of the two agents via their weighted sum. To circumvent the nonconvexity of this objective function, we exploit the parallel between distributed optimization problems and potential games [10]. The Nash equilibria of the resulting commongoal game are the intersection points of the bestresponse functions which turn out to be piecewise affine. Moreover, using game theoretic tools we provide a distributed algorithm  the iterative bestresponse algorithm  that converges to an optimal solution. Our results show that the central controller can smoothly manipulate the distortionleakage tradeoff between two extremes: both users share fully their data (minimum distortion  maximum leakage) or not at all (maximum distortion  minimum leakage). Specifically, not all informationtheoretic tuples can arise as outcomes, but only the optimizers of the networkwide objective function.
If there is no central controller (e.g., when agents are two systems operators that need to share data to monitor large parts of the electric grid), each agent chooses its own data sharing strategy to optimize its individual distortionleakage tradeoff. In [11], we showed that data sharing decreases the distortion of the agent receiving data while the sharing agent only increases its leakage. Thus, when the interaction takes place only once (i.e., oneshot interaction), rational agents have no incentive to share data. Economic incentives overcome this issue [11] and all distortionleakage tuples can be achieved assuming that agents are paid (by a common moderator) for their information leakage.
In the second part of this paper, we show that pricing is not the only mechanism enabling cooperation. If the agents interact repeatedly over an indeterminate period, titfortat type of strategies (i.e., an agent shares his data as long as the other agent does the same) turn out to be stable outcomes of the new game. We show that a whole subregion of distortionleakage tuples (in between the aforementioned extremes) is achieved without the need for a central authority; effectively, the agents build trust by exchanging data in the long term.
Preliminary results regarding the repeated interaction have been presented in [1]. We provide here a complete analysis and detailed proofs. Moreover, in this current version, we: (i) introduce different discount factors to model individual preferences for present vs. future rewards; (ii) give closedform bounds on the discount factors; and (iii) illustrate more results.
The paper is organized as follows. In Section II, we introduce the system model and an overview of the most relevant information and gametheoretic concepts and prior results. The common goal noncooperative game and its Nash equilibria are analysed in Section III as a simpler alternative to a nonconvex centralized. In Section IV in which we show that repetition the repeated games framework and study its solutions and achievable distortionleakage pairs. Numerical results that illustrate the analysis are also provided. We conclude in Section V.
Ii System Model
We consider a network composed of physically interconnected nodes as illustrated in Figure 1. We focus only on a pair of such nodes  called agents  which are capable of communicating and sharing some of their collected data.
Each agent observes a sequence of measurements from which it estimates a set of system parameters, henceforth referred to as states. The measurements at each agent are also affected by the states of the other agent. For simplicity reasons, we consider a linear approximation model (e.g., model of voltages in the electric power network [12]). Denoting the state and measurement vectors at agent as and , respectively, the linear model is:
(1) 
where , are positive parameters. The states, and , for all are assumed to be independent and identically distributed (i.i.d.) zeromean unitvariance Gaussian random variables and the additive zeromean Gaussian noise variables, and , are assumed to be independent of the agent states and of fixed variances and , respectively.
This model is relevant to direct current (DC) state estimation problems in which the agents (e.g., system operators or energy management entities) need to share their local measurements (e.g., power flow and injections at specific locations) to estimate with high fidelity their local states (e.g., complex voltages).
Agent can improve the fidelity of its state estimate if the other agent decides to share some information regarding his measurements  say . At the same time, the amount of agent leakage on his state information is constrained (in the competitive privacy framework of [4]). These conflicting aspects are measured by informationtheoretic concepts: the desired fidelity and privacy amount to meeting a distortion (meansquared error) and a information leakage constraint, respectively:
(2a)  
(2b) 
where represents the distortion of estimate  which depends on the other agent’s sharing policy,  from the actual state , and is the maximum information leakage. The mutual information in (2b) measures the average leak of information per sample about the private state of agent to the other agent. The other agent can infer information on from two sources: (i) his own measurements (1); and (ii) the data shared by agent , i.e., .
Sankar et al. [4] determined the entire region of achievable tuples. The authors devised a particular coding scheme  based on quantization and binning techniques  that satisfies the distortion constraints and achieves the minimal leakage constraint (for both agents). We summarize the resulting achievable distortionleakage (DL) region in the following theorem.
Theorem 1.
[4] The
distortionleakage tradeoff for a twoagent competitive privacy problem (described above) is the fourdimensional set of all
tuples such that:
For all ,

:
(3) 
: ,
with the parameters , , , , and
The maximal and minimal distortions, denoted by and , represent the extreme cases in which, the other agent , either sends no information or fully discloses his measurements. If , the distortion constraint is nontrivial and agent has to leak information about his own state. The leakage is increasing with . If , the distortion constraint is trivial, and agent does not have to send any data. His minimum leakage is not zero because agent can still infer some private data (on agent state) from his measurements .
Notice that the region contains asymmetric tuples in terms of data sharing. This results from the opposing distortion and leakage components that cannot be optimal simultaneously: minimum distortion at one agent corresponds to maximum leakage at the other, and minimum leakage of one agent corresponds to maximum distortion at the other. From this region (which is four dimensional) alone, it is not clear how to choose such a tradeoff tuple. In this paper, the main objective is to study different mechanisms that explain how specific tuples may arise in centralized and decentralized settings.
Iii Centralized solution via common goal games
Reliability in the North American electric power network is ensured by regulatory bodies (such as the North American Electric Regulatory Corporation (NERC) [13], Federal Electricity Regulatory Commission (FERC) [2]) and enforced by regional and independent system operators. Our first approach is focused on centralized networks in which a central controller dictates the datasharing policies of the two agents.
The controller wishes to minimize both the overall estimation fidelity and the information leakage. But, as discussed in the previous section, the two objectives are opposing and they cannot be optimized simultaneously; a networkwide compromise has to be made.
In multiobjective optimization problems, scalarization via the weighted sum of the different objectives is a common technique that provides good tradeoff tuples by solving a simpler scalar problem instead. In some cases (such as convex optimization problems), the tuples obtained by tuning the weights among the objectives are all optimal tradeoffs [14].
The networkwide objective function that captures the tradeoff between overall estimation fidelity and leakage  by their weighted sum  writes as follows:
where the leakage of information is given in (2b) and is the ratio of the weighting factors between the two terms.
For homogeneity reasons, the second term has to relate to logarithmic information measures. We propose to balance the information leakage (in bits/sample) with the overall shared information (also in bits/sample) which is inversely proportional to the distortion [15, Chap. 10]; as the distortions decrease, the information revealed per sample (or communication rate) increase.
The problem reduces to finding the distortion pairs  characterizing the datasharing policies of both users  which maximize the objective function (III). One can easily check that this function is not always concave on its domain. By using a distributed approach to find the solution, we can overcome this obstacle. Assume each agent controls his own datasharing policy which impacts directly on the distortion at the other agent. The control parameter (or action) of agent is denoted by . The agents choices are driven by the same common goal, i.e., the networkwide objective function.
We further exploit the parallel between distributed optimization and potential games which has several advantages: (i) allows to solve a nonconvex problem in a simpler manner; (ii) leads to an iterative and distributed procedure that converges to a local optimal tradeoff tuple; and (iii) the central controller can manipulate this outcome by tuning a scalar parameter alone. The partial shift of intelligence, from the centralized controller towards the agents, paves the way of developing scalable datasharing policies in more complex networks (of large number of communicating agents).
We model the common goal game by in which designates the set of players (the two agents); is the set of actions that agent can take. The payoff function of both players, , is given by
The utility function can be rewritten using Theorem 1 as
(5) 
where and , . Without loss of generality, the additive constant and the multiplicative positive constant in the payoff function can be ignored in the following analysis of the NE [16].
The noncooperative game falls into a special class called potential games [10] that have many interesting properties. Their particularity lies in the existence of a global function  called potential function  that captures the players’ incentives to change their actions. In our case, the networkwide objective (5) represents precisely the potential function of the game. Monderer et al. [10] proved that every potential game has at least one Nash Equilibrium (NE) solution^{1}^{1}1Nash equilibrium represents the natural solution concept in noncooperative games [17] defined as a profile of actions (one action for each agent) which is stable to unilateral deviations. Intuitively, if the players are at the NE, no player has any incentive to deviate and switch its action unilaterally (otherwise, the deviator decreases its payoff value).. Also, every local maximizer of the potential is NE of the game. However, since the potential function is not concave [18], the game may have other NE points (e.g., certain saddle points of the potential function).
To completely characterize the set of all NE, we study of the bestresponse correspondence defined by:
The bestresponse (BR) of agent to an action played by the other agent  denoted by  is the optimal choice (payoff maximizing one) of agent given the action of the other player. The bestresponse correspondence, , represents the concatenation of both agents’ BRs. The optimal action of agent for fixed choices of the other agents might not be a singleton, hence the correspondence definition (a setvalued function).
Nash [19] showed that the fixed points of the BR correspondence are the NE. In our case, the BR functions reduce to simply piecewise affine functions. Thus, the game can be described as a “Cournot duopoly” interaction [17] in which the set of NE points is completely characterized by intersection points of the bestresponse functions and [20]. Using game theoretical tools, we reduce the nonconvex optimization problem to the analysis of intersection points of piecewise affine functions.
We further investigate a refined stability property of NE, namely, their asymptotic stability [17]. This property is important when the game has multiple NE. In such cases it seems a priori impossible to predict which particular NE will be the actual outcome. Nevertheless, if the players update their choices using the bestresponse dynamics  the agents sequentially choose their bestresponse actions to previously observed plays by the others [8])  the outcome of a “Cournot duopoly” can be predicted exactly, depending on the initial point. To be precise, the asymptotic stable NE will be the attractors of this dynamics whereas the other NE will not be observed generically (except when the initial point happens to be one of these NE). For a more detailed discussion on “Cournot duopoly” the reader is referred to [17], [20].
To compute the BRs, we analyze the firstorder partial derivatives of the potential function. We distinguish different behaviors depending on the emphasis on either the leakage of information () or estimation fidelity ().
Iiia Emphasis on the fidelity of state estimation ()
By developing the firstorder partial derivatives of the potential function, the bestresponses become:
(6) 
where is an affine function of with parameters , . The intersection point of the two affine functions and is
(7) 
The NE can be completely characterized by the intersection points of the two BR functions in the profile set, i.e., . Noticing that the BRs are piecewise affine functions, the following result is obtained.
Theorem 2.
The game has generically a unique or three NE assuming the central controller puts an emphasis on the overall state estimation fidelity, i.e., . In very specific cases (on the system parameters), the game may have an infinite number of NE (when the affine functions are identical) or two NE (when the intersection point (7) lies on the border of ).
Intuitively, if the network parameters and are randomly drawn from a continuous distribution, the probability of having an infinite or two NE is zero. In general, depending on the relative slopes of the two BRs, the game has a unique NE (given by (7) provided it lies in ) or three NE (one is (7) and the other two lie on the border of ). The details of the proof are given in Appendix A. In this case, the NE of the common goal game are either networkwide optimal or saddle points of the central controller’s objective function (also the potential function of the game). However, only the NE that are optimizers of this objective function are asymptotically stable and can be observed as outcomes of bestresponse dynamics/algorithms.
IiiB Emphasis on the overall leakage of information ()
As opposed to the previous case, the BR of agent is a piecewise constant function given as follows:
(8) 
with the following conditions
where is defined in (6). The intersection points of such functions switching between the two extremes, can only lie on the corner points of .
Theorem 3.
The game has either a unique or two NE assuming the central controller puts an emphasis on the leakage of information (). The NE lie on the four corners of , depending on the system parameters.
When the game has two NE, they are always given by the two symmetric extreme corners (both users fully disclose their measurements) and (no cooperation). Otherwise, either of the four corners can be the outcome of the game, depending on the system parameters. Also, all NE are asymptotically stable in this case. The proof is omitted as it is tedious and follows simply by analysing the intersection of piecewise constant functions. In this case, the central controller cannot smoothly manipulate the outcome by tunning and only extreme distortionleakage pairs are achieved. In the remainder of this section, we focus only on the case of , the controller puts an emphasis on the estimation fidelity.
IiiC Numerical results
We assume the target distortions to be equal to the maximum distortions , . First, we consider the case in which a unique NE exists and . Fig. 2 illustrates the waterlevels of the potential function and the BRs in for the scenario: , and . The NE is the intersection point and is asymptotically stable. Using a bestresponse iteration, the two agents converge always  from any initial point  to the optimal point. If a small perturbation occurs, using the same iterative BR dynamics, the agents will return to this point.
The case in which the game has three NEs is illustrated in Fig. 3 for the scenario: , , and . The solutions are .
Analyzing the plot of the BR functions, we can observe that the intersection point is not asymptotically stable: Assume that a small perturbation moves the agents away from this point. By iterating the best responses, the agents get further away and converge to one of the other NEs. The initial perturbation determines which of the two NE  that are asymptotically stable  will be chosen.
Fig. 4 illustrates the NEs depending on the parameter tuned by the central controller. Both scenarios of Fig. 2 and 3 are considered.
By choosing small values of , the central controller prefers large distortions and small leakage tuples; privacy is enforced in the network. Larger values of result in opposite tuples (small distortions and large leakage tuples); cooperation is enabled among selfish agents. In the case of three NEs, the discontinuity at can be explained by the change in the BR functions; if they are continuous and piecewise affine; if they are discontinuous Heavisidetype of functions (as seen in Sec. IIIB)).
We also remark that not all informationtheoretic distortionleakage tuples are achieved at the NE. Only the local maximizers or saddle points of the overall networkwide payoff function are NE and these tradeoff tuples depend on the system parameters. To achieve different tuples at the NE, other objective functions have to be considered (e.g., the sum of agents’ individual payoff functions in (9)).
Iv Discounted repeated games
In large distributed networks, the need for continual monitoring makes repeated interactions among agents inevitable: The control of the electric power network depends on the state estimation performed periodically by distributed entities that interact with each other over and over. Such a repeated interaction may build trust among agents leading to sustained information exchange.
As opposed to the previous section, we do not assume the presence of a central controller. Rather, we exploit the repetition aspect to achieve nontrivial distortionleakage tuples naturally without economic incentives.
Oneshot game and pricing
We start with a brief overview of the noncooperative game introduced in [11]. Consider the tuple , where the set of players and their action sets are identical to the game described in Sec. III. The difference lies in the individual payoff functions: which measures the satisfaction of agent and depends on his own action choice but also on the others’ choices. As opposed to the commongoal game, each agent cares only of his own leakage of information and state estimation fidelity. Thus, the payoff function of agent , , is given by
(9) 
The second term represents the information rate of the data received from the other agent depending on , i.e., the distortion of agent . The weight is the ratio between the emphasis on leakage vs. state estimation fidelity of agent .
Maximizing the utility in (9) w.r.t. is equivalent minimizing only the first term: the leakage of information. Indeed, the second term is a result of the data shared by the other agent , and hence, not in control of agent . The game simplifies into two simple decoupled optimization problems; each agent chooses to stay silent (minimizing its leakage of information). The only rational outcome is the maximum distortion  minimum leakage extreme for both agents.
Remark 1.
The oneshot game is somewhat similar to the classical prisoners’ dilemma [17] (which is a discrete game as opposed to our continuous game): each agent has a strictly dominant strategy^{2}^{2}2A strictly dominant strategy is an action that is the best choice of an agent independent from the others’ choices. which is that of not sharing any data (beyond the minimum requirement).
In [11], we show that any tuple in the informationtheoretic region is achievable provided the agents are appropriately rewarded. The modified payoff functions which include the pricing are:
(10) 
The drawback of such pricing techniques  that rewards an agent proportionality to his data sharing rate  is the implicit presence of a mediator (central controller or selfregulating market) which can manipulate the outcome by tuning the prices . In the following, we show that repetition enables cooperation among selfish agents  without any centralized interference.
We assume that the agents interact with each other multiple times under the same conditions, i.e., they play the same noncooperative game repeatedly. The total number of rounds is denoted by . Two cases are distinguished in function of the available knowledge of : (i) perfect knowledge of  both agents know in advance when their interaction ends; and (ii) imperfect or statistical knowledge of  the agents do not know the precise ending of their interaction.
In both cases, we study the possibility of enabling and sustaining cooperation by allowing the agents to make only credible commitments, i.e., commitments on which they have incentives to follow through. The equilibrium concept we investigate here is a refinement of the Nash equilibrium, i.e., subgame perfect equilibrium, defined in the sequel.
Iva Strategies, Payoffs and Subgame Perfect Equilibria
We introduce some useful notation and definitions. These tools are necessary for a clear understanding of the solutions arising in repeated games.
We assume that the game described above is played several times. Repeated games differ from oneshot games by allowing players to observe the history of the game and condition their current play on past actions. The history at the end of stage is denoted by , where represents the agents’ play or action profile at stage . The set of all possible histories at the end of stage is denoted by such that denotes the void set. We can now formally define a repeated game.
Definition 1.
A repeated game is a sequence of noncooperative games given by the
tuple
, where
is the set of players (the two agents);
is the strategy set of agent ; and is the payoff
function which measures the satisfaction of agent for any strategy
profile.
As opposed to the oneshot game, we have to make a clear distinction between an action  the choice of an agent at a specific moment (or stage of the game)  and a strategy that describes the agents’ behavior for the whole duration of the game. A strategy of an agent is a contingent plan devising his play at each stage and for any possible history ; more precisely it is defined as follows.
Definition 2.
A pure strategy for player , , is a sequence of causal functions such that , and .
The set of strategies, denoted by , is the set of all possible sequences of functions given in Definition 2, such that, at each stage of the game, every possible history of play is mapped into a specific action in to be chosen at this stage.
In repeated games, the agents wish to maximize their averaged payoffs over the entire game horizon. We assume that agents discount future payoffs: present payoffs are more important than future promises.
Definition 3.
The discounted payoff function of player given a joint strategy is given by
(11) 
where is the action profile induced by the joint strategy , is the payoff function in (9), is the discount factor of player .
The Nash equilibrium concept for repeated games is defined similarly to the oneshot games (any strategy profile that is stable to unilateral deviations). Some of the Nash equilibria of the repeated games may rely on empty threats [17] of suboptimal play at histories that are not expected to occur (under the players’ rationality assumption). Thus, we focus on a subset of Nash equilibria that allow players to make only commitments they have incentives to follow through: the subgame perfect equilibria.
Before defining this concept, we have to define subgames. Given any history , the game from stage onwards, is a subgame denoted by . The final history for this subgame is denoted by . The strategies and payoffs are functions of the possible histories consistent with . Any strategy profile of the whole game induces a strategy on any subgame such that for all , is the restriction of to the histories consistent with .
Definition 4.
A subgame perfect equilibrium, , is a strategy profile (in a repeated game with observed history) such that, for any stage and any history , the restriction is a Nash equilibrium for the subgame .
This equilibrium concept is a refinement of the NE because it is required to be a NE in every possible subgame aside from the entire history game. We analyze this solution concept for two different cases in function of the available knowledge of the end stage: perfect knowledge and imperfect or statistical knowledge of .
IvB Perfect knowledge of end stage
We assume the agents know in advance the value of , i.e., when the game ends precisely. We show that datasharing beyond the minimum requirement cannot be enabled in this case.
Corollary 1.
Assuming the agents know perfectly the value of , the discounted repeated game has a unique subgame perfect equilibrium described by “no data sharing beyond the minimum requirement ” at each stage of the game and for both agents:
(12) 
The proof is omitted as it follows similarly to the repeated prisoners’ dilemma (using an extension of the backward induction principle to dominance solvable games [17]). The key element is the strict dominance principle: a rational player will never choose an action that is strictly dominated. The same result remains true if the discounted payoffs are replaced with average payoffs, . Moreover, Theorem 1 extends to a general class called dynamic games in which the system parameters (, , , ) may vary at every stage of the game. The same reasoning holds since, at any stage of the game, the action corresponding to “no data sharing beyond the minimum requirement” is the strictly dominating one.
The only achieved distortionleakage tuple is the maximum distortionminimum leakage  similarly to the oneshot game. The main reason why cooperation is not sustainable is that agents know precisely when their interaction ends. Next, we consider that the agents interact over an indeterminate period (they are unsure of the precise ending).
IvC Imperfect knowledge of end stage
We assume here that the players do not know the value of (the end stage). The discount factor can be interpreted as the agent’s belief (or probability) that the interaction goes on (see [21] and references therein). The probability that the game stops at stage is then . The discounted payoff (11) represents an expected or average utility. Thus, we assume that agent know which models its belief on the interaction continuing or not, at every stage (the probability that the game goes on).
The strategy of playing the oneshot NE at every stage is a subgame perfect equilibrium in this case as well.
Theorem 4.
Assuming imperfect knowledge of the end stage and that for all , in the discounted repeated game , the strategy “do not share any information beyond the minimum requirement” at each stage of the game and for both agents is a subgame perfect equilibrium, i.e. :
(13) 
The details of the proof are reported in Appendix B. Unlike the case of perfect knowledge of , we show that this is not the only possible outcome and other distortionleakage pairs can be achieved.
Inspired from the repeated prisoners’ dilemma, our objective is to show that nontrivial exchange of information can be sustainable. Consider the action profiles which perform strictly better than the oneshot NE for both agents:
(14) 
Such tuples may be expected to represent long term contracts or agreements between rational agents. Other tuples will never be acceptable: By not sharing any data, an agent is guaranteed at least the oneshot NE payoff value. In the game theoretic literature, these utility pairs are also known as individually rational payoffs [22].
These payoffs can be visualised in Fig. 5 for the scenario: , , , , . The plotted area represents the set of all payoff pairs. The four corner points represent the four extremes: (the lowleft corner: the oneshot NE), (the upperleft corner: the most advantageous for agent 2  he shares nothing while agent 1 fully discloses his data), (the lowright corner: the most advantageous for agent 1) and (the upperright corner: both agents fully disclose their data, maximizing their leakage). The darker area (in black) represents the subset of pairs satisfying (14). The lighter area (in magenta) represents the payoff pairs rejected by one or both rational players.
To gain more insight on these achievable agreement points, we explicit the payoff functions expressions in (9):
(15) 
Datasharing beyond the minimal requirement has two opposing effects: i) the leakage terms increase (); and ii) the estimation fidelity terms increase (). Thus, the pairs represent the tuples which result in an increase of the state estimation fidelity that overcomes the loss caused by the leakage for both agents.
Intuitively, the greater the emphasis on the state estimation terms, the larger the region of achievable agreement points is. We also observe that the achievable distortion pairs satisfying the conditions in (14) must be relatively symmetric distortions pairs. Otherwise said, both agents have to share their data for the agreement to be acceptable by both parties.
Unlike the oneshot game or the determined horizon repeated game (the agents have perfect knowledge of ), the commitment of sharing data resulting in any distortion pair is sustainable under some conditions on the discount factor. If the probability of the game stopping is small enough, then the commitment of playing is credible and, thus, sustainable to rational agents.
Theorem 5.
Assuming imperfect knowledge of the end stage in the discounted repeated game and for any agreement profile that meets the conditions (14), if the discount factors are bounded by:
(16) 
and for all , then the following strategy is a subgame perfect equilibrium: For all , “agent shares data at the agreement point in the first stage and continues to share data at this agreement point if and as long as the other player shares data at the agreement point . If any player has ever defected from the agreement point, then the players do not cooperate beyond the minimum requirement from this stage on.”
A detailed proof is given in Appendix C. This theorem assesses that both agents can achieve better distortion levels than the oneshot NE naturally, without the interference of a central authority or economic incentives. The optimal strategy is a titfortat type of policy: Each agent fulfils his part of the agreement and shares data if and as long as the other party does the same.
Any distortion pair in (14) is achievable in the long term, provided the discount factors are large enough. The lower bound in (16) depends on the agents’ emphasis on leakage vs. fidelity. Larger emphasis on the leakage of information () implies larger discount factors. Thus, smaller ending probability (or a longer expected interaction) is needed to sustain data sharing when agents are more sensitive to privacy concerns.
This lower bound also depends on the specific agreement pair . It is again a compromise: Smaller distortion agreements imply larger leakages of information, thus, larger discount factors.
In conclusion, the minimum expected length of the interaction needed to sustain an agreement depends on the agents’ tradeoffs between the leakage of information and state estimation fidelity resulting from their data exchange.
Theorem 5 may be extended to the case in which the parameters change at each stage of the game. However, the conditions on the discount factor would be much stricter. A different approach should be investigated in such general dynamic games. This issue falls out the scope of the present work and is left for future investigation.
IvD Numerical results
We focus on the scenario: , , and for . The minimum and maximum distortions are , , and . For simplicity, we assume that both agents have the same belief on the end stage of the game, i.e., .
If the agents put an emphasis on leakage (e.g., , or , there is no distortion pair that strictly improves both players’ payoffs compared to the oneshot NE . This means that the improvement in an agent’s estimation fidelity from the data shared by the other agent is overcome by the loss of privacy incurred by the agreement point.
If the agents put more emphasis on their estimation fidelities, the region of agreements becomes nontrivial. Figure 6 illustrates this region in the cases: i) , ; ii) , ; and iii) , . The coloured region represents all the possible agreements sustainable in the long term, whereas the white region represents the distortion points that cannot be achieved. In all these figures, the upperright corner represents the minimum cooperation requirement .
Very asymmetric distortion pairs (the upperleft and lowerright regions) are not achievable in the long term; a rational user will only agree to fulfil equitable datasharing agreements. In other words, either both players share information at a nontrivial rate or none of them does.
The higher the emphasis on state estimation fidelity, the larger the agreement region and lower the distortion levels achieved: The minimal distortion pair is only sustainable in the third case () when the emphasis on the estimation fidelity is high enough for both agents.
We can observe a symmetry regarding the values of needed to sustain a given agreement pair. The fairer or more symmetric distortion pairs require a shorter expected game duration to be sustainable. The most unfair distortion pairs (the border points on the region of sustainable agreements) require the longest expected game duration; close to one probability of the game to continue. Beyond these edges, the difference between what an agent shares and what he receives in return is unacceptable, even in a long term interaction.
V Concluding Remarks
Data sharing among physically interconnected nodes/agents of a network improves their local state estimations. When privacy also plays a role, enabling nontrivial data exchange often requires incentives.
In a centralized setting, we show that the central controller can manipulate the data sharing policies of the agents by tuning a single parameter  depending on the emphasis between leakage vs. estimation fidelity. A whole range of outcomes can be chosen in between two extremes: both agents fully disclose their measurements (minimum distortion  maximum leakage), and both agents stay silent (maximum distortion  minimum leakage).
If the network lacks a central controller and the agents are driven only by their individual agendas, we prove that nontrivial data sharing cannot be an outcome. Rational agents cannot trust each other in sharing data when the interaction takes place only once or in a finite number of rounds. However, if the agents interact repeatedly in the long term  over an undetermined number of rounds  then a whole region of outcomes is achieved depending on the agents’ emphasis on leakage vs. state estimation fidelity. There is a symmetry in this achievable region: Rational agents agree only on titfortat data sharing policies.
This results (long term repetition enables data exchange) follows from the underlying assumption that agents can perfectly observe the past plays (the history of the game) and condition their present choices on these observations. In practice, this implies important signalling among the agents which has to be taken into account in future works.
Although our work is focused on the case of two communicating agents, we make a first step in studying distributed solutions to competitive privacy problems in complex networks such as the electrical power network. Both our centralized and decentralized approaches use game theoretical tools which lead to developing distributed and scalable solutions.
Appendix A Proof of Theorem 2
Before providing the proof, we start by fully characterizing the set of NE. Three cases are distinguished depending on the parameter that determines the relative slopes of the BR functions.

If , then there is a unique and asymptotically stable NE. If the intersection point of the affine functions and denoted by with
(17) lies in the interior of , then it is the NE of the game. Otherwise, the NE lies on the border of .

If , then we have two different situations. If the condition holds, then there is a unique and asymptotically stable NE lying on the border of . If on the contrary , then . In this case, if this affine function intersects nontrivially, then the game has an infinite number of NEs which are not asymptotically stable. Otherwise, the unique NE lies on the border and is asymptotically stable.

If , then there are two or three different NEs provided that the intersection point in (7) lies in the interior or on the border of : this intersection point is the only asymptotically unstable equilibrium. The other one or two NEs lie on the corners of , and ). Otherwise, there is a unique NE which lies on the border of and is asymptotically stable.
Intuitively, the scalar threshold equal to for the parameter comes from the relative order among the two slopes of the BR functions. If , then the two slopes are identical and equal to one. In any other case, the slopes of the two curves are different in the same axis system (since one of the two curves would have to be inverted). The relative slopes of the two curves greatly influence their intersection points and, thus, the set of NE.
The proof follows a similar approach as in [20] for the power allocation game over nonoverlapping frequency bands in the interference relay channel and assuming a zerodelay scalar amplifyandforward relaying protocol. We investigate the NEs of the game when and their asymptotic stability. A necessary and sufficient condition that guarantees the asymptotic stability of a certain NE, say , is related to the relative slopes of the BRs [17]:
(18) 
for all in an open neighbourhood of . The analysis of the NE is based on the analysis of intersection points of the two BR functions in (6).
First, we analyze all the possible cases in which the intersection points between the affine functions and are outside the interval or on the two corners: or . In these cases, the NE is unique and it lies on the border of . These cases correspond to: (i) or , (ii) or and the corresponding analysis will not be reported here as they are tedious and similar to the next more interesting one. The more interesting case is when , , and . This means that, if the curves and intersect, the intersection point or points lie in and are NEs of the game under study. We have again three subcases:
If
then the two functions and have the same slope (equal to one) and thus they are parallel.

If , then the two functions are the same. All the points on these curves that intersect are NEs of the game. Therefore, we have an infinite number of NEs. The asymptotic stability condition is not met because
for all these NEs.

If , then the two BR function intersect on the border of in a unique asymptotically stable point for which
If
then the NE is unique and a detailed discussion follows
depending on the signs of the following inequalities: , , and and also on the relative positions of the
intersection points between the two functions and the border of
. We will detail only one of these cases.
If , ,
and , then the two BR functions coincide on with the two
functions . The unique NE is given by the intersection point of and such that
(19) 
It is easy to see that
and, thus, the NE is asymptotically stable.
If
then the discussion follows similarly depending on the signs of the following inequalities: , , and and also on the relative intersection points between the functions with the border of .
Appendix B Proof of Theorem 4
The backward induction argument is no longer valid since agents do not know which stage is the final one. Instead, we apply the onestagedeviation principle for discounted repeated games that are uniformly bounded in each stage [17]. This principle states that a strategy profile is subgame perfect if and only if there is no player and strategy that agrees with except at a single stage and history , and such that is a better response than in the subgame .
First, we have to check the uniform boundedness condition on the stage payoffs. Indeed, we can show that the stage payoffs in (9) are bounded as follows:
Given that