DifferentiallyPrivate TwoParty
Egocentric Betweenness Centrality
Abstract
We describe a novel protocol for computing the egocentric betweenness centrality of a node when relevant edge information is spread between two mutually distrusting parties such as two telecommunications providers. While each node belongs to one network or the other, its ego network might include edges unknown to its network provider. We develop a protocol of differentiallyprivate mechanisms to hide each network’s internal edge structure from the other; and contribute a new twostage stratified sampler for exponential improvement to time and space efficiency. Empirical results on several open graph data sets demonstrate practical relative error rates while delivering strong privacy guarantees, such as 16% error on a Facebook data set.
I Introduction
Data sets such as social, communication, and transport networks are graph structured: people are nodes and their interactions edges. Such graph structures are valuable for understanding realworld properties. However, revealing the graph or its statistics can cause privacy disclosure even with anonymisation techniques [1, 2, 3]. Furthermore, for many corporations, customer data is an asset they are reluctant to share. This motivates interest in joint computation over databases with limited exposure to sensitive information.
Differential privacy (DP) [4] guarantees that a release output distribution does not change by more than a small multiplicative factor under input data perturbation. We consider edge DP wherein perturbations correspond to edge flips: the existence of sensitive edges is not revealed by edgeDP release.
We envisage two (or more) networks controlled by different corporations, such as telephone or email providers, or two different social networks. The complete list of nodes (i.e., people) is public knowledge, but the individual connections between them are not, so we consider edge DP in order to hide the connection between the nodes in each network. Each service provider knows the connections within its own network, plus the connections between one of its members and the outside (e.g., when they contact someone in a different network), but not the internal connections in other networks.
We are the first to consider differentiallyprivate computation of egocentric betweenness centrality (EBC) [5]. Informally, EBC measures the importance of a node as a link between different parts of the graph. A node that forms a link between otherwiseisolated parts of the network has high betweenness centrality; a node that is simply an easilybypassed member of an interconnected network has a low betweenness centrality. This is a property of the whole communication graph: one service provider cannot compute it using its partial view of the graph alone.
Betweenness centrality could be used in targeted advertising or customer retention campaigns, as individuals with high EBC have the capacity to transfer information from one community to another. EBC is equally important in understanding and combating the spread of misinformation or “fake news”: individuals with high EBC can be educated to be more discerning about what they spread through the network, thereby mitigating spread of fake news. The difficulty of assessing misleading political content and obstructing its spread has become one of the most important research questions in online social network analysis, to which even the networks themselves are devoting significant research effort.^{1}^{1}1https://newsroom.fb.com/news/2018/04/newelectionsinitiative/
We enable a network provider to compute the egocentric betweenness centrality of a node, while requiring only differentiallyprivate information about internal connections to be shared between networks. Our main contributions are:

We introduce a privacy preserving method to compute the egocentric betweenness centrality of nodes in undirected graphs. In this work the network has local connections, while there are also internetwork connections.

We propose a twostage sampling process that delivers a simple approach to implement and exponential savings in time and space over naïve sampling from the exponential mechanism directly.

We report on thorough experiments using a Facebook graph data set on 63,000 nodes. The experiments in Section VI show that the error is approximately 16% of the true EBC for reasonable values of privacy level . Similar results hold for other networks from Enron and PGP email.
First we survey the technical background and give a precise definition of EBC. We then explain why a precise computation would expose individual links between networks. Section V describes our differentiallyprivate mechanism for communicating enough information between networks to permit effective approximation of EBC while preserving strong privacy guarantees. We then present empirical results testing the feasibility of our approach on samples of public data from Facebook, Enron and PGP.
Ia Related Work
anonymity [6] represents a major, early attempt at preventing node and edge reidentification by graph transformation, but it has been proven to be insufficient [7].
Differential privacy for graph processing was first introduced in [8] and was followed up by further work [9, 10, 11, 12]. Two main privacy models exist when publishing graphbased information under differential privacy; node [10] and edge differential privacy [8]. Hay et al. [8] introduced an algorithm for publishing degree distributions under edge privacy, implicitly permitting private star counting as well. Projectionbased techniques have been proposed to answer degree distribution queries under node differential privacy [13, 14].
Other statistics have been approximated under differential privacy such as frequent patterns of given sub graphs [9, 15, 16]. Bhaskar et al. [15] used the exponential mechanism to publish the (approximately) most frequent patterns with high probability, and the Laplace mechanism to release the noisy frequency of maximising patterns. Karawa et al. [16] proposed a differentiallyprivate algorithm to output answers to subgraph counting queries for star, triangle sub graphs while using local sensitivity [17] to overcome high global sensitivity in subgraph counting queries. An approach to finding arbitrary frequent patterns was proposed by Shen & Yu [12]. They utilise the exponential mechanism and Markov chain Monte Carlo sampling to output frequent patterns on graph data sets.
Finding node clusters in a single graph under differential privacy was first proposed by [11] and followed by [18]. These techniques try to find the group of nodes sharing many links with other nodes in the same group but relatively few outside the group. They maintain the privacy of the output clusters under node or edge differential privacy.
Our work differs from previous studies in two key ways. First we focus on the problem of node influence, through the study of ego betweenness centrality. This particular task poses significant technical challenges, made efficient here by adopting twostage stratified and acceptreject sampling. Second we consider a core graph processing task in a distributed twoparty setting. While most existing work on graph mining under differential privacy can adopt a model of trusted computation, and there is some work on privacy for distributed systems [19, 20], these are based on distributed queries that are decomposed into subqueries, each answered per database. Our setting requires untrusting parties to cooperate on computation without revealing one another’s privacysensitive data.
Ii Preliminaries
Iia Egocentric Betweenness Centrality
First proposed by Everett & Borgatti [21] as an approximation to betweenness centrality [22], egocentric betweenness centrality (EBC) has gained recognition in its own right as a natural measure of a node’s importance as a network bridge [23]. The EBC of a node is the sum, for all pairs of neighbours of that aren’t directly connected, of the fraction of 2edge paths between them that pass through .
Definition 1.
Egocentric betweenness centrality (EBC) of node in simple undirected graph is defined as
where denotes the neighbourhood or ego network of , denotes the adjacency matrix induced by with if and otherwise; denotes the th entry of the matrix square, guaranteed positive for all since all such nodes are connected through .
IiB Differential Privacy on Graphs
Differential privacy was proposed to quantify the indistinguishability of input databases when observing the output of data analysis [4]. With careful selection of which databases are to be indistinguishable—through the socalled neighbouring relation—the protective semantics of differential privacy may be controlled.
As detailed further in Section III, our concern is maintaining the privacy of connections in networks, e.g., who calls whom in a telecommunications network. We therefore use edge privacy [8] and so relate graphs that differ by edges. The adjacency matrix fully represents the edgeset of a graph of known nodes (the indices into the adjacency matrix). As such, we focus on databases as sequences of bits: elements of .
Formally, two databases and are termed neighbouring (denoted ) if there exists exactly one such that and for all . In other words, .
Definition 2.
For , a randomised algorithm on databases or mechanism is said to preserve differential privacy if for any two neighbouring databases , and for any measurable set ,
IiB1 Generic Mechanisms for Privacy
We leverage two wellknown DP mechanisms in this paper: the Laplace mechanism [4] which applies additive noise to numeric vectorvalued analyses, and the exponential mechanism [24] which privately optimises a realvalued objective function bivariate in the database and the decision variable which need not be numeric. Common to most generic mechanisms, and the Laplace and exponential in particular, is the concept of sensitivitycalibrated randomisation: the more sensitive a target function is to input perturbation, the more randomisation is required to attain a level of differential privacy. Both mechanisms leveraged here are calibrated via the same measure of sensitivity, defined next.
Definition 3.
The global sensitivity of any function for any , is defined as
For functions of additional variables we extend this definition naturally as
We can now define the aforementioned generic mechanisms.
Lemma 4.
Consider any Euclidean vectorvalued deterministic function for any , and any scalar . Given input , the Laplace mechanism releases responses in distributed as where is i.i.d. zeromean Laplace^{2}^{2}2The zeromean scalar Laplace with scale has PDF . r.v.’s with scale . Then the Laplace mechanism preserves differential privacy.
Lemma 5.
Consider any realvalued bivariate quality function , which assigns quality score to candidate response , on input database . The exponential mechanism approximately maximises by releasing randomised response with likelihood proportional to . Then the exponential mechanism preserves differential privacy.
IiB2 Compositional Calculus
In order to build up more complex privacypreserving computations, it is necessary to be able to quantify the privacy loss of compositions. Fortunately, differential privacy satisfies sequential composition and transformation invariance [4, 25, 26] among other compositions.
Lemma 6 (Sequential composition).
For any sequence of randomised mechanisms , if each preserves differential privacy then the compound response on a database , , preserves differential privacy.
Lemma 7 (Transformation invariance).
For any mechanism that is differentially private, and any (possibly randomised) mapping with domain containing the codomain of , the randomised mechanism preserves differential privacy.
Iii Problem Statement
Consider a twoparty setting of a telecommunications network with two service providers : every customer is represented as a node that belongs to one and only one service provider; pairs of customers who e.g., have called one another are represented as edges in a simple undirected graph on the disjoint union of nodes. Edges can either connect nodes within one party ( or ) in which case are unknown to the other party ( or respectively), or edges span both parties and are known to both. We consider all nodes to be known to both parties, as being addressable within a global addressing system (e.g., a phone book).
Denote by the nodes of respectively, the edges (twoelement sets) within parties respectively, and the edges spanning as sets with one element each from . The simple undirected graph on the entire network comprises nodeset disjoint union and edgeset disjoint union . Note we will often equivalently represent edge sets as adjacency matrices (or flattened vectors) with elements in . Table I shows all of the symbols used in this paper.
We wish to enable one party (without loss of generality) to compute the ego betweenneess centrality (EBC) of one of its nodes , while maintaining edge privacy between parties. Before detailing a protocol for accomplishing this task, we must be precise about a privacy model.
The two parties e.g., competing service providers.  
The nodes per party.  
Edges entirely within each party.  
Edges spanning both parties.  
The ego node (assumed WLOG to be in ).  
The ego network of .  
Party ’s nodes excluding .  
The ego network contained in .  
Counts of 2paths spanning .  
Partial EBC sums by endpoints.  
A private, randomised approximation to .  
For , a partition of .  
The differentialprivacy budget.  
A global sensitivity bound. 
Problem 8 (Private TwoParty EBC).
Consider a simple undirected graph partitioned by parties as above, and an arbitrary node . The problem of private twoparty egocentric betweenness centrality is for the parties to collaboratively approximate under assumptions that:

Both parties know the entire node set ;

Each party knows every edge incident to nodes within their own network. That is, knows while knows ; and

The computed needs to be available to but need not be shared with .
Any solution must not reveal to what is not already known except for discovering (Assumption 3). We seek solutions under an honestbutcurious adversarial model: while will follow any agreed upon protocol prescribing computations to take and messages to send to oneanother, without attempting to manipulate the other party; each party is curious about the other’s edges and may apply arbitrary auxiliary computation and leverage data sources in attempting to discover the other’s edges. Formally, what is revealed by () to (respectively ) must preserve differential privacy with respect to (respectively ).
Iv WarmUp: A NonPrivate Protocol
We first consider how might cooperate without preserving differential privacy. In particular cannot itself count 2paths that are

Contained entirely within ; or

Ending in both with intermediate node in .
Any protocol must involve in aggregating over such paths. But while the first case can be aggregated independently by , the second case requires to communicate its endpoint neighbours of to . This significantly complicates the differentiallyprivate solution developed in the next section.
Recall that denotes the ego network of anywhere in the graph (notably not including since the graph has no selfloops). Figure 1 summarises the following protocol.
Protocol 9.
Proceeding in sequence:

[Forward message] sends to the set of neighbours of contained within ;

[Backward message] computes and sends to , for each and for each (where are not directly connected), a count of 2paths with endpoints and intermediate point in ;

[Backward message] computes and sends to , the EBC partial sum over endpoint nodes with intermediate nodes in . That is, ;

increments the received by the number of 2paths between with intermediate point in . It then sets to the sum of their reciprocals;

computes, over distinct and disconnected endpoint nodes with intermediate node in , the EBC partial sum. That is: ; and

completes computation of as .
Remark 10.
We now briefly comment on where edge privacy is potentially breached, thereby highlighting challenges faced by any solution to Problem 8. When sends its set of neighbours of , party learns directly of all edges incident to in . When sends its 2path counts , while counts aggregate exact connectivity at worst this level of aggregation could be very small therefore revealing information about connections to within , and the interconnections between these nodes. A worst case occurs when there are two nodes in connected to : as soon as receives the vector of counts of 2paths spanning , it can learn whether these two nodes are connected.
V A PrivacyPreserving Protocol
We now develop our protocol for private EBC which involves a series of differentiallyprivate mechanisms for overcoming the privacy disclosures identified in Remark 10. We relegate proofs to the Appendices.
We use the exponential mechanism (viz. Lemma 5) to release a set of nodes in that privately approximates ’s ego network in . This is Protocol 9.i’s ‘forward message’ (Section VA).
While our application of the exponential mechanism protects edge privacy for , there is still potential for privacy disclosure when communicates counts to . To overcome this problem, we leverage the Laplace mechanism to privatise vectors of 2path counts communicated by within Protocol 9.ii’s ‘backward message’ (Section VB). The components of this message are indexed (in part) by the approximate sent in the forward message. A second Laplace mechanism makes private the partial EBC of Protocol 9.iii’s ‘backward message’.
In this way our privacypreserving protocol follows the broadbrush sequence of steps outlined in Section IV but is made more involved by the addition of differential privacy.
Va Forward Message
The goal of the forward message, is to communicate a privacypreserving approximation to chosen from power set . In order to leverage the exponential mechanism (viz. Lemma 5) we must specify a quality function of the form . That is a mapping from the adjacency matrix for and a candidate response , to a score reflecting the approximation quality of by . Since the response set is finite (albeit exponential in the graph size), the exponential mechanism then has normalised response probability mass function,
(1) 
with implicit dependency on fixed adjacency matrix.
In designing an appropriate quality function, we typically want to be maximised uniquely by the desired nonprivate output . The function should also be a semantically meaningful ‘distance’ between outputs and such that the utility bounds for the exponential mechanism of [24, Lemma 7 and Theorem 8] make meaningful guarantees. The utility guarantee states that with high probability the released random has score not too much lower than . And so if meets this global maximum, then we have that the released set has score not much lower than that of . If responses close in are also ‘close’ then this guarantees a good approximation to with high probability.
Remark 11.
A natural choice for quality function is as it is clearly maximised by . However it is not uniquely maximised, indeed for any superset (including the entire set of nodes) we have that also. There are many such sets: which is not far from the number of all possible responses for modest ego network sizes, in which case the exponential mechanism does not achieve our goal.
Section VA1 develops a sound choice of quality function in the symmetric set difference.
In the rest of this section we will abuse notation and abbreviate with the meaning understood from context. We will also denote by .
VA1 Symmetric Set Difference
We adopt (for minimisation) the symmetric set difference with given by as a promising basis for quality function design. Define complements relative to . We have
where the second equality follows from a disjoint union in the first equality’s righthand side.
In minimising the symmetric difference, dismissing the constant as redundant to optimisation, we can equivalently maximise^{3}^{3}3While ’s dependence on is suppressed, it should be implicitly understood.
(2) 
This quality function takes values in and is uniquely maximised by .
While the machinery of the exponential mechanism only requires sensitivity of this quality function (Section VA5) to guarantee differential privacy, a significant challenge is involved in sampling from the mechanism’s response distribution as it is defined over an enormous response space: the power set of . Thanks to the amplification of by the exponential, the distribution’s mass varies an incredible amount even for graphs of modest size.
VA2 EquiQuality Responses
It will be useful to consider the sets of candidate exponential mechanism responses, with equal quality score value ,
where . It can be shown that the form a partition of : the sets are pairwise disjoint, and their union is all subsets of . It can also be shown that for ,
A consequence of this identity is an efficient approach to computing the normalising constant for the exponential mechanism response distribution. The proof can be found in the Appendices.
Corollary 12.
Note that other phases of our protocol, to be specified, also require linear space. By comparison computing naïvely would take time exponential in and constant space.
VA3 TwoStage Sampling
We will now outline how to sample from the exponential mechanism response distribution, using a twostage sampling process that delivers a simple approach to implement, and exponential savings in time and space vs naïve sampling from the exponential mechanism.
Remark 13.
Another standard approach to sampling from challenging distributions is acceptancerejection sampling or the Metropolis algorithm. However no clear surrogate probability mass presents itself that yields acceptable rates of rejection for practical applicability.
Stratified Sampling
To simplify notation, let us consider the problem athand more generally: let be a discrete random variable on finite probability space i.e., a multinomial with . Suppose there exists a partition of into the disjoint union of , such that for all , and all , i.e., the probability mass is constant within each part. We can exactly sample from by (1) drawing a random variable that selects a part in the partition according to the relative part sizes then (2) sampling uniformly from within the chosen part—this is the approach taken by Algorithm 1 ForwardMessage. Denote by the constant probability mass of any . The following result is proven in the Appendices.
Lemma 14.
Define random variable where for each , and . Then for all . Moreover, the probability mass stated for is already normalised, i.e., .
Corollary 15.
The following sampling process, implemented in ForwardMessage Algorithm 1, is equivalent to sampling from the exponential mechanism (1)
(i) Sample with logspace probability mass
(ii) Sample
Proof:
As the exponential mechanism (1) on , with quality function (2), is a multinomial distribution with strata of constant probability given by the , Lemma 14 establishes that the stratified sampler successfully implements the mechanism. All that remains is to compute the probability of selecting as the stratum’s cardinality times constant probability (normalised by as given in Corollary 12).
where the final equality follows from the observation that the expression for is a Binomial expansion. We simplify and convert this expression to logspace as the expression exponentially increases in :
completing the result. ∎
VA4 LinearComplexity Sampling
While Corollary 15 reduces the problem from sampling from a large support set with highly skewed probability mass, we must address efficient implementation of the twostage sampling.
Inverse Transform Sampling of
The sampling of multinomial over much smaller support set can be accomplished efficiently via inverse transform sampling. Given access to a random variable ’s (invertible) CDF , one can sample realisations of by first sampling then releasing quantile . For , we can always take . However highlyskewed distributions can suffer from numeric instability in floatingpoint computation of the CDF. Though not as severe as for , this remains a problem for . To combat this we employ a library for arbitrary floatingpoint precision in our implementation (see Section VI) and we represent ’s probability mass in log space as reported in Corollary 15. In this case, the inverse transform sampler is easily adapted as proved in the Appendices.
PickandFlip Sampling of
After sampling , we must sample uniformly from within chosen stratum , a constrained and potentially large subset of . For sampled we are to have , so the size of and ’s symmetric set difference is . Moreover, this describes all candidates for within .
Proposition 17.
Proof:
Every time a new node sampled from , it could be sampled from either or . In both cases, is added to the set difference. This loop invariance continues until the set difference size reaches establishing and uniformly so due to the uniform sampling of the . Sampling the nodes (like the rest of the algorithm) can be achieved in linear time/space with FisherYates shuffling. ∎
VA5 Quality Function Global Sensitivity
The remaining ingredient for invoking the exponential mechanism to privately release , is bounding ’s sensitivity.
Lemma 18.
Consider any fixed and contained ego networks , induced by neighbouring adjacency matrices on and fixed ego node . Noting explicitly the dependence of the quality function (2) on nonprivate ego network, .
Proof:
Consider the effect of switching an edge within on the symmetric difference cardinality between i.e., the quality function. Adding/removing an edge can impact at most one node being neighbours with ego node ; it can therefore only decrease or increase the first or second terms of by 1, at most. Since these two sets are disjoint, it cannot change both simultaneously. ∎
Theorem 19.
ForwardMessage (Algorithm 1) takes time and space , and when run with , preserves differential privacy of the edge set within party ’s network.
VB Backward Message
Analogous to the nonprivate protocol of Section IV (Figure 1), receives nodeset approximating ’s ego network in , via ForwardMessage. Subsequently, must send back: counts of paths spanning with intermediate node in —indexed by ; and its partial EBC sum over paths with endpoints in and intermediate node in . Note this last set is the private appoximation to . We apply in BackwardMessage (Algorithm 4) the Laplace mechanism (Lemma 4) to both backward message components to avoid disclosure of edges in .
Note cases in which BackwardMessage need not be run by : If there are no paths incident to with intermediate point in ; or contained entirely within . We may therefore assume within the algorithm that these cases are not present.
VB1 Privately Counting Paths
The first part of the backward message compares the —noisy counts of 2paths with intermediate node in —over in the given approximating , and . The sensitivity of these counts relates to adding or removing an edge in , as follows, with proof given in the Appendices.
Lemma 20.
Let query denote the vectorvalued nonprivate response . The global sensitivity of is upperbounded by .
VB2 Private Partial EBC
The second part of BackwardMessage is a partial EBC sum over 2paths with endpoints in (and intermediate point in either the same or ) as in Protocol 9.iii. We again apply the Laplace mechanism to avoid privacy disclosure of edges in , which requires bounding sensitivity of the nonprivate sum as follows. The proof can be found in the Appendices.
Lemma 21.
Let query denote the partial EBC sum over 2paths with endpoints and intermediate node . Then the global sensitivity of is upperbounded by .
As both applications of the Laplace mechanism run with privacy budget , Lemma 6 implies overall edge privacy is guaranteed.
Corollary 22.
BackwardMessage (Algorithm 4) takes time and space where , and when run with , preserves differential privacy of edge set within party ’s network.
VC PrivateEBC: Putting it All Together
After parties and have respectively run ForwardMessage and BackwardMessage, must complete the computation of the private EBC. As shown in Algorithm 5, PrivateEBC comprises two phases that closely mirror the two components of BackwardMessage: counting 2paths spanning , and counting 2paths with endpoints in .
Within the first stage we incorporate BackwardMessage noisy counts contributed by , which count paths having intermediate nodes in . Party simply increments these values with counts of paths having endpoints in . The sum reciprocals forms . We make one optimisation to utility at no cost to privacy: counts for are discarded.
A straightforward sum of paths with endpoints in and intermediate points in completes . Finally party completes PrivateEBC by summing the partial EBCs.
Vi Experiments
To empirically validate the effectiveness of PrivateEBC we ran experiments on three graph data sets: a Facebook friendship graph [27] with 63,731 vertices and 817,035 edges; the Enron email network [28] with 36,692 nodes and 183,831 edges; and the Pretty Good Privacy (PGP) [27] data set with 10,680 users as vertices and 24,316 interuser interactions as edges. We follow a random process to partition the nodes, while the structure of the graph stays intact: nodes are assigned to parties or independently and uniformly at random, while edges are not changed.
The experiments were run on a server with core Xeon’s (112 threads with hyper threading) and 1.5 TB RAM, using Python 3.7 without parallel computations for fair comparison. We use relative error between true and private EBC—the lower the relative error the higher the utility. We employed the Mpmath arbitrary precision library and set the precision to 300 bits. Arbitrary precision is vital for implementing inverse transform sampling as described in Section VA4.
Vii Results
We first examine the relationship between utility and privacy for PrivateEBC. For 60 uniformlyatrandom ego nodes selected from randomlypartitioned party , we report average relative error (comparing private and true EBC) for a range of privacy levels between 0.1 and 7. Figure LABEL:sub@fig:refa—LABEL:sub@refc show the results for the three datasets, where it is apparent that average relative error decreases dramatically when is increased to 1, and stays very small for larger . For , average relative error is usually below 50%. And at the strong guarantee of the average relative error is 16% (Facebook) 47% (Enron) and 25% (PGP).
As we have employed three different privacypreserving mechanisms in our proposed protocol—one exponential (Mech1) and two Laplace (Mech2, Mech3)—we examine each separately to evaluate how they affect overall relative error. Specifically, we run the PrivateEBC protocol with only one of the privacypreserving mechanisms intact and use the nonprivate version for remaining mechanisms, with each of Mech1–Mech3 taking turns being private. In this way we can isolate the incremental cost to utility of each mechanism. Figure LABEL:sub@refd reports the results on Facebook, which demonstrate that Mech1 ForwardMessage has the least impact on the relative error while Mech3 BackwardMessage second component, has the highest impact. This suggest future work may focus on the third mechanism.
We next report on timing analysis for PrivateEBC as function of privacy level. Median computation time of 20 random ego nodes for from 0.1 to 7 is reported in Figure LABEL:sub@refe on Facebook data. Here total time is overall decreasing as privacy decreases (increasing ), while a small increase to runtime can be seen at very high levels of privacy (low but increasing . This dual effect is slightly more pronounced on Enron and PGP (see the Appendices), and is likely due to different behaviours in the protocol with increasing . When the set difference of and is small, the twostage sampler generates just small numbers of nodes in faster time. However faster runtime with lower privacy dominates behaviour overall. Moreover any effect of privacy is not strong, with at most a change in runtime which across data sets is practical at under 10 min (median) on the larger data sets.
Figure LABEL:sub@reff shows how the relative error between true and private EBC varies by ego node degree. We report results on , which do not show significant dependence: for node degrees up to , deviation is approximately 7% of the maximum relative error which is low.
Viii Conclusion and Future Work
In this paper we have developed the PrivateEBC algorithm which comprises a protocol of differentiallyprivate mechanisms for cooperative 2party computation of egocentric betweenness centrality. Theoretical and empirical results demonstrate that our approach achieves strong privacy guarantees for both parties which achieving practical levels of utility with efficient time and space complexity. Notably we contribute a novel twostage sampler that improves upon the exponential mechanism’s time and space complexities exponentially. PrivateEBC should extend naturally to multiple networks—we expect to add to our empirical investigations of efficiency in that case. It would be interesting to extend differential privacy to the case in which the answer needs to be returned by the party whose node is being queried to some untrusted authority.
Acknowledgment
This work is supported by the Australian Research Training Program and the Australian Research Council DE160100584.
References
 [1] L. Backstrom, C. Dwork, and J. Kleinberg, “Wherefore art thou r3579x?: anonymized social networks, hidden patterns, and structural steganography,” in WWW’07, 2007, pp. 181–190.
 [2] A. Narayanan, E. Shi, and B. I. Rubinstein, “Link prediction by deanonymization: How we won the Kaggle social network challenge,” in IJCNN’11, 2011, pp. 1825–1834.
 [3] A. Narayanan and V. Shmatikov, “Deanonymizing social networks,” in SP’09, 2009, pp. 173–187.
 [4] C. Dwork, F. McSherry, K. Nissim, and A. Smith, “Calibrating noise to sensitivity in private data analysis,” in TCC’06, 2006, pp. 265–284.
 [5] K.I. Goh, E. Oh, B. Kahng, and D. Kim, “Betweenness centrality correlation in social networks,” Physical Review E, vol. 67, no. 1, p. 017101, 2003.
 [6] L. Sweeney, “kanonymity: A model for protecting privacy,” Int. J. Uncertainty, Fuzziness and KnowledgeBased Systems, vol. 10, no. 05, pp. 557–570, 2002.
 [7] C. C. Aggarwal and S. Y. Philip, “A general survey of privacypreserving data mining models and algorithms,” in Privacypreserving data mining. Springer, 2008, pp. 11–52.
 [8] M. Hay, C. Li, G. Miklau, and D. Jensen, “Accurate estimation of the degree distribution of private networks,” in ICDM, 2009, pp. 169–178.
 [9] J. Zhang, G. Cormode, C. M. Procopiuc, D. Srivastava, and X. Xiao, “Private release of graph statistics using ladder functions,” in SIGMOD’15, 2015, pp. 731–745.
 [10] W.Y. Day, N. Li, and M. Lyu, “Publishing graph degree distribution with node differential privacy,” in SIGMOD’16, 2016, pp. 123–138.
 [11] Y. Mülle, C. Clifton, and K. Böhm, “Privacyintegrated graph clustering through differential privacy.” in EDBT/ICDT Workshops, 2015, pp. 247–254.
 [12] E. Shen and T. Yu, “Mining frequent graph patterns with differential privacy,” in KDD’13, 2013, pp. 545–553.
 [13] S. P. Kasiviswanathan, K. Nissim, S. Raskhodnikova, and A. Smith, “Analyzing graphs with node differential privacy,” in TCC’13, 2013, pp. 457–476.
 [14] S. Raskhodnikova and A. Smith, “Efficient Lipschitz extensions for highdimensional graph statistics and node private degree distributions,” arXiv preprint arXiv:1504.07912, 2015.
 [15] R. Bhaskar, S. Laxman, A. Smith, and A. Thakurta, “Discovering frequent patterns in sensitive data,” in SIGKDD’10, 2010, pp. 503–512.
 [16] V. Karwa, S. Raskhodnikova, A. Smith, and G. Yaroslavtsev, “Private analysis of graph structure,” PVLDB, vol. 4, no. 11, pp. 1146–1157, 2011.
 [17] K. Nissim, S. Raskhodnikova, and A. Smith, “Smooth sensitivity and sampling in private data analysis,” in STOC’07, 2007, pp. 75–84.
 [18] H. H. Nguyen, A. Imine, and M. Rusinowitch, “Detecting communities under differential privacy,” in Proceedings of the 2016 ACM Workshop on Privacy in the Electronic Society, 2016, pp. 83–93.
 [19] C. Dwork, K. Kenthapadi, F. McSherry, I. Mironov, and M. Naor, “Our data, ourselves: Privacy via distributed noise generation,” in EUROCRPY’06, 2006, pp. 486–503.
 [20] R. Chen, A. Reznichenko, P. Francis, and J. Gehrke, “Towards statistical queries over distributed private user data.” in NSDI, 2012, pp. 13–13.
 [21] M. Everett and S. P. Borgatti, “Ego network betweenness,” Social networks, vol. 27, no. 1, pp. 31–38, 2005.
 [22] L. C. Freeman, “Centrality in social networks conceptual clarification,” Social networks, vol. 1, no. 3, pp. 215–239, 1978.
 [23] P. V. Marsden, “Egocentric and sociocentric measures of network centrality,” Social networks, vol. 24, no. 4, pp. 407–422, 2002.
 [24] F. McSherry and K. Talwar, “Mechanism design via differential privacy,” in FOCS’07. IEEE, 2007, pp. 94–103.
 [25] D. Kifer and B.R. Lin, “Towards an axiomatization of statistical privacy and utility,” in SIGMOD’10, 2010, pp. 147–158.
 [26] F. D. McSherry, “Privacy integrated queries: an extensible platform for privacypreserving data analysis,” in SIGMOD’09, 2009, pp. 19–30.
 [27] Institute of Web Science and Technologies at the University of KoblenzâLandau, “The Koblenz network collection,” 2018. [Online]. Available: http://konect.unikoblenz.de/
 [28] Stanford University, “Stanford large network dataset collection.” [Online]. Available: https://snap.stanford.edu/data/index.html
Appendix A Supplemental Material
Aa Proof of Corollary 12
AB Proof of Lemma 14
The proof follows by splitting on , the chain rule of probability, and by definition of the r.v.’s. For any , denote to be such that , then
The penultimate equality follows from
This also establishes that the probability mass is already normalised.
AC Proof of Proposition 16
The pseudoinverse of the CDF follows the general case, while it is easy to show that , for , is distributed as . The time complexity corresponds to both computing the CDF (which need not be stored in its entirety) and a linear search for its inversion.
AD Proof of Lemma 20
Suppose that graphs and differ in some edge with and both in (that is, the edge would belong to ). Our task is to upper bound the corresponding change to counts resulting from running query on the two graphs. There can be at most choices of endpoint node for forming 2hop paths affected by the addition/deletion. Similarly there can be at most choices of endpoint for paths affected by the addition/deletion. For each of these paths the addition/deletion can affect by at most 1. This proves the result.
AE Proof of Lemma 21
There are two ways the addition or removal of an edge can affect . If the edge is the one between endpoints and , then this can change the term by at most 1, (from 0 to 1, in the case that the only other connection between and is via ). If the edge is within , then it can affect a term by at most : the denominator is incremented/decremented by 1 while the denominator must always be at least 1 as a 2path must go through . This can occur for at most terms in the sum, because they’re paths involving some intermediate node in that is neither nor . So overall the change in resulting from the addition or removal of one edge is at most:
AF Additional Experimental Results
We present the timing analyses for PrivateEBC as function of privacy level. Figure LABEL:sub@refap1 and LABEL:sub@refap2 show the results for Enron and PGP data sets which present similar behaviour as the Facebook data cf. Fig. ((e))(e).