LinkMirage: Enabling Privacy-preserving Analytics on Social Relationships

LinkMirage: Enabling Privacy-preserving Analytics on Social Relationships

Changchang Liu, Prateek Mittal
Email: cl12@princeton.edu, pmittal@princeton.edu
Department of Electrical Engineering, Princeton University
publicationid: pubid: Permission to freely reproduce all or part of this paper for noncommercial purposes is granted provided that copies bear this notice and the full citation on the first page. Reproduction for commercial purposes is strictly prohibited without the prior written consent of the Internet Society, the first-named author (for reproduction of an entire paper only), and the author’s employer if the paper was prepared within the scope of employment. NDSS ’15, 8-11 February 2015, San Diego, CA, USA Copyright 2015 Internet Society, ISBN TBD http://dx.doi.org/10.14722/ndss.2015.23xxx

Social relationships present a critical foundation for many real-world applications. However, both users and online social network (OSN) providers are hesitant to share social relationships with untrusted external applications due to privacy concerns. In this work, we design LinkMirage, a system that mediates privacy-preserving access to social relationships. LinkMirage takes users’ social relationship graph as an input, obfuscates the social graph topology, and provides untrusted external applications with an obfuscated view of the social relationship graph while preserving graph utility.
Our key contributions are (1) a novel algorithm for obfuscating social relationship graph while preserving graph utility, (2) theoretical and experimental analysis of privacy and utility using real-world social network topologies, including a large-scale Google+ dataset with 940 million links. Our experimental results demonstrate that LinkMirage provides up to 10x improvement in privacy guarantees compared to the state-of-the-art approaches. Overall, LinkMirage enables the design of real-world applications such as recommendation systems, graph analytics, anonymous communications, and Sybil defenses while protecting the privacy of social relationships.

I Introduction

Online social networks (OSNs) have revolutionized the way our society interacts and communicates with each other. Under the hood, OSNs can be viewed as a special graph structure composed of individuals (or organizations) and connections between these entities. These social relationships represent sensitive relationships between entities, for example, trusted friendships or important interactions in Facebook, Twitter, or Google+, which users want to preserve the security and privacy of.
At the same time, an increasing number of third party applications rely on users’ social relationships (these applications can be external to the OSN). E-commerce applications can leverage social relationships for improving sales [21], and data-mining researchers also rely on the social relationships for functional analysis [35, 33]. Social relationships can be used to mitigate spam[28]. Anonymous communication systems can improve client anonymity by leveraging users’ social relationships [11, 31, 30]. State-of-the-art Sybil defenses rely on social trust relationships to detect attackers [43, 8, 26].
However, both users and the OSN providers are hesitant to share social relationships/graphs with these applications due to privacy concerns. For instance, a majority of users are exercising privacy controls provided by popular OSNs such as Facebook, Google+ and LinkedIn to limit access to their social relationships [9]. Privacy concerns arise because external applications that rely on users’ social relationships can either explicitly reveal this information to an adversary, or allow the adversary to perform inference attacks [14, 24, 32, 38, 34, 20]. These concerns hinder the deployment of many real-world applications. Thus, there exist fundamentally conflicting requirements for any link obfuscation mechanism: protecting privacy for the sensitive links in social networks and preserving utility of the obfuscated graph for use in real-world applications.
In this work, we design LinkMirage, a system that mediates privacy-preserving access to social relationships. LinkMirage takes users’ social relationship graph as an input, either via an OSN operator or via individual user subscriptions. Next, LinkMirage obfuscates the social graph topology to protect the privacy of users’ social contacts (edge/link privacy, not vertex privacy). LinkMirage then provides external applications such as graph analytics and anonymity systems [11, 31, 30] with an obfuscated view of the social relationship graph. Thus, LinkMirage provides a trade-off between securing the confidentiality of social relationships, and enabling the design of social relationship based applications.
We present a novel obfuscation algorithm that first clusters social graphs, and then anonymizes intra-cluster links and inter-cluster links, respectively. We obfuscate links in a manner that preserves the key structural properties of social graphs. While our approach is of interest even for static social graphs, we go a step further in this paper, and consider the evolutionary dynamics of social graphs (node/link addition or deletion). We design LinkMirage to be resilient to such evolutionary dynamics, by consistently clustering social graphs across time instances. Consistent clustering improves both the privacy and utility of the obfuscated graphs. We show that LinkMirage provides strong privacy properties. Even a strategic adversary with full access to the obfuscated graph and prior information about the original social graph is limited in its ability to infer information about users’ social relationships. LinkMirage provides up to 3x privacy improvement in static settings, and up to 10x privacy improvement in dynamic settings compared to the state-of-the-art approaches.
Overall, our work makes the following contributions.

  1. First, we design LinkMirage to mediate privacy-preserving access to users’ social relationships. LinkMirage obfuscates links in the social graph (link privacy) and provides untrusted external applications with an obfuscated view of the social graph. LinkMirage can achieve a good balance between privacy and utility, under the context of both static and dynamic social network topologies.

  2. Second, LinkMirage provides rigorous privacy guarantees to defend against strategic adversaries with prior information of the social graph. We perform link privacy analysis both theoretically as well as using real-world social network topologies. The experimental results for both a Facebook dataset (with 870K links) and a large-scale Google+ dataset (with 940M links) show up to 10x improvement in privacy over the state-of-the-art research.

  3. Third, we experimentally demonstrate the applicability of LinkMirage in real-world applications, such as privacy-preserving graph analytics, anonymous communication and Sybil defenses. LinkMirage enables the design of social relationships based systems while simultaneously protecting the privacy of users’ social relationships.

  4. Finally, we quantify a general utility metric for LinkMirage. We analyze our utility measurement provided by LinkMirage both theoretically and using real-world social graphs (Facebook and Google+).

Ii Background

Ii-a Motivating Applications

In this paper, we focus our research on protecting the link privacy between labeled vertices in social networks [16, 29, 42]. Mechanisms for graph analytics, anonymous communication, and Sybil defenses can leverage users’ social relationships for enhancing security, but end up revealing users’ social relationships to adversaries. For example, in the Tor network [11], the relays’ IP addresses (labels) are already publicly known (vertex privacy in [45, 36, 27] is not useful). Tor operators are hesitant to utilize social trusts to set up the Tor circuit as recommended by [31, 30] since the circuit construction protocol would reveal sensitive social contact information about the users. Our proposed link-privacy techniques can thus be utilized by the Tor relay operators to enhance system security while preserving link privacy. Overall, our work focuses on protecting users’ trust relationships while enabling the design of such systems.
LinkMirage supports three categories of social relationship based applications: 1) Global access to the obfuscated graph: Applications such as social network based anonymity systems [11, 30, 31] and peer-to-peer networks [8] can utilize LinkMirage (described in Section III-B) to obtain a global view of privacy-preserving social graph topologies; 2) Local access to the obfuscated graph: an individual user can query LinkMirage for his/her obfuscated social relationships (local neighborhood information), to facilitate distributed applications such as SybilLimit [43]; 3) Mediated data analytics: LinkMirage can enable privacy-preserving data analytics by running desired functional queries (such as computing graph modularity and pagerank) on the obfuscated graph topology and only returning the result of the query. Existing work [12, 13] demonstrated that the implementation of graph analytics algorithms could leak certain information. Instead of repeatedly adding perturbations to the output of each graph analytics algorithm as in differential privacy [12, 13], which would be rather costly, LinkMirage can obtain the perturbed graph just once to support multiple graph analytics. Such an approach protects the privacy of users’ social relationships from inference attacks using query results. There exists a plethora of attacks against vertex anonymity based mechanisms [32, 38, 34, 20]. Ji et al. [19] recently showed that no single vertex anonymization technique was able to resist all the existing attacks. Note that these attacks are not applicable to link privacy schemes. Therefore, a sound approach to vertex anonymity must start with improvements in our understanding of link privacy. When used as first step in the design of vertex privacy mechanisms, our approach can protect the privacy of social contacts and graph links even when the vertices are de-anonymized using state-of-the-art approaches [32, 38, 34, 20]. Furthermore, our method can even improve the resilience of vertex anonymity mechanisms against de-anonymization attacks when applied to unlabelled graphs (will be shown in Section V-B).

Fig. 1: LinkMirage architecture. LinkMirage first collects social link information through our social link app or directly through the OSN providers, and then applies an obfuscation algorithm to perturb the original social graph(s). The obfuscated graph(s) would be utilized to answer the query of the untrusted applications in a privacy-preserving manner. The third-party application (which queries the social link information) is considered an adversary which aims to obtain sensitive link information from the perturbed query results.

Ii-B System Architecture and Threat Model

Fig. 1 shows the overall architecture for LinkMirage. For link privacy, we consider the third-party applications (which can query the social link information) as adversaries, which aim to obtain sensitive link information from the perturbed query results. A sophisticated adversary may have access to certain prior information such as partial link information of the original social networks, and such prior information can be extracted from publicly available sources, social networks such as Facebook, or other application-related sources as stated in [6]. The adversary may leverage Bayesian inference to infer the probability for the existence of a link. We assume that LinkMirage itself is trusted, in addition to the social network providers/users who provide the input social graph.
In Section IV-B, IV-C, we define our Bayesian privacy metric (called anti-inference privacy) and an information theoretic metric (called indistinguishability) to characterize the privacy offered by LinkMirage against adversaries with prior information. In addition, the evolving social topologies introduce another serious threat where sophisticated adversaries can combine information available in multiple query results to infer users’ social relationships. We define anti-aggregation privacy in Section IV-D, for evaluating the privacy performance of LinkMirage against such adversaries.

Ii-C Basic Theory

Let us denote a time series of social graphs as . For each temporal graph , the set of vertices is and the set of edges is . For our theoretical analysis, we focus on undirected graphs where all the edges are symmetric, i.e. iff . Note that our approach can be generalized to directed graphs with asymmetric edges. is the transition probability matrix of the Markov chain on the vertices of . measures the probability that we follow an edge from one vertex to another vertex, where ( denotes the degree of vertex ) if , otherwise . A random walk starting from vertex , selects a neighbor of at random according to and repeats the process.

Ii-D System Overview and Roadmap

Our objective for LinkMirage is to obfuscate social relationships while balancing privacy for users’ social relationships and the usability for large-scale real-world applications (as will be stated in Section III-A). We deploy LinkMirage as a Facebook application that implements graph construction and obfuscation (as will be discussed in Section III-B). We then describe the perturbation mechanism of LinkMirage in Section III-C where we take both the static and the temporal social network topology into consideration. Our perturbation mechanism consists of two steps: dynamic clustering which finds community structures in evolving graphs by simultaneously considering consecutive graphs, and selective perturbation which perturbs the minimal amount of edges in the evolving graphs. Therefore, it is possible to use a very high privacy parameter in the perturbation process, while preserving structural properties such as community structures. We then discuss the scalability of our algorithm in Section III-D and visually show the effectiveness of our algorithm in Section III-E. In Section IV, we rigorously analyze the privacy advantage of our LinkMirage over the state-of-the-art work, through considering three adversarial scenarios including the worst-case Bayesian adversary. In Section V, we apply our algorithm on various real world applications of anonymity systems, Sybil defenses and privacy-preserving analytics. In Section VI, we further analyze the effectiveness of LinkMirage on preserving different kinds of graph structural performance.

Iii LinkMirage System

Iii-a Design Goals

We envision that applications relying on social relationships between users can bootstrap this information from online social network operators such as Facebook, Google+, Twitter with access to the users’ social relationships. To enable these applications in a privacy-preserving manner, a perturbed social graph topology (by adding noise to the original graph topology) should be available.
Social graphs evolve over time, and the third-party applications would benefit from access to the most current version of the graph. A baseline approach is to perturb each graph snapshot independently. However, the sequence of perturbed graphs provide significantly more observations to an adversary than just a single perturbed graph. We argue that an effective perturbation method should consider the evolution of the original graph sequence. Therefore, we have the overall design goals for our system as:

  1. We aim to obfuscate social relationships while balancing privacy for users’ social relationships and the usability for real-world applications.

  2. We aim to handle both the static and dynamic social network topologies.

  3. Our system should provide rigorous privacy guarantees to defend against adversaries who have prior information of the original graphs, and adversaries who can combine multiple released graphs to infer more information.

  4. Our method should be scalable to be applied in real-world large-scale social graphs.

Fig. 2: Our perturbation mechanism for . Assume that has already been dynamically obfuscated, based on dynamic clustering (step 1) and selective perturbation (step 2). Our mechanism analyzes the evolved graph (step 3) and dynamically clusters (step 4) based on the freed hop neighborhood () of new links (between green and blue nodes), the merging virtual node (the large red node in step 4), and the new nodes. By comparing the communities in and , we can implement selective perturbation (step 5), i.e. perturb the changed blue community independently and perturb the unchanged red and green communities in the same way as , and then perturb the inter-cluster links.

Iii-B LinkMirage: Deployment

To improve the usability of our proposed obfuscation approach (which will be described in detail in Section III-C), and to avoid dependance on the OSN providers, we developed a Facebook application (available: https://apps.facebook.com/xxxx/)111Anonymized. that implements graph construction (via individual user subscriptions) and obfuscation. The work flow of the LinkMirage deployment is as follows: (i) When a user visits the above URL, Facebook checks the credentials of the user, asks whether to grant the user’s friends permission, and then gets redirected to the application hosting server. (ii) The application server authenticates itself, and then queries Facebook for the information of the user’s friends, and returns their information such as user’s id. The list of user’s friends can then be collected by the application server to construct a Facebook social graph for the current timestamp. Leveraging LinkMirage, a perturbed graph for this timestamp would be available which preserves the link privacy of the users’ social relationships.
Real-world systems such as Uproxy, Lantern, Kaleidoscope [17], anonymity systems [11, 31, 30], Sybil defenses systems [43, 8] can directly benefit from our protocol through automatically obtaining the perturbed social relationships. Furthermore, our protocol can enable privacy-preserving graph analytics for OSN providers. We will give more detailed explanations for supporting applications in Section III-F.

Iii-C LinkMirage: Perturbation Algorithm

Social networks evolve with time and publishing a time series of perturbed graphs raises a serious privacy challenge: an adversary can combine information available from multiple perturbed graphs over time to compromise the privacy of users’ social contacts [39, 10, 5]. In LinkMirage, we take a time series of graph topologies into consideration, to account for the evolution of the social networks. Intuitively, the scenario with a static graph topology is just a special situation of the temporal graph sequence, and is thus inherently incorporated in our model.
Consider a social graph series ,,. We want to transform the graph series to ,,, such that the vertices in remain the same as in the original graph , but the edges are perturbed to protect link privacy. Moreover, while perturbing the current graph , LinkMirage has access to the past graphs in the time series (i.e., ). Our perturbation goal is to balance the utility of social graph topologies and the privacy of users’ social contacts, across time.

Time 0 1 2 3 4 5 6 7 8
# of nodes 9,586 9,719 11,649 13,848 14,210 16,344 18,974 26,220 35,048
# of edges 48,966 38,058 47,024 54,787 49,744 58,099 65,604 97,095 142,274
Average degree 5.11 3.91 4.03 3.96 3.50 3.55 3.46 3.70 4.06
TABLE I: Temporal Statistics of the Facebook Dataset.

Approach Overview: Our perturbation mechanism for LinkMirage is illustrated in Fig. 2.
Static scenario: For a static graph , we first cluster it into several communities, and then perturb the links within each community. The inter-cluster links are also perturbed to protect their privacy.
Dynamic scenario: Let us suppose that evolves from by addition of new vertices (shown in blue color). To perturb graph , our intuition is to consider the similarity between graphs and .
First, we partition and into subgraphs, by clustering each graph into different communities. To avoid randomness (guarantee consistency) in the clustering procedure and to reduce the computation complexity, we dynamically cluster the two graphs together instead of clustering them independently. Noting that one green node evolves by connecting with a new blue node, we free 222We free the nodes from the previously clustering hierarchy. all the nodes located within hops of this green node (the other two green nodes and one red node) and merge the remaining three red nodes to a big virtual node. Then, we cluster these new nodes, the freed nodes and the remaining virtual node to detect communities in .
Next, we compare the communities within and , and identify the changed and unchanged subgraphs. For the unchanged subgraphs , we set their perturbation at time to be identical to their perturbation at time , denoted by . For the changed subgraph , we perturb it independently to obtain . We also perturb the links between communities to protect privacy of these inter-cluster links. Finally, we publish as the combination of and the perturbed inter-cluster links. There are two key steps in our algorithm: dynamic clustering and selective perturbation, which we describe in detail as follows.

Iii-C1 Dynamic Clustering

Considering that communities in social networks change significantly over time, we need to address the inconsistency problem by developing a dynamic community detection method. Dynamic clustering aims to find community structures in evolving graphs by simultaneously considering consecutive graphs in its clustering algorithms. There are several methods in the literature to cluster evolving graphs [3], but we found them to be unsuitable for use in our perturbation mechanism. One approach to dynamic clustering involves performing community detection at each timestamp independently, and then establishing relationships between communities to track their evolution [3]. We found that this approach suffers from performance issues induced by inherent randomness in clustering algorithms, in addition to the increased computational complexity.

0:     if or if ;
0:  ; null;if t=0;cluster to get ;label as changed, i.e. ;endif /*Begin Dynamic Clustering*/1. free the nodes within hops of the changed links;2. re-cluster the new nodes, the freed nodes, the remai--ning merged virtual nodes in to get ;/*End Dynamic Clustering*//*Begin Selective Perturbation*/3. find the unchanged communities and the chan--ged communities ; 4. let ;5. perturb for by the static method; 6. foreach community pair and ;if both of the communities belong to ;elseforeach marginal node in and in randomly add an edge with pro--bability to ;/*End Selective Perturbation*/return ;
Algorithm 1 LinkMirage, with dynamic clustering (steps 1-2) and selective perturbation (steps 3-6). The parameter denotes the perturbation level for each community. Here, ch, un, in are short for changed, unchanged, inter-community, respectively.

Another approach is to combine multiple graphs into a single coupled graph [3]. The coupled graph is constructed by adding edges between the same nodes across different graphs. Clustering can be performed on the single coupled graph. We found that the clustering performance is very sensitive to the weights of the added links, resulting in unstable clustering results. Furthermore, the large dimensionality of the coupled graph significantly increases the computational overhead.
For our perturbation mechanism, we develop an adaptive dynamic clustering approach for clustering the graph using the clustering result for the previous graph . This enables our perturbation mechanism to (a) exploit the link correlation/similarity in consecutive graph snapshots, and (b) reduce computation complexity by avoiding repeated clustering for unchanged links.
Clustering the graph from the clustering result of the previous graph requires a backtracking strategy. We use the maximum-modularity method [33] for clustering, which is hierarchical and thus easy to backtrack. Our backtrack strategy is to first maintain a history of the merge operations that led to the current clustering. When an evolution occurs, the algorithm backtracks over the history of merge operations, in order to incorporate the new additions and deletions in the graph.
More concretely, if the link between node and node is changed (added or deleted), we omit all the -hop neighborhoods of and as well as and themselves from the clustering result of the previous timestamp, and then perform re-clustering. All the new nodes, the changed nodes and their -hop neighbors, and the remaining merged nodes in the previous clustering result would be considered as basic elements for clustering (recall Figure 2).
For efficient implementation, we store the intermediate results of the hierarchical clustering process in a data structure. Upon link changes between , we free the -hop neighborhood of from the stored data structure.

Fig. 3: Dynamic Facebook interaction dataset topology, for . On the left, we can see that LinkMirage has superior utility than the baseline approach (Mittal et al.), especially for larger values of (due to dynamic clustering). On the right, we show the overlapped edges (black) and the changed edges (yellow) between consecutive graphs: t=() and t=(). We can see that in LinkMirage, the perturbation of unchanged communities is correlated across time (selective perturbation), minimizing information leakage and enhancing privacy.

Iii-C2 Selective perturbation


Intra-cluster Perturbation: After clustering based on using our dynamic clustering method, we perturb based on and the perturbed . First, we compare the communities detected in and , and classify them as changed or unchanged. Our unchanged classification does not require that the communities are exactly the same, but that the overlap among vertices/links exceeds a threshold. Our key idea is to keep the perturbation process for links in the unchanged communities to be identical to their perturbation in the previous snapshot. In this manner, we can preserve the privacy of these unchanged links to the largest extent; it is easy to see that alternate approaches would leak more information. For the communities which are classified as changed, our approach is to perturb their links independently of the perturbation in the previous timestamp. For independent perturbations, we leverage the static perturbation method of Mittal et al. in [29]. Their static perturbation deletes all the edges in the original graph, and replaces each edge with a fake edge selected from the -hop random walk starting from . Larger perturbation parameter corresponds to better privacy and leads to worse utility.
Inter-cluster Perturbation: Finally, we need to interconnect the subgraphs identified above. Suppose that nodes and nodes are connecting communities and respectively, and they construct an inter-community subgraph. For each marginal node and (here the marginal node in community (resp.) refers to the node that has neighbors in the other community (resp.)) , we randomly connect them with probability .333This probability is set for the preservation of degree distributions as analyzed in Section VI. Here, all the computations for only consider the marginal nodes. We can combine the perturbed links corresponding to the unchanged communities, changed communities, and inter-community subgraphs, to compute the output of our algorithm, i.e., .
LinkMirage not only preserves the structural characteristics of the original graph series, but also protects the privacy of the users by randomizing the original links. As compared to prior work, our method provides stronger privacy and utility guarantees for evolving graphs. Detailed procedures are stated in Algorithm. 1.
Surprisingly, our approach of first isolating communities and then selectively perturbing them provides benefits even in a static context! This is because previous static approaches use a single parameter to control the privacy/utility trade-off. Thus, if we apply them to the whole graph using high privacy parameters, it would destroy graph utility (e.g. community structures). On the other hand, LinkMirage applies perturbations selectively to communities; thus it is possible to use a very high privacy parameter in the perturbation process, while preserving structural properties such as community structures.

Iii-D Scalable Implementation

Our algorithm relies on two key graph theoretical techniques: community detection (serves as a foundation for the dynamic clustering step in LinkMirage) and random walk (serves as a foundation for the selective perturbation step in LinkMirage). The computational complexity for both community detection and random walk is  [3, 29] where is the number of edges in graph , therefore the overall computational complexity of our approach is . Furthermore, our algorithms are parallelizable. We adopt the GraphChi parallel framework in [22] to implement our algorithm efficiently using a commodity workstation (3.6 GHz, 24GB RAM). Our parallel implementation scales to very large social networks; for example, the running time of LinkMirage is less than seconds for the large scale Google+ dataset (940 million links) (will be described in Section IV-A) using our commodity workstation.

Iii-E Visual Depiction

For our experiments, we consider a real world Facebook social network dataset [40] among New Orleans regional network, spanning from September 2006 to January 2009. Here, we utilize the wall post interaction data which represents stronger trust relationships and comprises of 46,952 nodes (users) connected by 876,993 edges. We partitioned the dataset using three month intervals to construct a total of 9 graph instances as shown in Table I. Fig. 3 depicts the outcome of our perturbation algorithm on the partitioned Facebook graph sequence with timestamp (out of 9 snapshots), for varying perturbation parameter (perturbation parameter for each community). For comparative analysis, we consider a baseline approach [29] that applies static perturbation for each timestamp independently. In the dynamic clustering step of our experiments, we free the two-hop neighborhoods of the changed nodes, i.e. .
The maximum-modularity clustering method yields two communities for , three communities for , and four communities for . For the perturbed graphs, we use the same color for the vertices as in the original graph and we can see that fine-grained structures (related to utility) are preserved for both algorithms under small perturbation parameter , even though links are randomized. Even for high values of , LinkMirage can preserve the macro-level (such as community-level) structural characteristics of the graph. On the other hand, for high values of , the static perturbation algorithm results in the loss of structure properties, and appears to resemble a random graph. Thus, our approach of first isolating communities and applying perturbation at the level of communities has benefits even in a static context.
Fig. 3 also shows the privacy benefits of our perturbation algorithm for timestamps . We can see that LinkMirage reuses perturbed links (shown as black unchanged links) in the unchanged communities (one unchanged community for and two unchanged communities for ). Therefore, LinkMirage preserves the privacy of users’ social relationships by considering correlations among the graph sequence, and this benefit does not come at the cost of utility. In the following sections, we will formally quantify the privacy and utility properties of LinkMirage.

Time Jul.29 Aug.8 Aug.18 Aug.28 Sep.7 Sep.17 Sep.27 Oct.7
# of nodes 16,165,781 17,483,936 17,850,948 19,406,327 19,954,197 24,235,387 28,035,472 28,942,911
# of edges 505,527,124 560,576,194 575,345,552 654,523,658 686,709,660 759,226,300 886,082,314 947,776,172
Average degree 31.2714 32.0624 32.2305 33.7273 34.4143 31.3272 31.6058 32.7464
TABLE II: Temporal Statistics of the Google+ Dataset.

Iii-F Supporting Applications

As discussed in Section II-A, LinkMirage supports three types of applications: 1) Global access to obfuscated graphs: real-world applications can utilize our protocol to automatically obtain the secure social graphs to enable social relationships based systems. For instance, Tor operators [11] (or other anonymous communication network such as Pisces in [30]) can leverage the perturbed social relationships to set up the anonymous circuit; 2) Local access to the obfuscated graphs: an individual user can query our protocol for his/her perturbed friends (local neighborhood information), to implement distributed applications such as SybilLimit in [43]; 3) Mediated data analysis: the OSN providers can also publish perturbed graphs by leveraging LinkMirage to facilitate privacy-preserving data-mining research, i.e., to implement graph analytics such as pagerank score [35], modularity [33], while mitigating disclosure of users’ social relationships. Existing work in [12, 13] demonstrated that the implementation of graph analytic algorithms would leak certain information. To avoid repeatedly adding perturbations to the output of every graph analytic algorithm, which is rather costly, the OSN providers can first obtain the perturbed graphs by leveraging LinkMirage and then enable these graph analytics in a privacy-preserving manner.

Iv Privacy Analysis

We now address the question of understanding link privacy of LinkMirage. We propose three privacy metrics: anti-inference privacy, indistinguishability, anti-aggregation privacy to evaluate the link privacy provided by LinkMirage. Both theoretical analysis and experimental results with a Facebook dataset (870K links) and a large-scale Google+ dataset (940M links) show the benefits of LinkMirage over previous approaches. We also illustrate the relationship between our privacy metric and differential privacy.

Iv-a Experimental Datasets

To illustrate how the temporal information degrades privacy, we consider two social network datasets. The first one is a large-scale Google+ dataset [14]. whose temporal statistics are illustrated in Table II. To the best of our knowledge, this is the largest temporal dataset of social networks in public domain. The Google+ dataset is crawled from July 2011 to October 2011 which has 28,942,911 nodes and 947,776,172 edges. The dataset only considers link additions, i.e. all the edges in the previous graphs exist in the current graph. We partitioned the dataset into 84 timestamps. The second one is the 9-timestamp Facebook wall posts dataset [40] as we stated in Section III-E. with temporal characteristics shown in Table I. It is worth noting that the wall-posts data experiences tremendous churn with only 45% overlap for consecutive graphs. Since our dynamic perturbation method relies on the correlation between consecutive graphs, the evaluation of our dynamic method on the Facebook wall posts data is conservative. To show the improvement in performance of our algorithm for graphs that evolve at a slower rate, we also consider a sampled graph sequence extracted from the Facebook wall posts data with 80% overlap for consecutive graphs.

Fig. 4: (a),(b) represent the link probability distributions for the whole Facebook interaction dataset and the sampled Facebook interaction dataset with 80% overlap. We can see that the posterior probability of LinkMirage is more similar to the prior probability than the baseline approach.

Iv-B Anti-Inference Privacy

First, we consider adversaries that aim to infer link information by leveraging Bayesian inference. We define the privacy of a link (or a subgraph) in the -th graph instance, as the difference between the posterior probability and the prior probability of the existence of the link (or a subgraph), computed by the adversary using its prior information , and the knowledge of the perturbed graph sequence . Utilizing Bayesian inference, we have

Definition 1

For link in the original graph sequence and the adversary’s prior information , the anti-inference privacy for the perturbed graph sequence is evaluated by the similarity between the posterior probability and the prior probability , where the posterior probability is

(1)

Higher similarity implies better anti-inference privacy.

The difference between the posterior probability and the prior probability represents the information leaked by the perturbation mechanism. Similar intuition has been mentioned in [23]. Therefore, the posterior probability should not differ much from the prior probability.
In the above expression, is the prior probability of the link, which can be computed based on the known structural properties of social networks, for example, by using link prediction algorithms [24]. Note that is a normalization constant that can be analyzed by sampling techniques. The key challenge is to compute 444The detailed process for computing the posterior probability can be found in [29].
For evaluation, we consider a special case where the adversary’s prior is the entire time series of original graphs except the link (which is the link we want to quantify privacy for, and denotes the existence of this link while denotes the non-existence of this link). Such prior information can be extracted from personal public information, Facebook related information or other application-related information as stated in [6]. Note that this is a very strong adversarial prior, which would lead to the worst-case analysis of link privacy. Denoting as the prior which contains all the information except , we have the posterior probability of link under the worst case is

Fig. 5: Link probability distribution for the Google+ dataset under the adversary’s prior information extracted from the social-attribute network model in [14].

where

Therefore, the objective of perturbation algorithms is to make close to .
Comparison with previous work: Fig. 4 shows the posterior probability distribution for the whole Facebook graph sequence and the sampled Facebook graph sequence with 80% overlapping ratio, respectively. We computed the prior probability using the link prediction method in [24]. We can see that the posterior probability corresponding to LinkMirage is closer to the prior probability than that of the method of Mittal et al. [29]. In Fig. 4(b), taking the point where the link probability equals , the distance between the posterior CDF and the prior CDF for the static approach is a factor of larger than LinkMirage (). Larger perturbation degree improves privacy and leads to smaller difference with the prior probability. Finally, by comparing Fig. 4(a) and (b), we can see that larger overlap in the graph sequence improves the privacy benefits of LinkMirage.

Fig. 6: (a),(b) represent the temporal indistinguishability for the whole Facebook interaction dataset and the sampled Facebook interaction dataset with 80% overlap. Over time, the adversary has more information, resulting in decreased indistinguishability. We can also see that LinkMirage has higher indistinguishability than the static method and the Hay’s method in [16], although it still suffers from some information leakage.

We also compare with the work of Hay et al. in [16], which randomizes the graph with real links deleted and another fake links introduced. The probability for a real link to be preserved in the perturbed graph is , which should not be small otherwise the utility would not be preserved. Even considering (which would substantially hurt utility [16]), the posterior probability for a link using the method of Hay et al. would be , even without prior information. In contrast, our analysis for LinkMirage considers a worst-case prior, and shows that the posterior probability is smaller than for more than 50% of the links when in Fig. 4. Therefore, our LinkMirage provides significantly higher privacy than the work of Hay et al.
Adversaries with structural and contextual information: Note that our analysis so far focuses on quantifying link-privacy under an adversary with prior information about the original network structure (including link prediction capabilities). In addition, some adversaries may also have access to contextual information about users in the social network, such as user attributes, which can also be used to predict network links (e.g., social-attribute network prediction model in [14]). We further computed the prior probability using such social-attribute network prediction model in [14] and showed the link probability for the Google+ dataset in Fig. 5. The posterior probability of our LinkMirage is closer to the prior probability and thus LinkMirage achieves better privacy performance than previous work.

Iv-C Indistinguishability

Based on the posterior probability of a link under the worst case , we need to qualify the privacy metric for adversaries who aim to distinguish the posterior probability with the prior probability. Since our goal is to reduce the information leakage of based on the perturbed graphs and the prior knowledge , we consider the metric of indistinguishability to quantify privacy, which can be evaluated by the conditional entropy of a private message given the observed variables [7]. The objective for an obfuscation scheme is to maximize the indistinguishability of the unknown input given the observables , i.e. (where denotes entropy of a variable [7]). Here, we define our metric for link privacy as

Definition 2

The indistinguishability for a link in the original graph that the adversary can infer from the perturbed graph under the adversary’s prior information is defined as .

Furthermore, we quantify the behavior of indistinguishability over time. For our analysis, we continue to consider the worst case prior of the adversary knowing the entire graph sequence except the link . To make the analysis tractable, we add another condition that if the link exists, then it exists in all the graphs (link deletions are rare in real world social networks). For a large-scale graph, only one link would not affect the clustering result. Then, we have

Theorem 1

The indistinguishability decreases with time,

(2)

The inequality follows from the theorem conditioning reduces entropy in [7]. Eq.2 shows that the indistinguishability would not increase as time evolves. The reason is that over time, multiple perturbed graphs can be used by the adversary to infer more information about link .
Next, we theoretically show why LinkMirage has better privacy performance than the static method. For each graph , denote the perturbed graphs using LinkMirage and the static method as , respectively.

Theorem 2

The indistinguishability for LinkMirage is greater than that for the static perturbation method, i.e.

(3)
{proof}

In LinkMirage, the perturbation for the current graph is based on perturbation for . Let us denote the changed subgraph between as , then

where the first inequality also comes from the theorem conditioning reduces entropy in [7]. The second inequality generalizes the first inequality from a snapshot to the entire sequence. From Eq.3, we can see that LinkMirage may offer superior indistinguishability compared to the static perturbation, and thus provides higher privacy. Comparison with previous work: Next, we experimentally analyze our indistinguishability metric over time. Fig. 6 depicts the indistinguishability metric using the whole Facebook graph sequence and the sampled Facebook graph sequence with 80% overlap. We can see that the static perturbation leaks more information over time. In contrast, the selective perturbation achieves significantly higher indistinguishability. In Fig. 6(a), after 9 snapshots, and using , the indistinguishability of the static perturbation method is roughly of the indistinguishability of LinkMirage. This is because selective perturbation explicitly takes the temporal evolution into consideration, and stems privacy degradation via the selective perturbation step. Comparing Fig. 6(a) and (b), LinkMirage has more advantages for larger overlapped graph sequence.
We also compare with the work of Hay et al. in [16], For the first timestamp, the probability for a real link to be preserved in the anonymized graph is . As time evolves, the probability would decrease to . Combined with the prior probability, the corresponding indistinguishability for the method of Hay et al. is shown as the black dotted line in Fig. 6, which converges to 0 very quickly (we also consider which would substantially hurt utility [16]) Compared with the work of Hay et al, LinkMirage significantly improves privacy performance. Even when , LinkMirage with achieves up to 10x improvement over the approach of Hay et al. in the indistinguishability performance.

Fig. 7: (a)(b) show the temporal anti-aggregation privacy for the Google+ dataset and the Facebook dataset, respectively. The anti-aggregation privacy decreases as time evolves because more information is leaked with more perturbed graphs available. Leveraging selective perturbation, LinkMirage achieves much better anti-aggregation privacy than the static baseline method.

Iv-D Anti-aggregation Privacy

Next, we consider the adversaries who try to aggregate all the previously published graphs to infer more information. Recall that after community detection in our algorithm, we anonymize the links by leveraging the -hop random walk. Therefore, the perturbed graph is actually a sampling of the -hop graph , where the -hop graph represents graph where all the -hop neighbors in the original graph are connected. It is intuitive that a larger difference between and represents better privacy. Here, we utilize the distance between the corresponding transition probability matrices 555We choose the total variance distance to evaluate the statistical distance between and as in [29]. to measure this difference. And we extend the definition of total variance [18] from vector to matrix by averaging total variance distance of each row in the matrix, i.e. , where denotes the -th row of . We then formally define the anti-aggregation privacy as

Definition 3

The anti-aggregation privacy for a perturbed graph with respect to the original graph and the perturbation parameter is

The adversary’s final objective is to obtain an estimated measurement of the original graph, e.g. the estimated transition probability matrix which satisfies . A straightforward manner to evaluate privacy is to compute the estimation error of the transition probability matrix i.e. . We can derive the relationship between the anti-aggregation privacy and the estimation error as (we defer the proofs to the Appendix to improve readability.)

Theorem 3

The anti-aggregation privacy is a lower bound of the estimation error for the adversaries, and

(4)

We further consider the network evolution where the adversary can combine all the perviously perturbed graphs together to extract more -hop information of the current graph. Under this situation, a strategic methodology for the adversary is to combine the perturbed graph series , to construct a new perturbed graph , where . The combined perturbed graph contains more information about the -hop graph than . Correspondingly, the transition probability matrix of the combined perturbed graph would provide more information than . That is to say, the anti-aggregation privacy decreases with time.
Comparison with previous work: We evaluate the anti-aggregation privacy of LinkMirage on both the Google+ dataset and the Facebook dataset. Here we perform our experiments based on a conservative assumption that a link always exists after it is introduced. The anti-aggregation privacy decreases with time since more information about the -hop neighbors of the graph is leaked as shown in Fig. 7. Our selective perturbation preserves correlation between consecutive graphs, therefore leaks less information and achieves better privacy than the static baseline method. For the Google+ dataset, the anti-aggregation privacy for the method of Mittal et al. is only of LinkMirage after 84 timestamps.

Iv-E Relationship with Differential Privacy

Our anti-inference privacy analysis considers the worst-case adversarial prior to infer the existence of a link in the graph. Next, we uncover a novel relationship between this anti-inference privacy and differential privacy.
Differential privacy is a popular theory to evaluate the privacy of a perturbation scheme [12, 13, 25]. The framework of differential privacy defines local sensitivity of a query function on a dataset as the maximal for all differing from in at most one element . Based on the theory of differential privacy, a mechanism that adds independent Laplacian noise with parameter to the query function , satisfies -differential privacy. The degree of added noise, which determines the utility of the mechanism, depends on the local sensitivity. To achieve a good utility as well as privacy, the local sensitivity should be as small as possible. The following lemma demonstrates the effectiveness of worst-case Bayesian analysis since the objective for good utility-privacy balance under our worst-case Bayesian analysis is equivalent to under differential privacy.

Remark 1

The requirement for good utility-privacy balance in differential privacy is equivalent to the objective of our Bayesian analysis under the worst case. (We defer the proofs to Appendix to improve readability.)

Iv-F Summary for Privacy Analysis

  1. LinkMirage provides rigorous privacy guarantees to defend against adversaries who have prior information about the original graphs, and the adversaries who aim to combine multiple released graphs to infer more information.

  2. LinkMirage shows significant privacy advantages in anti-inference privacy, indistinguishability and anti-aggregation privacy, by outperforming previous methods by a factor up to .

V Applications

Applications such as anonymous communication [11, 31, 30] and vertex anonymity mechanisms [45, 36, 27] can utilize LinkMirage to obtain the entire obfuscated social graphs. Alternatively, each individual user can query LinkMirage for his/her perturbed neighborhoods to set up distributed social relationship based applications such as SybilLimit [43]. Further, the OSN providers can also leverage LinkMirage to perturb the original social topologies only once and support multiple privacy-preserving graph analytics, e.g., privately compute the pagerank/modularity of social networks.

Fig. 8: The worst case probability of deanonymizing users’ communications (). Over time, LinkMirage provides better anonymity compared to the static approaches.
Fig. 9: (a) shows the false positive rate for Sybil defenses. We can see that the perturbed graphs have lower false positive rate than the original graph. Random walk length is proportional to the number of Sybil identities that can be inserted in the system. (b) shows that the final attack edges are roughly the same for the perturbed graphs and the original graphs.

V-a Anonymous Communication [11, 31, 30]

As a concrete application, we consider the problem of anonymous communication [11, 31, 30]. Systems for anonymous communication aim to improve user’s privacy by hiding the communication link between the user and the remote destination. Nagaraja et al. and others [11, 31, 30] have suggested that the security of anonymity systems can be improved by leveraging users’ trusted social contacts.
We envision that our work can be a key enabler for the design of such social network based systems, while preserving the privacy of users’ social relationships. We restrict our analysis to low-latency anonymity systems that leverage social links, such as the Pisces protocol [30].
Similar to the Tor protocol, users in Pisces rely on proxy servers and onion routing for anonymous communication. However, the relays involved in the onion routing path are chosen by performing a random walk on a trusted social network topology. Recall that LinkMirage better preserves the evolution of temporal graphs in Fig. 3. We now show that this translates into improved anonymity over time, by performing an analysis of the degradation of user anonymity over multiple graph snapshots. For each graph snapshot, we consider a worst case anonymity analysis as follows: if a user’s neighbor in the social topology is malicious, then over multiple communication rounds (within that graph instance) its anonymity will be compromised using state-of-the-art traffic analysis attacks [41]. Now, suppose that all of a user’s neighbors in the first graph instance are honest. As the perturbed graph sequence evolves, there is further potential for degradation of user anonymity since in the subsequent instances, there is a chance of the user connecting to a malicious neighbor. Suppose the probability for a node to be malicious is . Denote as the distinct neighbors of node at time . For a temporal graph sequence, the number of the union neighbors of increases with time, and the probability for to be attacked under the worst case is . Note that in practice, the adversary’s prior information will be significantly less than the worst-case adversary.
Fig. 8 depicts the degradation of the worst-case anonymity with respect to the number of perturbed topologies. We can see that the attack probability for our method is lower than the static approach with a factor up to 2. This is because over consecutive graph instances, the users’ social neighborhood has higher similarity as compared to the static approach, reducing potential for anonymity degradation. Therefore, LinkMirage can provide better security for anonymous communication, and other social trust based applications.

V-B Vertex Anonymity [45, 36, 27]

Previous work for vertex anonymity [45, 36, 27] would be defeated by de-anonymization techniques [32, 38, 34, 20]. LinkMirage can serve as a formal first step for vertex anonymity, and even improve its defending capability against de-anonymization attacks. We apply LinkMirage to anonymize vertices, i.e. to publish a perturbed topology without labeling any vertex. In [20], Ji et al. modeled the anonymization as a sampling process where the sampling probability denotes the probability of an edge in the original graph to exist in the anonymized graph . LinkMirage can also be applied for such model, where the perturbed graph is sampled from the -hop graph (corresponding to ).
They also derived a theoretical bound of the sampling probability for perfect de-anonymization, and found that a weaker bound is needed with a larger value of the sampling probability . Larger implies that is topologically more similar to , making it easier to enable a perfect de-anonymization. When considering social network evolution, the sampling probability can be estimated as , where are the edges of the perturbed graph sequence, and are the edges of the -hop graph sequence. Compared with the static baseline approach, LinkMirage selectively reuses information from previously perturbed graphs, thus leading to smaller overall sampling probability , which makes it harder to perfectly de-anonymize the graph sequence. For example, the average sampling probability for the Google+ dataset (with ) is and for LinkMirage and the static method respectively. For the Facebook temporal dataset (with ), the average sampling probability is and for LinkMirage and the static method respectively. Therefore, LinkMirage is more resilient against de-anonymization attacks even when applied to vertex anonymity, with up to 10x improvement.

V-C Sybil Defenses [43]

Next, we consider Sybil defenses systems which leverage the published social topologies to detect fake accounts in the social networks. Here, we analyze how the use of a perturbed graph changes the Sybil detection performance of SybilLimit [43], which is a representative Sybil defense system. Each user can query LinkMirage for his/her perturbed friends to set up the implementation of SybilLimit. Fig. 9(a) depicts the false positives (honest users misclassified as Sybils) with respect to the random walk length in the Sybillimit protocol. Fig. 9(b) shows the final attack edges with respect to the attack edges in the original topology. We can see that the false positive rate is much lower for the perturbed graphs than for the original graph, while the number of the attack edges stay roughly the same for the original graph and the perturbed graphs. The number of Sybil identities that an adversary can insert is given by ( is the number of attack edges and is the random walk parameter in the protocol). Since stays almost invariant and the random walk parameter (for any desired false positive rate) is reduced, LinkMirage improves Sybil resilience and provides the privacy of the social relationships such that Sybil defense protocols continue to be applicable (similar to static approaches whose Sybil-resilience performance have been demonstrated in previous work).

Google+
Original
Graph
LinkMirage
LinkMirage
Mittal et al.
Mittal et al.
Modularity 0.605 0.601 0.603 0.591 0.586
Facebook
Original
Graph
LinkMirage
LinkMirage
Mittal et al.
Mittal et al.
Modularity 0.488 0.479 0.487 0.476 0.415
TABLE III: Modularity of Perturbed Graph Topologies

V-D Privacy-preserving Graph Analytics [33, 35]

Next, we demonstrate that LinkMirage can also benefit the OSN providers for privacy-preserving graph analytics. Previous work in [12, 13] have demonstrated that the implementation of graph analytic algorithms would also result in information leakage. To mitigate such privacy degradation, the OSN providers could add perturbations (noises) to the outputs of these graph analytics. However, if the OSN providers aim to implement multiple graph analytics, the process for adding perturbations to each output would be rather complicated. Instead, the OSN providers can first obtain the perturbed graph by leveraging LinkMirage and then set up these graph analytics in a privacy-preserving manner.
Here, we first consider the pagerank [35] as an effective graph metric. For the Facebook dataset, we have the average differences between the perturbed pagerank score and the original pagerank score as and for and respectively in LinkMirage. In comparison, the average differences are and for and in the approach of Mittal et al. LinkMirage preserves the pagerank score of the original graph with up to 4x improvement over previous methods. Next, we show the modularity [33] (computed by the timestamp in the Google+ dataset and the Facebook dataset, respectively) in Table III. We can see that LinkMirage preserves both the pagerank score and the modularity of the original graph, while the method of Mittal et al. degrades such graph analytics especially for larger perturbation parameter (recall the visual intuition of LinkMirage in Fig. 3).

V-E Summary for Applications of LinkMirage

  1. LinkMirage preserves the privacy of users’ social contacts while enabling the design of social relationships based applications. Compared to previous methods, LinkMirage results in significantly lower attack probabilities (with a factor up to 2) when applied to anonymous communications and higher resilience to de-anonymization attacks (with a factor up to 10) when applied to vertex anonymity systems.

  2. LinkMirage even surprisingly improves the Sybil detection performance when applied to the distributed SybilLimit systems.

  3. LinkMirage preserves the utility performance for multiple graph analytics applications, such as pagerank score and modularity with up to 4x improvement.

Vi Utility Analysis

Following the application analysis in Section V, we aim to develop a general metric to characterize the utility of the perturbed graph topologies. Furthermore, we theoretically analyze the lower bound on utility for LinkMirage, uncover connections between our utility metric and structural properties of the graph sequence, and experimentally analyze our metric using the real-world Google+ and Facebook datasets.

Vi-a Metrics

We aim to formally quantify the utility provided by LinkMirage to encompass a broader range of applications. One intuitive global utility metric is the degree of vertices. It is interesting to find that the expected degree of each node in the perturbed graph is the same as the original degree and we defer the proof to Appendix to improve readability.

Theorem 4

The expected degree of each node after perturbation by LinkMirage is the same as in the original graph: , where denotes the degree of vertex in .

Clustering Coefficient Assortativity Coefficient
Original Graph 0.2612 -0.0152
LinkMirage 0.2263 -0.0185
LinkMirage 0.1829 -0.0176
LinkMirage 0.0864 -0.0092
LinkMirage 0.0136 -0.0063
TABLE IV: Graph Metrics of the Original and the Perturbed Graphs for the Google+ Dataset.

To understand the utility in a fine-grained level, we further define our utility metric as

Definition 4

The Utility Distance (UD) of a perturbed graph sequence with respect to the original graph sequence , and an application parameter is defined as

(5)
Fig. 10: (a), (b) show the utility distances using the Google+ dataset and the Facebook dataset, respectively. Larger perturbation parameter results in larger utility distance. Larger application parameter decreases the distance, which shows the effectiveness of LinkMirage in preserving global community structures.

Our definition for utility distance in Eq. 5 is intuitively reasonable for a broad class of real-world applications, and captures the behavioral differences of -hop random walks between the original graphs and the perturbed graphs. We note that random walks are closely linked to the structural properties of social networks. In fact, a lot of social network based security applications such as Sybil defenses [43] and anonymity systems [30] directly perform random walks in their protocols. The parameter is application specific; for applications that require access to fine grained local structures, such as recommendation systems [2], the value of should be small. For other applications that utilize coarse and macro structure of the social graphs, such as Sybil defense mechanisms, can be set to a larger value (typically around 10 in [43]). Therefore, this utility metric can quantify the utility performance of LinkMirage for various applications in a general manner.
Note that LinkMirage is not limited to only preserving the community structure of the original graphs. We evaluate two representative graph theoretic metrics clustering coefficient and assortativity coefficient [14] as listed in Table IV. We can see that LinkMirage well preserves such fine-grained structural properties for smaller perturbation parameter . Therefore, the extent to which the utility properties are preserved depends on the perturbation parameter .

Vi-B Relationships with Other Graph Structural Properties

The mixing time measures the time required for the Markov chain to converge to its stationary distribution, and is defined as . Based on the Perron-Frobenius theory, we denote the eigenvalues of as . The convergence rate of the Markov chain to is determined by the second largest eigenvalue modulus (SLEM) as .
Since our utility distance is defined by using the transition probability matrix , this metric can be proved to be closely related to structural properties of the graphs, as shown in Theorem 5 and Theorem 6.

Theorem 5

Let us denote the utility distance between the perturbed graph and the original graph by , then we have .

Theorem 6

Let us denote the second largest eigenvalue modulus (SLEM) of transition probability matrix of graph as . We can bound the SLEM of a perturbed graph using the mixing time of the original graph, and the utility distance between the graphs as .

Vi-C Upper Bound of Utility Distance

LinkMirage aims to limit the degradation of link privacy over time. Usually, mechanisms that preserve privacy trade-off application utility. In the following, we will theoretically derive an upper bound on the utility distance for our algorithm. This corresponds to a lower bound on utility that LinkMirage is guaranteed to provide.

Theorem 7

The utility distance of LinkMirage is upper bounded by times the sum of the utility distance of each community and the ratio cut for each , i.e.

(6)

where denotes the number of inter-community links over the number of vertices, and each community within satisfies . We defer the proofs to the Appendix to improve readability.
Note that an upper bound on utility distance corresponds to a lower bound on utility of our algorithm. While better privacy usually requires adding more noise to the original sequence to obtain the perturbed sequence, thus we can see that LinkMirage is guaranteed to provide a minimum level of utility performance.
In the derivation process, we do not take specific evolutionary pattern such as the overlapped ratio into consideration, therefore our theoretical upper bound is rather loose. Next, we will show that in practice, LinkMirage achieves smaller utility distance (higher utility) than the baseline approach of independent static perturbations.

Vi-D Utility Experiments Analysis

Fig. 10(a)(b) depict the utility distance for the Google+ and the Facebook graph sequences, for varying perturbation degree and the application level parameter . We can also see that as increases, the distance metric increases. This is natural since additional noise increase the distance between probability distributions computed from the original and the perturbed graph series. As the application parameter increases, the distance metric decreases. This illustrates that LinkMirage is more suited for security applications that rely on macro structures, as opposed to applications that require exact information about one or two hop neighborhoods. Furthermore, our experimental results in Figure 8 and Table III also demonstrate the utility advantage of our LinkMirage over the approach of Mittal et al. [29] in real world applications.

Vii Related Work

Privacy with labeled vertices An important thread of research aims to preserve link privacy between labeled vertices by obfuscating the edges, i.e., by adding /deleting edges [16, 29, 42]. These methods aim to randomize the structure of the social graph, while differing in the manner of adding noise. Hay et al. [16] perturb the graph by applying a sequence of edge deletions and edge insertions. The deleted edges are uniformly selected from the existing edges in the original graph while the added edges are uniformly selected from the non-existing edges. However, neither the edge deletions nor edge insertions take any structural properties of the graph into consideration. Ying and Wu [42] proposed a new perturbation method for preserving spectral properties, without analyzing its privacy performance.
Mittal et al. proposed a perturbation method in [29], which serves as the foundation for our algorithm. Their method deletes all edges in the original graph, and replaces each edge with a fake edge that is sampled based on the structural properties of the graph. In particular, random walks are performed on the original graph to sample fake edges. As compared to the methods of Hay et al. [16] and Mittal et al. [29], LinkMirage provides up to 3x privacy improvement for static social graphs and up to 10x privacy improvement for dynamic social graphs.
Another line of research aims to preserve link privacy [15] [44] by aggregating the vertices and edges into super vertices. Therefore, the privacy of links within each super vertex is naturally protected. However, such approaches do not permit fine grained utilization of graph properties, making it difficult to be applied to applications such as social network based anonymous communication and Sybil defenses.
Privacy with unlabeled vertices While the focus of our paper is on preserving link privacy in context of labeled vertices, an orthogonal line of research aims to provide privacy in the context of unlabeled vertices (vertex privacy)  [27, 36, 4]. Liu et al. [27] proposed -anonymity to anonymize unlabeled vertices by placing at least vertices at an equivalent level. Differential privacy provides a theoretical framework for perturbing aggregate information, and Sala et al. [36] leveraged differential privacy to privately publish social graphs with unlabeled vertices. We note that LinkMirage can also provide a foundation for preserving vertex privacy as stated in Section V-B. Shokri et al. [37] addresses the privacy-utility trade-off by using game theory, which does not consider the temporal scenario.
We further consider anonymity in temporal graphs with unlabeled vertices. The time series data should be seriously considered, since the adversaries can combine multiple published graph to launch enhanced attacks for inferring more information. [39, 10, 5] explored privacy degradation in vertex privacy schemes due to the release of multiple graph snapshots. These observations motivate our work, even though we focus on labeled vertices.
De-anonymization In recent years, the security community has proposed a number of sophisticated attacks for de-anonymizing social graphs [32, 38, 34, 20]. While most of these attacks are not applicable to link privacy mechanisms (their focus is on vertex privacy), they illustrate the importance of considering adversaries with prior information about the social graph666Burattin et al [6] exploited inadvertent information leaks via Facebook’s graph API to de-anonymize social links; Facebook’s new graph API (v2.0) features stringent privacy controls as a countermeasure.. We perform a rigorous privacy analysis of LinkMirage (Section IV) by considering a worst-case (strongest) adversary that knows the entire social graph except one link, and show that even such an adversary is limited in its inference capability.

Viii Discussion

Privacy Utility Tradeoffs: LinkMirage mediates privacy-preserving access to users’ social relationships. In our privacy analysis, we consider the worst-case adversary who knows the entire social link information except one link, which conservatively demonstrates the superiority of our algorithm over the state-of-the-art approaches. LinkMirage benefits many applications that depend on graph-theoretic properties of the social graph (as opposed to the exact set of edges). This also includes recommendation systems and E-commerce applications.
Broad Applicability: While our theoretical analysis of LinkMirage relies on undirected links, the obfuscation algorithm itself can be generally applied to directed social networks. Furthermore, our underlying techniques have broad applicability to domains beyond social networks, including communication networks and web graphs.

Ix Conclusion

LinkMirage effectively mediates privacy-preserving access to users’ social relationships, since 1) LinkMirage preserves key structural properties in the social topology while anonymizing intra-community and inter-community links; 2) LinkMirage provides rigorous guarantees for the anti-inference privacy, indistinguishability and anti-aggregation privacy, in order to defend against sophisticated threat models for both static and temporal graph topologies; 3) LinkMirage significantly outperforms baseline static techniques in terms of both link privacy and utility, which have been verified both theoretically and experimentally using real-world Facebook dataset (with 870K links) and the large-scale Google+ dataset (with 940M links). LinkMirage enables the deployment of real-world social relationship based applications such as graph analytic, anonymity systems, and Sybil defenses while preserving the privacy of users’ social relationships.

References

  • [1] L. J. Almeida and A. de Andrade Lopes, “An ultra-fast modularity-based graph clustering algorithm,” in EPIA, 2009.
  • [2] R. Andersen, C. Borgs, J. Chayes, U. Feige, A. Flaxman, A. Kalai, V. Mirrokni, and M. Tennenholtz, “Trust-based recommendation systems: an axiomatic approach,” in WWW, 2008.
  • [3] T. Aynaud, E. Fleury, J.-L. Guillaume, and Q. Wang, “Communities in evolving networks: Definitions, detection, and analysis techniques,” in Dynamics On and Of Complex Networks, 2013.
  • [4] F. Beato, M. Conti, and B. Preneel, “Friend in the middle (fim): Tackling de-anonymization in social networks,” in PERCOM, 2013.
  • [5] S. Bhagat, G. Cormode, B. Krishnamurthy, and D. Srivastava, “Privacy in dynamic social networks,” in WWW, 2010.
  • [6] A. Burattin, G. Cascavilla, and M. Conti, “Socialspy: Browsing (supposedly) hidden information in online social networks,” arXiv preprint, 2014.
  • [7] T. M. Cover and J. A. Thomas, Elements of information theory.   John Wiley & Sons, 2012.
  • [8] G. Danezis and P. Mittal, “Sybilinfer: Detecting sybil nodes using social networks.” in NDSS, 2009.
  • [9] R. Dey, Z. Jelveh, and K. Ross, “Facebook users have become much more private: a large scale study,” in IEEE SESOC, 2012.
  • [10] X. Ding, L. Zhang, Z. Wan, and M. Gu, “De-anonymizing dynamic social networks,” in GLOBECOM, 2011.
  • [11] R. Dingledine, N. Mathewson, and P. Syverson, “Tor: The second-generation onion router,” in USENIX Security, 2004.
  • [12] C. Dwork, “Differential privacy,” in Automata, languages and programming.   Springer, 2006.
  • [13] C. Dwork and J. Lei, “Differential privacy and robust statistics,” in STOC, 2009.
  • [14] N. Z. Gong, W. Xu, L. Huang, P. Mittal, E. Stefanov, V. Sekar, and D. Song, “Evolution of social-attribute networks: measurements, modeling, and implications using google+,” in IMC, 2012.
  • [15] M. Hay, G. Miklau, D. Jensen, D. Towsley, and P. Weis, “Resisting structural re-identification in anonymized social networks,” Proceedings of the VLDB Endowment, 2008.
  • [16] M. Hay, G. Miklau, D. Jensen, P. Weis, and S. Srivastava, “Anonymizing social networks,” Technical Report, University of Massachusetts, Amherst, 2007.
  • [17] H. Hodson, “Google software lets you get online via your friends,” New Scientist, 2013.
  • [18] R. A. Horn and C. R. Johnson, Matrix analysis.   Cambridge university press, 2012.
  • [19] S. Ji, W. Li, P. Mittal, X. Hu, and R. Beyah, “Secgraph: A uniform and open-source evaluation system for graph data anonymization and de-anonymization,” in Usenix Security, 2013.
  • [20] S. Ji, W. Li, M. Srivatsa, and R. Beyah, “Structural data de-anonymization: Quantification, practice, and implications,” in CCS, 2014.
  • [21] Y. Kim and J. Srivastava, “Impact of social influence in e-commerce decision making,” in Proceedings of the ninth international conference on Electronic commerce, 2007.
  • [22] A. Kyrola, G. E. Blelloch, and C. Guestrin, “Graphchi: Large-scale graph computation on just a pc.” in OSDI, 2012.
  • [23] N. Li, W. Qardaji, D. Su, Y. Wu, and W. Yang, “Membership privacy: a unifying framework for privacy definitions,” in CCS, 2013.
  • [24] D. Liben-Nowell and J. Kleinberg, “The link-prediction problem for social networks,” Journal of the American society for information science and technology, 2007.
  • [25] C. Liu, S. Chakraborty, and P. Mittal, “Dependence makes you vulnerable: Differential privacy under dependent tuples,” in NDSS, 2016.
  • [26] C. Liu, P. Gao, M. Wright, and P. Mittal, “Exploiting temporal dynamics in sybil defenses,” in CCS, 2015.
  • [27] K. Liu and E. Terzi, “Towards identity anonymization on graphs,” in SIGMOD, 2008.
  • [28] A. Mislove, A. Post, P. Druschel, and P. K. Gummadi, “Ostra: Leveraging trust to thwart unwanted communication.” in NSDI, 2008.
  • [29] P. Mittal, C. Papamanthou, and D. Song, “Preserving link privacy in social network based systems,” in NDSS, 2013.
  • [30] P. Mittal, M. Wright, and N. Borisov, “Pisces: Anonymous communication using social networks,” in NDSS, 2013.
  • [31] S. Nagaraja, “Anonymity in the wild: mixes on unstructured networks,” in PETS, 2007.
  • [32] A. Narayanan and V. Shmatikov, De-anonymizing social networks.   IEEE S&P, 2009.
  • [33] M. E. Newman, “Modularity and community structure in networks,” Proceedings of the National Academy of Sciences, 2006.
  • [34] S. Nilizadeh, A. Kapadia, and Y.-Y. Ahn, “Community-enhanced de-anonymization of online social networks,” in CCS, 2014.
  • [35] L. Page, S. Brin, R. Motwani, and T. Winograd, “The pagerank citation ranking: Bringing order to the web.” 1999.
  • [36] A. Sala, X. Zhao, C. Wilson, H. Zheng, and B. Y. Zhao, “Sharing graphs using differentially private graph models,” in IMC, 2011.
  • [37] R. Shokri, “Privacy games: Optimal user-centric data obfuscation,” in PETS, 2015.
  • [38] M. Srivatsa and M. Hicks, “Deanonymizing mobility traces: Using social network as a side-channel,” in CCS, 2012.
  • [39] C.-H. Tai, P.-J. Tseng, P. S. Yu, and M.-S. Chen, “Identities anonymization in dynamic social networks,” in ICDE, 2011.
  • [40] B. Viswanath, A. Mislove, M. Cha, and K. P. Gummadi, “On the evolution of user interaction in facebook,” in WOSN, 2009.
  • [41] M. Wright, M. Adler, B. N. Levine, and C. Shields, Defending Anonymous Communications Against Passive Logging Attacks.   IEEE S&P, 2003.
  • [42] X. Ying and X. Wu, “Randomizing social networks: a spectrum preserving approach,” in SDM, 2008.
  • [43] H. Yu, P. B. Gibbons, M. Kaminsky, and F. Xiao, Sybillimit: A near-optimal social network defense against sybil attacks.   IEEE S&P, 2008.
  • [44] E. Zheleva and L. Getoor, “Preserving the privacy of sensitive relationships in graph data,” in PinKDD, 2008.
  • [45] B. Zhou, J. Pei, and W. Luk, “A brief survey on anonymization techniques for privacy preserving publishing of social network data,” ACM SIGKDD Explorations Newsletter, 2008.

Appendix A Appendix

A. Proof of the Upper Bound of Anti-aggregation Privacy B. Relationships with Differential Privacy When considering differential privacy for a time series of graph sequence , we have . For a good privacy performance, we need . Since the probability of given as , it is easy to see that if the condition for a good privacy performance holds, we have , which is the same as in Definition 1 and means that the posterior probability is similar to the prior probability, i.e., the adversary is bounded in the information it can learn from the perturbed graphs.
C. Proof of Theorem 4: Expectation of Perturbed Degree According to Theorem 3 in [29], we have , where denotes the degree of after perturbation within community. Then we consider the random perturbation for inter-community subgraphs. Since the probability for an edge to be chosen is , the expected degree after inter-community perturbation satisfies . Combining with the expectations under static scenario, we have
D. Proof of the Upper Bound for the Utility Distance We first introduce some notations and concepts. We consider two perturbation methods in the derivation process below. The first method is our dynamic perturbation method, which takes the graph evolution into consideration. The second method is the intermediate method, where we only implement dynamic clustering without selective perturbation. That is to say, we cluster , then perturb each community by the static method and each inter-community subgraphs by randomly connecting the marginal nodes, independently. We denote the perturbed graphs corresponding to the dynamic, the intermediate method by respectively. Similarly, we denote the perturbed TPM for the two approaches by . To simplify the derivation process, we partition the proof into two stages. In the first stage, we derive the UD upper bound for the intermediate perturbation method. In the second stage, we derive the relationship between and . Results from the two stages can be combined to find the upper bound for the utility distance of LinkMirage. Denoting the communities as and the inter-community subgraphs as , we have

(7)