Deep Adversarial Network Alignment
Abstract
Network alignment, in general, seeks to discover the hidden underlying correspondence between nodes across two (or more) networks when given their network structure. However, most existing network alignment methods have added assumptions of additional constraints to guide the alignment, such as having a set of seed nodenode correspondences across the networks or the existence of sideinformation. Instead, we seek to develop a general network alignment algorithm that makes no additional assumptions. Recently, network embedding has proven effective in many network analysis tasks, but embeddings of different networks are not aligned. Thus, we present our Deep Adversarial Network Alignment (DANA) framework that first uses deep adversarial learning to discover complex mappings for aligning the embedding distributions of the two networks. Then, using our learned mapping functions, DANA performs an efficient nearest neighbor node alignment. We perform experiments on real world datasets to show the effectiveness of our framework for first aligning the graph embedding distributions and then discovering node alignments that outperform existing methods.
1 Introduction
In today’s world, networks are arising almost everywhere from social to biological networks. This has caused an increased attention in the domain of network analysis. However, most efforts have primarily focused on single network problems such as link prediction [\citeauthoryearLibenNowell and Kleinberg2007] and community detection [\citeauthoryearFortunato2010], but many problems inherently are only defined when having multiple networks, such as the network alignment problem. In general, network alignment aims to discover a set of node pairs across two (or more) networks that we assume inherently have a correspondence between their nodes. The majority of existing network alignment algorithms assume additional constraints to guide the alignment process such as a onetoone mapping between the two networks [\citeauthoryearZhang and Philip2015], some seed nodenode correspondences (i.e., supervised) [\citeauthoryearMu et al.2016], sparsity in the possible alignments [\citeauthoryearBayati et al.2013], and the existence of sideinformation (e.g., node/edge attributes) [\citeauthoryearZhang and Tong2016]. However, inherently these constraints limit the applications of these methods as in many cases these constraints are not available due to many reasons such as data privacy. Thus, this leaves the desire for an advanced algorithm that is both unsupervised and assuming no sideinformation, which brings in tremendous challenges.
Without additional constraints, one key challenge to build network alignment algorithms is the vast number of possible permutations of the node orderings to align nodes from one network to another [\citeauthoryearHeimann et al.2018]. Previous works have focused on utilizing the adjacency matrix, or more recently, also leveraging spectral graph theory and the Laplacian matrix representations [\citeauthoryearNassar et al.2018, \citeauthoryearHayhoe et al.2018]. In these formulations, the main idea is to discover the optimal permutation to map one network’s matrix representation to that of the other with minimal variation between them. Various metrics have been defined to measure the similarity between these matrices during the optimization process [\citeauthoryearGuzzi and Milenković2017, \citeauthoryearAflalo et al.2015]. Inherently the use of the adjacency matrix is not scalable. Recently though, the field of network embedding, which in general seeks to discover a low dimensional representation of the nodes in a network, has seen amazingly fast development with advanced methods providing huge improvement over purely spectral based methods for single network tasks [\citeauthoryearGrover and Leskovec2016]. This is primarily due to the condensed, space efficient, and even richer low dimensional representations for the nodes of a network. However, these network embedding methods are optimized separately for different graphs. In other words, embeddings of nodes from two networks are not aligned. Thus, directly applying network embedding to advance network alignment still is immensely challenging.
Meanwhile, there have been adversarial based methods [\citeauthoryearGoodfellow et al.2014, \citeauthoryearIsola et al.2017, \citeauthoryearYu et al.2017, \citeauthoryearWang et al.2017] that harness the power of deep learning for solving a variety of unsupervised problems by using a minimax game between a generator and a discriminator. In these adversarial based methods, the generator is trained to attempt at “fooling” the discriminator that it is generating “real” (and not “generated”) examples while the discriminator is also trained to get better at differentiating between the “real” and “generated” examples. This process allows for an unsupervised way of learning a generator that can generate examples that seemingly come from the same distribution of the real data. These adversarial techniques have shown to be useful in a plethora of domains including computer vision [\citeauthoryearIsola et al.2017], natural language processing [\citeauthoryearYu et al.2017], and recommendation [\citeauthoryearWang et al.2017].
On the one hand, network embedding algorithms have been proven to be effective in learning representations for nodes, but embeddings for two networks are learned separately, which are not aligned. On the other hand, adversarial techniques are powerful in learning real data distributions. Thus, in this work, we propose to harness the power of network embedding and adversarial techniques to tackle the challenging network alignment problem without additional constraints or knowledge outside of the network structure. The rationale is that we can align the node representations of two networks by taking advantage of adversarial techniques. More specifically, the proposed novel Deep Adversarial Network Alignment (DANA) framework is composed of two stages – one graph distribution alignment stage and one node alignment stage. In the graph distribution alignment stage, we utilize deep neural networks in an adversarial framework that is able to learn a highly complex mapping from one network’s embedding space to that of the other such that the mapped embedding approximates the data distribution of the other network’s original embedding. In the node alignment stage, we align individual nodes from two networks by using the mapping functions learned from the graph distribution alignment stage. Our main contributions are as follows:

We propose a novel unsupervised Deep Adversarial Network Alignment (DANA) framework that utilizes the power of both network embedding and adversarial training techniques to align the embedding distributions and then perform an efficient node alignment thereafter;

We provide an unsupervised heuristic to perform model selection for DANA, which also simultaneously shows the effectiveness of aligning the embedding distributions; and

Experimental results on various datasets show the superiority of DANA against numerous advanced baselines.
2 Problem Definition
In this section, we introduce the basic notations and problem definition. First, we let and be two undirected networks with and being sets of and vertices, and edge sets and for networks and , respectively.
Now, with the aforementioned notations, we formally define the network alignment problem we want to study in this work as follows:
Given two networks and , and under the assumption that there is an underlying correspondence between the vertices and , we seek to discover a set of vertex alignment pairs defined as:
(1) 
where for each vertex in we predict a single corresponding vertex in , such that together these pairwise node alignments follows a global network alignment.
Actually we will solve this problem bidirectionally to align the nodes of to and vice versa. Furthermore, we stress that in our unsupervised setting, we do not have any known nodenode labeled correspondences nor any sideinformation (such as node/edge attributes). Instead, our proposed framework only requires the network structures, but could be extended to embrace such additional information (later discussed as future work).
3 Deep Adversarial Network Alignment Framework
In this section we introduce our proposed framework, Deep Adversarial Network Alignment (DANA), for the network alignment problem discussed in Section 2. First, we will provide an overview of how the framework is utilized to solve the network alignment problem by first aligning the embedding distributions and then aligning the nodes. Next, we will discuss in detail both of these key stages of our proposed framework. Thereafter we summarize with an algorithmic overview of our framework and also provide an analysis on the complexity of DANA.
As previously mentioned, network embedding algorithms have been proven to effectively learn node representations, but embeddings for separate networks are not aligned. Thus, we first obtain node embeddings and then use an adversarial based method to correctly learn a complex (and even nonlinear) mapping to simultaneously align the two networks embedding distributions. In this way the mapped embedding from the first network approximately follows the distribution of the other. Then, once the distributions have been aligned the second stage uses an efficient nearest neighbor search to match/align the individual nodes by using the mapping functions obtained through the adversarial learning.
3.1 Adversarial Graph Distribution Alignment
In this section, we introduce the first stage of DANA, namely the distribution alignment whose model is illustrated in Figure 1. In a nutshell, this model aligns two graphs and bidirectionally using two connected adversarial networks. The reason for connecting them is to ensure no “collapse” [\citeauthoryearZhu et al.2017] and utilize transitivity [\citeauthoryearZhou et al.2016] to regularize and prevent random alignments of the distributions which could be possible due to using complex (and even nonlinear) mappings between them.
In Figure 1, we can observe that when given two networks and the first step is to obtain graph embeddings for each of these networks. This can be obtained using one of the plethora of available methodologies, such as node2vec [\citeauthoryearGrover and Leskovec2016]. Then, our goal is to find a mapping between node embeddings for and for whose distributions are denoted as and , respectively (since we are ultimately looking to align the two graph distributions). As demonstrated in Figure 1, the model contains two generators mapping and mapping . Moreover, the discriminator distinguishes between real embeddings of graph and those generated from the real embedding of through (i.e., and , respectively). Likewise, the discriminator distinguishes between real embeddings of and those generated through from the real embedding of .
Both bidirectional mappings are optimized via applying adversarial losses [\citeauthoryearGoodfellow et al.2014]. More precisely, the loss function associated with aligning graph to (i.e., the mapping ) is as follows:
(2)  
where the generator is trying to mimic the embedding distribution of by generating embeddings (by minimizing Eq. (2)) while simultaneously the discriminator is attempting to differentiate between and (by maximizing Eq. (2)). Similarly, the loss function of aligning graph to (i.e., ) is as follows:
(3)  
where in this situation the minimax game is instead between and .
By separately optimizing the loss functions in Eq. (2) and Eq. (3) (i.e., of learning the mappings and ), we might expect to learn an alignment between the embeddings of to , and vice versa. However, in practice, during the training each of the separate models can map one real embedding distribution to some random embeddings in the target domain (or even collapse). More specifically, especially when nonlinearity is used in the generators, the mapping can project embeddings of graph to some random points in the embedding space of , that although might have a similar distribution, might also have completely distorted the proximity information between neighboring nodes that was originally preserved in . In other domains, such as wordtoword translation, adversarial techniques resorted to using only a single directional linear mapping [\citeauthoryearLample et al.2018] that could avoid these problems, but limited the complexity and power of nonlinearity in their translation/alignment.
To prevent these problems and still potentially use the power of a nonlinear generator mapping function for network alignment, we introduce the cycle consistency loss similar to [\citeauthoryearZhu et al.2017], which had been used for imagetoimage translation. More specifically, for a node embedding of node , the learned generators and should be able to recover and bring back to the embedding space of as follows:
(4) 
and similarly for being able to recover the embeddings of . Intuitively, if forcing these cyclic mappings, then this would help to prevent both the “collapse” and random alignment problem previously mentioned. Hence, we incorporate the following cyclereconstruction loss into our objective:
(5)  
This leads to the graph level embedding distribution alignment to optimize the following overall loss function:
(6)  
where is a hyperparameter controlling the balance between ensuring a close aligning of the graph level embedding distributions and the cycle consistency loss.
3.2 Nearest Neighbor Node Alignment
The second stage of DANA is to efficiently discover the node alignments, which are based on using the discovered complex mapping functions (i.e., the generators) from the first stage of DANA. In this subsection, we will discuss the efficient nearest neighbor greedy node alignment method from to , where we can then similarly perform to .
As seen in Figure 2, the first step is to take the node embeddings and map them to the embedding space of through the use of the trained generator . Next these projected node embeddings of are paired with their nearest neighbor from based on Euclidean distance. To perform the nearest neighbor search, we utilize a kd tree, which is a data structure used for performing a fast and efficient search. [\citeauthoryearAbbasifard et al.2014].
3.3 Algorithmic Overview and Complexity Analysis
Here we discuss an algorithmic overview of DANA along with the computational complexity. Algorithm 1 summarizes the entire framework including the major steps– namely obtaining network embeddings (line 2), training the unsupervised adversarial based graph distribution alignment (lines 411), and performing nearest neighbor node alignment (lines 1315). Next we discuss some details and the complexity of DANA. Note that we denote , and similarly define , where and .
First, we obtain the embeddings for networks and . In this work, we utilize a network embedding method (more specifically node2vec [\citeauthoryearGrover and Leskovec2016]) whose complexity is and for networks and , respectively, resulting in overall . Note that DANA could use attributed embedding methods to incorporate sideinformation, but we leave this as future work.
Next, we train the adversarial based graph distribution alignment. Suppose that the algorithm is run for some constant number of epochs, , where each epoch iterates through all the nodes (for both graphs) by randomly creating minibatches that perform the forward step, backpropagating error and also updating the parameters using stochastic gradient descent (SGD). Note that this depends on the architectures for the generators and discriminators. However, as later discussed in Section 4, if we use small reasonable constant size hidden layers, then the computation is also constant for each minibatch. Thus, due to the fact we run epochs, the complexity of the graph distribution alignment is .
As seen in Algorithm 1 (lines 1315), for the final step we actually perform the node alignment bidirectionally and select the one that has the lower average nearest neighbor distance. First, we build the kd tree based on the embeddings , which takes and then we need to search for the nearest neighbor for all using their mapped representation . The search in the worse case for each is , but in the average case. Thus, since we perform this search for all nodes in , the total expected time is for the alignment of onto . Then we similarly do the alignment of onto , resulting in the total expected time . This leads to the subquadratic total time complexity of DANA to be when ignoring the linear/constant terms.
4 Experiments
To evaluate the effectiveness of the proposed Deep Adversarial Network Alignment framework, we conduct a set of experiments for aligning real world networks. Through the conducted experiments, we seek to answer the following two questions: (1) Can DANA align network embeddings? and (2) How effective is DANA at accurately discovering the true underlying node alignment?
4.1 Experimental Setup
Here we will discuss the datasets and how we utilize them for our experiments, the architecture details for DANA, our proposed unsupervised heuristic for model selection, and baseline methods.
4.1.1 Datasets with Ground Truth Correspondence
The first two datasets we collected are BitcoinAlpha^{1}^{1}1http://www.btcalpha.com and BitcoinOTC^{2}^{2}2http://www.bitcoinotc.com. These networks are online marketplaces that allow users to buy and sell things using Bitcoins. Users create positive (or negative) links to those they trust (or distrust). Furthermore, most users have provided their unique Bitcoin fingerprints, thus allowing us to determine a groundtruth mapping of users across networks, which we use to evaluate the alignments. Note that we construct two undirected dataset variants, the first being networks that only include positive links (i.e., BitcoinA and BitcoinO), and the second that also include the negative links (i.e., BitcoinAn and BitcoinOn). Some basic statistics can be found in Table 1.
4.1.2 Datasets with PseudoGround Truth Correspondence
Here we collected real world datasets, namely CollegeMsg^{3}^{3}3http://snap.stanford.edu/data/, Hamsterster4, and Blogs^{4}^{4}4http://konect.unikoblenz.de/. We present the basic undirected network statistics in Table 1. Note that we have chosen these datasets to be used in a synthetic network alignment setting (although themselves are real world networks), where we will let to be the original dataset, while constructing a permuted version as and simultaneously adding some noise in the set {5,10,20}% to at random to evaluate the performance and robustness of DANA.
BitcoinA BitcoinO (pos)  BitcoinAn BitcoinOn (pos&neg)  CollegeMsg  Hamsterster  Blogs  
3682  3783  1899  2426  1224  
3819  3914        
3591  3682        
25952  28288  13754  16613  16718  
28321  30691        
24066  26004       
4.1.3 DANA Architecture
First, for the graph embeddings, we utilize node2vec [\citeauthoryearGrover and Leskovec2016] to obtain node embeddings of size 64. We note that both the generator and discriminator for DANA can be constructed in various ways. For the discriminator, based on knowledge from other domains when using adversarial based frameworks, we use a two layer fully connected network with 512 hidden units and Leaky ReLU (Rectified Linear Unit) [\citeauthoryearMaas et al.2013]. For the generators, we attempted using both a linear and nonlinear single layer mapping (i.e., two possible variants). For the adversarial learning, we varied in {1, 10, 100}, since was recommended in [\citeauthoryearZhu et al.2017]. We let denote the number of times we update the generators before updating the discriminators during the alternative updating and use the ADAM optimizer [\citeauthoryearKingma and Ba2014].
In the above we mentioned three hyperparameters that we would need to choose between, thus, we propose to use the unsupervised heuristic of how well DANA aligns the distributions as a metric to select the best parameters. We assume the better the embedding distributions match, the better the performance in aligning the individual nodes. Note that this does not utilize the groundtruth alignments, but rather simply measures the average distance each mapped node is to the nearest neighbor in the other embedded space.
4.1.4 Baselines
Here we introduce the set of baselines we will compare against our proposed Deep Adversarial Network Alignment (DANA) method. Isorank [\citeauthoryearSingh et al.2008]/FINAL [\citeauthoryearZhang and Tong2016] is a network alignment algorithm that was designed specifically for proteinprotein interaction network alignments and we note that FINAL is an attributed network alignment methods that is equivalent to IsoRank when having no edge/node attributes [\citeauthoryearZhang and Tong2016]. EigenAlignLR [\citeauthoryearNassar et al.2018] is a low rank extension of the EigenAlign [\citeauthoryearFeizi et al.2016] spectral based network alignment method. REGAL [\citeauthoryearHeimann et al.2018] constructs their own network embedding (while REGALs2v [\citeauthoryearHeimann et al.2018] instead uses struc2vec [\citeauthoryearRibeiro et al.2017]) and then uses a nearest neighbor search for node alignments. We also include two sparse network alignment methods, namely SparseIsoRank [\citeauthoryearBayati et al.2013] (a sparse network alignment variation of IsoRank [\citeauthoryearSingh et al.2008]) and NetAlignBP [\citeauthoryearBayati et al.2013] (which uses belief propagation to construct the alignment), where both use information for limiting the scope of possible alignments.
We note that SparseIsoRank and NetAlignBP both assume additional information to suggest to their methods which node alignments are possible, but our problem setting does not have such additional information. Therefore we heuristically provide them information from the network structure. More specifically, node degree similarity from and is used to provide each node in the two networks a subset of possible nodes to pair with (as also done in [\citeauthoryearHeimann et al.2018]). Here we try using and for the the set size containing the most similar nodes in terms of absolute difference in degree. We note that for all methods we use the default settings provided by the authors, but REGAL was unable to run on networks of different sizes, thus we only report their performance for the CollegeMsg, Blogs, and Hamsterster datasets.
4.2 Experimental Results
Here we present the results of our experiments in network alignment with both our known ground truth and pseudoground truth datasets. Although there are multiple ways of evaluating the performance of network alignment methods [\citeauthoryearDouglas et al.2018], we report the accuracy, which is by far the most commonly used.
In Figure 3(a), we have plotted the mean nearest neighbor distances against the accuracy while taking snapshots during DANA’s optimization of the adversarial graph distribution alignment of the BitcoinAn and BitcoinOn datasets. We can observe that as DANA better aligns the two distributions (i.e., lower mean nearest neighbor distance) the accuracy is also improving. In Figure 3(a), the star represents the least mean nearest neighbor distance and we can observe it nearly has the best accuracy. We observe the same trend across all hyperparameter settings and thus our unsupervised heuristic of using the lowest mean nearest neighbor distance for model selection works quite well in practice. To further show the effectiveness of DANA in aligning the graph embedding distributions, we show a visualization using principle component analysis (PCA) [\citeauthoryearJolliffe2011] in Figure 3(b) where “Start” refers to the initial random alignment and “End” shows the final graph distribution alignment that DANA adversarially learns. Therefore, based on Figure 3, we can conclude an answer for our first question, that DANA can indeed learn to effectively align the embeddings of two networks.
In Table 2, we ran DANA and the baselines (except for REGAL, since their implementation could not handle networks having different sizes) on the Bitcoin datasets. The first observation is that the EigenAlignLR, SparseIsoRank, and Isorank/FIANL all have significantly less performance than NetAlignBP, REGALs2v, and DANA. It would seem that NetAlignBP is able to effectively use the pseudo sideinformation (based on node degree similarity) we provided. Also, REGALs2v (which uses struc2vec embeddings) is able to out perform EigenAlignLR (which is spectral based). However, we also note the comparison of REGALs2v, which directly performs a node alignment on embeddings from struc2vec, against DANA, which uses adversarial learning to correctly align network embeddings before performing the node alignment. We can clearly see that DANA significantly out performs REGALs2v and all other baseline methods.
Methods  : BitcoinA : BitcoinO  : BitcoinAn : BitcoinOn 
SparseIsoRank  0.046  0.047 
NetAlignBP  0.157  0.141 
IsoRank/FINAL  0.041  0.040 
EigenAlignLR  0.015  0.016 
REGALs2v  0.124  0.089 
DANA  0.542  0.511 
Next, in Figure 4, we present the results for the pseudoground truth datasets where we have performed the network alignment experiments for the CollegeMsg, Hamsterster, and Blogs datasets. In these experiments we first permuted the nodes of the original network and then removed a portion of the edges (i.e., level of noise) and attempted to align back to the original network. We can observe that similar to the two Bitcoin experiments, DANA is able to outperform all existing baseline methods for all three datasets across all levels of noise. We also observe that as more edges are removed, it becomes harder to align back to the original network where all the baselines almost completely fail to align at 20%, but yet DANA is able to still maintain a reasonable alignment. It can also be seen that DANA and REGAL (also REGALs2v in many cases) outperform the other methods, which suggests again that embedding based approaches are superior. However, as previously mentioned, REGALs2v does not perform any alignment of the embeddings before performing the node alignment, and REGAL performs a similaritybased embedding alignment through a shallow matrix factorization method, but neither harness deep learning or adversarial training. Furthermore, while EigenAlignLR uses spectral based embeddings, DANA is able to harness more advanced network embedding methods, while leading to better performance across all levels of noise. This seems natural due to the fact that current network embedding methods have shown superiority over the classical spectral embeddings for a variety of network analysis tasks [\citeauthoryearGrover and Leskovec2016]. Thus, based on these experiments we have answered the second question, DANA is indeed effective at aligning the corresponding nodes across networks, which is due to the fact it harnesses the power of deep adversarial learning to correctly align the embeddings of two networks.
5 Related Work
Network alignment is a fundamental network analysis task having many real world applications in useridentity linkage [\citeauthoryearLiu et al.2013], computer vision [\citeauthoryearConte et al.2004], and bioinformatics [\citeauthoryearSingh et al.2008]. Classical network alignment methodologies typically were based around optimizing a permutation matrix to align the matrix representations. However, some methods have introduced relaxations such as convex or finding a doubly stochastic matrix instead of finding a permutation matrix [\citeauthoryearAflalo et al.2015].
Another set of related network alignment problems are those that are supervised. Some representative examples also learn network embeddings, but use known nodenode pairs to align in a shared embedding space [\citeauthoryearTan et al.2014]. Also, there is the sparse network alignment problem [\citeauthoryearBayati et al.2013] where the general alignment problem is simplified to restrict the possible alignments between nodes. In other words, a bipartite graph is constructed from the two networks being aligned, but rather than having number of possible matching pairs, they instead have a limited set to prevent certain pairs, which could be from domain specific knowledge or network structure. Some other specialized formulations can be found for heterogeneous networks [\citeauthoryearKong et al.2013] and attributed graphs [\citeauthoryearZhang and Tong2016].
6 Conclusion
In this work, we proposed our Deep Adversarial Network Alignment (DANA) framework to solve the general network alignment problem when only provided the network structure and assuming no additional constraints. More specifically, DANA harnesses the power of adversarial learning to align the graph embedding distributions and then thereafter performs an efficient nearest neighbor node alignment. Furthermore, we present an unsupervised heuristic to perform model selection for DANA. Finally extensive experiments were performed to show the effectiveness of both main stages of DANA, while also proving DANA to be superior in performance against existing network alignment methods. Our future work consists of first extending DANA to embrace additional constraints to aid in performing alignments, such as node/edge attributes or assuming a seed set of known nodenode aligned pairs.
References
 [\citeauthoryearAbbasifard et al.2014] Mohammad Reza Abbasifard, Bijan Ghahremani, and Hassan Naderi. Article: A survey on nearest neighbor search methods. IJCA, 2014.
 [\citeauthoryearAflalo et al.2015] Yonathan Aflalo, Alexander Bronstein, and Ron Kimmel. On convex relaxation of graph isomorphism. PNAS, 2015.
 [\citeauthoryearBayati et al.2013] Mohsen Bayati, David F Gleich, Amin Saberi, and Ying Wang. Messagepassing algorithms for sparse network alignment. TKDD, 2013.
 [\citeauthoryearConte et al.2004] Donatello Conte, Pasquale Foggia, Carlo Sansone, and Mario Vento. Thirty years of graph matching in pattern recognition. IJPRAI, 2004.
 [\citeauthoryearDouglas et al.2018] Joel Douglas, Ben Zimmerman, Alexei Kopylov, Jiejun Xu, Daniel Sussman, , and Vince Lyzinski. Metrics for evaluating network alignment. In GTA3 at WSDM, 2018.
 [\citeauthoryearFeizi et al.2016] Soheil Feizi, Gerald Quon, Mariana RecamondeMendoza, Muriel Médard, Manolis Kellis, and Ali Jadbabaie. Spectral alignment of networks. arXiv preprint arXiv:1602.04181, 2016.
 [\citeauthoryearFortunato2010] Santo Fortunato. Community detection in graphs. Physics Reports, 2010.
 [\citeauthoryearGoodfellow et al.2014] Ian Goodfellow, Jean PougetAbadie, Mehdi Mirza, Bing Xu, David WardeFarley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In NIPS. 2014.
 [\citeauthoryearGrover and Leskovec2016] Aditya Grover and Jure Leskovec. Node2vec: Scalable feature learning for networks. SIGKDD, 2016.
 [\citeauthoryearGuzzi and Milenković2017] Pietro Hiram Guzzi and Tijana Milenković. Survey of local and global biological network alignment: the need to reconcile the two sides of the same coin. Briefings in bioinformatics, 2017.
 [\citeauthoryearHayhoe et al.2018] Mikhail Hayhoe, Francisco Barreras, Hamed Hassani, and Victor M Preciado. Spectre: Seedless network alignment via spectral centralities. arXiv preprint arXiv:1811.01056, 2018.
 [\citeauthoryearHeimann et al.2018] Mark Heimann, Haoming Shen, Tara Safavi, and Danai Koutra. Regal: Representation learningbased graph alignment. In CIKM, 2018.
 [\citeauthoryearIsola et al.2017] P. Isola, J. Zhu, T. Zhou, and A. A. Efros. Imagetoimage translation with conditional adversarial networks. In CVPR, 2017.
 [\citeauthoryearJolliffe2011] Ian Jolliffe. Principal component analysis. In International encyclopedia of statistical science. 2011.
 [\citeauthoryearKingma and Ba2014] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
 [\citeauthoryearKong et al.2013] Xiangnan Kong, Jiawei Zhang, and Philip S. Yu. Inferring anchor links across multiple heterogeneous social networks. In CIKM, 2013.
 [\citeauthoryearLample et al.2018] Guillaume Lample, Alexis Conneau, Marc’Aurelio Ranzato, Ludovic Denoyer, and Hervé Jégou. Word translation without parallel data. In ICLR, 2018.
 [\citeauthoryearLibenNowell and Kleinberg2007] David LibenNowell and Jon Kleinberg. The linkprediction problem for social networks. JAIST, 2007.
 [\citeauthoryearLiu et al.2013] Jing Liu, Fan Zhang, Xinying Song, YoungIn Song, ChinYew Lin, and HsiaoWuen Hon. What’s in a name?: An unsupervised approach to link users across communities. WSDM, 2013.
 [\citeauthoryearMaas et al.2013] Andrew L Maas, Awni Y Hannun, and Andrew Y Ng. Rectifier nonlinearities improve neural network acoustic models. In DLASLP at ICML, 2013.
 [\citeauthoryearMu et al.2016] Xin Mu, Feida Zhu, EePeng Lim, Jing Xiao, Jianzong Wang, and ZhiHua Zhou. User identity linkage by latent user space modelling. In SIGKDD, 2016.
 [\citeauthoryearNassar et al.2018] Huda Nassar, Nate Veldt, Shahin Mohammadi, Ananth Grama, and David F. Gleich. Low rank spectral network alignment. WWW, 2018.
 [\citeauthoryearRibeiro et al.2017] Leonardo F.R. Ribeiro, Pedro H.P. Saverese, and Daniel R. Figueiredo. Struc2vec: Learning node representations from structural identity. SIGKDD, 2017.
 [\citeauthoryearSingh et al.2008] Rohit Singh, Jinbo Xu, and Bonnie Berger. Global alignment of multiple protein interaction networks with application to functional orthology detection. PNAS, 2008.
 [\citeauthoryearTan et al.2014] Shulong Tan, Ziyu Guan, Deng Cai, Xuzhen Qin, Jiajun Bu, and Chun Chen. Mapping users across networks by manifold alignment on hypergraph. AAAI, 2014.
 [\citeauthoryearWang et al.2017] Jun Wang, Lantao Yu, Weinan Zhang, Yu Gong, Yinghui Xu, Benyou Wang, Peng Zhang, and Dell Zhang. Irgan: A minimax game for unifying generative and discriminative information retrieval models. SIGIR, 2017.
 [\citeauthoryearYu et al.2017] Lantao Yu, Weinan Zhang, Jun Wang, and Yong Yu. Seqgan: Sequence generative adversarial nets with policy gradient. AAAI, 2017.
 [\citeauthoryearZhang and Philip2015] Jiawei Zhang and S Yu Philip. Multiple anonymized social networks alignment. In ICDM, 2015.
 [\citeauthoryearZhang and Tong2016] Si Zhang and Hanghang Tong. Final: Fast attributed network alignment. In SIGKDD, 2016.
 [\citeauthoryearZhou et al.2016] Tinghui Zhou, Philipp Krahenbuhl, Mathieu Aubry, Qixing Huang, and Alexei A Efros. Learning dense correspondence via 3dguided cycle consistency. In CVPR, 2016.
 [\citeauthoryearZhu et al.2017] JunYan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. Unpaired imagetoimage translation using cycleconsistent adversarial networks. In ICCV, 2017.