GAN-Leaks: A Taxonomy of Membership Inference Attacks against GANs
In recent years, the success of deep learning has carried over from discriminative models to generative models. In particular, generative adversarial networks (GANs) have facilitated a new level of performance ranging from media manipulation to dataset re-generation. Despite the success, the potential risks of privacy breach stemming from GANs are less well explored. In this paper, we focus on membership inference attack against GANs that has the potential to reveal information about victim models’ training data. Specifically, we present the first taxonomy of membership inference attacks, which encompasses not only existing attacks but also our novel ones. We also propose the first generic attack model that can be instantiated in various settings according to adversary’s knowledge about the victim model. We complement our systematic analysis of attack vectors with a comprehensive experimental study, that investigates the effectiveness of these attacks w.r.t. model type, training configurations, and attack type across three diverse application scenarios ranging from images, over medical data to location data. We show consistent effectiveness in all the setups, which bridges the assumption gap and performance gap in previous study with a complete spectrum of performance across settings. We conclusively remind users to think over before publicizing any part of their models.
In the past few years, two categories of deep learning techniques have made tremendous achievements over machine learning tasks. The discriminative model has been successfully adopted in various prediction tasks, such as image recognition [krizhevsky2012imagenet, Simonyan15, szegedy2015going, he2016deep], speech recognition [hinton2012deep, graves2013speech], machine translation [bahdanau2014neural, cho2014learning], computer-aided disease diagnosis [amato2013artificial, shin2016deep], etc. The generative model, on the other hand, has also gained increasing attention in recent years and has delivered appealing applications including photorealistic image synthesis [goodfellow2014generative, li2016precomputed, pathak2016context], image editing [gatys2016image, isola2017image, yu2019texture], text generation [vinyals2015show, arora2017generalization], sound generation [oord2016wavenet, mehri2016samplernn], etc. Most of such applications are supported by either the generative adversarial networks (GANs) [goodfellow2014generative, radford2015unsupervised, salimans2016improved, arjovsky2017wasserstein, gulrajani2017improved, karras2018progressive, brock2018large] or Variational AutoEncoder (VAE) [kingma2013auto, rezende2014stochastic, yan2015attribute2image], which decently improve synthesized media quality without leaving obvious artifacts.
In order to resonate the growing trend of deep learning in real business, many IT companies do not only embed deep learning models into their services to better market customers or understand business operations, but also commercialize deep learning itself as a service. They provide users with platforms where users can define their own deep learning tasks and apply the learned models, e.g., Google Prediction API [GooglePredictionAPI], Amazon Machine Learning [AmazonMachineLearning], Microsoft Azure Machine Learning [MicrosoftAzureMachineLearning], and BigML [BigML].
However, with the prevalence of online deep learning APIs, data privacy violation frequently happened due to data misuse with an inappropriate legal basis, e.g., the misuse of National Health Service data in the DeepMind project [GoogleMisuseNHS]. Data privacy can also be challenged by malicious users who intend to infer the original training data. The resulting privacy breach would raise serious issues because training data either contains sensitive attributes (e.g., a patient’s disease history) or is cumbersome to collect. One such attack is membership inference [dwork2015robust, backes2016membership, shokri2017membership, hayes2019logan, SZHBFB19] which aims to identify if a query data record was used to train a deep learning model. Overfitting to training data is the major cause for the feasibility of membership inference, as the learned model tends to memorize training inputs and perform better on them.
There are two main motivations for conducting research in membership inference attack. The first motivation is to validate and quantify the privacy vulnerability of a deep learning model. For models against which inference attacks are effective, owners can consider restricting their availability so as also to avoid other mounted attacks (e.g., profiling [pyrgelis2017does] or property inference [ateniese2013hacking, melis2018exploiting]). The second motivation is to establish wrongdoing, where, e.g., regulators can be clued from membership inference to propose the suspicion that a model was trained on personal data without an adequate legal basis [GoogleMisuseNHS].
Membership inference against discriminative deep learning models has been largely explored [backes2016membership, shokri2017membership, aono2017privacy, hitaj2017deep, long2018understanding, yeom2018privacy, carlini2018secret, melis2018exploiting], while inference against generative models is still an open question. This is more challenging from the adversary side because the victim model does not directly provide confidence values about the overfitting of data records. Recently, Hayes et al. [hayes2019logan] present a first approach targeting GANs, which proposes to retrain a local copy of GAN in a black-box setting and to check the discriminator’s confidence for membership inference in a white-box setting. Their intuition is that the overfitting of a victim GAN is either incorporated in its discriminator or can be mimicked from a local copy of the discriminator. Hilprecht et al. [hilprecht2019monte] extend membership inference attack to both GANs and VAE via Monte Carlo integration [owen2013monte]. Their intuition is that an overfitted generator tends to output data samples closer to the training data than to unseen data.
However, neither of them provides a complete, comprehensive, and practical analysis of membership inference attacks against GANs. For example, Hayes et al. [hayes2019logan] does not consider the realistic situation where the discriminator in a well-trained GAN is usually not deployed for query. Hilprecht et al. [hilprecht2019monte] investigate on toy image datasets and does not involve white-box attack against GANs. That motivates our contributions along this direction towards a more systematic analysis.
Taxonomy of membership inference attacks against GANs.
We propose a pioneering study to categorize attack settings against GANs. Given increasing order of the amount of knowledge about victim GAN accessible to the adversary, the settings are benchmarked as (1) full black-box generator, (2) partial black-box generator, (3) white-box generator, and (4) accessible discriminator . In particular, two of the settings, the partial black-box and white-box settings, are newly identified for attack model design. We then establish the first taxonomy for the existing and proposed attacks. See Section 3, Table 1, and Figure 1 for details.
The first generic attack model across settings and its novel instantiated variants.
We propose the first generic attack model applicable to all the settings. See Section 4.1. The instantiated attack variants in the partial black-box and white-box settings are also the first attempts. See Section 4.3 and 4.4. Their effectiveness in Section 5.4 and 5.5, with a complete spectrum of performance across settings, bridges the gap between the existing full black-box attacks in [hayes2019logan, hilprecht2019monte] and discriminator-accessible attack in [hayes2019logan]. See experimental comparisons in Section 5.7. We remind users of the high risk of privacy breach to publish the generator or even the input interface to their generator.
Comprehensive analysis in each setting.
We progressively investigate attacks in each setting in the increasing order of amount of knowledge to adversary. See Section 5.3 to 5.5. In each setting, our research spans several orthogonal dimensions including three datasets with diverse modalities (Section 5.1), five victim GAN models that were the state-of-the-art at their release time (Section 5.1), two ablation study w.r.t. GAN training configurations (Section 5.2), and attack performance changes according to attack calibration (Section 4.6 and 5.6) or differential privacy defense during GAN training (Section 5.8).
The rest of the paper is organized as the following. Section 2 reviews the related work in the fields including various generative models, and membership inference attack and defense techniques. Section 3 elaborates attack taxonomy, and Section 4 introduces our generic attack model and instantiated attack variants for each setting. In Section 5, we evaluate attack performance in thoses settings and compare to existing work. Section 6 concludes the paper.
2 Related work
Generative adversarial networks (GANs).
A GAN framework [goodfellow2014generative, radford2015unsupervised, salimans2016improved, arjovsky2017wasserstein, gulrajani2017improved, karras2018progressive, brock2018large] consists of two neural networks, a generator and a discriminator, which are trained in an adversarial manner to generate data approximating the training data distribution. The generator takes random noise (latent code) as input and generates samples in the same modality as training data, while the discriminator takes samples from both the generator and training dataset as input and is trained for differentiating the two sources. During training, the two networks compete and evolve to reach the game equilibrium, such that the generator learns to generate more and more realistic samples aiming at fooling the discriminator, while the discriminator learns to more and more accurately tell the two sources apart. During testing, the discriminator is usually removed from deployment. One insight of GAN is to avoid hand-crafted similarity metric of sample-wise self-supervision. Instead, the discriminator adversarially learns deep features for a distribution-wise similarity metric. Experimentally, the generator can learn to generate samples with more realistic details, in contrast to only low-frequency accuracy from other hand-crafted metrics which average over training dataset. Therefore, GANs have shown improved photorealism in image synthesis [li2016precomputed, bergmann2017learning, zhou2018nonstationary], translation [isola2017image, zhu2017unpaired, zhu2017toward], or manipulation [antipov2017face, thies2015real, thies2016face2face].
On the other hand, because of the nature of excluding sample-wise supervision, GANs implicitly model data distribution without processing individual images. Therefore, GANs can also be employed to re-render new samples following the original distribution but without revealing individual samples. Privacy protection can thus be potentially achieved by releasing only the distribution instead of the raw data to the public, e.g., GAN-based de-identification for sensitive human-centric data [brkic2017know, meden2017face, wu2018privacy] or health data [choi2017generating, xie2018differentially]. Therefore, membership inference attack against GANs is naturally more challenging.
We choose PGGAN [karras2018progressive], WGANGP [gulrajani2017improved], DCGAN [radford2015unsupervised], and MEDGAN [choi2017generating] as the victim models to be attacked by membership inference, considering their pleasing performances on generating images and other data representations.
Variational AutoEncoder (VAE).
VAE is an alternative generative framework [kingma2013auto, rezende2014stochastic, yan2015attribute2image] consists of two neural networks, an encoder and a decoder, which are cascaded to reconstruct data with sample-wise loss. The encoder maps data into a latent space, while the decoder maps the encoded latent representation back to data reconstruction. VAE regularizes the encoder by imposing a prior (usually a standard normal distribution) over the latent distribution. The VAE loss is then composed of the reconstruction error and a prior regularization in the form of Kullback-Leibler divergence.
Hybrid generative model.
GANs often suffer from mode collapse issue, i.e., failing to generate appearances relevant to some training samples, due to the lack of sample-wise supervision, while VAE often lacks flexible hallucination capability. Therefore, a hybrid model, VAEGAN [pmlr-v48-larsen16, apratim18cvprb], is proposed to jointly train a VAE and a GAN, where the VAE decoder and the GAN generator are collapsed into one by sharing trainable parameters. The GAN discriminator is used to measure sample similarity. We choose VAEGAN [apratim18cvprb] as the fifth victim model to be attacked by membership inference.
Membership inference attacks.
Shokri et al. [shokri2017membership] specifies attacks against discriminative models by exploiting differences in the model’s response to inputs that were or were not seen during training. Hayes et al. [hayes2019logan] specifies attacks against GANs by retraining a local copy of the victim GAN and checking the discriminator’s response to inputs. The latter attack is more challenging because GAN naturally excludes to sample-wise supervision and the retrained local copy may not accurately detect overfitting of the original training data. We also study in this direction but leverage only the pretrained generator. Another concurrent study by Hilprecht et al. [hilprecht2019monte] extends the attack against both GANs and VAE using Monte Carlo integration, but they only show attack against GANs in a black-box setting.
Other privacy-related attacks include model inversion [fredrikson2014privacy, fredrikson2015model, yeom2018privacy], model extraction [tramer2016stealing], model attribution [yu19iccv], and attacks against distributed deep learning systems [aono2017privacy, hitaj2017deep, melis2018exploiting]. None of them are designed to determine if an individual data record belongs to the original training set or not, and thus cannot directly adapt to our desired membership inference attack against generative models.
Membership inference defenses.
Regularization-based defenses include weight normalization [salimans2016weight] and random dropout [srivastava2014dropout]. Weight normalization is to reparameterize the model weight vectors by decoupling the length of those weights from their direction, and apply to all layers in both generator and discriminator in the victim model. Random dropout can be used to prevent overfitting by randomly zeroing connections between neurons during training.
Differential privacy defenses [dinur2003revealing, dwork2004privacy, dwork2006calibrating, dwork2006differential, dwork2008differential, chaudhuri2009privacy, chaudhuri2011differentially, song2013stochastic, song2015learning, zhu2017differentially, beaulieu2017privacy, triastcyn2018generating, xie2018differentially] are to inject carefully-sampled noise in either the forward passes or backward gradients during victim model training, so as to perturb data-related objective functions and thus mitigate inference attacks. We apply [dwork2006calibrating] in a variety of setups to analyze in which circumstances and to what extents it works.
|[hayes2019logan] full black-box|
|[hilprecht2019monte] full black-box|
|Our full black-box (Sec. 4.2)|
|Our partial black-box (Sec. 4.3)|
|Our white-box (Sec. 4.4)|
|[hayes2019logan] accessible discriminator|
3 Taxonomy of membership inference against GANs
In general, attack settings can be categorized into either white-box or black-box one. In the white-box setting, the victim network parameters are accessible to attackers, whereas in the black-box setting, they are unknown to attackers. Specifically to the attack against GANs, we further distinguish the settings based on the following criteria: (1) whether the discriminator is accessible or not, (2) whether the generator is white-box or black-box, and (3) whether the latent code of a generated sample is accessible or not. We categorize the existing and the proposed attacks in Table 1, and visualize different settings in Figure 1. We elaborate each possible category in the following paragraphs in the decreasing order of the amount of knowledge to attackers.
This is the most knowledgeable setting to attackers and it converts the attack against a GAN to the attack against a classical discriminative model, no matter whether the discriminator is white-box or black-box. Existing attack methods against discriminator models can be applied to this setting. For example, Shokri et al. [shokri2017membership] infer membership by checking the confidence value of the discriminator. This setting is also considered in [hayes2019logan], corresponding to the last row in Table 1. In practice, however, the discriminator of a well-trained GAN is usually discarded without being deployed to APIs, and thus not accessible to attackers. We, therefore, devote less effort to investigating the discriminator and mainly focus on the following practical settings where the attackers only have access to the generator.
This is the most knowledgeable setting to attackers when the discriminator of a GAN is no longer accessible. Attackers have access to the parameters of the generator. This is a realistic open-source scenario where users publish their well-trained generator without releasing the underlying training data. This scenario is also commonly studied in the community of differential privacy [dwork2006calibrating]. However, this is a novel setting for membership inference attack against GANs, which is not explored in [hayes2019logan] or [hilprecht2019monte]. It corresponds to the second last row in Table 1 and Section 4.4.
Partial black-box generator (known input-output pair).
This is a less knowledgeable setting to attackers where they have no access to the parameters of the generator but have access to the latent code of each generated sample. This is also a realistic scenario where attackers submit their latent code as input and collect corresponding generated samples from the generator. This is another novel setting and not considered in [hayes2019logan] or [hilprecht2019monte]. It corresponds to the third last row in Table 1 and Section 4.3.
Full black-box generator (known output only).
This is the least knowledgeable setting to attackers where they are unable to provide input but just blindly query samples from the well-trained black-box generator. This corresponds to the practical scenario of closed-source GAN-based APIs. Hayes et al. [hayes2019logan] investigate attacks in this setting by retraining a local copy of the API. Hilprecht et al. [hilprecht2019monte] sample data from the generator and use Monte Carlo approximation to score each query based on an elaborated design of distance metric between the query and the generated samples. Our idea is similar in spirit to Hilprecht et al. [hilprecht2019monte] but we design a low-skill attack method with a simpler implementation (Section 4.2) that achieves comparable or better performance (Section 5.3). Our attack and theirs correspond to the third, second, and first rows in Table 1, respectively.
|Latent code (input to the generator)|
|Training set of the victim generator|
|Attacker’s reference generator, described in Section 4.6|
4 Membership inference attack against GANs
We first describe our generic attack model in Section 4.1, which applies to all the settings. Based on that, we introduce three attack variants under different settings from Section 4.2 to 4.4, respectively. Since our attack depends on sample distance metric, we elaborate our metric design in Section 4.5. Finally, we propose an attack calibration method in Section 4.6 to mitigate the relevance between our inference prediction and the generation difficulty of individual queries.
4.1 Generic attack model
We first summarize the notations used in the paper in Table 2. We formulate the membership inference attack as a binary classification task where we threshold the reconstruction error between a query sample and its reconstructed copy from the well-trained victim generator . Our intuition is that, given access to a generator, we should reconstruct samples better if they belong to the GAN training set, no matter how simple or difficult of those query samples. Mathematically,
The attacker’s goal is then to design such that it activates the most accurate possible performance of to approximate a query sample. In the following sections, we instantiate variants of for different attack settings in the increasing order of the amount of knowledge to attackers.
4.2 Full black-box attack
We start with the least knowledgeable setting where an attacker only has access to a black-box generator . The attacker is allowed no other operation but blindly collecting samples from , denoted as . indicates that the attacker has neither access nor control over latent code input. We then define the reconstruction of as the nearest neighbor from . Mathematically,
See Figure 2(b) for a diagram. Larger leads to better reconstruction while sacrificing computation efficiency. Given a limited budget, we fix to 20k throughout the experiments.
4.3 Partial black-box attack
Due to the curse of dimensionality, random samples in the black-box attack are sparsely distributed in the high dimensional space. The blind nearest neighbor search without considering the latent code input may fail to find an accurate reconstruction. As the access to is permitted in the partial black-box setting, we propose to establish attack exploiting .
Concretely, the attacker performs an optimization w.r.t. in order to accurately reconstruct the query samples . Mathematically,
See Figure 2(c) for a diagram. Without knowing the parameters of , the optimization is not differentiable and no gradient information is available. As only the evaluation of function (forward-pass through the generator) is allowed by the access of pair, we propose to approximate the optimum via the Powell’s Conjugate Direction Method [Powell].
4.4 White-box attack
In the white-box setting, we have the same reconstruction formulation as in Section 4.3. See Figure 2(c) for a diagram. More advantageously to attackers, the reconstruction quality can be further boosted thanks to access to the parameters of . This is because the optimization becomes differentiable with gradient backpropagation w.r.t. , which can be more accurately solved by advanced first-order optimization algorithms, namely Adam [kingma2014adam], RMSPROP [tieleman2012lecture], and L-BFGS [liu1989limited]. We tried all the three in our ablation study and found L-BFGS performs the best in terms of convergence rate.
4.5 Distance metric
Our baseline distance metric consists of three terms: the element-wise (pixel-wise) difference term targets low-frequency components, the deep image feature term (i.e., the Learned Perceptual Image Patch Similarity (LPIPS) metric [zhang2018unreasonable]) targets realism details, and the regularization term penalizes latent code far from the prior distribution. Mathematically,
, and are used to enable/disable and balance the order of magnitude of each loss term, the values of which depend on experiment configurations. For non-image data, because LPIPS is no longer applicable. For full black-box attack, because is not accessible by the attacker.
4.6 Attack calibration
We noticed that the reconstruction error is query-dependent, i.e., some query samples are more difficult to reconstruct due to their intrinsically more complicated representations. In this case, the reconstruction error is dominated by the representations rather than by the membership clues. We, therefore, propose to mitigate the query dependency by first training a reference GAN with another disjoint dataset, and then calibrating our base reconstruction error according to the reference reconstruction error. Mathematically,
The optimization on the well-trained is the same as on . See Figure 2(d) for a diagram.
We discuss the experimental setup in Section 5.1. We propose two dimensions of ablation study in Section 5.2. From Section 5.3 to 5.5, we evaluate our attacks in three settings. We present the attack performance gain from calibration in Section 5.6. In Section 5.7, we compare with three baseline attack models. Finally, we investigate the effectiveness of one popular privacy defense scheme against our attack in various settings in Section 5.8.
Hilprecht et al. [hilprecht2019monte] limit their investigations on low-resolution toy image datasets, e.g., MNIST [lecun2010mnist] and CIFAR-10 [krizhevsky2009learning]. We push our attack scenarios towards realism by conducting experiments on three modalities of datasets covering images, medical records, and location check-ins, which are considered with a high risk of privacy breach.
CelebA [liu2015faceattributes] is a large-scale face attributes dataset with 200k RGB images. Images are aligned to each other based on facial landmarks, which benefits GAN performance. We select at most 20k images, center-crop them, and resize them to before GAN training.
MIMIC-III [johnson2016mimic] is a public Electronic Health Records (EHR) database containing medical records of intensive care unit (ICU) patients. We follow the same procedure as in [choi2017generating] to pre-process the data, where each patient is represented by a 1071-dimension binary feature vector. We filter out patients with repeated vector presentations and yield unique samples.
Instagram New-York [backes2017walk2friends] contains Instagram users’ check-ins at various locations in New York at different time stamps from 2013 to 2017. We filter out users with less than 100 check-ins and yield remaining samples. For sample representation, we first select evenly-distributed time stamps. We then concatenate the longitude and latitude values of the check-in location at each time stamp, and yield a 4048-dimension vector for each sample. The longitude and latitude values are either retrieved from the dataset or linearly interpolated from the available neighboring time stamps. We then perform zero-mean normalization before GAN training.
Victim GAN models.
We choose PGGAN[karras2018progressive], WGANGP [gulrajani2017improved], DCGAN [radford2015unsupervised], MEDGAN [choi2017generating], and VAEGAN [apratim18cvprb] into the victim model set, considering their pleasing performance on generating images and/or other data representations. We only train a subset of GAN models that are applicable for each dataset.
It is important to guarantee the high quality of well-trained GANs because attackers are more likely to target high-quality GANs with practical effectiveness. We noticed previous work [hayes2019logan, hilprecht2019monte] only shows qualitative results of their victim GANs. In particular, Hayes et al. [hayes2019logan] did not show visually pleasing generation on the Labeled Faces in the Wild (LFW) dataset [LFWTech]. It may result from the mismatch between the dataset and their GAN models. Rather, we present better qualitative results of different GANs on CelebA (Figure 3), and further present the corresponding quantitative evaluation in terms of Fréchet Inception Distance (FID) metric [heusel2017gans] (Table 3). A smaller FID indicates the generated image set is more realistic and closer to real-world data distribution. We show that our GAN models are in a reasonable range from the state of the art.
The proposed membership inference attack is formulated as a binary classification given a threshold in Equation 1. Through varying , we measure the area under the receiver operating characteristic curve (AUCROC) to evaluate the attack performance. ROC reflects the relation between false-positive rate and true-positive rate given varying thresholds for a binary classification task and is robust to class imbalance. With a value range of , a larger AUCROC indicates a higher true-positive rate in a variety of thresholding, and thus better classification (attack) performance.
5.2 Ablation study
We first list two dimensions of ablation study across attack settings. There are also some other dimensions specifically on the white-box attack, which is elaborated in Section 5.5.
GAN training set size.
Training set size is highly related to the degree of overfitting of GAN training. A GAN model trained with a smaller size tends to more easily memorize individual training images and is thus more vulnerable to membership inference attack. Therefore, we evaluate the attack performance w.r.t. training set size. We exclude DCGAN and VAEGAN from evaluation since they yield unstable training for small training sets.
Random v.s. identity-based selection for GAN training set.
There are different levels of difficulty for membership inference attack. For example, CelebA contains person identity information and we can design attack difficulty by composing GAN training set based on identity or not. In one case, we include all images of the selected individuals for training. In the other case, we ignore identity information and randomly select images for training. The former case is relatively easier to attackers with a larger margin between membership image set and non-membership image set. Therefore, we evaluate these two kinds of training set selection schemes on CelebA.
5.3 Evaluation on full black-box attack
We start with evaluating our preliminary low-skill black-box attack model in order to gain a sense of the difficulty of the whole problem.
Performance w.r.t. GAN training set size.
Figure 4 to 4 plot the attack performance against different GAN models on the three datasets. The attack performs sufficiently well when the training set is small for all three datasets. For instance, on CelebA, when the training set contains up to 512 images, attacker’s AUCROC on both PGGAN and WGANGP are above 0.95. This indicates an almost perfect attack and a serious privacy breach. For larger training sets, however, the attacks become less effective as the degree of overfitting decreases and GAN’s capability shifts from memorization to hallucination. That reminds users to collect more data for GAN training in order to reduce privacy breach for each individual sample. Moreover, PGGAN becomes more vulnerable than WGANGP on CelebA when the training size becomes larger. WGANGP is consistently more vulnerable than MEDGAN on MIMIC-III regardless of training size.
Performance w.r.t. GAN training set selection.
Figure 5 shows the comparisons against four victim GAN models. We find that, consistently, all the GAN models are more vulnerable when the training set is selected based on identity. That reminds users to pay more attention to identity-based privacy breach, which is more likely to happen than instance-based privacy breach. Moreover, DCGAN and VAEGAN are more resistant against full black-box attack with AUCROC only marginally above 0.5 (random guess baseline). The resistance order is inversely correlated to the generation quality order of GANs in Tabel 3. That reminds users that a better GAN performance is at the cost of a higher privacy breach risk.
5.4 Evaluation on partial black-box attack
We then equip attackers with more knowledge to perform partial black-box attack.
Performance w.r.t. GAN training set selection.
Figure 5 shows the comparison on four victim GAN models. We find that all models become more vulnerable to identity-based selection. Still, DCGAN is the most resistant victim against membership inference in both training set selection schemes.
Comparison to full black-box attack.
Comparing between Figure 5 and 5, the attack performance against each GAN model consistently and significantly improves from black-box setting to partial black-box setting, which reminds users not to provide the input interface to unauthorized ones.
5.5 Evaluation on white-box attack
We further equip attackers with access to the parameters of the generator to perform white-box attack. As the optimization in white-box attack involves more technical details, we conduct additional ablation study and sanity check in this setting.
Ablation on optimization initialization.
Due to the non-convexity of our optimization problem, the choice of initialization is of great importance. We explore three different initialization heuristics in our experiments, including mean (), random (), and nearest neighbour (). We find that the mean and nearest neighbor initialization perform well in practice, and are in general better than random initialization in terms of the successful reconstruction rate (reconstruction error smaller than 0.01). Therefore, we apply the mean and nearest neighbor initialization in parallel, and choose the one with smaller reconstruction error for the attack.
Ablation on optimization method.
We explore three optimizers with a range of hyper-parameter search: Adam [kingma2014adam], RMSProp [tieleman2012lecture], and L-BFGS [liu1989limited] for reconstructing generated samples of PGGAN on CelebA. Figure 7 shows that L-BFGS achieves superior convergence rate with no additional hyper-parameter. Therefore, we select L-BFGS as our default optimizer in the white-box setting.
Ablation on distance metric design for optimization.
We show the effectiveness of our objective design (Equation 5). Although optimizing only for element-wise term yields reasonably good reconstruction in most cases, we observe undesired blur in reconstruction for CelebA images. Incorporating deep image feature term and regularization term benefits the successful reconstruction rate. See Figure 6 for a demonstration.
|Success rate (%)||99.89||99.83||99.55||99.25|
Sanity check on distance metric design for optimization.
In addition, we check if the non-convexity of our objective function affects the feasibility of attack against different victim GANs. We apply optimization to reconstruct generated samples. Ideally, the reconstruction should have no error because the query samples are directly generated by the model, i.e., their preimages exist. We set a threshold of to the reconstruction error for counting successful reconstruction rate, and evaluate the success rate for four GAN models trained on CelebA. Table 4 shows that we obtained more than 99% success rate for all the GANs, which verifies the feasibility of our optimization-based attack.
Ablation on distance metric design for classification.
We propose to enable/disable , , or in Equation 5 to investigate the contribution of each term towards classification thresholding (membership inference) on CelebA. In detail, we consider using (1) the element-wise difference term only, (2) the deep image feature term only, and (3) all the three terms together to evaluate attack performance. Figure 8 shows the AUCROC of attack against each various GANs. We find that our complete distance metric design achieves general superiority to single terms. Therefore, we use the complete distance metric for classification thresholding.
Moreover, the attack is less effective against DCGAN while more effective against VAEGAN. The success in attacking VAEGAN can be explained by the sample-wise self-supervision during VAEGAN training, which enforces to memorize training data. Therefore, we argue that combining VAE and GAN may alleviate mode collapse but at the cost of higher privacy breach risk.
Performance w.r.t. GAN training set size.
Figure 4 to 4 plot the attack performance against different GAN models on the three datasets. We find that the attack becomes less effective as the training set becomes larger, similar to that in the black-box setting. For CelebA, the attack remains effective for 20k training samples, while for MIMIC-III and Instagram, this number decreases to 8192 and 2048, respectively. The strong similarity between the member and non-member in these two non-image datasets increases the difficulty for attack, which explains the deteriorated performance.
Performance w.r.t. GAN training set selection.
Figure 5 shows the comparisons against four victim GAN models. Our attack is much more effective when we compose GAN training set according to identity, which is similar to those in the full and partial black-box settings.
Comparison to full and partial black-box attacks.
For membership inference attack, it is an important question whether or to what extent the white-box attack is more effective than the black-box ones. We find that against GAN models the white-box attack is much more effective. Comparisons across subfigures in Figure 5 show that the AUCROC values increase by at least 0.03 when changing from full black-box to white-box setting. Compared to the partial black-box attack, the white-box attack achieves noticeably better performance against PGGAN and VAEGAN. Moreover, conducting white-box attack requires much less time cost than conducting partial black-box attack. Therefore, we conclude that providing users with model parameters (white-box setting) does incur high privacy breach risk.
5.6 Performance gain from attack calibration
We perform calibration on all the settings. Especially for full and partial black-box settings, attackers do not have prior knowledge on victim model architectures. We thus train a PGGAN on LFW face dataset [LFWTech] and use it as the generic reference model for calibrating all victim models trained on CelebA. Similarly, for MIMIC-III, we use WGANGP as the reference model for MedGAN and vice versa. In other words, we have to guarantee that our calibrated attacks strictly follow the black-box assumption.
Figure 9 compares attack performance on CelebA before and after applying calibration. The AUCROC values are improved consistently across all the GAN architectures in all the settings. In general, the white-box attack calibration yields the greatest performance gain. Moreover, the improvement is especially significant when attacking against VAEGAN, as the AUCROC value increases by 0.2 after applying calibration.
Figure 10 compares attack performance on the other two non-image datasets. The performance is also consistently boosted for all training set sizes after calibration.
5.7 Comparison to baseline attacks
We compare our calibrated attack to two recent membership inference attack baselines: Hayes et al. [hayes2019logan] (denoted as LOGAN) and Hilprecht et al. [hilprecht2019monte] (denoted as MC, standing for their proposed Monte Carlo sampling method). As described in our taxonomy (Section 3), LOGAN includes a full black-box attack model and a discriminator-accessible attack model against GANs. The latter is regarded as the most knowledgeable but unrealistic setting because the discriminator in GAN is usually not accessible in practice. But we still compare to both settings for the completeness of our taxonomy and experiments. MC includes a full black-box attack model but does not include a white-box attack model against GANs. Their white-box attack model is only targeting VAE. We thus only compare to their former model. Note that, to the best of our knowledge, there does not exist another attack against GANs in the partial black-box or white-box settings.
Figure 11 and 12 show the comparisons, considering several datasets, victim GAN models, and GAN training set sizes, and across different settings. We skip MC on the non-image datasets as it is not directly applicable in terms of their distance calculation. Our findings are as follow.
In black-box setting, our low-skill attack consistently outperforms MC and outperforms LOGAN on the non-image datasets. It also achieves comparable performance to LOGAN on CelebA but with a much simpler implementation.
Our white-box and even partial black-box attacks consistently outperform the other full black-box attacks, which reminds users that the release of the generator or even just the input to the generator can lead to severe risk of privacy breach. With a complete spectrum of performance across settings, they bridge the performance gap between LOGAN black-box attack and LOGAN discriminator-accessible attack.
LOGAN with access to the discriminator is the most effective attack, except when against VAEGAN. The effectiveness can be explained by the fact that the discriminator is explicitly trained to maximize the margin between training set (membership samples) and generated set (a subset of non-membership samples), which eventually yields very accurate confidence scores for membership inference. However, the discriminator score is not very effective against VAEGAN because its training relies more on sample-wise supervision than on the adversarial loss. Although this is not a fair comparison due to the more knowledgeable assumption, it reminds users of the exceptional risk of publishing the discriminator.
We investigate one popular defense mechanism against membership inference attack, i.e., the differential private (DP) stochastic gradient descent [abadi2016deep]. The algorithm can be summarized into two steps. First, the per-sample gradient computed at each training iteration is clipped by its norm with a pre-defined threshold. Second, random noise is added to this gradient in order to protect privacy. The noisy gradient is then used for gradient descent optimization. In this scheme, however, privacy protection is at the cost of computational complexity and utility deterioration, i.e., slower training and lower generation quality.
We conduct attacks against PGGAN on CelebA, which has been defended by DP. We skip the other cases because DP always deteriorates generation quality to an unacceptable level. The DP hyper-parameters are selected through the grid search. We fix the norm threshold to 1.0 and the noise scale to .
Figure 13 and 14 depict the attack performance in different settings. We observe a consistent decrease in AUCROC in all the settings. Therefore, DP is effective in general against our attack. However, applying DP into training leads to a much higher computation cost ( slower) in practice due to the per-sample gradient modification. Moreover, DP results in a deterioration of GAN utility, which is witnessed by an increasing FID (comparing the last and second columns in Table 3). Moreover, for obtaining a pleasing level of utility, the noise scale has to be limited to a small value, which, in turn, cannot defend membership inference attack completely. For example, for all the settings, our attack still achieves better performance than the random guess baseline (AUCROC ).
We have established the first taxonomy of membership inference attacks against GANs, with which we hope to benchmark research in this direction in the future. We have also proposed the first generic attack model based on reconstruction, that is applicable to all the settings according to the amount of attacker’s knowledge about the victim model. In particular, the instantiated attack variants in the partial black-box and white-box settings are our another novelty to bridge the assumption and performance gap in the previous work [hayes2019logan, hilprecht2019monte]. Comprehensive experiments show consistent effectiveness and a complete spectrum of performance in a variety of setups spanning three datasets, five victim GAN models, two directions of ablation study, attack calibration, as well as differential privacy defense, which conclusively remind users to be careful about releasing any part of their models.
A Experiment setup
a.1 Hyper-parameter setting
We set , , for our partial black-box and white-box attack on CelebA, and set , , for the other cases. The maximum number of iterations for optimization are set to be 1000 for our white-box attack and 10 for our partial black-box attack.
a.2 Model architectures
We use the official implementations of the victim GAN models.111https://github.com/tkarras/progressive_growing_of_gans,
https://github.com/mp2893/medgan We re-implement WGANGP model with a fully-connected structure for non-image datasets. The network architecture is summarized in Table 5. The depth of both the generator and discriminator is set to 5. The dimension of the hidden layer is fix to be 512 . We use ReLU as the activation function for the generator and Leaky ReLU with for the discriminator, except for the output layer where either the sigmoid or identity function is used.
|(MIMIC-III)||(Instagram)||(MIMIC-III and Instagram)|
|FC (512)||FC (512)||FC (512)|
|FC (512)||FC (512)||FC (512)|
|FC (512)||FC (512)||FC (512)|
|FC (512)||FC (512)||FC (512)|
|FC ()||FC ()||FC (1)|
a.3 Implementation of baseline attacks
We provide more details of implementing baseline attacks that are discussed in Section 5.7.
For CelebA, we employ DCGAN as the attack model, which is the same as in the original paper [hayes2019logan]. For MIMIC-III and Instagram, we use WGANGP as the attack model.
For implementing MC on CelebA, we apply the same process of their best attack on RGB image dataset: First, we employ principal component analysis (PCA) on a data subset disjoint from the query data. Then, we keep the first 120 PCA components as suggested in the original paper [hilprecht2019monte] and apply dimensionality reduction on the generated and query data. Finally, we calculate the Euclidean distance of the projected data and use the median heuristic to choose the threshold for MC attack.
B Quantitative results
b.1 Evaluation on full black-box attack
Attack performance w.r.t. training set size.
Attack performance w.r.t. training set selection.
b.2 Evaluation on partial black-box attack
Attack performance w.r.t. training set selection.
b.3 Evaluation on white-box attack
Ablation on distance metric design for classification.
Attack performance w.r.t. training set size.
Attack performance w.r.t. training set selection.
b.4 Attack calibration
b.5 Comparison to baseline attacks
|full bb (LOGAN)||0.56||0.57||0.52||0.50|
|full bb (MC)||0.52||0.52||0.51||0.50|
|full bb (ours calibrated)||0.54||0.54||0.52||0.51|
|partial bb (ours calibrated)||0.58||0.63||0.56||0.59|
|wb (ours calibrated)||0.68||0.66||0.55||0.76|
|full black-box||partial black-box||white-box|
|white-box w/o DP||1.00||1.00||1.00||0.99||0.95||0.83||0.62|
|white-box w/ DP||1.00||1.00||0.99||0.98||0.90||0.70||0.56|
|full black-box w/o DP||1.00||1.00||1.00||0.99||0.95||0.79||0.57|
|full black-box w/ DP||1.00||1.00||0.99||0.98||0.89||0.68||0.53|
C Qualitative results
Given query samples , we show their reconstruction copies and obtained in our white-box attack.