Adversarial Recommendation: Attack of the Learned Fake Users
Can machine learning models for recommendation be easily fooled? While the question has been answered for hand-engineered fake user profiles, it has not been explored for machine learned adversarial attacks. This paper attempts to close this gap.
We propose a framework for generating fake user profiles which, when incorporated in the training of a recommendation system, can achieve an adversarial intent, while remaining indistinguishable from real user profiles. We formulate this procedure as a repeated general-sum game between two players: an oblivious recommendation system and an adversarial fake user generator with two goals: (G1) the rating distribution of the fake users needs to be close to the real users, and (G2) some objective encoding the attack intent, such as targeting the top- recommendation quality of for a subset of users, needs to be optimized. We propose a learning framework to achieve both goals, and offer extensive experiments considering multiple types of attacks highlighting the vulnerability of recommendation systems.
Fake social media accounts are created to promote news articles about a political ideology; false online product reviews attempt to bias users’ opinions favorably or against certain products—these are just a few of the many real life examples illustrating that recommendation systems are exposed and can be susceptible to threats from adversarial parties.
Machine learning algorithms have an ever-growing impact on people’s everyday lives. Recommendation systems heavily rely on such algorithms to help users make their decisions—from which show to watch, to which news articles to read (which could end up influencing their beliefs). Thus, a natural question is: How easy is it to manipulate a machine learned system for malicious purposes? An answer to such a question would be a stepping stone towards safer artificial intelligence [30, 16].
To study this question, the first necessary step is the creation of adversarial examples; this would allow one to test the algorithms against them, and potentially increase the algorithms’ robustness . With this motivation, a recently thriving subfield of machine learning is the one of adversarial examples—find the minimal perturbation vector to add to the feature vector of an example so that an oblivious classifier misclassifies the perturbed example. These works focus on classification [35, 19, 27, 26, 34].
In recommendation systems, the adversarial attacks have a different form. Instead of minimally perturbing an existing example to misclassify it, the attack consists of creating a few adversarial user profiles rating items with some intent. The intent could be to promote a specific item, or to deteriorate the recommendation quality of a group of users. The setting is not new; in fact, it has been researched since [29, 23]. However, the injected fake user profiles are hand-coded—typically, the fake users rate the target item with a small or large score, and the rest with random or normal distributed scores to mimic the true rating distribution.
Our goal is to revisit the question of crafting adversarial fake user profiles for a recommendation system from an optimization perspective. We pose this as finding a matrix of fake usersitems, so that (G1) the distance between the rating distributions of real and fake users is small, and (G2) the adversary’s intent is accomplished. The scenario is highly realistic—e.g., an adversary creates a small number of realistic-looking fake user accounts with the goal of removing a target group of a competitor company’s products from target users’ top lists. We assume that knows the recommender’s model and algorithm to fit the model, and can fit similar models on any new/fake data.
Particularly, we make the following contributions:
We formulate adversarial recommendation as a game of an adversary vs. an oblivious recommender, e.g. a low-rank model. There are two objectives: (1) given the real and some fake ratings, learn the low-rank model based on the recommender’s objective and (2) use the low-rank model to evaluate the adversarial objective. This two-step process makes the adversary’s task more involved.
We propose a learning framework for adversarial attacks on recommendation systems, using: (i) generative adversarial nets (GANs)  to learn initial fake users that mimic the true rating distribution and (ii) suitably update them optimizing an objective encoding the adversarial goal. For (ii), we use 0-th order optimization to construct the gradient, as the adversary does not have direct access to the gradient. Our framework is the first to find machine learned attacks on recommendation systems, allowing to optimize complex intents.
Our real-world experiments show that machine learned adversarial attacks with a wide range of intents are very much possible. As a striking example of a malicious attack we illustrate that in order to ruin the predicted scores of a specific item for users who would have loved or hated that item, it suffices to minimize the predicted score of the user with the highest predicted score before the attack.
The rest of the paper is organized as follows. In Section 2 we formalize the problem of adversarial recommendation and in Section 3 we propose our learning procedure from the perspective of an adversary of the recommender. We empirically evaluate the proposed methods in Section 4, review related work in Section 5, and give a summary in Section 6.
2 Problem Formulation
Our considered model for attacking a recommendation system involves two players: an oblivious recommendation system and an adversarial ‘fake user’ generator . The goal of the recommendation system is to build a model with parameters to minimize a suitable loss function between true and model predicted ratings over all users and items. The goal of the adversary is to generate fake users using a model with parameters such that:
the fake users are indistinguishable from the real users based on reasonable metrics, e.g., ratings distributions of the fake users are similar to real users, eigen-spectrum of the fake user ratings are similar to that of real user ratings, etc., and
a recommendation model learned by using the fake users generated by leads to worse predicted ratings for a suitable subset of the real users and/or items, e.g., makes an item less desirable to a subset of users.
Let be the set of items, and the set of real users present in the recommendation system. Let be the number of items and the number of real users, where denotes the cardinality of a set. Let denote the matrix of ratings from real users. We assume that the adversary has a certain budget of fake user profiles, where ; and that each user profile is a dimensional vector; aka how the user has rated the different items in , with zero values denoting empty ratings. Particularly, outputs a matrix , where each row for is a fake user profile. The total of real and fake users is .
The setting can be formulated as a repeated, general-sum game between two players: the row player, the recommender and the column player, the adversary . The recommender maps tuples to some real-valued score representing the predicted rating of user on item , and is parameterized by . The actions of include all , e.g., for low rank recommender models, each corresponds to a pair of latent factor matrices. The actions of the adversary include all fake user profiles , which are generated using a model parameterized by .
Both players consider a loss function (the negative of a payoff function) they wish to minimize. If the row player chooses actions (latent factor matrices) and the column player chooses action (fake user matrix), then for the row player, the functional form of the payoff is , and for the column player, is , where the arguments of , are the actions played in this round.
In the general setting, each player maintains a distribution over respective action spaces, and will play by drawing an action from the distribution. The row player maintains a distribution over the space of (), i.e., , and the column player maintains a distribution over , i.e., . The distributions of the recommender and the adversary are parameterized by and respectively. In a repeated game setting, let be the current parameterizations of the two players. In the next step, the goal of each player is to find optimal parameters and respectively such that their corresponding expected loss is minimized:
Note that , so the game is not zero sum.
We assume that the adversary knows how the oblivious recommender fits the model. In particular, knows ’s loss function , the parametric representation , e.g., low-rank model with latent factors . Thus, the adversary can evaluate how predicted ratings will change for any given fake user ratings matrix augmented to the true ratings matrix . However, one main challenge for is that there is a two-step process going on: (step 1) given some fake ratings, learning say the low-rank model based on the recommendation system objective, typically using non-convex optimization, and (step 2) use the low-rank model to evaluate the adversarial objective. As a result, the adversary typically cannot compute the gradient of the effect w.r.t. . In the sequel we approach the problem of constructing from the adversary’s perspective.
We detail the specifics of the recommender and adversary considered, and discuss our proposed learning approach.
Recommender Strategy. We assume throughout that the recommender is oblivious to the existence of an adversary, hence, it optimizes its loss over all given data—before the attack over only the original training user-item-rating tuples ; after the attack over both and the non-zero ratings of the fake user profiles, succinctly represented as a sparse matrix produced by the adversary, resulting in an augmented training set of , with . In particular, , using parameters and a goodness-of-fit loss function , maps input tuples to estimated scores , so that the loss is minimized
We assume that the recommender is a low rank model; however, our overall approach is not specific to such a model. The low rank recommender has latent factors capturing the latent preferences of users, and capturing the latent attributes of the items, and optimizes its expected loss over its parameters :
where denotes concatenation of two matrices over the row axis. We optimize the loss by alternative minimization (alt-min for short), i.e., alternating the closed-form update equations, for a few iterations . The model has a probabilistic interpretation: the prior parameter distributions are and conditional model , where is the identity matrix, is the -th row of , and the -th row of . Thus, from a Bayesian perspective, can maintain a posterior distribution over its parameters . For computational simplicity, is assumed to pick the mode of the posterior distribution.
The adversary is aware of the model and algorithm, and is able to compute the point estimates for
any chosen . Note that since
algorithm is needed to obtain , the adversary does not have a direct way to do gradient descent
w.r.t. on functions of ; we will return to this point later.
Adversary Strategy. The adversary , with parameters , learns and outputs a (distribution over) fake user matrix , which should satisfy the two goals presented earlier—(G1) the unnoticeability goal and (G2) satisfying an adversarial intent. The intent is captured by the loss function ; our experiments explore various intents.
The intent can be defined over a set of target items , target users , a single target user , or target item . Some examples of intents are:
Target the predicted score for (, ):
Target the mean predicted score for over the target users :
Target recommendation metric@top for :
e.g. Hit Rate (HR), .
Approach for (G1). We generate fake users by using generative adversarial nets (GANs) . In GANs, a pair of Generator-Discriminator networks pitty each other—the generator generating samples with the goal of fooling the discriminator to not being able to distinguish them from real. At convergence, the conditional distribution of the generator
should give fake user samples which cannot be distinguished by from real ones. We discuss details of the GANs architecture to generate fake users for recommendation systems in Section 4.1.
Approach for (G2). To accomplish both (G1) and (G2), one could in principle change the loss of GANs to be a convex combination of two losses: the “perception” loss of fooling (as in GANs formulation) and the “adversarial loss” encoding the adversary’s intent, i.e., .
Instead, we opt for a simpler two-step approach: first train GANs until convergence and sample a set of fake users from the conditional posterior of ; and second, suitably modify the sampled users using a variant of gradient descent to optimize the adversary’s intent over the fake users . In the process, we want to make sure that the resulting fake users helping with the adversary’s intent do not come across as obviously fake.
Let us consider the problem of interest to the adversary:
where recall that plays the role of the actions of the adversary and the argument , i.e., , is dropped for brevity. To optimize (2) we use projected gradient descent for
where the projection is to ensure that the marginals of real and fake users remain close after the descent, is the learning rate, and is the gradient of the adversarial loss w.r.t .
Now the question is, how can we compute the gradient ? To make things concrete, let us consider as adversarial intent: minimize the predicted rating of item over all real users who have not rated the item
At first glance, from (4) the loss is a function of the recommender’s parameters, and not . But, recall from (1) that the parameters of the recommender are a function of —more generally, is a function of (). After playing fake matrix , the adversary player gets to observe the loss only for this single played, and not the other actions/ matrices it could have played. Thus, the adversary gets limited information, or else bandit feedback. Put differently: the gradient of the loss is not directly given for the optimization over .
To obtain an approximation of the gradient, we build upon 0th-order optimization works in bandit optimization [2, 12]. The idea is that if we can only perform query evaluations, to obtain the gradient of , we need to query at two nearby points: and , for a small and a suitable fixed matrix . Then we can compute the gradient as the directional derivative along the direction :
We use a refinement of Algorithm 3 in : there instead of two-point evaluation, they needed directions and computed the gradient using all directions. Instead we use as directions the top left and right singular vectors of the fake user matrix at round , obtained from a Singular Value Decomposition on : . Let be the rank one matrices built from each left and right singular vectors of , for , where is the rank of . Then, we can use these rank-1 matrices as possible directions, and compute the matrix gradient based on these:
This involves evaluations of the function . To make things faster, we use warm-start—we first evaluate , then, for any , we use the final for to warm start the iterates. Similar strategies have been studied in stochastic and evolutionary optimization [7, 20].
Algorithm 1 summarizes our proposed learning approach.
We design our experiments to understand the effectiveness of the proposed approach in creating an adversary model that produces fake users which cannot be distinguished from real, and which influence in some way the recommender .
4.1 Can We Learn Realistic User Profiles?
The first question we investigate is whether with generative adversarial nets we can learn fake user profiles that seem like real.
Network Architecture. We used the DCGAN architecture , thanks to its good empirical performance. The Discriminator takes an image of size (fake or real user sample) and outputs either a or a (is it fake or real?). It consists of four 2D convolutional (CONV) units, with leaky ReLUs and batch normalization (BN), whose depths are respectively , followed by a single-output fully connected (FC) unit with sigmoid activation. The Generator takes as input noise and outputs an image (the fake user sample). It consists of a FC unit of dimension with ReLU and BN, reshaped to a image, followed by four transposed CONV units of depths respectively, each with ReLU and BN, except for the final with a tanh. We set for the (transposed) CONV units the stride to 2, the kernel size to . We set batch size to 64, and run DCGAN for 100 epochs (each epoch does a cyclic pass over all batches.)
|Dataset||# of Items||2D Shape||# of Users|
Since we want to learn user profiles for recommendation systems, we used two popular movie recommendation datasets, MovieLens 100K, MovieLens 1M , whose statistics are shown in Table 1. They contain the ratings of users on different movies in the scale with 5 the highest like, and 0 denoting that the user has not rated the movie. As the last layer of has the tanh activation function, fake user samples are in ; hence, using we transformed the real ratings from to .
Setup. Each user profile is an -d sparse vector, which needs to be transformed to a 2D array to go through the DCGAN 2D (de-)convolutional units. We set as the smallest factor of and as . This way, each user is viewed as a 2D image with pixel values the ratings of the user on the different items.
Results. Periodically during training, we sample 64 fake users from the conditional posterior of the generator and visualize them in a grid of 8 by 8 HW images. In Figure 1 we illustrate the progress of the samples during training for MovieLens 1M, with the first column visualizing the first 64 real users. We see that in the first epochs the sampled users are noise, but as training goes on, the real distribution seems to be learned; similar results hold for MovieLens 100K.
We also want to validate quantitatively that the fake user distribution is close to the real one at DCGAN’s training convergence. For these experiments, we sample 700 fake users from so that the size of the real and fake user distribution—at least for MovieLens 100K—is comparable.
We compute the correlation matrix over items of the fake data (based on sampled from the conditional posterior of the learned of the last training epoch), and the correlation matrix of the real data , and we compare their respective eigenspectrum; specifically their top-10 eigenvalues.
We compute distance metrics between the real and fake user distributions: For each item , the real users form a distribution over the rating values (i.e., for MovieLens 100K with 943 samples), and the fake users form a distribution over (with 700 samples). Then, we discretize the values to the six bins (corresponding to ): for each of the six bins, we compute the fraction of (real or fake) users who have rated in this bin out of all (real or fake) users. For a certain item , we compute two such six-dimensional vectors, one for the real users, and a second for the fake users, and then use the following metrics to measure distance between distributions :
where and is the Kullback-Leibler divergence. After we have computed the distance metric for each item, we report the average over all items, i.e, mean TVD and mean JS Div.
In Figure 2 we plot the (a) top-10 eigenvalues and (b) JS Divergence for the MovieLens 1M dataset. For (b), the reported results are averaged over five different runs of DCGAN. The results shown, along with similar results we have observed for MovieLens 100K, indicate our first key finding:
Generative adversarial nets can produce fake user samples whose distribution is close to the real user distribution.
4.2 Experimental Design
To evaluate the adversary ’s capability of attacking a recommender , we use the two-phased approach described in Section 3: (1) We first train DCGAN on the respective dataset, as presented before in 4.1, and (2) we then perform the -SGD updates, which are initialized by a sample of DCGAN-generated fake users, denoted by , transformed from the to the expected by the recommender range of .
Throughout all experiments, the sample size of will be set to 64; the 64 fake users, iteratively optimized during the -SGD updates, represent only 0.063 fraction of all system users (real and fake) for MovieLens 100K, and 0.01 fraction for MovieLens 1M.
We use two types of experimental setups:
targets unrated user-item entries (thus entries which are candidates for recommendation) not included in the training of .
targets a small subset from the recommender’s true (user, item, rating) tuples, which is held out from the training of .
For (E2), we define a target set where the adversarial loss is optimized over, and a test set which is unknown to both the recommender and the adversary—the test set is used to check whether the success of adversarial intent generalizes from the target to the test set. We form the target and test sets in two ways: (E2-a) either using the leave-one-out setup, i.e., leaving one tuple per user in the target set (and another tuple in the test set), or (E2-b) an 80-10-10 split, i.e., splitting the original dataset into 80% of the total ratings per user for training, 10% for the target set, and 10% for the test set.
The recommender under attack is a matrix factorization (else low rank) model, trained on explicit ratings in the scale . Unless otherwise specified, we set the latent factor dimension to 40, the regularization parameter to 0.001, and is trained before the attack for 10 alt-min iterations.
For the adversary , we set the SVD approximation rank to 30, and the approximate gradient step constant to 0.0001. During a single Z-SGD iteration for each of the evaluations, 5 alt-min iterations of are performed.
We perform warm-start, i.e., for the -SGD iteration, ’s parameters are initialized from the ones obtained at the end of the alt-min iterations from the previous -SGD step.
To be consistent with the original movie ratings from the datasets, every time is evaluated, e.g. either during the approximate gradient computation or the loss computation, the values are rounded to the closest integers, and get clipped to . Also, to ensure that while performing the -SGD updates, the fake user distribution does not diverge from the real distribution, we perform projected gradient descent:
where (9) corresponds to a box-projection.
To evaluate the success of the adversary, we use various metrics; every time we introduce a new metric, we will use bold letters. Overall, we use the metric of Attack Difference, (or else magnitude of the attack) denoted by :
where denotes the value of the adversarial loss before is concatenated with the real users’ data (before the attack). In other words this metric expresses the decrease of the adversarial loss. In order for the attack to be considered successful, needs to be at least positive, and ideally larger than zero by a certain margin. Depending on the intent of the adversary, as encoded by , this metric could imply for example a decrease in the predicted score of a target item—in which case, it is related to the “prediction shift metric” [23, 28]—, or a change in the recommendation quality of a target group of users.
Each of the following sections introduces a separate attack type, as specified by target user(s), item(s) and intent of .
4.3 Targeting a User-Item Pair
We start with the adversarial intent: can learn realistic users that reduce the predicted score for an unrated user-item entry? For this, we adopt the (E1) experimental setup.
Let target user be denoted with and target item with , where . The adversarial loss is the predicted score for : and the magnitude of the attack is where denotes the predicted score for the pair by R before the attack.
We set to 100, to 50, to 5. The adversary performs a total of -SGD iterations, or fewer if a certain stopping criterion is satisfied. We explore two stopping criteria, and two cases for how to specify the target entry.
Stopping criterion is
First, we considered to stop the -SGD iterations when the predicted score for () is decreased by at least 1 after the attack. We performed this experiment for 70 uniformly at random sampled items for the MovieLens 100K dataset. For each target item , we sampled target user from the set of users who have not rated . Considering an attack successful only if is larger than 0, we found that for only 2 out of the 70 sampled target items the attack was not successful.
Targeting Top Item of User, Stopping criterion: Remove from Top
Next, we used the early-stopping criterion that the item does not exist in the top of the recommendation list anymore, as this aligns better with the actual user experience in a recommendation setting. For this, we randomly sampled target users, and for each user , we considered as target item the one out of the unrated set predicted to be at the top of the user’s list before the attack—simply put, the top item of user . Beyond the attack difference metric 10, and the distance metrics (6), (7), we report the metric of = , where we considered .
In the first experiment, for the stopping criterion, we set to remove from the top-1 list: we found that out of the 135 sampled target users, only for one user the top-one item remained at the top.
In the second experiment, we set for the stopping criterion to remove from the top-10. We found that out of the 55 sampled users, only for two users the attack was not successful—for the rest, notably, the adversary managed to remove the target item from the target user’s top-10 list, while looking realistic. As an example, Figure 3 illustrates the metrics for the movie “A Little Princess” that appeared in the top-1 of user ID-0 before the attack. We can see that the attack is successful, removing the top item from the top-1 at iteration 17, and from top-10 at iteration 19 (thus stopping the Z-SGD updates), optimizing well the adversarial loss of the estimated score for the entry (yellow line), while the real-fake distribution distance metrics of mean TVD and mean JS Div (green lines) remain close to 0.
This experiment illustrates our second key finding:
The adversary can successfully target the top-1 predicted item of a user, and remove it from the top-10.
4.4 Targeting Item’s Mean Predicted Score
Here, we examine whether the adversary can accomplish a more ambitious goal: can target (push down) the mean predicted score of a target item over all real users who have not rated in the training dataset?—again, we adopt the (E1) experimental setup. The reason for choosing this target user set is because these are the users for which can be a candidate item for recommendation. This intent can be formulated as:
and the attack will be successful if becomes smaller than the predicted score .
We keep the same setting as before, except for setting and choosing in , as we found that for this experiment larger values tend to lead to larger . The early-stopping criterion is . We performed this experiment for 29 randomly chosen target items from MovieLens 100K and found that out of the 29 items, only for 6 items early stopping was realized. Also, for 6 out of the 29 items, the attack was unsuccessful; , i.e., the average score after the attack remained the same or increased. Overall, we conclude that:
Targeting the average predicted score of an item is a hard task.
To understand why this happens, we examine how the distribution of over all users who have not rated target item evolves over the Z-SGD iterations. From Figure 4 we can see for the sampled movie “Mille bolle blu (1993)” (similar behavior is noticed in the others too), that although the average difference reached 0.2 (magenta line in Figure 4, left panel), every user’s attack difference follows its own trend (Figure 4, right panel); with mainly the users with the largest or smallest affecting the average . This shows that the fake users cannot move all users’ scores on simultaneously to the same direction.
4.5 Targeting the Top User of an Item
In reality, to attack a target item, the adversary does not need to solve the more difficult problem of pushing down all unrated users’s score. Instead, they only need to push the score of users who would be good candidates for getting this item in their recommendations. Put differently, these are the users with the higher predicted scores from before the attack; the rest of the users would not get in their recommendations either way.
In this experiment, the adversary’s intent is to target the top user of an item, i.e., the user from with the largest predicted score from before the attack. This can be seen again as a targeting a single entry attack, that we found earlier in Section 4.3 to be a successful attack; but while can be any arbitrary target item, is the top user of the item.
In Figures 5, 6 we show the results of targeting the top user of item “The Joy Luck Club” (similar results hold for other target movies). We can see that just by targeting the predicted score of the most-wanting-“The Joy Luck Club” userID 1417, the scores of all other users who have not rated get affected too; similarly the scores of userID 1417 for all the other movies he has not rated get affected.
Figure 5 shows how the mean attack difference (y-axis) when considering only the top/ bottom users for the item (left panel), or when considering only the top/ bottom items for the user (right panel), varies as we vary the top/ bottom (x-axis).
Figure 5, left panel shows that although the average difference over all users who have not rated is only .48, if we consider only the top-5 users with the highest prediction scores for before the attack, the average difference is 8. In fact, for the top-80 users the average difference is larger than 2, whereas for the users who were predicted to hate the item the most (bottom-5) the average difference is close to -6; this means that for users who were predicted to least like the item, the predicted score increased, which is opposite to what a good recommender should do.
The right panel illustrates a similar story for the items. The average difference on the items which were predicted to be liked the most by before the attack is 60 (items which were a good fit for this user are pushed down), and the average difference on the items which were predicted to be liked the least by the user is -70 (items which were a bad fit for the user are pushed up).
This illustrates an important finding of this work:
A successful attack on item is: When targets the score of the top user, i.e., the user predicted by to like the most before the attack, then the top- users for and top- items for are also attacked.
On a side note, we want to see whether there is a relationship among the attack differences of the various users for the target item with the user-target user correlations; or among the s of the target user for the various items with the item-target item correlations. Thus, Figure 6 shows how the different users’ (left panel) or items’ s (right panel) vary as a function of the correlation with the target user or target item respectively. We compute the correlations (a) based on the estimated latent factors () of the before the attack, or (b) based on the true rating matrix . We find that for items with small correlation with the target item, the variance of the differences is large; as the correlation increases, the variance reduces. We also find that for the users who are most correlated with the target user ( factor-based correlation), as the correlation increases the attack difference becomes larger.
4.6 Targeting a Group of Items
Next, we focus on attacks that target an entire group of items, in contrast to the presented experiments so far, where a single item was the target of each attack. We examine two adversarial goals:
minimize the mean predicted score over all items in a group, and
maximize the prediction error, as measured by mean absolute error, over a group.
Experimental Setup. We adopt the (E2) experimental setup (Section 4.2) of using the extra information of a “target set”—the adversary has the added power of, besides making queries to access ’s predictions, being able to target some held-out tuples of (user, item, score) which have not been used as part of ’s true training data. This is in contrast to the (E1) setup where targeted one or a set of unrated user-item entries.
We used the (E2-b) setup of 80-10-10 split of ratings per user.
To define the target item groups, we explored four different ways: (i) grouping them into 10-percentile groups based on the predicted scores from before the attack, (ii) using the side information of movie genre, where the same movie can belong to multiple groups, (iii) 10-percentile groups based on the prediction error of before the attack, or (iv) 10-percentile groups based on number of training ratings per item. To be more precise, for (i) and (iii), we first computed for each item ’s average predicted scores, and average mean absolute error respectively, over the corresponding (user, , score) tuples in the target set, and we then divided them into deciles based on these values.
Setting. For , we set to 100, and to 0.1, and we train it for 100 alt-min iterations before the attack. For the adversary , we set to 1000, to 5, to 50, and . We report the best results from ’s side, for the Z-SGD iteration with the best value of in the target set. We report the
where is given by (10).
Figure 7 focuses on goal (A1)— decreasing the average predicted score from over an item group, defined as the mean of the predicted scores over the subset of the target held-out user-item entries belonging to that group. Figure 7, left panel shows that tends to be capable of larger % target improved, i.e., larger % of decrease in the predicted score, for the movie groups with larger original predicted score; can achieve up to 10.9% for the bucket with predicted scores before the attack between [4.47, 6.53). This is interesting, as these are the entries which would be more likely to appear on users’ lists, if the attack did not happen. Figure 7, right panel shows that when grouping the movies into buckets based on their genre, can achieve up to 6.1% decrease in predicted score for the adventure genre; the second largest % was found for the unknown genre.
Figure 8 focuses on goal (A2)—maximizing the target prediction error of a group. The left panel shows the results for grouping movies based on the target prediction error of before the attack, and shows that can achieve up to 59.3% target prediction error increase for the well-modeled buckets, i.e., those with [0.02, 0.49) error before the attack. The right panel shows that can achieve up to 4.2 % error increase for the item bucket with [79, 134) training ratings; however, the groups with fewer than 4 training ratings were not successfully targeted.
4.7 Targeting Improved Modeling for a Group of Users or Items
In the last set of experiments, the adversary’s intent is to achieve improvement in the modeling of groups of users or items in the target set—we will refer to those as targeted improvements. The difference between this experiment and Section 4.6 is that here, wants to improve how models groups of users or items, whereas before in the (A2) goal wanted to deteriorate how modeled the groups of items.
We examine three goals:
improve the average recommendation quality, as measured by Hit Rate@10 (checking whether on average per user, the user’s held-out entry is included in their top-10 recommender list (hit=1), or not (hit=0)), over a user group,
improve the modeling, as measured by mean absolute predicted error, over a group of items, and
ensure that two user groups are equally well modeled, i.e., the gap between their modeling errors is reduced.
Goal (I2) is essentially the negative of goal (A2). In all three (I1), (I2), (I3) goals, the metrics are defined in the target set.
The setup followed is again the (E2) experimental setup (Section 4.2), and the parameter setting stays the same as described in Section 4.6. However, for the sub-experiment focusing on the (I1) goal, the target set is formed based on the (E2-a) leave-one-out setup so to compute the Hit Rates, whereas for (I2), (I3) the (E2-b) 80-10-10 split was used.
We explore different ways of defining the target user or item groups: For the experiment realizing goal (I1) we group users into 10-percentile groups based on number of training ratings per user, or based on the side information of age. For the experiment realizing goal (I2), we create the item groups by grouping movies into 10-percentile groups based on number of training ratings per item. Finally, for the goal (I3)-experiment, we create user groups either based on side information, dividing into two user groups based on gender, i.e., male and female, or dividing them into four 25-percentile groups using the number of training ratings per user.
For the (I1), (I2) set of experiments, we report again the % target improved metric, i.e., the percentage of improvement in the average Hit Rate over the users belonging to the group, or the percentage of decrease in the mean absolute error over the target set. For (I3), we measure the gap, i.e., absolute difference, among the two prediction errors in the target set we optimize over.
Figure 9 focuses on goal (I1), i.e., improving the Hit Rates of certain user groups. From Figure 9, left panel we find that the user group with the largest number of ratings is not improved, in fact, it is hurt by the fake users. The groups which benefit the most from tend to be those in the middle of the rating distribution. Figure 9, right panel shows that grouping users into deciles [7, 20), [20, 23) up to [51, 73), a targeted improvement is possible; with the largest observed for the youngest age bucket. But, these targeted improvements do not transfer to an unseen test set from the same age group (yellow bars). We argue that this happens as the before-the-attack trends of HRs over the age groups in the target and test set differ (Figure 11, right).
Figure 10, left panel focuses on goal (I2), and shows the % improvement results in the modeling of item groups defined based on the number of training ratings per item. We find that groups with the smallest target prediction errors before the “attack” (data labels annotated on top of the bars in the plot), are the ones which are better targeted, i.e., the ones with the largest % target improved. From Figure 11, left we can see that the original metrics of the target and test set hold similar trends across groups, which might be one reason why the attack generalizes here (further future analysis is needed).
Last, we focus on goal (I3), which can be viewed as ensuring fair treatment of two user groups. By grouping users into two groups, males and females, and measuring the target mean absolute prediction error, we find that: the error for (females, males) was before the “attack” (1.299, 1.339); during the “attack”, attempts to minimize the error of the group with the larger original error; and after the “attack” the rounded prediction errors became equal: (1.281, 1.281), improving the absolute gap between the two groups from 0.04 to 0.0001 (without the rounding). Similar trends can be found when defining user groups based on number of ratings, or the age side information. For example, Figure 10, right panel shows that for 25-percentile groups 0, 1, 2, and 3 of users based on number of training ratings per user, the gap between group 0 and 3 becomes 0. Also, the plot shows how targeting 0-3 gap affects the prediction errors of the other groups as well as -SGD updates progress—this observation holds for all group-based attacks: targeting a group has an effect of improvement/ decrease in the other groups, too. This inter-connection among the groups seems to play a role for understanding when/ why attacks generalize to a test set (to be explored in the future).
These results serve as proof of concept that: Together the results of this and the previous section serve as proof of concept that:
The fake users of can affect (improve/ deteriorate) how models user or item groups in a target set.
More rigorous analysis though is needed to understand patterns for the user/ item groups that can enjoy larger targeted improvements.
Overall, our experiments indicate that an adversary can achieve a variery of intents on target groups of users or items defined in a variety of ways, while the distance between the real user and fake user distribution remains close111Although from Section 4.5 onwards we omitted the distance metrics, in all experiments mean TVD and mean JS Divergence remain close to 0.. It is important to emphasize again that this happens under the assumptions that the adversary has the ability to:
query the recommender for predicted scores,
know the underlying true distribution of user-item ratings (so to be able to create realistic-looking user profiles), and
under the (E2) setup, to target true—not unrated as in the (E1) setup—user-item entries; which however, might not be realizable in real-world settings,
while is oblivious to , and is periodically (at every Z-SGD update) warm-start retrained over both real and fake ratings for a few alt-min iterations. Nevertheless, it is still notable that the adversary can affect the recommender’s predictions; with perhaps the most interesting result being that just by targeting the top user predicted for an item, all top users predicted by for this item, and all top items predicted by for this user are successfully targeted as well.
5 Related Work
Most works on attacking a recommender system injecting fake users (i.e., “shilling attacks”), have focused on engineering user profiles with high/ small score on the target item(s) and average or normal-distributed scores for (a subset of) the other items. These approaches vary in terms of: the recommender model under attack (e.g. user/ item-based collaborative filtering, model/ memory-based), the adversary’s knowledge (e.g. of the true rating distribution, the recommender’s architecture or parameters), the attack intent (e.g. promote/ demote an item or group), and the adversary’s success metric [3, 8, 25, 29, 28]. Our work is the first where the adversarial fake user generator learns to generate fake profiles in an end-to-end fashion, capturing different adversarial intents. Our approach is demonstrated for a low rank , assuming that knows the true rating distribution, and can evaluate ’s objective, but not its gradient.
By learning fake user profiles, we attempt to bridge the gap between shilling attacks and works on adversarial examples [34, 27, 26, 17, 19]. These have largely focused on classification; only recently a new adversarial attack type was introduced for graph data . Our approach, although related with works on adversarial examples, has important differences: it (i) consists of injecting fake user profiles during training, instead of perturbing the features of examples on a deployed trained model, and (ii) considers the recommendation problem, which can lead to attacks of considerably bigger size than e.g. one-pixel attacks , or small-norm perturbation attacks—especially since recommendation models rely on the assumption that similar users tend to like items similarly.
In  adversarial classification was formulated in a game-theory framework, giving an optimal classifier given the adversary’s optimal strategy. Our work focuses on recommendation and presents the strategy from the adversary’s view, considering an oblivious recommender.
Our work is the first to apply generative adversarial nets (GANs) , to produce realistic-looking fake users. Previous works have used GANs in recommenders, either for better matrix reconstruction , or to find visually similar item recommendations —but never to learn the rating distribution.
When injecting fake user profiles, the original rating data is augmented. Hence, data augmentation works become related . Also, our experiments in Section 4.7 make works on adversarial training to tailor representations [14, 13, 5], or on better subset modeling [6, 9, 10] relevant.
6 Conclusions and Future Directions
In this paper, we presented the first work on machine learned adversarial attacks to recommendation systems. We introduced the framework of adversarial recommendation, posed as a game between a low rank recommender oblivious to the adversary’s existence, and an adversary aiming to generate fake user profiles that are realistic-looking, and optimize some adversarial intent. Our experiments showed that adversarial attacks for a variety of intents are possible, while remaining unnoticeable. A notable attack example is that to ruin the predicted scores of a specific item for users who would have loved or hated that item, it suffices to minimize the predicted score of the top predicted user for that item before the attack.
This study provides several interesting directions for future research on adversarial recommendation. First, our approach needs to be tested in other recommendation datasets, and against other recommendation models, while also varying the knowledge of the adversary . Second, our work has only scratched the surface on how data augmentation with adversarially learned users can improve the modeling of certain groups; further research is needed to develop a more thorough understanding as to when it could help, and how it would compare with alternative techniques [6, 5]. Third, other optimization objectives could be encoded for the adversarial intent so to craft the recommendations / representations for certain goals . Also, generating new realistic-looking user profiles could be used to improve testbeds of recommendation algorithms. Finally, one of the most important directions is to to create adversary-aware recommenders, and evaluate the degree to which they, as well as existing robust recommenders , can resist to machine learned attacks.
Acknowledgements: The research was supported by NSF grants IIS-1563950, IIS1447566, IIS-1447574, IIS-1422557, CCF-1451986, CNS-1314560, IIS-0953274, IIS-1029711, NASA grant NNX-12AQ39A, and gifts from Adobe, IBM, and Yahoo.
-  Gediminas Adomavicius and Jingjing Zhang. Stability of recommendation algorithms. ACM Transactions on Information Systems (TOIS), 30(4):23, 2012.
-  Alekh Agarwal, Ofer Dekel, and Lin Xiao. Optimal algorithms for online convex optimization with multi-point bandit feedback. In COLT, pages 28–40. Citeseer, 2010.
-  Charu C Aggarwal. Attack-resistant recommender systems. In Recommender Systems, pages 385–410. Springer, 2016.
-  Antreas Antoniou, Amos Storkey, and Harrison Edwards. Data augmentation generative adversarial networks. arXiv preprint arXiv:1711.04340, 2017.
-  Alex Beutel, Jilin Chen, Zhe Zhao, and Ed H Chi. Data decisions and theoretical implications when adversarially learning fair representations. arXiv preprint arXiv:1707.00075, 2017.
-  Alex Beutel, Ed H Chi, Zhiyuan Cheng, Hubert Pham, and John Anderson. Beyond globally optimal: Focused learning for improved recommendations. In Proceedings of the 26th International Conference on World Wide Web, pages 203–212. International World Wide Web Conferences Steering Committee, 2017.
-  Shalabh Bhatnagar, HL Prasad, and LA Prashanth. Stochastic recursive algorithms for optimization: simultaneous perturbation methods, volume 434. Springer, 2012.
-  Robin Burke, Michael P Oï¿½Mahony, and Neil J Hurley. Robust collaborative recommendation. In Recommender systems handbook, pages 961–995. Springer, 2015.
-  Evangelia Christakopoulou and George Karypis. Local item-item models for top-n recommendation. In Proceedings of the 10th ACM Conference on Recommender Systems, pages 67–74. ACM, 2016.
-  Evangelia Christakopoulou and George Karypis. Local latent space models for top-n recommendation. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 1235–1243. ACM, 2018.
-  Nilesh Dalvi, Pedro Domingos, Sumit Sanghai, Deepak Verma, et al. Adversarial classification. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 99–108. ACM, 2004.
-  John C Duchi, Michael I Jordan, Martin J Wainwright, and Andre Wibisono. Optimal rates for zero-order convex optimization: The power of two function evaluations. IEEE Transactions on Information Theory, 61(5):2788–2806, 2015.
-  Harrison Edwards and Amos Storkey. Censoring representations with an adversary. arXiv preprint arXiv:1511.05897, 2015.
-  Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, Francois Laviolette, Mario Marchand, and Victor Lempitsky. Domain-adversarial training of neural networks. The Journal of Machine Learning Research, 17(1):2096–2030, 2016.
-  Ian Goodfellow, Yoshua Bengio, Aaron Courville, and Yoshua Bengio. Deep learning, volume 1. MIT press Cambridge, 2016.
-  Ian Goodfellow, Patrick McDaniel, and Nicolas Papernot. Making machine learning robust against adversarial inputs. Communications of the ACM, 61(7):56–66, 2018.
-  Ian Goodfellow, Nicolas Papernot, Patrick McDaniel, R Feinman, F Faghri, A Matyasko, K Hambardzumyan, YL Juang, A Kurakin, R Sheatsley, et al. cleverhans v0. 1: an adversarial machine learning library. arXiv preprint, 2016.
-  Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In NIPS, pages 2672–2680, 2014.
-  Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
-  Nikolaus Hansen and Andreas Ostermeier. Completely derandomized self-adaptation in evolution strategies. Evolutionary computation, 9(2):159–195, 2001.
-  F Maxwell Harper and Joseph A Konstan. The movielens datasets: History and context. Acm transactions on interactive intelligent systems (tiis), 5(4):19, 2016.
-  Wang-Cheng Kang, Chen Fang, Zhaowen Wang, and Julian McAuley. Visually-aware fashion recommendation and design with generative image models. In Data Mining (ICDM), 2017 IEEE International Conference on, pages 207–216. IEEE, 2017.
-  Shyong K Lam and John Riedl. Shilling recommender systems for fun and profit. In WWW, pages 393–402. ACM, 2004.
-  Andriy Mnih and Ruslan R Salakhutdinov. Probabilistic matrix factorization. In Advances in neural information processing systems, pages 1257–1264, 2008.
-  Bamshad Mobasher, Robin Burke, Runa Bhaumik, and Chad Williams. Toward trustworthy recommender systems: An analysis of attack models and algorithm robustness. ACM Transactions on Internet Technology (TOIT), 7(4):23, 2007.
-  Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Omar Fawzi, and Pascal Frossard. Universal adversarial perturbations. arXiv preprint arXiv:1610.08401, 2016.
-  Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool: a simple and accurate method to fool deep neural networks. In CVPR, pages 2574–2582, 2016.
-  Michael O’Mahony, Neil Hurley, Nicholas Kushmerick, and Guénolé Silvestre. Collaborative recommendation: A robustness analysis. ACM Transactions on Internet Technology (TOIT), 4(4):344–377, 2004.
-  Michael P OâMahony, Neil J Hurley, and Guenole CM Silvestre. Promoting recommendations: An attack on collaborative filtering. In International Conference on Database and Expert Systems Applications, pages 494–503. Springer, 2002.
-  Nicolas Papernot, Patrick McDaniel, Arunesh Sinha, and Michael Wellman. Towards the science of security and privacy in machine learning. arXiv preprint arXiv:1611.03814, 2016.
-  Alec Radford, Luke Metz, and Soumith Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434, 2015.
-  Al Mamunur Rashid, George Karypis, and John Riedl. Influence in ratings-based recommender systems: An algorithm-independent approach. In Proceedings of the 2005 SIAM International Conference on Data Mining, pages 556–560. SIAM, 2005.
-  Paul Resnick and Rahul Sami. The information cost of manipulation-resistance in recommender systems. In Proceedings of the 2008 ACM conference on Recommender systems, pages 147–154. ACM, 2008.
-  Jiawei Su, Danilo Vasconcellos Vargas, and Sakurai Kouichi. One pixel attack for fooling deep neural networks. arXiv preprint arXiv:1710.08864, 2017.
-  Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
-  Jun Wang, Lantao Yu, Weinan Zhang, Yu Gong, Yinghui Xu, Benyou Wang, Peng Zhang, and Dell Zhang. Irgan: A minimax game for unifying generative and discriminative information retrieval models. In Proceedings of the 40th International ACM SIGIR conference on Research and Development in Information Retrieval, pages 515–524. ACM, 2017.
-  Daniel Zügner, Amir Akbarnejad, and Stephan Günnemann. Adversarial attacks on neural networks for graph data. In KDD, pages 2847–2856. ACM, 2018.