Security Evaluation of Support Vector Machines in Adversarial Environments
Abstract
Support Vector Machines (SVMs) are among the most popular classification techniques adopted in security applications like malware detection, intrusion detection, and spam filtering. However, if SVMs are to be incorporated in realworld security systems, they must be able to cope with attack patterns that can either mislead the learning algorithm (poisoning), evade detection (evasion), or gain information about their internal parameters (privacy breaches). The main contributions of this chapter are twofold. First, we introduce a formal general framework for the empirical evaluation of the security of machinelearning systems. Second, according to our framework, we demonstrate the feasibility of evasion, poisoning and privacy attacks against SVMs in realworld security problems. For each attack technique, we evaluate its impact and discuss whether (and how) it can be countered through an adversaryaware design of SVMs. Our experiments are easily reproducible thanks to opensource code that we have made available, together with all the employed datasets, on a public repository.
1 Introduction
Machinelearning and patternrecognition techniques are increasingly being adopted in security applications like spam filtering, network intrusion detection, and malware detection due to their ability to generalize, and to potentially detect novel attacks or variants of known ones. Support Vector Machines (SVMs) are among the most successful techniques that have been applied for this purpose drucker99 (); perdisciICDM06 ().
However, learning algorithms like SVMs assume stationarity: that is, both the data used to train the classifier and the operational data it classifies are sampled from the same (though possibly unknown) distribution. Meanwhile, in adversarial settings such as the above mentioned ones, intelligent and adaptive adversaries may purposely manipulate data (violating stationarity) to exploit existing vulnerabilities of learning algorithms, and to impair the entire system. This raises several open issues, related to whether machinelearning techniques can be safely adopted in securitysensitive tasks, or if they must (and can) be redesigned for this purpose. In particular, the main open issues to be addressed include:

analyzing the vulnerabilities of learning algorithms;

evaluating their security by implementing the corresponding attacks; and

eventually, designing suitable countermeasures.
These issues are currently addressed in the emerging research area of adversarial machine learning, at the intersection between computer security and machine learning. This field is receiving growing interest from the research community, as witnessed by an increasing number of recent events: the NIPS Workshop on “Machine Learning in Adversarial Environments for Computer Security” (2007) nips07adv (); the subsequent Special Issue of the Machine Learning journal titled “Machine Learning in Adversarial Environments” (2010) laskov10ed (); the 2010 UCLA IPAM workshop on “Statistical and LearningTheoretic Challenges in Data Privacy”; the ECMLPKDD Workshop on “Privacy and Security issues in Data Mining and Machine Learning” (2010) psdml10 (); five consecutive CCS Workshops on “Artificial Intelligence and Security” (20082012) AISEC1 (); AISEC2 (); AISEC3 (); AISEC4 (); AISEC5 (), and the Dagstuhl Perspectives Workshop on “Machine Learning for Computer Security” (2012) dagstuhl12adv ().
In Section 2, we review the literature of adversarial machine learning, focusing mainly on the issue of security evaluation. We discuss both theoretical work and applications, including examples of how learning can be attacked in practical scenarios, either during its training phase (i.e., poisoning attacks that contaminate the learner’s training data to mislead it) or during its deployment phase (i.e., evasion attacks that circumvent the learned classifier).
In Section 3, we summarize our recently defined framework for the empirical evaluation of classifiers’ security biggio12tkde (). It is based on a general model of an adversary that builds on previous models and guidelines proposed in the literature of adversarial machine learning. We expound on the assumptions of the adversary’s goal, knowledge and capabilities that comprise this model, which also easily accommodate applicationspecific constraints. Having detailed the assumptions of his adversary, a security analyst can formalize the adversary’s strategy as an optimization problem.
We then demonstrate our framework by applying it to assess the security of SVMs. We discuss our recently devised evasion attacks against SVMs biggio13ecml () in Section 4, and review and extend our recent work biggio12icml () on poisoning attacks against SVMs in Section 5. We show that the optimization problems corresponding to the above attack strategies can be solved through simple gradientdescent algorithms. The experimental results for these evasion and poisoning attacks show that the SVM is vulnerable to these threats for both linear and nonlinear kernels in several realistic application domains including handwritten digit classification and malware detection for PDF files. We further explore the threat of privacybreaching attacks aimed at the SVM’s training data in Section 6 where we apply our framework to precisely describe the setting and threat model.
Our analysis provides useful insights into the potential security threats from the usage of learning algorithms (and, particularly, of SVMs) in realworld applications, and sheds light on whether they can be safely adopted for securitysensitive tasks. The presented analysis allows a system designer to quantify the security risk entailed by an SVMbased detector so that he may weigh it against the benefits provided by the learning. It further suggests guidelines and countermeasures that may mitigate threats and thereby improve overall system security. These aspects are discussed for evasion and poisoning attacks in Sections 4 and 5. In Section 6 we focus on developing countermeasures for privacy attacks that are endowed with strong theoretical guarantees within the framework of differential privacy. We conclude with a summary and discussion in Section 7.
In order to support the reproducibility of our experiments, we published all the code and the data employed for the experimental evaluations described in this paper advlib (). In particular, our code is released under opensource license, and carefully documented, with the aim of allowing other researchers to not only reproduce, but also customize, extend and improve our work.
2 Background
In this section, we review the main concepts used throughout this chapter. We first introduce our notation and summarize the SVM learning problem. We then motivate the need for the proper assessment of the security of a learning algorithm so that it can be applied to securitysensitive tasks.
Learning can be generally stated as a process by which data is used to form a hypothesis that performs better than an a priori hypothesis formed without the data. For our purposes, the hypotheses will be represented as functions of the form , which assign an input sample point to a class ; that is, given an observation from the input space , a hypothesis makes a prediction in the output space . For binary classification, the output space is binary and we use . In the classical supervised learning setting, we are given a paired training dataset , we assume each pair is drawn independently from an unknown joint distribution , and we want to infer a classifier able to generalize well on ; i.e., to accurately predict the label of an unseen sample drawn from that distribution.
2.1 Support Vector Machines
In its simplest formulation, an SVM learns a linear classifier for a binary classification problem. Its decision function is thus , where () if (), and and are learned parameters that specify the position of the decision hyperplane in feature space: the hyperplane’s normal gives its orientation and is its displacement. The learning task is thus to find a hyperplane that wellseparates the two classes. While many hyperplanes may suffice for this task, the SVM hyperplane both separates the training samples of the two classes and provides a maximum distance from itself to the nearest training point (this distance is called the classifier’s margin), since maximummargin learning generally reduces generalization error vapnik95book (). Although originally designed for linearlyseparable classification tasks (hardmargin SVMs), SVMs were extended to nonlinearlyseparable classification problems by Vapnik vapnik95 () (softmargin SVMs), which allow some samples to violate the margin. In particular, a softmargin SVM is learned by solving the following convex quadratic program (QP):
s. t. 
where the margin is maximized by minimizing , and the variables (referred to as slack variables) represent the extent to which the samples, , violate the margin. The parameter tunes the tradeoff between minimizing the sum of the slack violation errors and maximizing the margin.
While the primal can be optimized directly, it is often solved via its (Lagrangian) dual problem written in terms of Lagrange multipliers, , which are constrained so that and for . Solving the dual has a computational complexity that grows according to the size of the training data as opposed to the feature space’s dimensionality. Further, in the dual formulation, both the data and the slack variables become implicitly represented—the data is represented by a kernel matrix, , of all inner products between pairs of data points (that is, ) and each slack variable is associated with a Lagrangian multiplier via the KKT conditions that arise from duality. Using the method of Lagrangian multipliers, the dual problem is derived, in matrix form, as
s. t. 
where (the Hadamard product of and ) and is a vector of ones.
Through the kernel matrix, SVMs can be extended to more complex feature spaces (where a linear classifier may perform better) via a kernel function—an implicit inner product from the alternative feature space. That is, if some function maps training samples into a higherdimensional feature space, then is computed via the space’s corresponding kernel function, . Thus, one need not explicitly know , only its corresponding kernel function.
Further, the dual problem and its KKT conditions elicit interesting properties of the SVM. First, the optimal primal hyperplane’s normal vector, , is a linear combination of the training samples;^{1}^{1}1This is an instance of the Representer Theorem which states that solutions to a large class of regularized ERM problems lie in the span of the training data representer (). i.e., . Second, the dual solution is sparse, and only samples that lie on or within the hyperplane’s margin have a nonzero value. Thus, if , the corresponding sample is correctly classified, lies beyond the margin (i.e., ) and is called a nonsupport vector. If , the sample violates the margin (i.e., ) and is an error vector. Finally, if , the sample lies exactly on the margin (i.e., ) and is a support vector. As a consequence, the optimal displacement can be determined by averaging over the support vectors.
2.2 Machine Learning for Computer Security: Motivation, Trends, and Arms Races
In this section, we motivate the recent adoption of machinelearning techniques in computer security and discuss the novel issues this trend raises. In the last decade, security systems increased in complexity to counter the growing sophistication and variability of attacks; a result of a longlasting and continuing arms race in securityrelated applications such as malware detection, intrusion detection and spam filtering. The main characteristics of this struggle and the typical approaches pursued in security to face it are discussed in Section 2.3. We now discuss some examples that better explain this trend and motivate the use of modern machinelearning techniques for security applications.
In the early years, the attack surface (i.e., the vulnerable points of a system) of most systems was relatively small and most attacks were simple. In this era, signaturebased detection systems (e.g., rulebased systems based on stringmatching techniques) were considered sufficient to provide an acceptable level of security. However, as the complexity and exposure of sensitive systems increased in the Internet Age, more targets emerged and the incentive for attacking them became increasingly attractive, thus providing a means and motivation for developing sophisticated and diverse attacks. Since signaturebased detection systems can only detect attacks matching an existing signature, attackers used minor variations of their attacks to evade detection (e.g., stringmatching techniques can be evaded by slightly changing the attack code). To cope with the increasing variability of attack samples and to detect neverbeforeseen attacks, machinelearning approaches have been increasingly incorporated into these detection systems to complement traditional signaturebased detection. These two approaches can be combined to make accurate and agile detection: signaturebased detection offers fast and lightweight filtering of most known attacks, while machinelearning approaches can process the remaining (unfiltered) samples and identify new (or less wellknown) attacks.
The quest of image spam. A recent example of the above arms race is image spam (see, e.g., biggioPRL11 ()). In 2006, to evade the textualbased spam filters, spammers began rendering their messages into images included as attachments, thus producing “imagebased spam,” or image spam for short. Due to the massive volume of image spam sent in 2006 and 2007, researchers and spamfilter designers proposed several different countermeasures. Initially, suspect images were analyzed by OCR tools to extract text for standard spam detection, and then signatures were generated to block the (known) spam images. However, spammers immediately reacted by randomly obfuscating images with adversarial noise, both to make OCRbased detection ineffective, and to evade signaturebased detection. The research community responded with (fast) approaches mainly based on machinelearning techniques using visual features extracted from images, which could accurately discriminate between spam images and legitimate ones (e.g., photographs, plots, etc.). Although image spam volumes have since declined, the exact cause for this decrease is debatable—these countermeasures may have played a role, but the image spam were also more costly to the spammer as they required more time to generate and more bandwidth to deliver, thus limiting the spammers’ ability to send a high volume of messages. Nevertheless, had this arms race continued, spammers could have attempted to evade the countermeasures by mimicking the feature values exhibited by legitimate images, which would have, in fact, forced spammers to increase the number of colors and elements in their spam images thus further increasing the size of such files, and the cost of sending them.
Misuse and anomaly detection in computer networks. Another example of the above arms race can be found in network intrusion detection, where misuse detection has been gradually augmented by anomaly detection. The former approach relies on detecting attacks on the basis of signatures extracted from (known) intrusive network traffic, while the latter is based upon a statistical model of the normal profile of the network traffic and detects anomalous traffic that deviates from the assumed model of normality. This model is often constructed using machinelearning techniques, such as oneclass classifiers (e.g., oneclass SVMs), or, more generally, using density estimators. The underlying assumption of anomalydetectionbased intrusion detection, though, is that all anomalous network traffic is, in fact, intrusive. Although intrusive traffic often does exhibit anomalous behavior, the opposite is not necessarily true: some nonintrusive network traffic may also behave anomalously. Thus, accurate anomaly detectors often suffer from high falsealarm rates.
2.3 Adversarial Machine Learning
As witnessed by the above examples, the introduction of machinelearning techniques in securitysensitive tasks has many beneficial aspects, and it has been somewhat necessitated by the increased sophistication and variability of recent attacks and zeroday exploits. However, there is good reason to believe that machinelearning techniques themselves will be subject to carefully designed attacks in the near future, as a logical next step in the abovesketched arms race. Since machinelearning techniques were not originally designed to withstand manipulations made by intelligent and adaptive adversaries, it would be reckless to naively trust these learners in a secure system. Instead, one needs to carefully consider whether these techniques can introduce novel vulnerabilities that may degrade the overall system’s security, or whether they can be safely adopted. In other words, we need to address the question raised by Barreno et al. barrenoASIACCS06 (): can machine learning be secure?
At the center of this question is the effect an adversary can have on a learner by violating the stationarity assumption that the training data used to train the classifier comes from the same distribution as the test data that will be classified by the learned classifier. This is a conventional and natural assumption underlying much of machine learning and is the basis for performanceevaluationbased techniques like crossvalidation and bootstrapping as well as for principles like empirical risk minimization (ERM). However, in securitysensitive settings, the adversary may purposely manipulate data to mislead learning. Accordingly, the data distribution is subject to change, thereby potentially violating nonstationarity, albeit, in a limited way subject to the adversary’s assumed capabilities (as we discuss in Section 3.1). Further, as in most security tasks, predicting how the data distribution will change is difficult, if not impossible biggio12tkde (); huang11 (). Hence, adversarial learning problems are often addressed as a proactive arms race biggio12tkde (), in which the classifier designer tries to anticipate the next adversary’s move, by simulating and hypothesizing proper attack scenarios, as discussed in the next section.
Reactive and Proactive Arms Races
As mentioned in the previous sections, and highlighted by the examples in Section 2.2, security problems are often cast as a longlasting reactive arms race between the classifier designer and the adversary, in which each player attempts to achieve his/her goal by reacting to the changing behavior of his/her opponent. For instance, the adversary typically crafts samples to evade detection (e.g., a spammer’s goal is often to create spam emails that will not be detected), while the classifier designer seeks to develop a system that accurately detects most malicious samples while maintaining a very low falsealarm rate; i.e., by not falsely identifying legitimate examples. Under this setting, the arms race can be modeled as the following cycle biggio12tkde (). First, the adversary analyzes the existing learning algorithm and manipulates her data to evade detection (or more generally, to make the learning algorithm ineffective). For instance, a spammer may gather some knowledge of the words used by the targeted spam filter to block spam and then manipulate the textual content of her spam emails accordingly; e.g., words like “cheap” that are indicative of spam can be misspelled as “che4p”. Second, the classifier designer reacts by analyzing the novel attack samples and updating his classifier. This is typically done by retraining the classifier on the newly collected samples, and/or by adding features that can better detect the novel attacks. In the previous spam example, this amounts to retraining the filter on the newly collected spam and, thus, to adding novel words into the filter’s dictionary (e.g., “che4p” may be now learned as a spammy word). This reactive arms race continues in perpetuity as illustrated in Figure 1.
However, reactive approaches to this arms race do not anticipate the next generation of security vulnerabilities and thus, the system potentially remains vulnerable to new attacks. Instead, computer security guidelines traditionally advocate a proactive approach^{2}^{2}2Although in certain abstract models we have shown how regretminimizing online learning can be used to define reactive approaches that are competitive with proactive security reactive ().—the classifier designer should proactively anticipate the adversary’s strategy by (i) identifying the most relevant threats, (ii) designing proper countermeasures into his classifier, and (iii) repeating this process for his new design before deploying the classifier. This can be accomplished by modeling the adversary (based on knowledge of the adversary’s goals and capabilities) and using this model to simulate attacks, as is depicted in Figure 2 to contrast the reactive arms race. While such an approach does not account for unknown or changing aspects of the adversary, it can indeed lead to an improved level of security by delaying each step of the reactive arms race because it should reasonably force the adversary to exert greater effort (in terms of time, skills, and resources) to find new vulnerabilities. Accordingly, proactively designed classifiers should remain useful for a longer time, with less frequent supervision or human intervention and with less severe vulnerabilities.
Although this approach has been implicitly followed in most of the previous work (see Section 2.3), it has only recently been formalized within a more general framework for the empirical evaluation of a classifier’s security biggio12tkde (), which we summarize in Section 3. Finally, although security evaluation may suggest specific countermeasures, designing generalpurpose secure classifiers remains an open problem.
Previous Work on Security Evaluation
Previous work in adversarial learning can be categorized according to the two main steps of the proactive arms race described in the previous section. The first research direction focuses on identifying potential vulnerabilities of learning algorithms and assessing the impact of the corresponding attacks on the targeted classifier; e.g., barreno10 (); barrenoASIACCS06 (); cardenasws06 (); huang11 (); kloft12b (); kolcz09 (); laskov09 (); lowd05 (). The second explores the development of proper countermeasures and learning algorithms robust to known attacks; e.g., dalvi04 (); kolcz09 (); rodrigues09 ().
Although some prior work does address aspects of the empirical evaluation of classifier security, which is often implicitly defined as the performance degradation incurred under a (simulated) attack, to our knowledge a systematic treatment of this process under a unifying perspective was only first described in our recent work biggio12tkde (). Previously, security evaluation is generally conducted within a specific application domain such as spam filtering and network intrusion detection (e.g., in dalvi04 (); fogla06 (); kolcz09 (); lowd05ceas (); wittel04 ()), in which a different applicationdependent criteria is separately defined for each endeavor. Security evaluation is then implicitly undertaken by defining an attack and assessing its impact on the given classifier. For instance, in fogla06 (), the authors showed how camouflage network packets can mimic legitimate traffic to evade detection; and, similarly, in dalvi04 (); kolcz09 (); lowd05ceas (); wittel04 (), the content of spam emails was manipulated for evasion. Although such analyses provide indispensable insights into specific problems, their results are difficult to generalize to other domains and provide little guidance for evaluating classifier security in a different application. Thus, in a new application domain, security evaluation often must begin anew and it is difficult to directly compare with prior studies. This shortcoming highlights the need for a more general set of security guidelines and a more systematic definition of classifier security evaluation, that we began to address in biggio12tkde ().
Apart from applicationspecific work, several theoretical models of adversarial learning have been proposed barreno10 (); brueckner12 (); dalvi04 (); huang11 (); kloft12b (); laskov09 (); lowd05 (); nelson12jmlr (). These models frame the secure learning problem and provide a foundation for a proper security evaluation scheme. In particular, we build upon elements of the models of barreno10 (); barrenoASIACCS06 (); huang11 (); kloft10 (); kloft12b (); laskov09 (), which were used in defining our framework for security evaluation biggio12tkde (). Below we summarize these foundations.
A Taxonomy of Potential Attacks against Machine Learning Algorithms
A taxonomy of potential attacks against pattern classifiers was proposed in barreno10 (); barrenoASIACCS06 (); huang11 () as a baseline to characterize attacks on learners. The taxonomy is based on three main features: the kind of influence of attacks on the classifier, the kind of security violation they cause, and the specificity of an attack. The attack’s influence can be either causative, if it aims to undermine learning, or exploratory, if it targets the classification phase. Accordingly, a causative attack may manipulate both training and testing data, whereas an exploratory attack only affects testing data. Examples of causative attacks include work in biggio12icml (); kloft10 (); kloft12b (); nelson08 (); rubinstein09 (), while exploratory attacks can be found in dalvi04 (); fogla06 (); kolcz09 (); lowd05ceas (); wittel04 (). The security violation can be either an integrity violation, if it aims to gain unauthorized access to the system (i.e., to have malicious samples be misclassified as legitimate); an availability violation, if the goal is to generate a high number of errors (both falsenegatives and falsepositives) such that normal system operation is compromised (e.g., legitimate users are denied access to their resources); or a privacy violation, if it allows the adversary to obtain confidential information from the classifier (e.g., in biometric recognition, this may amount to recovering a protected biometric template of a system’s client). Finally, the attack specificity refers to the samples that are affected by the attack. It ranges continuously from targeted attacks (e.g., if the goal of the attack is to have a specific spam email misclassified as legitimate) to indiscriminate attacks (e.g., if the goal is to have any spam email misclassified as legitimate).
Each portion of the taxonomy specifies a different type of attack as laid out in Barreno et al. barreno10 () and here we outline these with respect to a PDF malware detector. An example of a causative integrity attack is an attacker who wants to mislead the malware detector to falsely classify malicious PDFs as benign. The attacker could accomplish this goal by introducing benign PDFs with malicious features into the training set and the attack would be targeted if the features corresponded to a particular malware or otherwise an indiscriminate attack. Similarly, the attacker could cause a causative availability attack by injecting malware training examples that exhibited features common to benign messages; again, these would be targeted if the attacker wanted a particular set of benign PDFs to be misclassified. A causative privacy attack, however, would require both manipulation of the training and information obtained from the learned classifier. The attacker could inject malicious PDFs with features identifying a particular author and then subsequently test if other PDFs with those features were labeled as malicious; this observed behavior may leak private information about the authors of other PDFs in the training set.
In contrast to the causative attacks, exploratory attacks cannot manipulate the learner, but can still exploit the learning mechanism. An example of an exploratory integrity attack involves an attacker who crafts a malicious PDF for an existing malware detector. This attacker queries the detector with candidate PDFs to discover which attributes the detector uses to identify malware, thus, allowing her to redesign her PDF to avoid the detector. This example could be targeted to a single PDF exploit or indiscriminate if a set of possible exploits are considered. An exploratory privacy attack against the malware detector can be conducted in the same way as the causative privacy attack described above, but without first injecting PDFs into the training data. Simply by probing the malware detector with crafted PDFs, the attacker may divulge secrets from the detector. Finally, exploratory availability attacks are possible in some applications but are not currently considered to be of interest.
3 A Framework for Security Evaluation
In Sections 2.3 and 2.3, we motivated the need for simulating a proactive arms race as a means for improving system security. We further argued that evaluating a classifier’s security properties through simulations of different, potential attack scenarios is a crucial step in this arms race for identifying the most relevant vulnerabilities and for suggesting how to potentially counter them. Here, we summarize our recent work biggio12tkde () that proposes a new framework for designing proactive secure classifiers by addressing the shortcomings of the reactive security cycle raised above. Namely, our approach allows one to empirically evaluate a classifier’s security during its design phase by addressing the first three steps of the proactive arms race depicted in Figure 2: (i) identifying potential attack scenarios, (ii) devising the corresponding attacks, and (iii) systematically evaluating their impact. Although it may also suggest countermeasures to the hypothesized attacks, the final step of the proactive arms race remains unspecified as a unique design step that has to be addressed separately in an applicationspecific manner.
Under our proposed security evaluation process, the analyst must clearly scrutinize the classifier by considering different attack scenarios to investigate a set of distinct potential vulnerabilities. This amounts to performing a more systematic whatif analysis of classifier security rizzi09 (). This is an essential step in the design of security systems, as it not only allows the designer to identify the most important and relevant threats, but also it forces him/her to consciously decide whether the classifier can be reasonably deployed, after being made aware of the corresponding risks, or whether it is instead better to adopt additional countermeasure to mitigate the attack’s impact before deploying the classifier.
Our proposed framework builds on previous work and attempts to systematize and unify their views under a more coherent perspective. The framework defines how an analyst can conduct a security audit of a classifier, which we detail in the remainder of this section. First, in Section 3.1, we explain how an adversary model is constructed according to the adversary’s anticipated goals, knowledge and capabilities. Based on this model, a simulation of the adversary can be conducted to find the corresponding optimal attack strategies and produce simulated attacks, as described in Section 3.1. These simulated attack samples are then used to evaluate the classifier by either adding them to the training or test data, in accordance with the adversary’s capabilities from Section 3.1. We conclude this section by discussing how to exploit our framework in specific application domains in Section 3.2.
3.1 Modeling the Adversary
The proposed model of the adversary is based on specific assumptions about her goal, knowledge of the system, and capability to modify the underlying data distribution by manipulating individual samples. It allows the classifier designer to model the attacks identified in the attack taxonomy described as in Section 2.3 barreno10 (); barrenoASIACCS06 (); huang11 (). However, in our framework, one can also incorporate applicationspecific constraints into the definition of the adversary’s capability. Therefore, it can be exploited to derive practical guidelines for developing optimal attack strategies and to guide the design of adversarially resilient classifiers.
Adversary’s Goal
According to the taxonomy presented first by Barreno et al. barrenoASIACCS06 () and extended by Huang et al. huang11 (), the adversary’s goal should be defined based on the anticipated security violation, which might be an integrity, availability, or privacy violation (see Section 2.3), and also depending on the attack’s specificity, which ranges from targeted to indiscriminate. Further, as suggested by Laskov and Kloft laskov09 () and Kloft and Laskov kloft12b (), the adversary’s goal should be defined in terms of an objective function that the adversary is willing to maximize. This allows for a formal characterization of the optimal attack strategy.
For instance, in an indiscriminate integrity attack, the adversary may aim to maximize the number of spam emails that evade detection, while minimally manipulating their content dalvi04 (); lowd05 (); nelson12jmlr (), whereas in an indiscriminate availability attack, the adversary may aim to maximize the number of classification errors, thereby causing a general denialofservice due to an excess of false alarms nelson08 (); biggio12icml ().
Adversary’s Knowledge
The adversary’s knowledge of the attacked system can be defined based on the main components involved in the design of a machine learning system, as described in dudahartstork () and depicted in Figure 3.
According to the five design steps depicted in Figure 3, the adversary may have various degrees of knowledge (ranging from no information to complete information) pertaining to the following five components:

the training set (or part of it);

the feature representation of each sample; i.e., how real objects (emails, network packets, etc.) are mapped into the feature space;

the learning algorithm and its decision function; e.g., that logistic regression is used to learn a linear classifier;

the learned classifier’s parameters; e.g., the actual learned weights of a linear classifier;

feedback from the deployed classifier; e.g., the classification labels assigned to some of the samples by the targeted classifier.
These five elements represent different levels of knowledge about the system being attacked. A typical hypothesized scenario assumes that the adversary has perfect knowledge of the targeted classifier (k.iv). Although potentially too pessimistic, this worstcase setting allows one to compute a lower bound on the classifier performance when it is under attack dalvi04 (); kolcz09 (). A more realistic setting is that the adversary knows the (untrained) learning algorithm (k.iii), and she may exploit feedback from the classifier on the labels assigned to some query samples (k.v), either to directly find optimal or nearlyoptimal attack instances lowd05 (); nelson12jmlr (), or to learn a surrogate classifier, which can then serve as a template to guide the attack against the actual classifier. We refer to this scenario as a limited knowledge setting in Section 4.
Note that one may also make more restrictive assumptions on the adversary’s knowledge, such as considering partial knowledge of the feature representation (k.ii), or a complete lack of knowledge of the learning algorithm (k.iii). Investigating classifier security against these uninformed adversaries may yield a higher level of security. However, such assumptions would be contingent on security through obscurity; that is, the provided security would rely upon secrets that must be kept unknown to the adversary even though such a high level of secrecy may not be practical. Reliance on unjustified secrets can potentially lead to catastrophic unforeseen vulnerabilities. Thus, this paradigm should be regarded as being complementary to security by design, which instead advocates that systems should be designed from the groundup to be secure and, if secrets are assumed, they must be welljustified. Accordingly, security is often investigated by assuming that the adversary knows at least the learning algorithm and the underlying feature representation.
Adversary’s Capability
We now give some guidelines on how the attacker may be able to manipulate samples and the corresponding data distribution. As discussed in Section 2.3 barreno10 (); barrenoASIACCS06 (); huang11 (), the adversary may control both training and test data (causative attacks), or only on test data (exploratory attacks). Further, training and test data may follow different distributions, since they can be manipulated according to different attack strategies by the adversary. Therefore, we should specify:

whether the adversary can manipulate training (TR) and/or testing (TS) data; i.e., the attack influence from the taxonomy in barreno10 (); barrenoASIACCS06 (); huang11 ());

whether and to what extent the attack affects the class priors, for TR and TS;

which and how many samples can be modified in each class, for TR and TS;

which features of each attack sample can be modified and how can these features’ values be altered; e.g., correlated feature values can not be modified independently.
Assuming a generative model (where we use and for training and test distributions, respectively), assumption (c.ii) specifies how an attack can modify the priors and while assumptions (c.iii) and (c.iv) specifies how it can alter the classconditional distributions and .
To perform security evaluation according to the hypothesized attack scenario, it is thus clear that the collected data and generated attack samples should be resampled according to the above distributions to produce suitable training and test set pairs. This can be accomplished through existing resampling algorithms like crossvalidation or bootstrapping, when the attack samples are independently sampled from an identical distribution (i.i.d.). Otherwise, one may consider different sampling schemes. For instance, in Biggio et al. biggio12icml () the attack samples had to be injected into the training data, and each attack sample depended on the current training data, which also included past attack samples. In this case, it was sufficient to add one attack sample at a time, until the desired number of samples was reached.^{3}^{3}3See biggio12tkde () for more details on the definition of the data distribution and the resampling algorithm.
Attack Strategy
Once specific assumptions on the adversary’s goal, knowledge, and capability are made, one can compute the optimal attack strategy corresponding to the hypothesized attack scenario; i.e., the adversary model. This amounts to solving the optimization problem defined according to the adversary’s goal, under proper constraints defined in accordance with the adversary’s assumed knowledge and capabilities. The attack strategy can then be used to produce the desired attack samples, which then have to be merged consistently to the rest of the data to produce suitable training and test sets for the desired security evaluation, as explained in the previous section. Specific examples of how to derive optimal attacks against SVMs, and how to resample training and test data to properly include them are discussed in Sections 4 and 5.
3.2 How to use our Framework
We summarize here the steps that can be followed to correctly use our framework in specific application scenarios:

hypothesize an attack scenario by identifying a proper adversary’s goal, and according to the taxonomy in barreno10 (); barrenoASIACCS06 (); huang11 ();

formulate the corresponding optimization problem and devise the corresponding attack strategy;

resample the collected (training and test) data accordingly;

evaluate classifier’s security on the resampled data (including attack samples);

repeat the evaluation for different levels of adversary’s knowledge and/or capabilities, if necessary; or hypothesize a different attack scenario.
In the next sections we show how our framework can be applied to investigate three security threats to SVMs: evasion, poisoning, and privacy violations. We then discuss how our findings may be used to improve the security of such classifiers to the considered attacks. For instance, we show how careful kernel parameter selection, which trades off between security to attacks and classification accuracy, may complicate the adversary’s task of subverting the learning process.
4 Evasion Attacks against SVMs
In this section, we consider the problem of SVM evasion at test time; i.e., how to optimally manipulate samples at test time to avoid detection. The problem of evasion at test time has been considered in previous work, albeit either limited to simple decision functions such as linear classifiers dalvi04 (); lowd05 (), or to cover any convexinducing classifiers nelson12jmlr () that partition the feature space into two sets, one of which is convex, but do not include most interesting families of nonlinear classifiers such as neural nets or SVMs. In contrast to this prior work, the methods presented in our recent work biggio13ecml () and in this section demonstrate that evasion of kernelbased classifiers at test time can be realized with a straightforward gradientdescentbased approach derived from Golland’s technique of discriminative directions Gol02 (). As a further simplification of the attacker’s effort, we empirically show that, even if the adversary does not precisely know the classifier’s decision function, she can learn a surrogate classifier on a surrogate dataset and reliably evade the targeted classifier.
This section is structured as follows. In Section 4.1, we define the model of the adversary, including her attack strategy, according to our evaluation framework described in Section 3.1. Then, in Section 4.2 we derive the attack strategy that will be employed to experimentally evaluate evasion attacks against SVMs. We report our experimental results in Section 4.3. Finally, we critically discuss and interpret our research findings in Section 4.4.
4.1 Modeling the Adversary
We show here how our framework can be applied to evaluate the security of SVMs against evasion attacks. We first introduce our notation, state our assumptions about attack scenario, and then derive the corresponding optimal attack strategy.
Notation. We consider a classification algorithm that assigns samples represented in some feature space to a label in the set of predefined classes , where () represents the legitimate (malicious) class. The label given by a classifier is typically obtained by thresholding a continuous discriminant function . Without loss of generality, we assume that if , and otherwise. Further, note that we use to refer to a label assigned by the classifier for the point (rather than the true label of that point) and the shorthand for the label assigned to the training point, .
Adversary’s Goal
Malicious (positive) samples are manipulated to evade the classifier. The adversary may be satisfied when a sample is found such that where is a small constant. However, as mentioned in Section 3.1, these attacks may be easily defeated by simply adjusting the decision threshold to a slightly more conservative value (e.g., to attain a lower false negative rate at the expense of a higher false positive rate). For this reason, we assume a smarter adversary, whose goal is to have her attack sample misclassified as legitimate with the largest confidence. Analytically, this statement can be expressed as follows: find an attack sample that minimizes the value of the classifier’s discriminant function . Indeed, this adversarial setting provides a worstcase bound for the targeted classifier.
Adversary’s Knowledge
We investigate two adversarial settings. In the first, the adversary has perfect knowledge (PK) of the targeted classifier; i.e., she knows the feature space (k.ii) and function (k.iiiiv). Thus, the labels from the targeted classifier (k.v) are not needed. In the second, the adversary is assumed to have limited knowledge (LK) of the classifier. We assume she knows the feature representation (k.ii) and the learning algorithm (k.iii), but that she does not know the learned classifier (k.iv). In both cases, we assume the attacker does not have knowledge of the training set (k.i).
Within the LK scenario, the adversary does not know the true discriminant function but may approximate it as by learning a surrogate classifier on a surrogate training set of samples. This data may be collected by the adversary in several ways; e.g., she may sniff network traffic or collect legitimate and spam emails from an alternate source. Thus, for LK, there are two subcases related to assumption (k.v), which depend on whether the adversary can query the classifier. If so, the adversary can build the training set by submitting a set of queries to the targeted classifier to obtain their classification labels, . This is indeed the adversary’s true learning task, but it requires her to have access to classifier feedback; e.g., by having an email account protected by the targeted filter (for public email providers, the adversary can reasonably obtain such accounts). If not, the adversary may use the true class labels for the surrogate data, although this may not correctly approximate the targeted classifier (unless it is very accurate).
Adversary’s Capability
In the evasion setting, the adversary can only manipulate testing data (c.i); i.e., she has no way to influence training data. We further assume here that the class priors can not be modified (c.ii), and that all the malicious testing samples are affected by the attack (c.iii). In other words, we are interested in simulating an exploratory, indiscriminate attack. The adversary’s capability of manipulating the features of each sample (c.iv) should be defined based on applicationspecific constraints. However, at a more general level we can bound the attack point to lie within some maximum distance from the original attack sample, , which then is a parameter of our evaluation. Similarly to previous work, the definition of a suitable distance measure is left to the specific application domain dalvi04 (); lowd05 (); nelson12jmlr (). Note indeed that this distance should reflect the adversary’s effort or cost in manipulating samples, by considering factors that can limit the overall attack impact; e.g., the increase in the file size of a malicious PDF, since larger files will lower the infection rate due to increased transmission times. For spam filtering, distance is often given as the number of modified words in each spam dalvi04 (); lowd05 (); nelson08 (); nelson12jmlr (), since it is assumed that highly modified spam messages are less effectively able to convey the spammer’s message.
Attack Strategy
Under the attacker’s model described in Sections 4.1, 4.1 and 4.1, for any target malicious sample (the adversary’s true objective), an optimal attack strategy finds a sample to minimize or its estimate , subject to a bound on its modification distance from :
For several classifiers, minimizing is equivalent to maximizing the estimated posterior ; e.g., for neural networks, since they directly output a posterior estimate, and for SVMs, since their posterior can be estimated as a sigmoidal function of the distance of to the SVM hyperplane platt99 ().
Generally, this is a nonlinear optimization, which one may optimize with many wellknown techniques (e.g., gradient descent, Newton’s method, or BFGS) and below we use a gradient descent procedure. However, if is not convex, descent approaches may not find a global optima. Instead, the descent path may lead to a flat region (local minimum) outside of the samples’ support where and the classification behavior of is unspecified and may stymie evasion attempts (see the upper left plot in Figure 4).
Unfortunately, our objective does not utilize the evidence we have about the distribution of data , and thus gradient descent may meander into unsupported regions () where is relatively unspecified. This problem is further compounded since our estimate is based on a finite (and possibly small) training set making it a poor estimate of in unsupported regions, which may lead to false evasion points in these regions. To overcome these limitations, we introduce an additional component into the formulation of our attack objective, which estimates using densityestimation techniques. This second component acts as a penalizer for in low density regions and is weighted by a parameter yielding the following modified optimization problem:
(1)  
where is a bandwidth parameter for a kernel density estimator (KDE), and is the number of benign samples () available to the adversary. This alternate objective trades off between minimizing (or ) and maximizing the estimated density . The extra component favors attack points to imitate features of known samples classified as legitimate, as in mimicry attacks fogla06 (). In doing so, it reshapes the objective function and thereby biases the resulting densityaugmented gradient descent towards regions where the negative class is concentrated (see the bottom right plot in Figure 4).
Finally, note that this behavior may lead our technique to disregard attack patterns within unsupported regions () for which , when they do exist (see, e.g., the upper right plot in Figure 4). This may limit classifier evasion especially when the constraint is particularly strict. Therefore, the tradeoff between the two components of the objective function should be carefully considered.
4.2 Evasion Attack Algorithm
Algorithm 1 details a gradientdescent method for optimizing problem of Equation (1). It iteratively modifies the attack point in the feature space as , where is a unit vector aligned with the gradient of our objective function, and is the step size. We assume to be differentiable almost everywhere (subgradients may be used at discontinuities). When is nondifferentiable or is not smooth enough for a gradient descent to work well, it is also possible to rely upon the mimicry / KDE term in the optimization of Equation (1).
In the next sections, we show how to compute the components of ; namely, the gradient of the discriminant function of SVMs for different kernels, and the gradient of the mimicking component (density estimation). We finally discuss how to project the gradient onto the feasible region in discrete feature spaces.
Gradient of Support Vector Machines
For SVMs, . The gradient is thus given by . Accordingly, the feasibility of the approach depends on the computability of this kernel gradient , which is computable for many numeric kernels. In the following, we report the kernel gradients for three main cases: (a) the linear kernel, (b) the RBF kernel, and (c) the polynomial kernel.
(a) Linear kernel. In this case, the kernel is simply given by . Accordingly, (we remind the reader that the gradient has to be computed with respect to the current attack sample ), and .
(b) RBF kernel. For this kernel, . The gradient is thus given by .
(c) Polynomial kernel. In this final case, . The gradient is thus given by .
Gradients of Kernel Density Estimators
As with SVMs, the gradient of kernel density estimators depends on the gradient of its kernel. We considered generalized RBF kernels of the form
where is any suitable distance function. We used here the same distance used in Equation (1), but they can be different, in general. For  and norms (i.e., RBF and Laplacian kernels), the KDE (sub)gradients are respectively given by:
Note that the scaling factor here is proportional to . Therefore, to influence gradient descent with a significant mimicking effect, the value of in the objective function should be chosen such that the value of is comparable to (or higher than) the range of the discriminant function .
Gradient Descent Attack in Discrete Spaces
In discrete spaces, gradient approaches may lead to a path through infeasible portions of the feature space. In such cases, we need to find feasible neighbors that yield a steepest descent; i.e., maximally decreasing . A simple approach to this problem is to probe at every point in a small neighborhood of : . However, this approach requires a large number of queries. For classifiers with a differentiable decision function, we can instead use the neighbor whose difference from best aligns with ; i.e., the update becomes
Thus, the solution to the above alignment is simply to modify a feature that satisfies for which the corresponding change leads to a feasible state. Note however that, sometimes, such a step may be relatively quite large, and may lead the attack out of a local minimum potentially increasing the objective function. Therefore, one should consider the best alignment that effectively reduces the objective function by disregarding features that lead to states where the objective function is higher.
4.3 Experiments
In this section, we first report some experimental results on the MNIST handwritten digit classification task globersonICML06 (); LeCun95 (), that visually demonstrate how the proposed algorithm modifies digits to mislead classification. This dataset is particularly useful because the visual nature of the handwritten digit data provides a semantic meaning for attacks. We then show the effectiveness of the proposed attack on a more realistic and practical scenario: the detection of malware in PDF files.
Handwritten Digits
We first focus on a twoclass subproblem of discriminating between two distinct digits from the MNIST dataset LeCun95 (). Each digit example is represented as a grayscale image of pixels arranged in rasterscanorder to give feature vectors of values. We normalized each feature (pixel) by dividing its value by , and we constrained the attack samples to this range. Accordingly, we optimized Equation (1) subject to for all .
For our attacker, we assume the perfect knowledge (PK) attack scenario. We used the Manhattan distance (norm) as the distance function, , both for the kernel density estimator (i.e., a Laplacian kernel) and for the constraint of Equation (1), which bounds the total difference between the gray level values of the original image and the attack image . We used an upper bound of to limit the total change in the graylevel values to . At each iteration, we increased the norm value of by , which is equivalent to increasing the difference in the gray level values by . This is effectively the gradient step size.
For the digit discrimination task, we applied an SVM with the linear kernel and . We randomly chose training samples and applied the attacks to a correctlyclassified positive sample.
In Figure 5 we illustrate gradient attacks in which a “3” is to be misclassified as a “7”. The left image shows the initial attack point, the middle image shows the first attack image misclassified as legitimate, and the right image shows the attack point after 500 iterations. When , the attack images exhibit only a weak resemblance to the target class “7” but are, nevertheless, reliably misclassified. This is the same effect we observed in the left plot of Figure 4: the classifier is evaded by making the attack sample dissimilar to the malicious class. Conversely, when the attack images strongly resemble the target class because the mimicry term favors samples that are more similar to the target examples. This is the same effect illustrated in the rightmost plot of Figure 4.
Malware Detection in PDF Files
We focus now on the problem of discriminating between legitimate and malicious PDF files, a popular medium for disseminating malware IBM (). PDF files are excellent vectors for maliciouscode, due to their flexible logical structure, which can described by a hierarchy of interconnected objects. As a result, an attack can be easily hidden in a PDF to circumvent filetype filtering. The PDF format further allows a wide variety of resources to be embedded in the document including JavaScript, Flash, and even binary programs. The type of the embedded object is specified by keywords, and its content is in a data stream. Several recent works proposed machinelearning techniques for detecting malicious PDFs use the file’s logical structure to accurately identify the malware maiorca (); Smutz (); Srndic (). In this case study, we use the feature representation of Maiorca et al. maiorca () in which each feature corresponds to the tally of occurrences of a given keyword in the PDF file. Similar feature representations were also exploited in Smutz (); Srndic ().
The PDF application imposes natural constraints on attacks. Although it is difficult to remove an embedded object (and its corresponding keywords) without corrupting the PDF’s file structure, it is rather easy to insert new objects (and, thus, keywords) through the addition of a new version to the PDF file Refer (). In our feature representation, this is equivalent to allowing only feature increments,;i.e., requiring as an additional constraint in the optimization problem given by Equation (1). Further, the total difference in keyword counts between two samples is their Manhattan distance, which we again use for the kernel density estimator and the constraint in Equation (1). Accordingly, is the maximum number of additional keywords that an attacker can add to the original .
Experimental setup. For experiments, we used a PDF corpus with malicious samples from the Contagio dataset^{4}^{4}4http://contagiodump.blogspot.it and benign samples collected from the web. We randomly split the data into five pairs of training and testing sets with samples each to average the final results. The features (keywords) were extracted from each training set as described in maiorca (); on average, keywords were found in each run. Further, we also bounded the maximum value of each feature (keyword count) to , as this value was found to be close to the percentile for each feature. This limited the influence of outlying samples having very high feature values.
We simulated the perfect knowledge (PK) and the limited knowledge (LK) scenarios described in Section 4.1. In the LK case, we set the number of samples used to learn the surrogate classifier to . The reason is to demonstrate that even with a dataset as small as the of the original training set size, the adversary may be able to evade the targeted classifier with high reliability. Further, we assumed that the adversary uses feedback from the targeted classifier ; i.e., the labels for each surrogate sample . Similar results were also obtained using the true labels (without relabeling), since the targeted classifiers correctly classified almost all samples in the test set.
As discussed in Section 4.2, the value of is chosen according to the scale of the discriminant function , the bandwidth parameter of the kernel density estimator, and the number of samples labeled as legitimate in the surrogate training set. For computational reasons, to estimate the value of the KDE at a given point in the feature space, we only consider the nearest (legitimate) training samples to ; therefore in our case. The bandwidth parameter was set to , as this value provided a proper rescaling of the Manhattan distances observed in our dataset for the KDE. We thus set to be comparable with .
For each targeted classifier and training/testing pair, we learned five different surrogate classifiers by randomly selecting samples from the test set, and averaged their results. For SVMs, we sought a surrogate classifier that would correctly match the labels from the targeted classifier; thus, we used parameters , and (for the RBF kernel) to heavily penalize training errors.
Experimental results. We report our results in Figure 6, in terms of the false negative (FN) rate attained by the targeted classifiers as a function of the maximum allowable number of modifications, . We compute the FN rate corresponding to a fixed false positive (FP) rate of FP. For , the FN rate corresponds to a standard performance evaluation using unmodified PDFs. As expected, the FN rate increases with as the PDF is increasingly modified, since the adversary has more flexibility in his attack. Accordingly, a more secure classifier will exhibit a more graceful increase of the FN rate.
Results for . We first investigate the effect of the proposed attack in the PK case, without considering the mimicry component (Figure 6, top row), for varying parameters of the considered classifiers. The linear SVM (Figure 6, topleft plot) is almost always evaded with as few as to modifications, independent of the regularization parameter . It is worth noting that attacking a linear classifier amounts to always incrementing the value of the same highestweighted feature (corresponding to the /Linearized keyword in the majority of the cases) until it is bounded. This continues with the next highestweighted nonbounded feature until termination. This occurs simply because the gradient of does not depend on for a linear classifier (see Section 4.2). With the RBF kernel (Figure 6, topright plot), SVMs exhibit a similar behavior with and various values of its parameter,^{5}^{5}5We also conducted experiments using and , but did not find significant differences compared to the presented results using . and the RBF SVM provides a higher degree of security compared to linear SVMs (cf. topleft plot and middleleft plot in Figure 6).
In the LK case, without mimicry (Figure 6, middle row), classifiers are evaded with a probability only slightly lower than that found in the PK case, even when only surrogate samples are used to learn the surrogate classifier. This aspect highlights the threat posed by a skilled adversary with incomplete knowledge: only a small set of samples may be required to successfully attack the target classifier using the proposed algorithm.
Results for . When mimicry is used (Figure 6, bottom row), the success of the evasion of linear SVMs (with ) decreases both in the PK (e.g., compare the blue curve in the topleft plot with the solid blue curve in the bottomleft plot) and LK case (e.g., compare the blue curve in the middleleft plot with the dashed blue curve in the bottomleft plot). The reason is that the computed direction tends to lead to a slower descent; i.e., a less direct path that often requires more modifications to evade the classifier. In the nonlinear case (Figure 6, bottomright plot), instead, mimicking exhibits some beneficial aspects for the attacker, although the constraint on feature addition may make it difficult to properly mimic legitimate samples. In particular, note how the targeted SVMs with RBF kernel (with and ) in the PK case (e.g., compare the blue curve in the topright plot with the solid blue curve in the bottomright plot) is evaded with a significantly higher probability than in the case when . The reason is that a pure descent strategy on may find local minima (i.e., attack samples) that do not evade detection, while the mimicry component biases the descent towards regions of the feature space more densely populated by legitimate samples, where eventually attains lower values. In the LK case (e.g., compare the blue curve in the middleright plot with the dashed blue curve in the bottomright plot), however, mimicking does not exhibit significant improvements.
Analysis. Our attacks raise questions about the feasibility of detecting malicious PDFs solely based on logical structure. We found that /Linearized, /OpenAction, /Comment, /Root and /PageLayout were among the most commonly manipulated keywords. They indeed are found mainly in legitimate PDFs, but can be easily added to malicious PDFs by the versioning mechanism. The attacker can simply insert comments inside the malicious PDF file to augment its /Comment count. Similarly, she can embed legitimate OpenAction code to add /OpenAction keywords or she can add new pages to insert /PageLayout keywords.
In summary, our analysis shows that even detection systems that accurately classify nonmalicious data can be significantly degraded with only a few malicious modifications. This aspect highlights the importance of developing detection systems that are accurate, but also designed to be robust against adversarial manipulation of attack instances.
4.4 Discussion
In this section, we proposed a simple algorithm that allows for evasion of SVMs with differentiable kernels, and, more generally, of any classifier with a differentiable discriminant function. We investigated the attack effectiveness in the case of perfect knowledge of the attacked system. Further, we empirically showed that SVMs can still be evaded with high probability even if the adversary can only learn a classifier’s copy on surrogate data (limited knowledge). We believe that the proposed attack formulation can easily be extended to classifiers with nondifferentiable discriminant functions as well, such as decision trees and nearest neighbors.
Our analysis also suggests some ideas for improving classifier security. In particular, when the classifier tightly encloses the legitimate samples, the adversary must increasingly mimic the legitimate class to evade (see Figure 4), and this may not always be possible; e.g., malicious network packets or PDF files still need to embed a valid exploit, and some features may be immutable. Accordingly, a guideline for designing secure classifiers is that learning should encourage a tight enclosure of the legitimate class; e.g., by using a regularizer that penalizes classifying “blind spots”—regions with low —as legitimate. Generative classifiers can be modified, by explicitly modeling the attack distribution, as in biggio11smc (), and discriminative classifiers can be modified similarly by adding generated attack samples to the training set. However, these security improvements may incur higher FP rates.
In the above applications, the feature representations were invertible; i.e., there is a direct mapping from the feature vectors to a corresponding realworld sample (e.g., a spam email, or PDF file). However, some feature mappings can not be trivially inverted; e.g., gram analysis fogla06 (). In these cases, one may modify the realworld object corresponding to the initial attack point at each step of the gradient descent to obtain a sample in the feature space that as close as possible to the sample that would be obtained at the next attack iteration. A similar technique has been already exploited in to address the preimage problem of kernel methods biggio12icml ().
Other interesting extensions include (i) considering more effective strategies such as those proposed by lowd05 (); nelson12jmlr () to build a small but representative set of surrogate data to learn the surrogate classifier and (ii) improving the classifier estimate ; e.g.using an ensemble technique like bagging to average several classifiers breiman96bagging ().
5 Poisoning Attacks against SVMs
In the previous section, we devised a simple algorithm that allows for evasion of classifiers at test time and showed experimentally how it can be exploited to evade detection by SVMs and kernelbased classification techniques. Here we present another kind of attack, based on our work in biggio12icml (). Its goal is to force the attacked SVM to misclassify as many samples as possible at test time through poisoning of the training data, that is, by injecting wellcrafted attack samples into the training set. Note that, in this case, the test data is assumed not to be manipulated by the attacker.
Poisoning attacks are staged during classifier training, and they can thus target adaptive or online classifiers, as well as classifiers that are being retrained on data collected during test time, especially if in an unsupervised or semisupervised manner. Examples of these attacks, besides our work biggio12icml (), can be found in biggio12spr (); biggio11mcs (); biggio12icb (); kloft07 (); kloft12b (); nelson08 (); rubinstein09 (). They include specific application examples in different areas, such as intrusion detection in computer networks biggio11mcs (); kloft07 (); kloft12b (); rubinstein09 (), spam filtering biggio11mcs (); nelson08 (), and, most recently, even biometric authentication biggio12icb (); biggio12spr ().
In this section, we follow the same structure of Section 4. In Section 5.1, we define the adversary model according to our framework; then, in Sections 5.1 and 5.2 we respectively derive the optimal poisoning attack and the corresponding algorithm; and, finally, in Sections 5.3 and 5.4 we report our experimental findings and discuss the results.
5.1 Modeling the Adversary
Here, we apply our framework to evaluate security against poisoning attacks. As with the evasion attacks in Section 4.1, we model the attack scenario and derive the corresponding optimal attack strategy for poisoning.
Notation. In the following, we assume that an SVM has been trained on a dataset with and . The matrix of kernel values between two sets of points is denoted with , while denotes its labelannotated version, and denotes the SVM’s dual variables corresponding to each training point. Depending on the value of , the training points are referred to as margin support vectors (, set ), error support vectors (, set ) or reserve vectors (, set ). In the sequel, the lowercase letters are used to index the corresponding parts of vectors or matrices; e.g., denotes the submatrix of corresponding to the margin support vectors.
Adversary’s Goal
For a poisoning attack, the attacker’s goal is to find a set of points whose addition to maximally decreases the SVM’s classification accuracy. For simplicity, we start considering the addition of a single attack point . The choice of its label is arbitrary but fixed. We refer to the class of this chosen label as attacking class and the other as the attacked class.
Adversary’s Knowledge
According to Section 3.1, we assume that the adversary knows the training samples (k.i), the feature representation (k.ii), that an SVM learning algorithm is used (k.iii) and the learned SVM’s parameters (k.iv), since they can be inferred by the adversary by solving the SVM learning problem on the known training set. Finally, we assume that no feedback is exploited by the adversary (k.v).
These assumptions amount to considering a worstcase analysis that allows us to compute the maximum error rate that the adversary can inflict through poisoning. This is indeed useful to check whether and under what circumstances poisoning may be a relevant threat for SVMs.
Although having perfect knowledge of the training data is very difficult in practice for an adversary, collecting a surrogate dataset sampled from the same distribution may not be that complicated; for instance, in network intrusion detection an attacker may easily sniff network packets to build a surrogate learning model, which can then be poisoned under the perfect knowledge setting. The analysis of this limited knowledge poisoning scenario is however left to future work.
Adversary’s Capability
According to Section 3.1, we assume that the attacker can manipulate only training data (c.i), can manipulate the class prior and the classconditional distribution of the attack point’s class by essentially adding a number of attack points of that class into the training data, one at a time (c.iiiii), and can alter the feature values of the attack sample within some lower and upper bounds (c.iv). In particular, we will constrain the attack point to lie within a box, that is .
Attack Strategy
Under the above assumptions, the optimal attack strategy amounts to solving the following optimization problem:
(2)  
(3) 
where the hinge loss has to be maximized on a separate validation set to avoid considering a further regularization term in the objective function. The reason is that the attacker aims to maximize the SVM generalization error and not only its empirical estimate on the training data.
5.2 Poisoning Attack Algorithm
In this section, we assume the role of the attacker and develop a method for optimizing according to Equation (2). Since the objective function is nonlinear, we use a gradientascent algorithm, where the attack vector is initialized by cloning an arbitrary point from the attacked class and flipping its label. This initialized attack point (at iteration 0) is denoted by . In principle, can be any point sufficiently deep within the attacking class’s margin. However, if this point is too close to the boundary of the attacking class, the iteratively adjusted attack point may become a reserve point, which halts further progress.
The computation of the gradient of the validation error crucially depends on the assumption that the structure of the sets , and does not change during the update. In general, it is difficult to determine the largest step along the gradient direction , which preserves this structure. Hence, the step is fixed to a small constant value in our algorithm. After each update of the attack point , the optimal solution can be efficiently recomputed from the solution on , using the incremental SVM machinery cauwenberghs00 (). The algorithm terminates when the change in the validation error is smaller than a predefined threshold.
Gradient Computation
We now discuss how to compute the gradient of our objective function. For notational convenience, we now refer to the attack point as instead of .
First, we explicitly account for all terms in the margin conditions that are affected by the attack point :
(4)  
As already mentioned, is a nonconvex objective function, and we thus exploit a gradient ascent technique to iteratively optimize it. We denote the initial location of the attack point as . Our goal is to update the attack point as where is the current iteration, is a unit vector representing the attack direction (i.e., the normalized objective gradient), and is the step size. To maximize our objective, the attack direction is computed at each iteration.
Although the hinge loss is not everywhere differentiable, this can be overcome by only considering point indices with nonzero contributions to ; i.e., those for which . Contributions of such points to can be computed by differentiating Equation (4) with respect to using the product rule:
(5) 
where, by denoting the feature of as , we use the notation
The expressions for the gradient can be further refined using the fact that the gradient step must preserve the optimal SVM solution. This can expressed as an adiabatic update condition using the technique introduced in cauwenberghs00 (). In particular, for the training point, the KKT conditions of the optimal SVM solution are:
(6)  
(7) 
The form of these conditions implies that an infinitesimal change in the attack point causes a smooth change in the optimal solution of the SVM, under the restriction that the composition of the sets , and remains intact. This equilibrium allows us to predict the response of the SVM solution to the variation of , as shown below.
By differentiation of the dependent terms in Equations (6)–(7) with respect to each feature (), we obtain, for any ,
Solving these equations and computing an inverse matrix via the ShermanMorrisonWoodbury formula Lue96 () yields the following gradients:
where and . We thus obtain the following gradient of the objective used for optimizing our attack, which only depends on through gradients of the kernel matrix, :
(8) 
where .
Kernelization
From Equation (8), we see that the gradient of the objective function at iteration may depend on the attack point only through the gradients of the matrix . In particular, this depends on the chosen kernel. We report below the expressions of these gradients for three common kernels (see Section 4.2):

Linear kernel:

Polynomial kernel:

RBF kernel:
The dependence of these gradients on the current attack point, , can be avoided by using the previous attack point, provided that is sufficiently small. This approximation enables a straightforward extension of our method to arbitrary differentiable kernels.
5.3 Experiments
The experimental evaluation presented in the following sections demonstrates the behavior of our proposed method on an artificial twodimensional dataset and evaluates its effectiveness on the classical MNIST handwritten digit recognition dataset globersonICML06 (); LeCun95 ().
Twodimensional Toy Example
Here we consider a twodimensional example in which each class follows a Gaussian with mean and covariance matrices given by , , . The points from the negative distribution have label (shown as red in subsequent figures) and otherwise (shown as blue). The training and the validation sets, and , consist of and points per class, respectively.
In the experiment presented below, the red class is the attacking class. That is, a random point of the blue class is selected and its label is flipped to serve as the starting point for our method. Our gradient ascent method is then used to refine this attack until its termination condition is satisfied. The attack’s trajectory is traced as the black line in Figure 7 for both the linear kernel (upper two plots) and the RBF kernel (lower two plots). The background of each plot depicts an error surface: hinge loss computed on a validation set (leftmost plots) and the classification error (rightmost plots). For the linear kernel, the range of attack points is limited to the box shown as a dashed line. This implements the constraint of Equation (3).
For both kernels, these plots show that our gradient ascent algorithm finds a reasonably good local maximum of the nonconvex error surface. For the linear kernel, it terminates at the corner of the bounded region, since the error surface is unbounded. For the RBF kernel, it also finds a good local maximum of the hinge loss which, incidentally, is the maximum classification error within this area of interest.
Handwritten Digits
We now quantitatively validate the effectiveness of the proposed attack strategy on the MNIST handwritten digit classification task globersonICML06 (); LeCun95 (), as with the evasion attacks in Section 4.3. In particular, we focus here on the following twoclass subproblems: 7 vs. 1; 9 vs. 8; 4 vs. 0. Each digit is normalized as described in Section 4.3. We consider again a linear SVM with . We randomly sample a training and a validation data of and samples, respectively, and retain the complete testing data given by MNIST for . Although it varies for each digit, the size of the testing data is about 2000 samples per class (digit).
Figure 8 shows the effect of single attack points being optimized by our descent method. The leftmost plots of each row show the example of the attacked class used as starting points in our algorithm. The middle plots show the final attack point. The rightmost plots depict the increase in the validation and testing errors as the attack progresses. For this experiment we run the attack algorithm times by reinitializing the gradient ascent procedure, and we retain the best result.
Visualizing the attack points reveals that these attacks succeed by blurring the initial prototype to appear more like examples of the attacking class. In comparing the initial and final attack points, we see that the bottom segment of the straightens to resemble a , the lower segment of the is rounded to mimicking an , and ovular noise is added to the outer boundary of the to make it similar to a . These blurred images are thus consistent with one’s natural notion of visually confusing digits.
The rightmost plots further demonstrate a striking increase in error over the course of the attack. In general, the validation error overestimates the classification error due to a smaller sample size. Nonetheless, in the exemplary runs reported in this experiment, a single attack data point caused the classification error to rise from initial error rates of 2–5% to 15–20%. Since our initial attack points are obtained by flipping the label of a point in the attacked class, the errors in the first iteration of the rightmost plots of Figure 8 are caused by single random label flips. This confirms that our attack can achieve significantly higher error rates than random label flips, and underscores the vulnerability of the SVM to poisoning attacks.
The latter point is further illustrated in a multiple point, multiple run experiment presented in Figure 9. For this experiment, the attack was extended by repeatedly injecting attack points into the same class and averaging results over multiple runs on randomly chosen training and validation sets of the same size (100 and 500 samples, respectively). These results exhibit a steady rise in classification error as the percentage of attack points in the training set increases. The variance of the error is quite high, which can be attributed to the relatively small sizes of the training and validation sets. Also note that, in this experiment, to reach an error rate of 15–20%, the adversary needs to control at least 4–6% of the training data, unlike in the single point attacks of Figure 8. This is because Figure 8 displays the best single point attack from five restarts whereas here initial points are selected without restarts.
5.4 Discussion
The poisoning attack presented in this section, summarized from our previous work in biggio12icml (), is a first step toward the security analysis of SVM against training data attacks. Although our gradient ascent method is not optimal, it attains a surprisingly large impact on the SVM’s classification accuracy.
Several potential improvements to the presented method remain to be explored in future work. For instance, one may investigate the effectiveness of such an attack with surrogate data, that is, when the training data is not known to the adversary, who may however collect samples drawn from the same distribution to learn a classifier’s copy (similarly to the limited knowledge case considered in the evasion attacks of Section 4). Another improvement may be to consider the simultaneous optimization of multipoint attacks, although we have already demonstrated that greedy, sequential singlepoint attacks may be rather successful.
An interesting analysis of the SVM’s vulnerability to poisoning suggested from this work is to consider the attack’s impact under loss functions other than the hinge loss. It would be especially interesting to analyze bounded loss functions, like the ramp loss, since such losses are designed to limit the impact of any single (attack) point on the outcome. On the other hand, while these losses may lead to improved security to poisoning, they also make the SVM’s optimization problem nonconvex, and, thus, more computationally demanding. This may be viewed as another tradeoff between computational complexity of the learning algorithm and security.
An important practical limitation of the proposed method is the assumption that the attacker controls the labels of injected points. Such assumptions may not hold if the labels are assigned by trusted sources such as humans, e.g., antispam filters use their users’ labeling of messages as ground truth. Thus, although an attacker can send arbitrary messages, he cannot guarantee that they will have the labels necessary for his attack. This imposes an additional requirement that the attack data must satisfy certain side constraints to fool the labeling oracle. Further work is needed to understand and incorporate these potential side constraints into attacks.
6 Privacy Attacks against SVMs
We now consider a third scenario in which the attacker’s goal is to affect a breach of the training data’s confidentiality. We review our recent work PrivateSVM () deriving mechanisms for releasing SVM classifiers trained on privacysensitive data while maintaining the data’s privacy. Unlike previous sections, our focus here lies primary in the study of countermeasures, while we only briefly consider attacks in the context of lower bounds. We adopt the formal framework of Dwork et al. DiffPrivacy (), in which a randomized mechanism is said to preserve differential privacy, if the likelihood of the mechanism’s output changes by at most when a training datum is changed arbitrarily (or even removed). The power of this framework, which gained nearuniversal favor after its introduction, is that it quantifies privacy in a rigorous way and provides strong guarantees even against powerful adversaries with knowledge of almost all of the training data, knowledge of the mechanism (barring its source of randomness), arbitrary access to the classifier output by the mechanism, and the ability to manipulate almost all training data prior to learning.
This section is organized as follows. In Section 6.1 we outline our model of the adversary, which makes only weak assumptions. Section 6.2 provides background on differential privacy, presents a mechanism for training and releasing privacypreserving SVMs—essentially a countermeasure to many privacy attacks—and provides guarantees on differential privacy and also utility (e.g., controlling the classifier’s accuracy). We then briefly touch on existing approaches for evaluation via lower bounds and discuss other work and open problems in Section 6.3.
6.1 Modeling the Adversary
We first apply our framework to define the threat model for defending against privacy attacks within the broader context of differential privacy. We then focus on specific countermeasures in the form of modifications to SVM learning that provide differential privacy.
Adversary’s Goal
The ultimate goal of the attacker in this section is to determine features and/or the label of an individual training datum. The overall approach of the adversary towards this goal, is to inspect (arbitrary numbers of) testtime classifications made by a released classifier trained on the data, or by inspecting the classifier directly. The definition of differential privacy, and the particular mechanisms derived here, can be modified for related goals of determining properties of several training data; we focus on the above conventional case without loss of generality.
Adversary’s Knowledge
As alluded to above, we endow our adversary with significant knowledge of the learning system, so as to derive countermeasures that can withstand very strong attacks. Indeed the notion of differential privacy, as opposed to more syntactic notions of privacy such as anonymity kanony (), was inspired by decadesold work in cryptography that introduced mathematical formalism to an ageold problem, yielding significant practical success. Specifically, we consider a scenario in which the adversary has complete knowledge of the raw input feature representation, the learning algorithm (the entire mechanism including the form of randomization it introduces, although not the source of randomness) and the form of its decision function (in this case, a thresholded SVM), the learned classifier’s parameters (the kernel/feature mapping, primal weight vector, and bias term), and arbitrary/unlimited feedback from the deployed classifier (k.iiv). We grant the attacker near complete knowledge of the training set (k.i): the attacker may have complete knowledge of all but one training datum, for which she has no knowledge of input feature values or its training label, and it is these attributes she wishes to reveal. For simplicity of exposition, but without loss of generality, we assume this to be the last datum in the training sample.
Adversary’s Capability
Like our assumptions on the attacker’s knowledge, we impose weak limitations on the adversary’s capability. We assume an adversary that can manipulate both training and test data (c.i), although the latter is subsumed by the attacker’s complete knowledge of the decision function and learned parameters—e.g., she may implement her own classifier and execute it arbitrarily, or she may submit or manipulate test points presented to a deployed classifier.
Our attack model makes no assumptions about the origins of the training or test data. The data need not be sampled independently or even according to a distribution—the definition of differential privacy provided below makes worstcase assumptions about the training data, and again the test data could be arbitrary. Thus the adversary may have arbitrary capability to modify class priors, training data features and labels (c.iiiv) except that the adversary attacking the system may not directly modify the targeted training datum because she does not have knowledge of it. That said, however, differential privacy makes worstcase (no distributional) assumptions about the datum and thus one could consider even this data point as being adversarially manipulated by nature (i.e., nature does not collude with the attacker to share information about the target training datum, but that may collude to facilitate a privacy breach by selecting a “convenient” target datum).
Attack Strategy
While no practical privacy attacks on SVMs have been explored in the past—an open problem discussed in Section 6.3—a general approach would be to approximate the inversion of the learning map on the released SVM parametrization (either primal weight vector, or dual variables) around the known portion of the training data. In practice this could be achieved by taking a similar approach as done in Section 5 whereby an initial guess of a missing training point is iterated on by taking gradient steps of the differential in the SVM parameter vector with respect to the missing training datum. An interpretation of this approach is one of simulation: to guess a missing training datum, given access to the remainder of the training set and the SVM solution on all the data, simulate the SVM on guesses for the missing datum, updating the guesses in directions that appropriately shift the intermediate solutions. As we discuss briefly in the sequel, theoretical lower bounds on achievable privacy relate to attacks in pathological cases.
6.2 Countermeasures with Provable Guarantees
Given an adversary with such strong knowledge and capabilities as described above, it may seem difficult to provide effective countermeasures particularly considering the complication of abundant access to side information that is often used in publicized privacy attacks breaknetflix (); kanony (). However, the crux that makes privacypreservation under these conditions possible lies in the fact that the learned quantity being released is an aggregate statistic of the sensitive data; intuitively the more data being aggregated, the less sensitive a statistic should be to changes or removal of any single datum. We now present results from our recent work that quantifies this effect PrivateSVM (), within the framework of differential privacy.
Background on Differential Privacy
We begin by recalling the key definition due to Dwork et al. DiffPrivacy (). First, for any training set denote set to be a neighbor of (or ) if where . In the present context, differential privacy is a desirable property of learning maps, which maps a training set to a continuous discriminant function of the form —here a learned SVM—in some space of functions, . We say that a randomized^{6}^{6}6That is, the learning map’s output is not a deterministic function of the training data. The probability in the definition of differential privacy is due to this randomness. Our treatment here is only as complex as necessary, but to be completely general, the events in the definition should be on measurable sets rather than individual . learning map preserves differential privacy if for all datasets , all neighboring sets of , and all possible functions , the following relation holds
Intuitively, if we initially fix a training set and neighboring training set, differential privacy simply says that the two resulting distributions induced on the learned functions are pointwise close—and closer for smaller . For a patient deciding whether to submit her datum to a training set for a cancer detector, differential privacy means that the learned classifier will reveal little information about that datum. Even an adversary with access to the innerworkings of the learner, with access to all other patients’ data, and with the ability to guessandsimulate the learning process repeatedly with various possible values of her datum, cannot reverse engineer her datum from the classifier released by the hospital because the adversary cannot distinguish the classifier distribution on one training set, from that on neighboring sets. Moreover, variations of this definition (which do not significantly affect the presented results) allow for neighboring databases to be defined as those missing a datum; or having several varying data, not just a single one.
For simplicity of exposition, we drop the explicit bias term from our SVM learning process and instead assume that the data feature vectors are augmented with a unit constant, and that the resulting additional normal weight component corresponds to the bias. This is an equivalent SVM formulation that allows us to focus only on the normal’s weight vector.
A classic route to establish differential privacy is to define a randomized map that returns the value of a deterministic, nonrandom plus a noise term. Typically, we use an exponential family in a term that matches an available Lipschitz condition satisfied by : in our case, for learning maps that return weight vectors in , we aim to measure global sensitivity of via the norm as
With respect to this sensitivity, we can easily prove that the randomized mechanism
is differential private.^{7}^{7}7Recall that the zeromean multivariate Laplace distribution with scale parameter has density proportional to . The wellestablished proof technique DiffPrivacy () follows from the definition of the Laplace distribution involving the same norm as used in our measure of global sensitivity, and the triangle inequality: for any training set , , response , and privacy parameter
We take the above route to develop a differentiallyprivate SVM. As such, the onus is on calculating the SVM’s global sensitivity, .
Global Sensitivity of Linear SVM
Unlike much prior work applying the “Laplace mechanism” to achieving differential privacy, in which studied estimators are often decomposed as linear functions of data SuLQ (), measuring the sensitivity of the SVM appears to be nontrivial owing to the nonlinear influence an individual training datum may have on the learned SVM. However, perturbations of the training data were studied by the learningtheory community in the context of algorithmic stability: there the goal is to establish bounds on classifier risk, from stability of the learning map, as opposed to leveraging combinatorial properties of the hypothesis class (e.g., the VC dimension, which is not always possible to control, and for the RBF kernel SVM is infinite) SVMbook (). In recent work PrivateSVM (), we showed how these existing stability measurements for the SVM can be adapted to provide the following global sensitivity bound.
Lemma 1
Consider SVM learning with a kernel corresponding to linear SVM in a feature space with finitedimension and norm bounded^{8}^{8}8That is , ; e.g. for the RBF the norm is uniformly unity ; more generally, we can make the standard assumption that the data lies within some ball. by , with hinge loss (as used throughout this chapter), and chosen parameter . Then the global sensitivity of the resulting normal weight vector is upperbounded by .
We omit the proof, which is available in the original paper PrivateSVM () and which follows closely the previous measurements for algorithmic stability. We note that the result extends to any convex Lipschitz loss.
DifferentiallyPrivate SVMs
So far we have established that Algorithm 3, which learns an SVM and returns the resulting weight vector with added Laplace noise, preserves differential privacy. More noise is added to the weight vector when either (i) a higher degree of privacy is desired (smaller ), (ii) the SVM fits closer to the data (higher ) or (iii) the data is more distinguishable (higher or —the curse of dimensionality). Hidden in the above is the dependence on : typically we take to scale like to achieve consistency in which case we see that noise decreases with larger training data—akin to less individual influence—as expected PrivateSVM ().
Problematic in the above approach, is the destruction to utility due to preserving differential privacy. One approach to quantifying this effect, involves bounding the following notion of utility PrivateSVM (). We say a privacypreserving learning map has utility with respect to nonprivate map if for all training sets ,
The norm here is in the function space of continuous discriminators, learned by the learning maps, and is the pointwise norm which corresponds to —although for technical reasons we will restrict the supremum to be over a set to be specified later. Intuitively, this indicates that the continuous predictions of the learned private classifier are close to those predictions of the learned nonprivate classifier, for all test points in , with high probability (again, in the randomness due to the private mechanism). This definition draws parallels with PAC learnability, and in certain scenarios is strictly stronger than requiring that the private learner achieves good risk (i.e., PAC learns) PrivateSVM (). Using the Chernoff tail inequality and known momentgenerating functions, we establish the following bound on the utility of this private SVM PrivateSVM ().
Theorem 6.1
The differentiallyprivate SVM of Algorithm 3 achieves utility with respect to the nonprivate SVM run with the same parameter and kernel, for and
where the set supporting the supremum in the definition of utility is taken to be the preimage of the feature mapping on the ball of radius .^{9}^{9}9Above we previously bounded the norms of points in features space by , the additional bound on the norm here is for convenience and is standard practice in learningtheoretic results.
As expected, the more confidence or privacy required, the less accuracy is attainable. Similarly, when the training data is fitted more tightly via higher , or when the data is less tightly packed for higher , less accuracy is possible. Note that like the privacy result, this result can hold for quite general loss functions.
6.3 Discussion
In this section, we have provided a summary of our recent results on strong countermeasures to privacy attacks on the SVM. We have shown how, through controlled addition of noise, SVM learning in finitedimensional feature spaces can provide both privacy and utility guarantees. These results can be extended to certain translationinvariant kernels including the infinitedimensional RBF PrivateSVM (). This extension borrows a technique from largescale learning where finding a dual solution of the SVM for large training data size is infeasible. Instead, a primal SVM problem is solved using a random kernel that uniformly approximates the desired kernel. Since the approximating kernel induces a feature mapping of relatively small, finite dimensions, the primal solution becomes feasible. For privacy preservation, we use the same primal approach but with this new kernel. Fortunately, the distribution of the approximating kernel is independent of the training data, and thus we can reveal the approximating kernel without sacrificing privacy. Likewise the uniform approximation of the kernel composes with the utility result here to yield an analogous utility guarantee for translationinvariant kernels.
While we demonstrated here a mechanism for private SVM learning with upper bounds on privacy and utility, we have previously also studied lower bounds that expose limits on the achievable utility of any learner that provides a given level of differential privacy. Further work is needed to sharpen these results. In a sense, these lower bounds are witnessed by pathological training sets and perturbation points and, as such, serve as attacks in pathological (unrealistic) cases. Developing practical attacks on the privacy of an SVM’s training data remains unexplored.
Finally, it is important to note that alternate approaches to differentiallyprivate SVMs have been explored by others. Most notable is the work (parallel to our own) of Chaudhuri et al. PrivateERM (). Their approach to finitedimensional feature mappings is, instead of adding noise to the primal solution, to add noise to the primal objective in the form of a dot product of the weight with a random vector. Initial experiments show their approach to be very promising empirically, although it does not allow for nondifferentiable losses like the hinge loss.
7 Concluding Remarks
In security applications like malware detection, intrusion detection, and spam filtering, SVMs may be attacked through patterns that can either evade detection (evasion), mislead the learning algorithm (poisoning), or gain information about their internal parameters or training data (privacy violation). In this chapter, we demonstrated that these attacks are feasible and constitute a relevant threat to the security of SVMs, and to machine learning systems, in general.
Evasion. We proposed an evasion algorithm against SVMs with differentiable kernels, and, more generally, against classifiers with differentiable discriminant functions. We investigated the attack’s effectiveness in perfect and limited knowledge settings. In both cases, our attack simulation showed that SVMs (both linear and RBF) can be evaded with high probability after a few modifications to the attack patterns. Our analysis also provides some general hints for tuning the classifier’s parameters (e.g., the value of in SVMs with the RBF kernel) and for improving classifier security. For instance, if a classifier tightly encloses the legitimate samples, the adversary’s samples must closely mimic legitimate samples to evade it, in which case, if such exact mimicry is still possible, it suggests an inherent flaw in the feature representation.
Poisoning. We presented an algorithm that allows the adversary to find an attack pattern whose addition to the training set maximally decreases the SVM’s classification accuracy. We found that the increase in error over the course of attack is especially striking. A single attack data point may cause the classification error to rise from the initial error rates of 2–5% to 15–20%. This confirms that our attack can achieve significantly higher error rates than random label flips, and underscores the vulnerability of the SVM to poisoning attacks. As a future investigation, it may be of interest to analyze the effectiveness of poisoning attacks against nonconvex SVMs with bounded loss functions, both empirically and theoretically, since such losses are designed to limit the impact of any single (attack) point on the resulting learned function. This has been also studied from a more theoretical perspective in christmann04 (), exploiting the framework of Robust Statistics hampel86 (); maronna06 (). A similar effect is obtained by using bounded kernels (e.g., the RBF kernel) or bounded feature values.
Privacy. We developed an SVM learning algorithm that preserves differential privacy, a formal framework for quantifying the threat of a potential training set privacy violation incurred by releasing learned classifiers. Our mechanism involves adding Laplacedistributed noise to the SVM weight vector with a scale that depends on the algorithmic stability of the SVM and the desired level of privacy. In addition to presenting a formal guarantee that our mechanism preserves privacy, we also provided bounds on the utility of the new mechanism, which state that the privacypreserving classifier makes predictions that are pointwise close to those of the nonprivate SVM, with high probability. Finally we discussed potential approaches for attacking SVMs’ training data privacy, and known approaches to differentiallyprivate SVMs with (possibly infinitedimensional feature space) translationinvariant kernels, and lower bounds on the fundamental limits on utility for private approximations of the SVM.
Acknowledgements.
This work has been partly supported by the project CRP18293 funded by Regione Autonoma della Sardegna, L.R. 7/2007, Bando 2009, and by the project “Advanced and secure sharing of multimedia data over social networks in the future Internet” (CUP F71J1100069 0002) funded by the same institution. Davide Maiorca gratefully acknowledges Regione Autonoma della Sardegna for the financial support of his PhD scholarship (P.O.R. Sardegna F.S.E. Operational Programme of the Autonomous Region of Sardinia, European Social Fund 20072013  Axis IV Human Resources, Objective l.3, Line of Activity l.3.1.). Blaine Nelson thanks the Alexander von Humboldt Foundation for providing additional financial support. The opinions expressed in this chapter are solely those of the authors and do not necessarily reflect the opinions of any sponsor.Bibliography
 (1) Adobe: PDF Reference, sixth edition, version 1.7
 (2) Balfanz, D., Staddon, J. (eds.): Proceedings of the ACM Workshop on AISec (AISec). ACM (2008)
 (3) Balfanz, D., Staddon, J. (eds.): Proceedings of the ACM Workshop on Security and Artificial Intelligence (AISec). ACM (2009)
 (4) Barreno, M., Nelson, B., Joseph, A., Tygar, J.: The security of machine learning. Machine Learning 81, 121–148 (2010). 10.1007/s1099401051885
 (5) Barreno, M., Nelson, B., Sears, R., Joseph, A.D., Tygar, J.D.: Can machine learning be secure? In: ASIACCS ’06: Proceedings of the 2006 ACM Symposium on Information, Computer and Communications Security, pp. 16–25. ACM, New York, NY, USA (2006). DOI http://doi.acm.org/10.1145/1128817.1128824
 (6) Barth, A., Rubinstein, B.I.P., Sundararajan, M., Mitchell, J.C., Song, D., , Bartlett, P.L.: A learningbased approach to reactive security. IEEE Transactions on Dependable and Secure Computing 9(4), 482–493 (2012)
 (7) Biggio, B., Corona, I., Fumera, G., Giacinto, G., Roli, F.: Bagging classifiers for fighting poisoning attacks in adversarial environments. In: the International Workshop on Multiple Classifier Systems (MCS), Lecture Notes in Computer Science, vol. 6713, pp. 350–359. SpringerVerlag (2011)
 (8) Biggio, B., Corona, I., Maiorca, D., Nelson, B., Šrndić, N., Laskov, P., Giacinto, G., Roli, F.: Evasion attacks against machine learning at test time. In: H. Blockeel et al. (ed.) European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD), Part III, Lecture Notes in Artificial Intelligence, vol. 8190, pp. 387–402. SpringerVerlag Berlin Heidelberg (2013)
 (9) Biggio, B., Didaci, L., Fumera, G., Roli, F.: Poisoning attacks to compromise face templates. In: the IAPR International Conference on Biometrics (ICB) (2013)
 (10) Biggio, B., Fumera, G., Pillai, I., Roli, F.: A survey and experimental evaluation of image spam filtering techniques. Pattern Recognition Letters 32(10), 1436 – 1446 (2011)
 (11) Biggio, B., Fumera, G., Roli, F.: Design of robust classifiers for adversarial environments. In: IEEE Int’l Conf. on Systems, Man, and Cybernetics (SMC), pp. 977–982 (2011)
 (12) Biggio, B., Fumera, G., Roli, F.: Security evaluation of pattern classifiers under attack. IEEE Transactions on Knowledge and Data Engineering 99(PrePrints), 1 (2013)
 (13) Biggio, B., Fumera, G., Roli, F., Didaci, L.: Poisoning adaptive biometric systems. In: Structural, Syntactic, and Statistical Pattern Recognition, Lecture Notes in Computer Science, vol. 7626, pp. 417–425 (2012)
 (14) Biggio, B., Nelson, B., Laskov, P.: Poisoning attacks against support vector machines. In: Proceedings of the International Conference on Machine Learning (2012)
 (15) Blum, A., Dwork, C., McSherry, F., Nissim, K.: Practical privacy: the SuLQ framework. In: Proceedings of the Symposium on Principles of Database Systems, pp. 128–138 (2005)
 (16) Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)
 (17) Brückner, M., Kanzow, C., Scheffer, T.: Static prediction games for adversarial learning problems. Journal of Machine Learning Research 13, 2617–2654 (2012)
 (18) Cárdenas, A.A., Baras, J.S.: Evaluation of classifiers: Practical considerations for security applications. In: AAAI Workshop on Evaluation Methods for Machine Learning (2006)
 (19) Cárdenas, A.A., Nelson, B., Rubinstein, B.I. (eds.): The ACM Workshop on Artificial Intelligence and Security (AISec). ACM (2012)
 (20) Cauwenberghs, G., Poggio, T.: Incremental and decremental support vector machine learning. In: T.K. Leen, T.G. Dietterich, V. Tresp (eds.) NIPS, pp. 409–415. MIT Press (2000)
 (21) Chaudhuri, K., Monteleoni, C., Sarwate, A.D.: Differentially private empirical risk minimization. Journal of Machine Learning Research 12(Mar), 1069–1109 (2011)
 (22) Chen, Y., Cárdenas, A.A., Greenstadt, R., Rubinstein, B. (eds.): Proceedings of the ACM Workshop on Security and Artificial Intelligence (AISec). ACM (2011)
 (23) Christmann, A., Steinwart, I.: On robust properties of convex risk minimization methods for pattern recognition. Journal of Machine Learning Research 5, 1007–1034 (2004)
 (24) Corona, I., Biggio, B., Maiorca, D.: Adversarialib: a generalpurpose library for the automatic evaluation of machine learningbased classifiers under adversarial attacks. URL http://sourceforge.net/projects/adversarialib/ (2013)
 (25) Cortes, C., Vapnik, V.: Supportvector networks. Machine Learning 20, 273–297 (1995)
 (26) Dalvi, N., Domingos, P., Mausam, Sanghai, S., Verma, D.: Adversarial classification. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 99–108 (2004)
 (27) Dimitrakakis, C., GkoulalasDivanis, A., Mitrokotsa, A., Verykios, V.S., Saygin, Y. (eds.): International ECML/PKDD Workshop on Privacy and Security Issues in Data Mining and Machine Learning (2010)
 (28) Drucker, H., Wu, D., Vapnik, V.N.: Support vector machines for spam categorization. IEEE Transactions on Neural Networks 10(5), 1048–1054 (1999)
 (29) Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. WileyInterscience (2000)
 (30) Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Proceedings of the Theory of Cryptography Conference (TCC 2006), pp. 265–284 (2006)
 (31) Fogla, P., Sharif, M., Perdisci, R., Kolesnikov, O., Lee, W.: Polymorphic blending attacks. In: Proceedings of the Conference on USENIX Security Symposium (2006)
 (32) Globerson, A., Roweis, S.T.: Nightmare at test time: robust learning by feature deletion. In: Proceedings of the International Conference on Machine Learning, pp. 353–360 (2006)
 (33) Golland, P.: Discriminative direction for kernel classifiers. In: Neural Information Processing Systems (NIPS), pp. 745–752 (2002)
 (34) Greenstadt, R. (ed.): Proceedings of the ACM Workshop on Artificial Intelligence and Security (AISec). ACM (2010)
 (35) Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.J., Stahel, W.A.: Robust Statistics: The Approach Based on Influence Functions. Probability and Mathematical Statistics. John Wiley and Sons, New York, NY, USA (1986). URL http://www.worldcat.org/isbn/0471735779
 (36) Huang, L., Joseph, A.D., Nelson, B., Rubinstein, B., Tygar, J.D.: Adversarial machine learning. In: Proceedings of the ACM Workshop on Artificial Intelligence and Security (AISec), pp. 43–57 (2011)
 (37) Joseph, A.D., Las