Can Adversarially Robust Learning Leverage Computational Hardness?
Making learners robust to adversarial perturbation at test time (i.e., evasion attacks) or training time (i.e., poisoning attacks) has emerged as a challenging task in machine learning. It was recently shown that for many natural settings, there exist sublinear perturbations that can drastically decrease the performance in both training and testing phase. These barriers, however, are information theoretic and only prove the existence of such successful adversarial perturbations. A natural question is whether or not we can make classifiers computationally robust to polynomial-time adversaries.
In this work, we prove strong barriers against achieving such envisioned computational robustness both for evasion and poisoning attacks. In particular, we show that if the test instances come from a product distribution (e.g., uniform over or , or isotropic Gaussian of dimension ) and that there is an initial constant error, then there always exists a polynomial-time attack that finds adversarial examples of Hamming distance . For poisoning attacks, we prove that for any learning algorithm with sample complexity , there always exist polynomial-time online poisoning attacks that tamper with many examples, replace them with other correctly labeled examples, and decrease the confidence of the learner (in finding a low risk hypothesis) from to , or alternatively increase the error for any chosen target instance from to .
Our poisoning and evasion attacks are black-box in how they access their corresponding components of the system (i.e., the hypothesis, the concept, and the learning algorithm), and they do not make any assumptions about the classifier or the learning algorithm producing the classifier.
- 1 Introduction
- 2 Preliminaries
- 3 Polynomial-time Attacks from Computational Concentration of Products
- 4 Products are Computationally Concentrated under Hamming Distance
- 5 Conclusion
Making trained classifiers robust to adversarial attacks of various forms has been an active line of research in machine learning recently. Two major forms of attack are the so called “evasion” and “poisoning” attacks. In an evasion attack, an adversary enters the game during the test phase and tries to change the instance into a “close” adversarial instance that is misclassified by the hypothesis (a.k.a. trained model) . In a poisoning attack, the adversary manipulates the training data into a “closely related” poisoned version with the hope of increasing the risk of the produced hypothesis based on the poisoned data, or perhaps make a particular test instance (for which the adversary might have a special interest) misclassified. Starting with Szegedy et al. [SZS14] a race has emerged between evasion attacks that aim to find adversarial examples and defences against those attacks (e.g., see also [BCM13, BFR14, GSS15, PMW16, CW17, XEQ17, ACW18]). In another line of work, many papers studied poisoning attacks and defense mechanisms against them (e.g., see [ABL17, XBB15, PMSW16, RNH09, MDM18b, SHN18]). Though, some specific problems (e.g., that of image classification) naturally has got more attention in this line of work, in this work we approach the robustness problem from a general perspective as a theoretically fundamental problem.
Is adversarially robust classification possible?
Recently, started by Gilmer et el. [GMF18] and followed by [FFF18, DMM18, SHS18, MDM18a], it was shown that for many natural metric probability spaces of instances (e.g., uniform distribution over , unit -sphere, or isotropic Gaussian in dimension , all with “normalized” Euclidean or Hamming distance) adversarial examples of sublinear perturbations exist for almost all test instances. Indeed, as shown by Mahloujifar et al. [MDM18a], if the instances are drawn from a “normal LÃ©vy family” of metric probability spaces (that include all the above-mentioned cases), and if there exists an initial non-negligible risk for the generated hypothesis classifier , an adversary can perturb an initial instance into an adversarial one that is only (which is sublinear in ) far from and that is misclassified.
The work of Mahloujifar et al. [MDM18a] also proved a similar result about the “inevitabily” of poisoning attacks of various forms. Namely, if the goal of the adversary is to decrease the confidence of the produced hypothesis to have error at most , or if its goal is to increase the classification error of a particular instance , there is always an adversarial poisoning strategy that achieves this goal by changing of the training examples, where is the sample complexity of the learner.
Is computationally robust classification possible?
All the above-mentioned sublinear-perturbation attacks of [FFF18, DMM18, SHS18, MDM18a], in both evasion and poisoning models, are information theoretic. Namely, they only show the existence of such adversarial instances for evasion attacks or adversarial poisoned data with sublinear perturbations for poisoning attacks. In this work, we study the next natural question; can these information theoretic lower bounds be avoided in the computational setting? Namely, for what learning problems we can design solutions that resist polynomial-time adversaries that attack the robustness of the learning process. More specifically, the central question studied in our work is as follows.
Can we make classifiers robust to computationally bounded adversarial perturbations of sublinear magnitude that occur during the training or the test phase?
The focus on sublinear perturbations is because if the adversary can indeed change the whole training data, or if it can completely change the test instance, it simply can make the hypothesis fail always.
1.1 Our Results
In this work, we prove strong barriers against basing the robustness of classifiers, in both evasion and poisoning settings, on computational hardness. Namely, we show that in many natural settings (i.e., any problem for which the instances are drawn from any product distribution and that their distances are measured by Hamming distance) adversarial examples could be found in polynomial time. This result applies to any learning task over these distributions. In the poisoning attacks’ setting, we show that for any learning task and any distribution over the labeled instances, if the goal of the adversary is to decrease the confidence of the learner or to increase its error on any particular instance , it can always do so in polynomial time by only changing of the labeled instances and replacing them with yet correctly labeled examples. Below we describe both of these results at a high level.
See Theorem 3.3 for the formal version of the following theorem.
Theorem 1.1 (Informal: polynomial-time evasion attacks).
Let be a classification problem in which the test instances are drawn from a product distribution . Suppose is a concept function (i.e., ground truth) and is a hypothesis that has a constant error in predicting . Then, there is a polynomial-time (black-box) adversary that perturbs only of the blocks of the instances and make them misclassified with probability .
The above theorem covers many natural distributions such as uniform over or or the isotropic Gaussian of dimension , so long as the distance measure is Hamming distance.
Also, as we will see in Theorem 3.3, the initial error necessary for our polynomial-time evasion attack could be as small as to keep the perturbations , and even initial error is enough to keep the perturbations sublinear . Finally, by “black-box” we mean that our attacker only needs oracle access to the hypothesis and the ground truth .
We now describe our main result about polynomial-time poisoning attacks. See Theorem 3.5 for the formal version of the following theorem.
Theorem 1.2 (Informal: polynomial-time poisoning attacks).
Let be a classification problem with a deterministic learner that is given labeled examples of the form for a concept function (determining the ground truth).
Decreasing confidence. For any risk threshold , let be the probability that produces a hypothesis of risk at most , referred to as the -confidence of . If is at most , then there is a polynomial-time adversary that replaces at most of the training examples with other correctly classified examples and makes the -confidence go down to any constant .
Increasing chosen-instance error. For any fixed test instance , if the average error of the hypotheses generated by over instance is at least , then there is a polynomial-time adversary that replaces at most of the training examples with other correctly classified examples and increases this average error to any constant .
Moreover, both attacks above are online and black-box.
Roughly speaking, the two parts of Theorem 1.2 generalize to any setting in which the adversary attacks a particular predicate over the generated hypothesis (this could be making a mistake on a particular or having more than risk).
Other features of our poisoning attacks.
Our attacks are online; i.e., during the attack, the adversary is only aware of the training examples sampled so far when it decides about the next tampering decision (see [WC18] where more attacks of this form are studied).
Our attacks only use correct labels for instances that they inject to the training set (see [SHN18] where attacks of this form are studied in practice).
Our attacks are black-box [PMG17], as they use the learning algorithm and concept as oracles.
Computational constraints for robust learning were previously considered by the works of Mahloujifar et al. [MM17, MDM18b] for poisoning attacks and by Bubeck et al. [BPR18] in the context of evasion attacks. The works of [MM17, MDM18b] studied so called -tampering attacks that are poisoning attacks that are online, never use incorrect labels, and allow each incoming training example to be substituted by the adversary by independent probability . (The latter property makes -tampering attacks a special form of Valiant’s malicious noise model [Val85].) They showed that for an initial error , polynomial-time -tampering attacks can decrease the confidence or increase a chosen instance’s error by . Therefore, in order to increase the error to , their attacks need to tamper with a linear number of training examples. The more recent work of Mahloujifar et al. [MDM18a] improved this attack to use only a sublinear number of tamperings at the cost of only achieving information theoretic (exponential time) attacks. In this work, we get the best of both worlds, i.e., polynomial-time poisoning attacks of sublinear tampering budget.
The recent work of Bubeck et al. [BPR18] studied whether the difficulty of finding robust classifiers is due to information theoretic barriers or that it is due to computational constraints. Indeed, they showed that (for a broad range of problems with minimal conditions) if we assume the existence of robust classifiers then polynomially many samples would contain enough information for guiding the learners towards one of those robust classifiers, even though as shown by Schmidt at al. [SST18] this could be provably a larger sample complexity than the setting with no attacks. However, [BPR18] showed that finding such classifier might not be efficiently feasible, where efficiency here is enforced by Kearns’ statistical query model [Kea98]. So, even though our work and the work of [BPR18] both study computational constraints, the work of [BPR18] studied barriers against efficiently finding robust classifiers, while we study whether or not robust classifiers exist at all in the presence of computationally bounded attackers.
Other related definitions of adversarial examples.
In both of our results about poisoning and evasion attacks, we use definitions require misclassification of the final test as the goal of the adversary. However, other definitions of adversarial examples are proposed in the literature that coincide with this “default” variant, under natural conditions for specific problems of study (such as image classification). For example, Madry et al. [MMS18] proposed an alternative definition of adversarial risk that was inspired by robust optimization and is based on adversarial loss functions. Their definition compares the predicted label of the adversarial example with the original label of the honest (non-tampered) example. Some other works (e.g., [SZS14, FFF18]) only compare the prediction of the hypothesis over the adversarial example with its own prediction on the honest example (and so their definition is independent of the ground truth ). Even though in many natural settings these definitions become very close, in order to prover our formal theorems we use a definition that is based on the “error region” of the hypothesis in comparison with the ground truth that is implicit in [GMF18, BPR18] and in [MM17, MDM18b] in the context of poisoning attacks. We refer the reader to the work of Diochnos et al. [DMM18] for a taxonomy of these variants and further discussion.
1.2 Technique: Computational Concentration of Measure in Product Spaces
In order to prove our Theorems 1.1 and 1.2, we use ideas from a recent work of Kalai et al. [KKR18] in the context of attacking coin tossing protocols. Essentially, our proofs proceed by first designing new polynomial-time coin-tossing attacks with carefully changing the model of [KKR18], and then we show how such coin tossing attacks can be used to obtain evasion and poisoning attacks. Our new coin tossing attacks could be interpreted as polynomial-time algorithmic proofs for concentration of measure for product distributions, and using them instead of information theoretic concentration results used in [MDM18a].
To describe our techniques, it is instructive to first recall the big picture of the polynomial-time poisoning attacks of [MM17, MDM18a], even though they needed linear perturbations, before describing how those ideas can be extended to obtain stronger attacks with sublinear perturbations in both evasion and poisoning contexts. Indeed, the core idea there is to model the task of the adversary by a Boolean function over the training data , and roughly speaking define if the training process over leads to a misclassification by the hypothesis (on a chosen instance) or “low confidence” over the produced hypothesis. Then, they showed how to increase the expected value of any such from an initial constant value to by tampering with fraction of “blocks” of the input sequence .
The more recent work of [MDM18a] improved the bounds achieved by the above poisoning attacks by using an computationally inefficient attack who is more efficient in its tampering budget and only tampers with a sublinear number of the training examples and yet increase the average of from to . The key idea used in [MDM18a] was to use the concentration of measure in product probability spaces under the Hamming distance [AM80, MS86, McD89, Tal95]. Namely, it is known that for any product space of dimension (here, modeling the training sequence that is iid sampled) and any initial set of constant probability (here, , “almost all” of the points in the product space are of distance from , and so the measure is concentrated around .
The main technical contribution of our work is to show that one can essentially achieve similar concentration bounds (i.e., showing “typical” distance of from any set with constant measure) for any product distribution of dimension , by using polynomial time algorithms that efficiently find a “close” point in the target set. Namely, we prove the following result about biasing the expected value of Boolean functions defined over product spaces using polynomial time biasing algorithms. (See Theorem 3.1.)
Theorem 1.3 (Informal: biasing functions on product distributions).
Let be any product distribution of dimension and let be any Boolean function with expected value . Then, there is a polynomial-time tampering adversary who only tampers with of the blocks of a sample and increases the average of over the tampered distribution to .
Once we prove Theorem 1.3, we can also use it directly to obtain evasion attacks that find adversarial examples, so long as the test instances are drawn from a product distribution and that the distances over the instances are measured by Hamming distance. Indeed, using concentration results (or their stronger forms of isoperimetric inequalities) was the key method used in previous works of [GMF18, FFF18, DMM18, SHS18, MDM18a] to show the existence of adversarial examples. Thus, our Theorem 3.1 is a natural tool to be used in this context as well, as it simply shows that similar (yet not exactly equal) bounds to those proved by the concentration of measure can be achieved algorithmically using polynomial time adversaries.
Ideas behind the proof of Theorem 1.3.
The proof of our Theorem 3.1 is inspired by the recent work of Kalai et al. [KKR18] in the context of attacks against coin tossing protocols. Indeed, [KKR18] proved that in any coin tossing protocol in which parties send a single message each, there is always an adversary who can corrupt up to of the players adaptively and almost fix the output, making progress towards resolving a conjecture by [BOL89].
At first sight, it might seem that we might be able to directly use the result of [KKR18] for our purposes of proving Theorem 1.3 as they design adversaries who tamper with blocks of an incoming input and change the average of a function over them. However, there are two major obstacles against such approach. (1) The attack of [KKR18] is exponential time (as it is recursively defined over the full tree of the values of the input random process), and (2) their attack can not always increase the probability of a function defined over the input, and it can only guarantee that either we will increase this average or decrease it. In fact (2) is necessary for the result of [KKR18] as in their model the adversary should pick the tampered blocks before seeing their contents, and that there are simple functions for which we cannot choose the direction of the bias. Both of these restrictions are acceptable in the context of [KKR18], but not for our setting: here we want to increase the average of as it represents the “error” in the learning process, and we want polynomial time algorithms.
Interestingly, the work of [KKR18] also presented an alternative simpler proof for a previously known result of Lichtenstein et al. [LLS89] in the context of adaptive corruption in coin tossing attacks. In that special case, the messages sent by parties were only consisted of single bits. In the simpler bitwise setting, it is indeed possible to achieve biasing attacks that always increase the average of the output function bit. Thus, there is hope that such attacks could be adapted to our setting, and this is exactly what we do.
Namely, we use ideas from the bitwise attack of [KKR18] as follows.
We show that this attack can be approximate in polynomial time. (The blockwise attack of [KKR18] seems inherently exponential time).
Here, we describe our new attack in a simplified ideal setting in which we ignore computational efficiency. We will then compare it with the attack of [KKR18]. The attack has the form that can be adapted to computationally efficient setting by approximating the partial averages needed for the attack. See Constructions 4.3 and 4.14 for the formal description of the attack in computational settings.
Construction 1.4 (Informal: inefficient biasing attack over product distributions).
Let be a product distribution. Our (ideal model) tampering attacker is parameterized by . Given a sequence of blocks , tampers with them by reading them one by one (starting from ) and decides about the tampered values inductively as follows. Suppose are the finalized values for the first blocks (after tampering decisions).
Tampering case 1. If there is some value for that by picking it, the average of goes up by at least for the fixed prefix and for a random continuation of the rest of the blocks, then pick as the tampered value for the block.
Tampering case 2. Otherwise, if the actual (untampered) content of the block, namely , decreases the average of (under a random continuation of the remaining blocks) for the fixed prefix , then ignore the original block , and pick some tampered value that at least does not decrease the average. (Such always exists by an elementary averaging argument.)
Not tampering. If none of the above cases happen, output the original sample .
We prove that by picking parameter , we prove that the attack achieves the desired properties of Theorem 3.1: the number of tampered blocks is , while the bias of under attack reaches .
The bit-wise attack of [KKR18] can be seen as variant of the attack above in which the adversary (also) has access to an oracle that returns the partial averages for random continuation. Namely, in their attack tampering cases 1 and 2 are combined into one: if the next bit can increase (or equivalently decrease) the partial averages of the current prefix by , then the adversary chooses to corrupt that bit (even without seeing its actual content). They crucial difference between the bitwise setting of [KKR18] and our blockwise attack of Theorem 1.3 is in tampering case 2: here we do look at the untampered value of the block, and doing so is necessary for getting an attack that biases the final bit towards .
Extension to general product distributions and for coin-tossing protocols.
Our proof of Theorem 1.3, and its formalized version Theorem 3.1, with almost no changes extend to any joint distributions like under a proper definition of online tampering in which the next “untampered” block is sampled conditioned on the previously tampered blocks chosen by the adversary. This shows that in any round coin tossing protocol in which each of the parties sends exactly one message, there is a polynomial-time strong adaptive adversary who corrupts of the parties and biases the output to be with probability. A strong adaptive adversary, introduced by Goldwasser et al. [GKP15], allows the adversary to see the messages before they are delivered and then corrupt a party (and change their message) based on their initial messages that were about to be sent. Our result improves a previous result of [GKP15] that was proved for one-round protocols using exponential time attackers. Our attack extends to arbitrary (up to) round protocols and is also polynomial time. Our results are incomparable to those of [KKR18]; while they also corrupt up to of the messages, attackers do not see the messages of the parties before corrupting them, but our attackers inherently rely on this information. On the other hand, their bias is either towards or toward (for the blockwise setting) while our attacks can choose the direction of the biasing.
We use calligraphic letters (e.g., ) for sets. By we denote sampling from the probability distribution . For a randomized algorithm , by we denote the randomized execution of on input outputting . By we denote that the random variables and have the same distributions. Unless stated otherwise, by using a bar over a variable , we emphasize that it is a vector. By we refer to a joint distribution over vectors with components. For a joint distribution , we use to denote the joint distribution of the first variables . Also, for a vector we use to denote the prefix . For a distribution , by we denote the conditional distribution . By we denote the support set of . By we denote an algorithm with oracle access to a sampler for distribution that upon every query returns a fresh sample from . By we refer to the product distribution in which and are sampled independently. By we denote the -fold product returning iid samples. Multiple instances of a random variable in the same statement (e.g., refer to the same sample. By PPT we denote “probabilistic polynomial time”.
2.1 Basic Definitions for General Tampering Attacks
Our tampering adversaries follow a close model to that of -budget adversaries defined in [MDM18b]. Such adversaries, given a sequence of blocks, select at most fraction of the locations in the sequence and change their value. The -budget model of [MDM18b] works in an online setting in which, the adversary should decide for the th block, only knowing the first blocks. In this work, we define both online and offline attacks that work in a closely related budget model in which we only bound the expected number of tampered blocks. We find this notion more natural for the robustness of learners.
Definition 2.1 (Online and offline tampering).
We define the following two tampering attack models.
Online attacks. Let be an arbitrary product distribution.
2We call a (potentially randomized and computationally unbounded) algorithm an online tampering algorithm for , if given any and any , it holds that
Namely, outputs (a candidate block) in the support set of .
Offline attacks. For an arbitrary joint distribution (that might or might not be a product distribution), we call a (potentially randomized and possibly computationally unbounded) algorithm an offline tampering algorithm for , if given any , it holds that
Namely, given any , always outputs a vector in .
Efficiency of attacks. If is a joint distribution coming from a family of distributions (perhaps based on the index ), we call an online or offline tampering algorithm efficient, if its running time is where is the total bit length of any .
Notation for tampered distributions. For any joint distribution , any , and for any tampering algorithm , by we refer to the distribution obtained by running over , and by we refer to the final distribution by also sampling at random. More formally,
For an offline tampering algorithm , the distribution is sampled by simply running on the whole and obtaining the output .
For an online tampering algorithm and input sampled from a product distribution , we obtain the output inductively: for , sample .
Average budget of tampering attacks. Suppose is a metric defined over . We say an online or offline tampering algorithm has average budget (at most) , if
If no metric is specified,we use Hamming distance over vectors of dimension .
2.2 Basic Definitions for Classification Problems
A classification problem is specified by the following components. The set is the set of possible instances, is the set of possible labels, is a distribution over , is a class of concept functions where is always a mapping from to . We did not state the loss function explicitly, as we work with classification problems. For , the risk or error of a hypothesis is equal to . We are usually interested in learning problems with a specific metric defined over for the purpose of defining risk and robustness under instance perturbations controlled by metric . Then, we simply write to include .
Definition 2.2 (Confidence, chosen-instance error, and their adversarial variants).
Let be a learning algorithm for a classification problem , be the sample complexity of , and be any concept. We define the (adversarial) confidence function and chosen-instance error as follows.
Confidence function. For any error function , the adversarial confidence in the presence of a adversary is defined as
By we denote the confidence without any attack; namely, for the trivial (identity function) adversary that does not change the training data.
Chosen-instance error. For a fixed test instance , the chosen-instance error (over instance ) in presence of a poisoning adversary is defined as
By we denote the chosen-instance error (over ) without any attacks; namely, for the trivial (identity function) adversary .
3 Polynomial-time Attacks from Computational Concentration of Products
In this section, we will first formally state our main technical tool, Theorem 3.1, that underlies our polynomial-time evasion and poisoning attacks. Namely, we will prove that product distributions are “computationally concentrated” under the Hamming distance, in the sense that any subset with constant probability, is “computationally close” to most of the points in the probability space. We will then use this tool to obtain our attacks against learners. We will finally prove our main technical tool.
Theorem 3.1 (Computational concentration of product distributions).
Let be any product distribution and be any Boolean function over , and let be the expected value of . Then, for any where , there is an online tampering algorithm generating the tampering distribution with the following properties.
Achieved bias. .
Efficiency. Having oracle access to and a sampler for , runs in time where is the maximum bit length of any for any .
Average budget. uses average budget .
In the rest of this section, we will use Theorem 3.1 to prove limitations of robust learning in the presence of polynomial-time poisoning and evasion attackers. We will prove Theorem 3.1 in the next section.
Range of initial and target error covered by Theorem 3.1.
For any Theorem 3.1 uses an average budget of only . If we start from larger initial error that is still bounded by , the average budget given by the attacker of Theorem 3.1 will still be , which is nontrivial as it is still sublinear in the dimension. However, if we start from , we the attacker of Theorem 3.1 stops to give a nontrivial bound, as the required linear budget is enough for getting any target error trivially. In contrast, the information theoretic attacks of [MDM18a] can handle much smaller initial error all the way to subexponentially small . Finding the maximum range of for which computationally bounded attackers can increase the error to remains as an intriguing open question after our work.
3.1 Polynomial-time Evasion Attacks
The following definition formalizes the notion of robustness against computationally bounded poisoning adversaries. Our definition is based on those of [MM17, MDM18b] for the online case and [MDM18a] for the offline case.
Definition 3.2 (Computational evasion robustness).
Let be a classification problem. Suppose the components of are indexed by , and let for functions and that for simplicity we denote by and . We say that the -to- evasion robustness of is at most , if there is a (computationally unbounded) tampering oracle algorithm for distribution such that for all with error region , we have
Having oracle access to and a sampler for , the oracle adversary reaches adversarial risk to at least . Namely, .
The average budget of the adversary (with oracle access to and a sampler for ) is at most for samples and with respect to metric .
The -to- computational evasion robustness of is at most , if the same statement holds for an efficient (i.e., PPT) oracle algorithm .
Evasion robustness of problems vs. evasion robustness of learners.
Computational evasion robustness as defined in Definition 3.2 directly deals with learning problems regardless of what learning algorithm is used for them. The reason for such a choice is that in this work, we prove negative results demonstrating the limitations of computational robustness. Therefore, limiting the robustness of a learning problems regardless of their learner is a stronger result. In particular, any negative result (i.e., showing attackers with small tampering budget) about -to- (computational) robustness of a learning problem , immediately implies that any learning algorithm for that produces hypothesis with risk can always be attacked (efficiently) to reach adversarial risk .
Now we state and prove our main theorem about evasion attacks. Note that the proof of this theorem is identical to the reduction shown in [MDM18a]. The difference is that, instead of using original concentration inequalities, we use our new results about computational concentration of product measures under hamming distance and obtain attacks that work in polynomial time.
Theorem 3.3 (Limits on computational evasion robustness).
Let be a classification problem in which the instances’ distribution is a product distribution of dimension and is the Hamming distance over vectors of dimension . Let be functions of . Then, the -to- computational evasion robustness of is at most
In particular, if and , then is sublinear in , and if and , then .
We first define a Boolean function as follows:
It is clear that . Therefore, by using Theorem 3.1, we know there is an tampering Algorithm that runs in time , and increase the average of to while using average budget at most . Note that needs oracle access to which is computable by oracle access to and . ∎
3.2 Polynomial-time Poisoning Attacks
Definition 3.4 (Computational poisoning robustness).
Let be a classification problem with a learner of sample complexity . Let be functions of .
Computational confidence robustness. For , we say that the -to- -confidence robustness of the learner is at most , if there is a (computationally unbounded) tampering algorithm such that for all for which , the following two conditions hold.
The average budget of (who has oracle access to and a sampler for ) tampering with the distribution is at most .
The adversarial confidence for is at most when attacked by the oracle adversary .
The -to- computational -confidence robustness of the learner is at most , if the same statement holds for an efficient (i.e., PPT) oracle algorithm .
Computational chosen-instance robustness. For instance , we say that the -to- chosen-instance robustness of the learner for is at most , if there is a (computationally unbounded) tampering oracle algorithm (that could depend on ) such that for all for which , the following two conditions hold.
Adversary increases the chosen-instance error to .
The -to- computational chosen-instance robustness of the learner for instance is at most , if the same thing holds for an efficient (i.e., PPT) oracle algorithm .
Now we state and prove our main theorem about poisoning attacks. Again, the proof of this theorem is identical to the reduction from shown in [MDM18a]. The difference is that here we use our new results about computational concentration of product measures under hamming distance and get attacks that work in polynomial time. Another difference is that our attacks here are online due the online nature of our martingale attacks on product measures.
Theorem 3.5 (Limits on computational poisoning robustness).
Let be a classification problem with a deterministic polynomial-time learner . Let be functions of , where is the sample complexity of .
Confidence robustness. Let be the risk threshold defining the confidence function. Then, the -to- computational -confidence robustness of the learner is at most
Chosen-instance robustness. For any instance , the -to- computational chosen-instance robustness of the learner for is at most
In particular, in both cases above if and , then is sublinear in , and if and , then .
Moreover, the polynomial time attacker bounding the computational poisoning robustness in both cases above has the following features: (1) is online, and (2) is plausible; namely, it never uses any wrong labels in its poisoned training data.
We first prove the case of chosen-instance robustness. We define a Boolean function as follows:
It is clear that . Therefore, by using Theorem 3.1, we know there is a PPT tampering Algorithm that runs in time , and increase the average of to while using average budget at most . Note that needs oracle access to which is computable by oracle access to the learning algorithm and concept . Now we prove the case of confidence robustness. Again we define a Boolean function as follows:
We have . Therefore, by using Theorem 3.1, we know there is a PPT tampering Algorithm that runs in time , and increase the average of to while using average budget at most . Note that needs oracle access to , which requires the adversary to know the exact error of a hypothesis. Computing the exact error is not possible in polynomial time but using an emprical estimator, the adversary can find an approximation of the error which is sufficient for the attack (See Corollary 3 of [MDM18b]). ∎
4 Products are Computationally Concentrated under Hamming Distance
In this section, we formally prove Theorem 3.1. For simplicity of presentation, we will prove the following theorem for product distributions over the same , but the same proof directly holds for more general case of .
We will first present an attack in an idealized model in which the adversary has access to some promised oracles that approximate certain properties of the function in a carefully defined way. In this first step, we indeed show that our attack (and its proof) are robust to such approximations. We then show that these promised oracles can be obtained with high probability, and by doing so we obtain the final polynomial time biasing attack proving the concentration of product distributions under Hamming distance.
4.1 Biasing Attack Using Promised Approximate Oracles
We first state a usefull lemma that is similar to Azuma inequality but works with approximate martingales.
Lemma 4.1 (Azuma’s inequality for approximate conditions).
Let be a sequence of jointly distributed random variables such that for all , and for all , we have Then, we have
If we let , Lemma 4.1 becomes the standard version of Azuma inequality. Here we sketch why Lemma 4.1 can also be reduced to the case that (i.e., Azuma inequality). We build a sequence from as follows: Sample , if , output . Otherwise output 0. We clearly have and . Now we can use Lemma 4.1 for the basic case of for the sequence and use it to get a looser bound for seuence , using the fact that happens with probability at most . ∎
Now we define some oracle functions that our tampering attack is based on.
Definition 4.2 (Notation for oracles).
Suppose is defined over a product distribution of dimension . Then, given a specific parameter we define the following promise oracles for any and any . Namely, our promise oracles could be one out of any oracles that satisfy the following guarantees.
Oracle returns the average gain conditioned on the given prefix:
Oracle returns the gain on the average in the last block and is defined as
Oracle approximates the gain of average in the last block,
Oracle returns the approximate maximum gain with two promised properties:
Oracle returns a sample producing the approximate maximum gain . Namely,
Following is the construction of our tampering attack based on the oracles defined above.
Construction 4.3 (Attack using promised approximate oracles).
For a product distribution and , our (online) efficient tampering attacker is parameterized by , but for simplicity it will be denoted as . The parameter determines the approximation promised by the oracles used by . Given as input, will output some . Let as defined in Definition 4.2. chooses its output as follows.
Tampering. If or if , then output .
Not tampering. Otherwise, output the original sample .
Before proving the bounds, we first define some events based on the conditions/cases that happen during the tampering attack of Construction 4.3.
We define the following three Boolean functions over based on the actions taken by the tampering algorithm of Construction 4.3. Namely, for any , we define
Thus, if or happens, it means that the adversary has chosen to tamper with block , and if happens it means that the adversary has not chosen to tamper with block . Also, since the above functions are Boolean, we might treat them as events as well. Moreover, for convenience we define the set .
The following Claim bounds the average of the function when the attack of Construction 4.3 is performed on the distribution.
If is the tampering distribution of the efficient attacker of Construction 4.3, then it holds that
Define a function as follows,
Now consider a sequence of random variables sampled as follows. We first sample , then , and then for . Now, for any , we claim that . The reason is as follows.