Adversarially Robust Learning Could Leverage Computational Hardness
Abstract
Over recent years, devising classification algorithms that are robust to adversarial perturbations has emerged as a challenging problem. In particular, deep neural nets (DNNs) seem to be susceptible to small imperceptible changes over test instances. However, the line of work in provable robustness, so far, has been focused on information theoretic robustness, ruling out even the existence of any adversarial examples. In this work, we study whether there is a hope to benefit from algorithmic nature of an attacker that searches for adversarial examples, and ask whether there is any learning task for which it is possible to design classifiers that are only robust against polynomialtime adversaries. Indeed, numerous cryptographic tasks (e.g. encryption of long messages) can only be secure against computationally bounded adversaries, and are indeed impossible for computationally unbounded attackers. Thus, it is natural to ask if the same strategy could help robust learning.
We show that computational limitation of attackers can indeed be useful in robust learning by demonstrating the possibility of a classifier for some learning task for which computational and information theoretic adversaries of bounded perturbations have very different power. Namely, while computationally unbounded adversaries can attack successfully and find adversarial examples with small perturbation, polynomial time adversaries are unable to do so unless they can break standard cryptographic hardness assumptions. Our results, therefore, indicate that perhaps a similar approach to cryptography (relying on computational hardness) holds promise for achieving computationally robust machine learning. On the reverse directions, we also show that the existence of such learning task in which computational robustness beats information theoretic robustness requires computational hardness by implying (averagecase) hardness of .
Adversarially Robust Learning Could Leverage Computational Hardness
1
1 Introduction
Designing classifiers that are robust to small perturbations to test instances has emerged as a challenging task in machine learning. The goal of robust learning is to design classifiers that still correctly predicts the true label even if the input is perturbed minimally to a “close” instance . In fact, it was shown (Szegedy et al., 2014; Biggio et al., 2013; Goodfellow et al., 2015) that many learning algorithms, and in particular DNNs, are highly vulnerable to such small perturbations and thus adversarial examples can be successfully found. Since then, the machine learning community has been actively engaged to address this problem with many new defenses (Papernot et al., 2016; Madry et al., 2018; Biggio and Roli, 2018) and novel and powerful attacks (Carlini and Wagner, 2017; Athalye et al., 2018).
Do adversarial examples always exist? This state of affairs suggest that perhaps the existence of adversarial example is due to fundamental reasons that might be inevitable. A sequence of work (Gilmer et al., 2018; Fawzi et al., 2018; Diochnos et al., 2018; Mahloujifar et al., 2019; Shafahi et al., 2018; Dohmatob, 2018) show that for natural theoretical distributions (e.g., isotropic Gaussian of dimension ) and natural metrics over them (e.g., or ), adversarial examples are inevitable. Namely, the concentration of measure phenomenon (Ledoux, 2001; Milman and Schechtman, 1986) in such metric probability spaces imply that small perturbations are enough to map almost all the instances into a close that is misclassified. This line of work, however, does not yet say anything about “natural” distributions of interest such as images or voice, as the precise nature of such distributions are yet to be understood.
Can lessons from cryptography help? Given the pessimistic state of affairs, researchers have asked if we could use lessons from cryptography to make progress on this problem (Madry, 2018; Goldwasser, 2018; Mahloujifar and Mahmoody, 2018). Indeed, numerous cryptographic tasks (e.g. encryption of long messages) can only be realized against attackers that are computationally bounded. In particular, we know that all encryption methods that use a short key to encrypt much longer messages are insecure against computationally unbounded adversaries. However, when restricted to computationally bounded adversaries this task becomes feasible and suffices for numerous settings. This insight has been extremely influential in cryptography. Nonetheless, despite attempts to build on this insight in the learning setting, we have virtually no evidence on whether this approach is promising. Thus, in this work we study the following question:
Could we hope to leverage computational hardness for the benefit of adversarially robust learning by rendering successful attacks computationally infeasible?
Taking a step in realizing this vision, we provide formal definitions for computational variants of robust learning. Following the cryptographic literature, we provide a game based definition of computationally robust learning. Very roughly, a gamebased definition consists of two entities: a challenger and an attacker, that interact with each other. In our case, as the first step the challenger generates independent samples from the distribution at hand, use those samples to train a learning algorithm, and obtain a hypothesis . Additionally, the challenger samples a fresh challenge sample from the underlying distribution. Next, the challenger provides the attacker with oracle access to and . At the end of this game, the attacker outputs a value to the challenger. The attacker declares this execution as a “win” if is obtained as a small perturbation of and leads to a misclassification. We say that the learning is computationally robust as long as no attacker from a class of adversaries can “win” the above game with a probability much better than some base value. (See Definition 2.) This definition is very general and it implies various notions of security by restricting to various classes of attackers. While we focus on polynomially bounded attackers in this paper, we remark that one may also naturally consider other natural classes of attackers based on the setting of interest (e.g. an attacker that can only modify certain part of the image).
What if adversarial examples are actually easy to find? Mahloujifar and Mahmoody (2019) studied this question, and showed that as long as the input instances come from a product distribution, and if the distances are measured in Hamming distance, adversarial examples with sublinear perturbations can be found in polynomial time. This result, however, did not say anything about other distributions or metrics such as . Thus, it was left open whether computational hardness could be leveraged in any learning problem to guarantee its robustness.
1.1 Our Results
From computational hardness to computational robustness. In this work, we show that computational hardness can indeed be leveraged to help robustness. In particular, we present a learning problem that has a classifier that is only computationally robust. In fact, let be any learning problem that has a classifier with “small” risk , but that adversarial examples exist for classifier with higher probability under the norm (e.g., could be any of the wellstudied problems in the literature with a vulnerable classifier under norm ). Then, we show that there is a “related” problem and a related classifier that has computational risk (i.e., risk in the presence of computationally bounded tampering adversaries) at most , but the risk of will go up all the way to if the tampering attackers are allowed to be computationally unbounded. Namely, computationally bounded adversaries have a much smaller chance of finding adversarial examples of small perturbations for than computationally unbounded attackers do. (See Theorem 3.1.)
The computational robustness of the above construction relies on allowing the hypothesis to sometimes “detect” tampering and output a special symbol . The goal of the attacker is to make the hypothesis output a wrong label and not get detected. Therefore, we have proved, along the way, that allowing tamper detection can also be useful for robustness. Allowing tamper detection, however, is not always an option. For example a realtime decision making classifier (e.g., classifying a traffic sign) that has to output a label, even if it detects that something might be suspicious about the input image. We prove that even in this case, there is a learning problem with binary labels and a classifier for such that computational risk of is almost zero, while its information theoretic risk is , which makes classifiers’ decisions under attack meaningless. (See Theorem 3.2).
Extension: existence of learning problems that are computationally robust. Our result above applies to certain classifiers that “separate” the power of computationally bounded vs. that of computationally unbounded attackers. Doing so, however, does not rule out the possibility of finding information theoretically robust classifiers for the same problem. So, a natural question is: can we extend our result to show the existence of learning tasks for which any classifier is vulnerable to unbounded attackers, while computationally robust classifiers for that task exist? At first, it might look like an impossible task, in “natural” settings, in which the ground truth function itself is robust under the allowed amount of perturbations. (For example, in case of image classification, Human is the robust ground truth). Therefore, we cannot simply extend our result in this setting to rule out the existence of robust classifiers, since they might simply exist (unless one puts a limits on the complexity of the learned model, to exclude the ground truth function as a possible hypothesis).
However, one can still formulate the question above in a meaningful way as follows: Can we have a learning task for which any polynomial time learning algorithm (with polynomial sample complexity) is forced to produce (with high probability) hypotheses with low robustness against unbounded attacks? Indeed, in this work we also answer this question affirmatively, as a corollary to our main result, by also relying on recent results proved in recent exciting works of (Bubeck et al., 2018c, a; Degwekar and Vaikuntanathan, 2019).
In summary, our work provides credence that perhaps restricting attacks to computationally bounded adversaries holds promise for achieving computationally robust machine learning that relies on computational hardness assumptions as is currently done in cryptography.
From computational robustness back to computational hardness. Our first result shows that computational hardness can be leveraged in some cases to obtain nontrivial computational robustness that beats information theoretic robustness. But how about the reverse direction; are computational hardness assumptions necessary for this goal? We also prove such reverse direction and show that nontrivial computational robustness implies computationally hard problems in . In particular, we show that a nonnegligible gap between the success probability of computationally bounded vs. that of unbounded adversaries in attacking the robustness of classifiers implies strong averagecase hard distributions for class . Namely, we prove that if the distribution of the instances in learning task is efficiently samplable, and if a classifier for this problem has computational robustness , information theoretic robustness , and , then one can efficiently sample from a distribution that generates Boolean formulas that are satisfiable with overwhelming probability, yet no efficient algorithm can find the satisfying assignments of with a nonnegligible probability. (See Theorem 4 for the formal statement.)
What world do we live in? As explained above, our main question is whether adversarial examples could be prevented by relying on computational limitations of the adversary. In fact, even if adversarial examples exist for a classifier, we might be living in either of two completely different worlds. One is a world in which computationally unbounded adversaries can find adversarial examples (almost) whenever they exist and they would be as powerful as informationtheoretic adversaries. Another world is one in which machine learning could leverage computational hardness. Our work suggests that computational hardness can potentially help robustness for certain learning problems; thus, we are living in the better world. Whether or not we can achieve computational robustness for practical problems (such as image classification) that beats their informationtheoretic robustness remains an intriguing open question. A related line of work (Bubeck et al., 2018c, a; Degwekar and Vaikuntanathan, 2019) studied other “worlds” that we might be living in, and studied whether adversarial examples are due to the computational hardness of learning robust classifiers. They designed learning problems demonstrating that in some worlds, robust classifiers might exist, while they are hard to be obtained efficiently. We note however, that the goal of those works and our work are quite different. They deal with how computational constraints might be an issue and prevent the learner from reaching its goal, while our focus is on how such constraints on adversaries can help us achieve robustness guarantees that are not achievable information theoretically.
What does our result say about robustifying other natural learning tasks? Our results only show the existence of a learning task for which computational robustness is very meaningful. So, one might argue that this is an ad hoc phenomenon that might not have an impact on other practical problems (such as image classification). However, we emphasize that prior to our work, there was no provable evidence that computational hardness can play any positive role in robust learning. Indeed, our results also shed light on how computational robustness can potentially be applied to other, perhaps more natural learning tasks. The reason is that the space of all possible ways to tamper a high dimensional vector is exponentially large. Lessons from cryptography, and the construction of our learning task proving our main result, suggest that, in such cases, there is potentially a huge gap between the power of computationally bounded vs. unbounded search algorithms. On the other hand, there are methods proposed by researchers that seem to resist attacks that try to find adversarial examples (Madry et al., 2018), while the certified robustness literature is all focused on modeling the adversary as a computationally unbounded entity who can find adversarial examples within a certain distance, so long as they exist (Raghunathan et al., 2018; Wong and Kolter, 2018; Sinha et al., 2018; Wong et al., 2018). Our result shows that, perhaps we shall start to consider computational variants of certification methods that focus on computationally bounded adversaries, as by doing so we might be able to prove better robustness bounds for methods that are designed already.
Other related work. In another line of work (Raghunathan et al., 2018; Wong and Kolter, 2018; Sinha et al., 2018; Wong et al., 2018) the notion of certifiable robustness was developed to prove robustness for individual test instances. More formally, they aim at providing robustness certificates with bounds along with a decision made on a test instance , with the guarantee that any at distance at most from is correctly classified. However, these guarantees, so far, are not strong enough to rule out attacks completely, as larger magnitudes of perturbation (than the levels certified) still can fool the classifiers while the instances look the same to the human.
Techniques
We prove our main result about the possibility of computationally robust classifiers (Theorem 3.1) by “wrapping” an arbitrary learning problem with a vulnerable classifier by adding computational certification based on cryptographic digital signatures to test instances. A digital signature scheme (see Definition A) operates based on two generated keys , where is private and is used for signing messages, and is public and is used for verifying signatures. Such schemes come with the guarantee that a computationally bounded adversary with the knowledge of cannot sign new messages on its own, even if it is given signatures on some previous messages. Digital signature schemes can be constructed based on the assumption that oneway functions exist.
Initial Attempt. Suppose is the distribution over of a learning problem with input space and label space . Suppose had a hypothesis that can predict correct labels reasonably well, . Suppose, at the same time, that a (perhaps computationally unbounded) adversary can perturb test instances like into a close adversarial example that is now likely to be misclassified by ,
Now we describe a related problem , its distribution of examples , and a classifier for . To sample an example from , we first sample and then modify to by attaching a short signature to . The label of remains the same as that of . Note that will be kept secret to the sampling algorithm of . The new classifier will rely on the public parameter that is available to it. Given an input , first checks its integrity by verifying that the given signature is valid for . If the signature verification does not pass, rejects the input as adversarial without outputting a label, but if this test passes, outputs .
To successfully find an adversarial example for through a small perturbation of sampled as , an adversary can pursue either of the following strategies. (I) One strategy is that tries to find a new signature for the same , which will constitute as a sufficiently small perturbation as the signature is short. Doing so, however, is not considered a successful attack, as the label of remains the same as that of the true label of the untampered point . (II) Another strategy is to perturb the part of into a close instance and then trying to find a correct signature for it, and outputting . Doing so would be a successful attack, because the signature is short, and thus would indeed be a close instance to . However, doing this is computationally infeasible, due to the very security definition of the signature scheme. Note that is a forgery for the signature scheme, which a computationally bounded adversary cannot construct because of the security of the underlying signature scheme. This means that the computational risk of would remain at most .
We now observe that information theoretic (i.e., computationally unbounded) attackers can succeed in finding adversarial examples for with probability at least . In particular, such attacks can first find an adversarial example for (which is possible with probability over the sampled ), construct a signature for , and then output . Recall that an unbounded adversary can construct a signature for using exhaustive search.
Actual construction. One main issue with the above construction is that it needs to make publicly available, as a public parameter to the hypothesis (after it is sampled as part of the description of the distribution ). Note that it is computationally hard to construct the hypothesis described above without knowing . The problem with revealing to the learner is that the distribution of examples should come with some extra information other than samples. However, in the classical definition of a learning problem, the learner only has access to samples from the distribution. In fact, if we were allowed to pass some extra information to the learner, we could pass the description of a robust classifier (e.g. the ground truth) and the learning task becomes trivial. The other issue is that the distribution is not publicly samplable in polynomial time, because to get a sample from one needs to use the signing key , but that key is kept secret. We resolve these two issues with two more ideas. The first idea is that, instead of generating one pair of keys for and keeping secret, we can generate a fresh pair of keys every time that we sample and attach also to the actual instance . The modified hypothesis also uses this key and verifies using . This way, the distribution is publicly samplable, and moreover, there is no need for making available as a public parameter. However, this change of the distribution introduces a new possible way to attack the scheme and to find adversarial examples. In particular, now the adversary can try to perturb into a close string for which it knows a corresponding signing key , and then use to sign an adversarial example for and output . However, to make this attack impossible for the attacker under small perturbations of instances, we use error correction codes and employ an encoding of the verification key (instead of ) that needs too much change before one can fool a decoder to decode to any other . But as long as the adversary cannot change , the adversary cannot attack the robustness computationally. (See Construction 3.1.)
To analyze the construction above (see Theorem 3.1), we note that the computationally bounded adversary would need to change number of bits in to get where . This is because because the encoded would need number of perturbations to change the encoded , and if remains the same it is hard computationally to find a valid signature. On the other hand, a computationally unbounded adversary can focus on perturbing into and then forge a short signatures for it, which could be as small as perturbations.
Extension to problems, rather than specific classifiers for them. Note that the construction above could be wrapped around any learning problem. In particular, we can pick an original problem that is not (information theoretically) robustly learnable in polynomial time. These problems, which we call them robusthard are studied recently in (Bubeck et al., 2018c) and (Degwekar and Vaikuntanathan, 2019) where they construct such robusthard problems to show the effect of computational limitation in robust learning (See Definition 2 and 2). Here, using their construction as the original learning problem, and wrapping it with our construction, we can strengthen our result and construct a learning problem that is not robustly learnable by any polynomial time learning algorithm, yet it has a classifier that is computationally robust. See Corollary 3.1 for more details.
Computational robustness without tamper detection. The computational robustness of the constructed classifier relies on sometimes detecting tampering attacks and not outputting a label. We give an alternative construction for a setting that the classifier always has to output a label. We again use digital signatures and error correction codes as the main ingredient of our construction but in a different way. The main difference is that we have to repeat the signature multiple times to prevent the adversary from changing all of the signatures. The caveat of this construction is that it is no longer a wrapper around an arbitrary learning problem. See Construction 3.2 for more details.
2 Defining Computational Risk and Computationally Robust Learning
Notation. We use calligraphic letters (e.g., ) for sets and capital noncalligraphic letters (e.g., ) for distributions. By we denote sampling from . For a randomized algorithm , denotes the randomized execution of on input outputting . A classification problem is specified by the following components: set is the set of possible instances, is the set of possible labels, is a joint distribution over , and is the space of hypothesis. For simplicity we work with problems that have a single distribution (e.g., is the distribution of labeled images from a data set like MNIST or CIFAR10). A learner for problem is an algorithm that takes a dataset as input and outputs a hypothesis . We did not state the loss function explicitly, as we work with classification problems and use the zeroone loss by default. For a learning problem , the risk or error of a hypothesis is . We are usually interested in learning problems with a specific metric defined over for the purpose of defining adversarial perturbations of bounded magnitude controlled by . In that case, we might simply write , but is implicitly defined over . Finally, for a metric over , we let be the ball of radius centered at under the metric . By default, we work with Hamming distance , but our definitions can be adapted to any other metrics. We usually work with families of problems where determines the length of (and thus input lengths of ). We sometimes use a special notation to define that is the probability of and event over a random variable . We also might use a combination of multiple random variables, for examples denotes the same thing as . Order of sampling of and matters might depend on .
Allowing tamper detection. In this work we expand the standard notion of hypotheses and allow to output a special symbol as well (without adding to ), namely we have . This symbol can be used to denote “out of distribution” points, or any form of tampering. In natural scenarios, when is not an adversarially tampered instance. However, we allow this symbol to be output by even in noattack settings as long as its probability is small enough.
We follow the tradition of gamebased security definitions in cryptography (Naor, 2003; Shoup, 2004; Goldwasser and Kalai, 2016; Rogaway and Zhang, 2018). Games are the most common way that security is defined in cryptography. These games are defined between a challenger and an adversary . Consider the case of a signature scheme. In this case the challenger is a signature scheme and an adversary is given oracle access to the signing functionality (i.e. adversary can give a message to the oracle and obtains the corresponding signature ). Adversary wins the game if he can provide a valid signature on a message that was not queried to the oracle. The security of the signature scheme is then defined informally as follows: any probabilistic polynomial time/size adversary can win the game by probability that is bounded by a negligible function on the security parameter. We describe a security game for tampering adversaries with bounded tampering budget in , but the definition is more general and can be used for other adversary classes.
[Security game of adversarially robust learning] Let be a classification problem where the components are parameterized by . Let be a learning algorithm with sample complexity for . Consider the following game between a challenger , and an adversary with tampering budget .

samples i.i.d. examples and gets hypothesis where .

then samples a test example and sends to the adversary .

Having oracle (gates, in case of circuits) access to hypothesis and a sampler for , the adversary obtains the adversarial instance and outputs .
Winning conditions: In case , the adversary wins if ,
Why separating winning conditions for from ? One might wonder why we separate the winning condition for the two cases of and . The reason is that is supposed to capture tamper detection. So, if the adversary does not change and the hypothesis outputs , this is an error, and thus should contribute to the risk. More formally, when we evaluate risk, we have , which implicitly means that outputting contributes to the risk. However, if adversary’s perturbs to leads to , it means the adversary has not succeeded in its attack, because the tampering is detected. In fact, if we simply require the other 3 conditions to let adversary win, the notion of “adversarial risk” (see Definition 2) might be even less than the normal risk, which is counter intuitive.
Alternative definitions of winning for the adversary. The winning condition for the adversary could be defined in other ways as well. In our Definition 2, the adversary wins if and . This condition is inspired by the notion of corrupted input (Feige et al., 2015), is extended to metric spaces in (Madry et al., 2018), and is used in and many subsequent works. An alternative definition for adversary’s goal, formalized in (Diochnos et al., 2018) and used in (Gilmer et al., 2018; Diochnos et al., 2018; Bubeck et al., 2018a; Degwekar and Vaikuntanathan, 2019) requires to be different from the true label of (rather than ). This condition requires the misclassification of , and thus, would belong to the “errorregion” of . Namely, if we let be the ground truth function, the errorregion security game requires . Another stronger definition of adversarial risk is given by Suggala et al. (2018) in which the requirement condition requires both conditions: (1) the ground truth should not change , and that (2) is misclassified. For natural distributions like images or voice, where the ground truth is robust to small perturbations, all these three definitions for adversary’s winning are equivalent.
Stronger attack models. In the attack model of Definition 2, we only provided the label of to the adversary and also give her the sample oracle from . A stronger attacker can have access to the “concept” function which is sampled from the distribution of given (according to ). This concept oracle might not be efficiently computable, even in scenarios that is efficiently samplable. In fact, even if is not efficiently samplable, just having access to a large enough pool of i.i.d. sampled data from is enough to run the experiment of Definition 2. In alternative winning conditions (e.g., the errorregion definition) for Definition 2 discussed above, it makes more sense to also include the ground truth concept oracle given as oracle to the adversary, as the adversary needs to achieve . Another way to strengthen the power of adversary is to give him nonblackbox access to the components of the game (see Papernot et al. (2017)). In definition 2, by default, we model adversaries who have blackbox access to , but one can define nonblackbox (whitebox) access to each of , if they are polynomial size objects.
Diochnos et al. (2018) focused on bounded perturbation adversaries that are unbounded in their running time and formalized notions of of adversarial risk for a given hypothesis with respect to the perturbing adversaries. Using Definition 2, in Definition 2, we retrieve the notions of standard risk, adversarial risk, and its (new) computational variant.
[Adversarial risk of hypotheses and learners] Suppose is a learner for a problem . For a class of attackers we define
where the winning is in the experiment of Definition 2. When the attacker is fixed, we simply write . For a trivial attacker who outputs , it holds that . When includes attacker that are only bounded by perturbations, we use notation , and when the adversary is further restricted to all size (oracleaided) circuits, we use notation . When is a learner that outputs a fixed hypothesis , by substituting with , we obtain the following similar notions for , which will be denoted as , , , and .
[Computationally robust learners and hypotheses] Let be a family of classification parameterized by . We say that a learning algorithm is a computationally robust learner with risk at most against perturbing adversaries, if for any polynomial , there is a negligible function such that
Note that the size of circuit used by the adversary controls its computational power and that is why we are enforcing it to be a polynomial. Again, when is a learner that outputs a fixed hypothesis for each , we say that the family is a computationally robust hypothesis with risk at most against perturbing adversaries, if is so. In both cases, we might simply say that (or ) has computational risk at most .
We remark that, in the definition above, one can opt to work with concrete bounds and a version that drops the negligible probability on the right hand side of the equation and ask for the upper bound to be simply stated as . Doing so, however, is a matter of style. In this work, we opt to work with the above definition, as the negligible probability usually comes up in computational reductions, and hence it simplifies the statement of our theorems, but both forms of the definition of computational risk are equally valid.
PAC learning under computationally bounded tampering adversaries. Recently, several works studied generalization under adversarial perturbations from a theoretical perspective (Bubeck et al., 2018b; Cullina et al., 2018; Feige et al., 2018; Attias et al., 2018; Khim and Loh, 2018; Yin et al., 2018; Montasser et al., 2019; Diochnos et al., 2019), and hence they implicitly or explicitly revisited the “probably approximately corect” (PAC) learning framework of Valiant (2013) under adversarial perturbations. Here we comment that, one can derive variants of those definitions for computationally bounded attackers, by limiting their adversaries as done in our Definition 2. In particular, we call a learner an PAC learner for a problem and computationally bounded perturbing adversaries, if with probability , outputs a hypothesis that has computational risk at most .
Bellow we formally define the notion of robusthard learning problems which captures the inherent vulnerability of a learning problem to adversarial attacks due to computational limitations of the learning algorithm. This definition are implicit in works of (Degwekar and Vaikuntanathan, 2019; Bubeck et al., 2018c). In Section 3, we use these robusthard problems to construct learning problems that are inherently nonrobust in presence of computationally unbounded adversaries but have robust classifiers against computationally bounded adversaries.
[Robusthard learning problems] A learning problem is robusthard w.r.t budget if for any learning algorithm that runs in we have
Discussion on falsifiability of computational robustness. If the learner is polynomial time, and that the distribution is samplable in polynomial time (e.g., by sampling first and then using a generative model to generate for ), then the the computational robustness of learners as defined based on Definitions 2 and 2 is a “falsifiable” notion of security as defined by Naor (2003). Namely, if an adversary claims that it can break the computational robustness of the learner , it can prove so in polynomial time by participating in a challengeresponse game and winning in this game with a noticeable (nonnegligible) probability more than . This feature is due to the crucial property of the challenger in Definition 2 that is a polynomial time algorithm itself, and thus can be run efficiently. Not all security games have efficient challengers (e.g., see Pandey et al. (2008)).
3 From Computational Hardness to Computational Robustness
In this section, we will first prove our main result that shows the existence of a learning problem with classifiers that are only computationally robust. We first prove our result by starting from any hypothesis that is vulnerable to adversarial examples; e.g., this could be any of the numerous algorithms shown to be susceptible to adversarial perturbations. Our constructions use error correction codes and cryptographic signatures. For definitions of these notions refer to section A.
3.1 Computational Robustness with Tamper Detection
Our first construction uses hypothesis with tamper detection (i.e, output capability). Our construction is based on cryptographic signature schemes with short (polylogarithmic) signatures.
Construction \thetheorem (Computationally robust problems relaying on tamper detection wrappers)
Let be a learning problem and a classifier for such that .
We construct a family of learning problems (based on the fixed problem ) with a family of classifiers . In our construction we use signature scheme for which the bitlength of is and the bitlength of signature is

The space of instances for is .

The set of labels is .

The distribution is defined by the following process: first sample , , , then encode and output .

The classifier is defined as
For family of Construction 3.1, the family of classifiers is computationally robust with risk at most against adversaries with budget . (Recall that is the error rate of the error correction code.) On the other hand is not robust against information theoretic adversaries of budget , if itself is not robust to perturbations:
Theorem 3.1 means that, the computational robustness of could be as large as (by choosing a code with constant error correction rate) while its information theoretic adversarial robustness could be as small as (note that is a constant here) by choosing a signature scheme with short signatures of polylogarithmic length.
Before proving Theorem 3.1 we state the following corollary about robusthard learning problems.
{corollary}
If the underlying problem in Construction 3.1 is robusthard w.r.t sublinear budget , then for any polynomial learning algorithm for we have
On the other hand, the family of classifiers for is computationally robust with risk at most against adversaries with linear budget.
The above corollary follows from Theorem 3.1 and definition of robusthard learning problems. The significance of this corollary is that it provides an example of a learning problem that could not be learnt robustly with any polynomial time algorithm. However, the same problem has a classifier that is robust against computationally bounded adversaries. This construction uses a robusthard learning problem that is proven to exist based on cryptographic assumptions (Bubeck et al., 2018c; Degwekar and Vaikuntanathan, 2019). Now we prove Theorem 3.1.
(of Theorem 3.1) We first prove the following claim about the risk of .
Claim \thetheorem
For problem we have
The proof follows from the completeness of the signature scheme. We have,
Now we prove the computational robustness of .
Claim \thetheorem
For family , and for any polynomial there is a negligible function such that for all
Let be the family of circuits maximizing the adversarial risk for for all . We build a sequence of circuits , such that and are of size at most . just samples a random and outputs . gets and , calls to get and outputs . Note that can provide all the oracles needed to run if the sampler from , and are all computable by a circuit of polynomial size. Otherwise, we need to assume that our signature scheme is secure with respect to those oracles and the proof will follow. We have,
Note that implies that based on the error rate of the error correcting code. Also implies that is a valid signature for under verification key . Therefore, we have,
Thus, by the unforgeability of the onetime signature scheme we have
which by Claim 3.1 implies
Now we show that is not robust against computationally unbounded attacks.
Claim \thetheorem
For family and any we have
For any define where is the closes point to where and is a valid signature such that . Based on the fact that the size of signature is , we have Also, it is clear that because is a valid signature. Also, . Therefore we have
This concludes the proof of Theorem 3.1.
3.2 Computational Robustness without Tamper Detection
The following theorem shows an alternative construction that is incomparable to Construction 3.1, as it does not use any tamper detection. On the down side, the construction is not defined with respect to an arbitrary (vulnerable) classifier of a natural problem.
Construction \thetheorem (Computational robustness without tamper detection)
Let be a distribution over with a balanced “label” bit: . We construct a family of learning problems with a family of classifiers . In our construction we use a signature scheme for which the bitlength of is and the bitlength of signature is and an error correction code with code rate and error rate .

The space of instances for is .

The set of labels is .

The distribution is defined as follows: first sample , then sample and compute . Then compute . If sample a random that is not a valid signature of w.r.t . Then output . Otherwise compute and output .

The classifier is defined as
For family of Construction 3.2, the family of classifiers has risk and is computationally robust with risk at most against adversaries of budget . On the other hand is not robust against information theoretic adversaries of budget :
Note that reaching adversarial risk makes the classifier’s decisions meaningless as a random coin toss achieves this level of accuracy.
First it is clear that for problem we have . Now we prove the computational robustness of .
Claim \thetheorem
For family , and for any polynomial there is a negligible function such that for all
Similar to proof of Claim 3.1 we prove this based on the security of the signature scheme. Let be the family of circuits maximizing the adversarial risk for for all . We build a sequence of circuits and such that and are of size at most . just asks the signature for . gets and does the following: It first samples , computes encodings and and if , it samples a random then calls on input to get . Then it checks all ’s and if there is any of them that it outputs , otherwise it aborts and outputs . If it aborts and outputs . Note that can provide all the oracles needed to run if the sampler from , and are all computable by a circuit of polynomial size. Otherwise, we need to assume that our signature scheme is secure with respect to those oracles and the proof will follow. We have,
Because of the error rate of the error correcting code, implies that and . Also implies that . This is because if , the adversary has to make all the signatures invalid which is impossible with tampering budget . Therefore must be and one of the signatures in must pass the verification because the prediction of should be . Therefore we have
Thus, by the unforgeability of the onetime signature scheme we have
Now we show that is not robust against computationally unbounded attacks.
Claim \thetheorem
For family and any we have
For any define as follows: If , does nothing and outputs . If , search all possible signatures to find a signature such that . It then outputs . Based on the fact that the size of signature is , we have