Adversarial Examples Versus Cloud-based Detectors: A Black-box Empirical Study
Deep learning has been broadly leveraged by major cloud providers such as Google, AWS, Baidu, to offer various computer vision related services including image auto-classification, object identification, and illegal image detection, etc. While the recent works demonstrated that the deep learning classification models are vulnerable to adversarial examples, the more complicated cloud-based image detections models also have similar security concern but not get enough attention in literature yet. In this paper, we mainly focus on the security issues of real-world cloud-based image detectors. Specifically, (1) based on effective semantic segmentation, we propose four different attacks to generate semantics-aware adversarial examples via only interacting with black-box APIs; and (2) we make the first attempt to conduct an extensive empirical study of black-box attacks against real-world cloud-based image detectors. Through the comprehensive evaluations on five major cloud platforms on AWS, Azure, Google Cloud, Baidu Cloud, and Alibaba Cloud, we demonstrate that our Image Processing attacks have a success rate of approximately 100%, and the semantic segmentation based attacks have a success rate over 90% among different detection services, such as violence, politician, and pornography detection, etc. We also proposed several possible defense strategies for these security challenges in the real-life situation.
Taking advantage of the availability of big data and the strong learning ability of neural networks, deep learning outperforms many other traditional approaches in various computer vision tasks such as image classification, object detection, and image segmentation. Since deep learning often requires massive training data and lengthy training time, many cloud service providers (such as Google, AWS, Baidu, Alibaba, Azure) offer deep learning Applicant Program Interfaces (APIs) for their clients to accomplish computer vision tasks without the need to train models and own a big amount of data. These APIs can help the cloud service users check images for both commercial and non-commercial purposes. For example, the search engine giant Google111https://cloud.google.com and Baidu222https://ai.baidu.com allow their APIs to identify the category of pictures (e.g., dog, cat); Alibaba Cloud333https://www.alibabacloud.com and Azure444https://azure.microsoft.com provide APIs to check whether the images are illegal (e.g., pornographic, violent).
However, deep learning has been recently found extremely vulnerable to adversarial examples, which are carefully-constructed input samples that can trick the learning model into producing incorrect results. Hence, the study of adversarial examples in deep learning has drawn increasing attention among the security community . In general, in terms of applications, research of adversarial example attacks against cloud vision services can be grouped into three main categories: self-trained classifier attacks, cloud-based classifier attacks, and cloud-based detector attacks, as shown in Table I (In this paper, classifiers and detectors refer to the deep learning models in the image field). For self-trained classifiers, clients upload the training data themselves and attackers know the distribution of training data in advance [2, 3, 4]. For cloud-based classifiers, the cloud providers train the classifiers themselves (e.g., image classifiers on AWS), attackers do not have the prior knowledge and state-of-art attacks are achieved by hundreds of thousands of queries to successfully generate an adversarial example [5, 6]. An image classifier utilizes training images to understand how given input variables relate to the class. By contrast, an image detector identifies the bounding areas that are “worth labelling” in an input image, and then generates labels for each area. For cloud-based detectors, the cloud providers train the detectors themselves and integrate the detectors into their detection services, e.g., the detection services for violent and pornographic images provided by Google. Despite a “classifier” is incorporated in the last step, cloud-based detectors usually have to contain other modules, such as object detection, image segmentation, and even human judgment in some complicated situations. Attacking cloud-based image detectors is a challenging task since it is hard to bypass these complicated techniques simultaneously to launch a successful attack with limited queries.
|Vision Model Service||Attack Method|
|self-trained classifier||Substitution model, MLaaS, ZOO|
|cloud-based classifier||Boundary attack , Ensemble models |
|cloud-based detector||Work in this paper|
Unfortunately, although cloud-based detectors are playing an increasingly important role, there is very limited work exploring the possibility of adversarial example attacks against the detection of cloud vision services. A few of the most recent works that paid attention to fool image detectors focused on the standard object detection algorithm . But the attacks in  cannot be readily applied to the cloud environment. This is because (i) cloud-based detectors contain several complicated modules (object detection, image segmentation, human judgment, etc.), and (ii) the cloud-based detectors are presented as black-box to their adversaries. In , Florian et al. succeed to steal a machine learning model via public APIs by an equation-solving method. However, it is impractical to attack a commercial model with hundreds of millions of parameters by a simple equation-solving method. Furthermore,  succeeds to steal a simple model trained by themselves but it belongs to self-trained classifiers attack.
To fill this emerging gap, in this work, we take the first step to present attacks on cloud-based detectors. In order to conduct a comprehensive study, we consider the image detector services on five major cloud platforms worldwide including Baidu Cloud, Google Cloud, Alibaba Cloud, AWS and Azure. In the rest of this paper, we use “Google”, “Alibaba”, and “Baidu” for short, to present the corresponding cloud services provided by these companies.
In this study, by incorporating the image semantics segmentation, we propose
four black-box attack methods on cloud-based detectors, which do not need prior knowledge of
detectors and more importantly, can be achieved within very limited queries.
Specifically, we present the Image Processing (IP) attack, Single-Pixel (SP) attack, Subject-based Local-Search (SBLS) attack and Subject-based Boundary (SBB) attack.
Our empirical study demonstrates that the proposed attacks can successfully fool the cloud-based detectors deployed on the major cloud platforms with a remarkable bypass rate even approaching 100% as shown in Table II.
We summarize our main contributions as follows:
To the best of our knowledge, this is the first work to study the black-box attacks on cloud-based detectors without any access to the training data, model, or any other prior knowledge. Different from attacks on a classifier, we investigate the components of detectors and design four kinds of methods to fool the cloud-based detectors.
We propose four attack methods by incorporating semantic segmentation to achieve a high bypass rate with a very limited number of queries. Instead of millions of queries in previous studies, our methods find the adversarial examples using only a few thousands of queries.
We conduct extensive evaluations on the major cloud platforms worldwide. The experimental results demonstrate that all major cloud-based detectors are all bypassed successfully by one or multiple methods of our attacks. All the tests only rely on the APIs of cloud service providers. The results also verify the feasibility of our proposal.
We discuss the potential defense solutions and the security issues. By revealing these vulnerabilities, we provide a valuable reference for academia and industry for developing an effective defense against these attacks. We reported the vulnerabilities to the involved cloud platforms and received very active and positive acknowledgements from them.
Roadmap In the rest of the paper, we begin with the preliminary in Section 2, followed by the threat model and criterion in Section 3. Section 4 describes the details of our attack algorithms. Section 5 shows experimental results on the cloud-based detectors. The effects of these attacks and potential defense methods are discussed in Section 6. Section 7 summaries the related work of our paper. Finally, Section 8 concludes this paper and proposes further work.
2.1 Neural Network and Adversarial Examples
A neural network is a function that accepts and outputs , where the model is a -class classifier, is the set of real numbers, is the dimension of , and is the combination of model parameters . In this paper, the parameters of models are unknown. The output is an -dimensional vector ( is the probability of each class), where , for and . We show the architecture of a Deep Neural Networks (DNN) model in Fig. 1. The final label . Sometimes in response to a query, cloud models only return a confidence score instead of a probability distribution. Note that there is no correlation between scores in different classes. For instance, Alibaba Cloud only returns a confidence score from 0 to 100 when queried. Probability can leak more information than scores due to the strong correlation between the probabilities and the classes. Our algorithms could be adapted well in either of the cases (probabilities or scores).
Adversarial example attacks on neural networks were first proposed by Szegedy , wherein well-designed input samples called adversarial examples are constructed to fool the learning model. Specifically, adversarial examples are generated from benign samples by adding small perturbation that is imperceptible to human eyes, i.e., and , where is an adversarial example, is a perturbation, and is the adversarial label. Prior work [9, 10] has shown that adversarial examples are detrimental to many systems in the real world. For instance, under adversarial example attacks, the automatic driving system may take a stop sign as an acceleration sign  and malware can evade the detection systems . Depending on whether there is a specified target for misclassification, adversarial example attacks can be categorized into two types, i.e., targeted and untargeted attacks. Since we only intend to make the API service generate the incorrect label, in this paper, we focus on the untargeted attacks.
2.2 Cloud Vision APIs Based on Neural Network
Due to the high cost of storing massive data and intensive computational resource usage of training a neural network for computer vision, it is common in recent years for individuals and small business to use cloud platforms to train and perform deep learning tasks. The cloud service providers normally own plenty of data and computing power, and they actively provide their users with multiple APIs of pre-trained neural networks as their services. By leveraging these cloud-based classification and detection services, a small fee could allow an application to complete a relatively complicated computer vision task. We list the computer vision services provided by several major cloud service providers in the market along with the fees they charge for 1000 queries in Table III.
Both classification and detection modules are provided by these computer vision APIs. The classification module has many applications such as logo recognition, celebrity recognition, animal recognition, etc. The Detection module aims to find illegal images which violate the content security policy. A detector is more complex than a classifier, as it involves more components such as object detection and image segmentation. Warnings are generated by cloud-based detectors when the outputs of models exceed the threshold. In this paper, we select the most representative image detection topics, including Nausea, violence, politics, and pornography. All experiments are completed only using the free quota provided by these cloud service providers. This demonstrates that anyone can launch a successful attack by our models in the real world with a very low cost.
2.3 White-box and Black-box Attack
Security and privacy in deep learning models have been widely studied. Recent research has proposed several attacks on deep learning models. Based on the prior knowledge possessed by attackers, adversarial examples attacks can be classified to white-box and black-box two categories, as shown in Table IV. Architecture means the parameters of a model, training tools mean the training methods used when training the model, and Oracle means whether the model gives an output when queried with an input.
|Attack types||Architecture||Training tools||Train data||Oracle|
In this paper, we only consider the black-box attacks against deep learning models, which is even more challenging due to the limited access to the model. In fact, based on the block-box attacks, it is straightforward to design the white-box attacks.
2.4 Image Semantic Segmentation
Semantic segmentation of images includes dividing and recognizing the contents in the image automatically. Semantic segmentation processes an image at the pixel level, thus we can assign each pixel in the image to an object class. With the proposal of a full convolutional neural network , deep learning has been widely adopted in the field of semantic segmentation .
Through semantic segmentation techniques, we can focus on the key pixels of the images, where key pixels means the important pixels contributed to the classification results. If we perturb the key pixels, the attacks will become easy. Therefore, the general idea of the attack is using the semantic segmentation model to identify the pixels of images, and then perturb the pixels of a particular class. In this paper, we choose Fully Convolutional Networks (FCN) as the semantic segmentation model. Since FCN is one of the most classical models in the semantic segmentation filed and is sufficiently good for us to conduct our attacks, we decide to adopt the FCN model for the whole demonstration. Different from the classic CNN, which uses a fully connected layer in the final to obtain a fixed-length feature vector for classification, FCN can take input an image with arbitrary size and produce the corresponding sized output with efficient inference and learning. FCN uses the deconvolution layer to upsample the feature map of the last convolutional layer, restoring it to the same size of the input image. Then, pixel-by-pixel classification is performed on the upsampled feature map. For an input , is the pixel of at the location , is the class belongs to, and is the subject pixel set of the input. For instance, in Fig. 2, we take a person as the subject class. Thus, we have . Similarly, we can get the set of animal images. The details of the image semantic segmentation are shown in Table V.
In our experiment, results show that perturbation based on semantic segmentation could speed up the generation of adversarial examples in the cases of violence, politician and pornography. Note that for political images, we choose the position of the face due to the importance of face in recognition of these images. In practice, due to the lack of a particular subject class, we are unable to semantically segment the disgusting images.
3 Threat Model and Criterion
3.1 Threat Model
In this paper, we assume that the attacker is just a client and s/he can only access the cloud-based computer vision APIs as a black box. Under the black-box attack, the attacker cannot access the inner information of the model. The only data the attacker can collect is the feedback from the cloud APIs by the query. Moreover, the attacker can only access the APIs with a limited number of queries since it is inefficient and impractical to conduct a large number of queries in cloud platforms.
3.2 Criterion and Evaluation
The goal of adversarial example attacks in detection is to mislead the detector into misclassification. Nina  proposed the concept of top- misclassification, which means the network ranks the true label below at least other labels. However, the cloud-based detectors usually produce a label (e.g., violent, pornographic, etc.) after processing an image, which can be used by websites to judge the legitimacy of the image. Consequently, we choose top-1 misclassification as our criterion, which means that our attack is successful if the label with the highest probability generated by the neural networks differs from the correct label.
Evaluating the quality of adversarial images in detection is a challenge since a detector is largely different from a classifier and the quality of adversarial examples cannot be properly measured only based on the number of pixels changed. For the classifier, the objective is to perturb as few pixels as possible to generate adversarial images. For detectors, however, people can still easily recognize the politician in a political image, where many pixels have been perturbed. If the attacker performs some political activities, such as insulting slogans, on this disturbing image, the detection service may fail to block such misdeed. Instead, we consider three evaluation methods including , PSNR and SSIM. distance corresponds to the number of pixels that have been altered in an image. We assume the original input is O, the adversarial example is ADV. Then,
We also use Peak Signal to Noise Ratio (PSNR) to measure the quality of images.
where , is the mean square error. For an RGB image (),
where is a coordinate of an image for channel () at location .
To measure the image similarity, the structural similarity (SSIM) index is adopted in this paper . Therefore, distance is used to measure how many pixels have been changed, while PSNR value is used to measure image quality, and SSIM value to measure structural similarity. In the following sections, the distance is used to present the distance between the original image and the adversarial image for short.
4 Black-box Attack Algorithms
In this section, we mainly talk about the black-box attacks used in our experiments. We list five frequently-used image processing techniques, which can make image adversarial in section 4.1. In section 4.2-4.4, we analyze the flaws of previous literature on adversarial example attacks and design Single Pixel attack, Subject-based Local-Search attack, and Subject-based Boundary attack suited to the cloud platforms.
|(a) Origin||(b) Gaussian noise||(c) Grayscale||(d) Binarization||(e) =0.05||(f) =0.15|
|(g) =0.3||(h) =0.5||(i) =0.1||(j) =0.3||(k) =0.5||(l) =0.8|
4.1 Image Processing
In the following, we explore the effect of five different image processing techniques on both the classifier and the detector, including Gaussian Noise, Grayscale Image, Image Binarization, Salt-and-Pepper Noise, and Brightness Control. Prior work such as Hossein et al. only discussed Salt-and-Pepper noise on Google vision APIs. The reason we choose these five techniques is that these image processing techniques are sufficiently representative enough. As an empirical study, we hope our work could be easily extended to other image processing techniques, and provide a reference for other scholars in the community. It should be noted that all the parameters we used in the test also maintain a well image visibility. All these image processing techniques are implemented with Python libraries, such as skimage555https://scikit-image.org, OpenCV666https://opencv.org, and PIL777http://www.pythonware.com/products/pil.
4.1.1 Gaussian Noise
Gaussian noise is statistical noise with a probability density function (PDF) equal to the normal distribution as shown below.
where and represent the mean and variance, respectively. Thus
Note that, clipping ADV is necessary to maintain a reasonable RGB value, which is [0, 255]. The example is shown in Fig. 3 (b).
4.1.2 Grayscale Image
A grayscale image is one in which the value of each pixel is a single sample representing only the amount of light, i.e., it only carries intensity information. In the computer vision field, the black and white image contains only black and white pixels, while the grayscale image has many levels of color depth between black and white. RGB means three channel values of pixels, namely, red, green and blue. To obtain the ADV, an equation888https://en.wikipedia.org/wiki/Grayscale is used:
This function was also implemented in Python library (PIL). Clipping the ADV is necessary to maintain a reasonable RGB value. The example is shown in Fig. 3 (c).
4.1.3 Image Binarization
A binary image is a digital image that has only two possible values for each pixel. Typically, the two colors used for a binary image are black and white. Floyd-Steinberg dithering  is used to approximate the original image luminosity levels in the implementation of PIL. The example is shown in Fig. 3 (d).
4.1.4 Salt-and-Pepper Noise
Salt-and-Pepper noise is also known as impulse noise. This noise can result in sharp and sudden disturbances in the image signal. For a pixel of an RGB image, the noise image of Salt-and-Pepper is calculated by . where is the noise image, and is the noise density. The other pixels remain original. The examples are shown in Fig. 3 (e)-(h).
4.1.5 Brightness Control
We hypothesize that the image brightness may affect classification results. Thus we iteratively adjust the brightness of the image to observe the change of the result. A constant value is added to all pixels of the image at the same time. Then we clip it to a reasonable range, namely, [0, 255].
where is the parameter to control brightness, which is in the range of [0-1]. The examples are shown in Fig. 3 (i)-(l).
4.2 Single-Pixel Attack
The single-pixel attack proposed by Nina et al., is an attack where perturbation of a single pixel would make the classifier generate a wrong label. However, it suffers from several limitations. First, when processing high-resolution images, a single pixel is not enough to cause the misclassification. Second, Nina’s experiment was completely offline and the data were fed to the classifiers directly. Given the online classifier in the physical world, attacks are more difficult to succeed.
To address these problems, we design a new single-pixel attack by gradually increasing the number of modified pixels and integrating the idea of image semantic segmentation. In order to verify the validity of semantic segmentation, we implement the attack in three areas of the image, namely, subject region, non-subject region, and random region. The random region is chosen as a baseline to compare with the other two regions. These three regions are defined as follows:
subject region: a region composed of all pixels which belong to a subject class.
non-subject region: a region composed of all pixels which do not belong to any subject class.
random region: a region composed of all pixels chosen from the image randomly.
Other types of image segmentation are shown in Table V. In the rest of this paper, we name Single-Pixel attack as SP attack. since inspired by single-pixel perturbation.
4.3 Subject-based Local-Search Attack
Nina et al.  proposed a local greedy algorithm, where the algorithm searches the optimal perturbation of local area based on iteration. In the case of each finite perturbation on local areas, the optimal perturbation is defined as the one that has the largest influence on the model decision in each iteration. However, there are many flaws that make this attack ineffective on cloud classifiers or detectors. Firstly, they only conducted the test offline on the VGG model . Secondly, they fed the image data to classifier directly. Kurakin  pointed out that the transformations applied to images by the process of printing them may have a negative effect on adversarial examples. The RGB color model is an additive color model in which a color’s RGB value indicates its red, green, and blue intensity. For an image in the RGB mode, the pixel values are composed of the RGB values for the three channels. If we save the RGB value of an image using lossy compression, the resistance may disappear. Our experiment also shows that JPEG format hinders the adversarial examples to a certain extent, while PNG format does not. The reason is that the PNG format uses lossless compression coding. If Nina sent the perturbed RGB values to the VGG model and generated the adversarial example, it may be not adversarial to online cloud-based classifiers using JPEG format. Thirdly, Nina et al. initialized a very large perturbation (RGB values exceed 500) to detect the probability changes of the classifier, which is impractical for the cloud-based classifier. However, one can only send an image to the classifiers and the RGB values of the image are between 0 and 255. Namely, we cannot send an image with an RGB value of 500 to the cloud. Finally, if the initial disturbance region is very large, it is easy to fall into a local optimum, making it difficult to find adversarial examples. Based on the above reasons, the local greedy algorithm  does not apply to cloud-based classifiers or detectors.
In this paper, we propose Subject-based Local-Search (SBLS) Attack by incorporating semantic segmentation to speed up the attack and saving all images with PNG format to retain their original features. Considering the online models, the initial modified pixel values is 0 or 255 and it is within the range of RGB value. The main steps of our algorithm are summarized as follows.
Firstly, we obtain the subject region of the image by semantic segmentation techniques.
Secondly, 50 pixels are selected from the subject region randomly and the image is perturbed on each pixel one by one. The perturbed images are fed to the cloud-based detectors and 50 predictions are produced. The first (10 in this paper) pixels for which the probability drops the most are picked.
Thirdly, the image is perturbed on the pixels and the algorithm will record whether the prediction result changes from illegal to normal.
Finally, the perturbed image is taken as the initial image of the next round and cycle through the above steps until the label becomes normal.
The pseudocode for the algorithm is shown in Algorithm 1.
In Algorithm 1, subject(O) means getting subject region of , random(S, 50) means getting 50 pixels of randomly, perturb(O, axis, P) means perturbation of on location axis with coefficient , cloud.predict(Imagetemp) means getting the label and probability of Imagetemp from cloud APIs. Here, we assume that the cloud APIs will return both the label and probability (score). Since the initial images contain illegal contents, such as pornography, violence, politics, etc., we iterate until the labels become normal.
4.4 Subject-based Boundary Attack
Boundary Attack solely relies on the final model decision , which is also called decision-based attack. A decision-based attack starts from a large adversarial perturbation and then seeks to reduce the perturbation while staying adversarial. This method works in theory, but is inefficient. About 1.2 million predictions were used in the Boundary Attack to find an adversarial image for ResNet-50 , which is a huge overhead for the cloud service APIs. If we want to generate hundreds of adversarial images, the required time and expense will be unbearable.
To make the attack practical, we design a new Subject-based Boundary (SBB) Attack by incorporating the semantic segmentation and greedy algorithm. By semantic segmentation, the subject region is first perturbed with the average RGB value of the non-subject region, since the background color has a great influence on the recognition of a subject class based on the previous experiments. Through the greedy algorithm, the attack is able to recover as many of the perturbed pixels as possible, making the probability of correct classification as small as possible. The main steps of the algorithm are summarized as follows.
First, all pixels in the subject region are perturbed, which keep the image free of illegal content and the probability of being predicted to be illegal closing to zero.
Then, the distance between the current perturbed image and the original image is computed. A certain percentage of pixels, which are selected from the different pixels between the original image and the previous one, are recovered randomly.
Next, the recovery process is repeated to choose the best recovering, which leads to the slowest increase in probability or score. The perturbed images are still recognized as normal by the cloud APIs.
Finally, steps 2-3 are repeated to minimize the distance of the perturbed image until the image is correctly classified. If the perturbed image is recognized as illegal, the iteration will be stopped and the last perturbed image will be returned as the adversarial image.
The pseudo-code of the algorithm is shown in Algorithm 2.
In Algorithm2, getaverpixel (Non-S) means getting the average pixel value of a Non-S region, (O,ADV) means getting the norm distance between and , and recover(step, O, ADV) means recovering step pixels according to the difference of and .
5 Experimental Evaluation
5.1 Validation of Semantic Segmentation
Since cloud-based detectors are based on classifiers and attacking detectors is more difficult than classifiers, we first conduct the SP attack on classifiers to better understand the validity of semantic segmentation. The results of classifiers can help us adjust our attack algorithms. We choose the SP attack because it belongs to coarse-grained perturbation. If the SP attack works well with semantic segmentation techniques, SBLS and SBB attacks should have better performance since both of them belong to fine-grained perturbation. Several local models and a cloud-based classifier are used in experiments. The local models we are using are VGG16 , Resnet50  and InceptionV3 . In this paper, we leverage the Keras framework and pre-trained deep learning models999https://github.com/fchollet/deep-learning-models/releases to conduct experiments. These pre-trained models are trained with ImageNet datasets since ImageNet as a standard dataset is widely used in deep learning classifiers. Due to the easier usage of Baidu APIs, we choose Baidu animal classifier as an example.
We prepare the dataset by selecting 100 animal images from the ImageNet set. Because VGG16 and Resnet50 both accept input images of size , every input image is clipped to the size of , where 3 is the number of RGB channels. Only animal images are selected for simplicity. The attack strategy mainly consists of two parts: perturbation methods and perturbation regions.
Different methods of perturbation have different effects on the prediction results. Three types of perturbation are considered: =0, =255, and =2, where is a perturbation parameter. For instance, =0 or 255 means setting the RGB value of the pixel to 0 or 255, where 0 represents black and 255 represents white. =2 means multiplying the pixel value by 2 and clipping it to a reasonable range.
In order to verify the effectiveness of semantic segmentation in perturbation, three perturbation regions described in section 4.2 are chosen to test.
5.1.2 Results and Analysis
First, we carry out a precursor experiment by 1) perturbing pixels that do not belong to subject class using different perturbation methods; 2) recording the changes in prediction. The examples are shown in Fig. 4.
|(a) =0||(b) =255||(c) =2|
The overall result is shown in Table VI. It is evident from the results that perturbation of non-subject region can cause a misclassification rate of up to 0.8. VGG16 model is the least resilient against all three perturbation methods, while InceptionV3 and Online are the two most robust models. Besides, the results of =2 have the lowest successful misclassification rate among three perturbation methods since it is only a slight perturbation. From the experimental results, it can be seen that about 60% image classifications of InceptionV3 are not affected by the perturbation on the background pixels. Thus, it is necessary to perturb the subject pixels to improve the accuracy of the attack.
We conduct a number of experiments to understand the choice of perturbation parameters and regions on different models, as shown in Fig. 5. Firstly, we adjust the number of perturbed pixels and record the effects on four classifiers.
As shown in Fig. 5 (a), the number of successful attacks increases as the number of perturbed pixels increases. After reaching a large perturbation, the success rate increases very slowly. Compared to three other local classifiers, attacking online classifier (Baidu) is more difficult.
Secondly, for a single classifier, we select the perturbation regions in three different ways to observe the changes in the prediction results. We take the online classifier (Baidu) for example. Fig. 5 (b) shows that it is very sensitive to perturbation on the pixels in the subject region, resulting in a high misclassification rate. Fig. 5 (b) also demonstrates the results when we use =2 to perturb the pixels. However, surprisingly, the perturbation in random regions performs even better than subject regions. Our conjecture is that slight perturbation in the non-subject region makes the pixels after perturbation close to those in the subject region. In order to verify our conjecture, we perturb the non-subject pixels with the average RGB value of the subject region. For all pixels in the subject region, the average RGB value in each channel is computed and the non-subject region is perturbed with the derived average value. The results in Fig. 5 (c) demonstrate that perturbation with the average value performs better than that with =2. When perturbing 2000 pixels, perturbation with the average value performs best.
Conclusions. We draw the following conclusions based on the results from classifiers:
Among our evaluation, the online models of Baidu have relatively good robustness and attacking its online models is more difficult than attacking local models using the same number of perturbed pixels.
Perturbations in subject regions are more effective than other regions.
The prediction of the model is sensitive to the magnitude of the disturbance value, especially the value close to the pixels of the subject region.
Larger disturbance values are more destructive to image recognition than smaller disturbance.
5.2 Attacking Cloud-based Detectors
In this paper, we explore the security issues in cloud-based detectors, thus we conduct four kinds of attacks described in section 4 on real-world cloud platforms.
5.2.1 Datasets and Preprocessing
To test cloud-based detectors, 400 images are selected from Google Images or Baidu Images. For the four areas to be detected, 100 images for each area are manually selected and labeled. All the images are resized to a fixed size: . Among the detectors we tested, they accept most image formats, such as JPG, PNG, JPEG, BMP, etc. It is worth to mention that for our real-life cloud-based setting, we have to adjust these images properly followed the requirements of the tested APIs, including input format, size, and resolution and these images were collected legally. In order to avoid image loss during image compression and transmission, all the images in our experiments are saved with the PNG format and sent directly to the cloud API interface without additional transformations. Because the images are labeled by us, the illegal images may not be identified by detectors. We first filter these images by calling the detectors. The detectors will return the corresponding predictions, which are made up of probabilities (scores) and labels. Then, we preprocess the predictions. We discard images with ambiguous labels to ensure the quality of the initial images. For instance, Alibaba offers an option for the manual review to suspicious images and thus we exclude such images. For Google’s detector, we can only obtain a single-world result, e.g., POSSIBLE, LIKELY in Table VIII. Only the images whose labels are LIKELY and VERY LIKELY are considered. Finally, the predictions of the remaining images are recorded in Table VII, where “–” means the platform does not provide the API service.
|(a) Baidu||(b) Alibaba||(c) Azure||(d) AWS|
According to Table VII, we can learn that Baidu, which labels 95% of the pornography correctly, has done a better job than other cloud platforms. To our surprise, 46% of the pornographic images have not been identified by Azure’s detector. For violent images, Alibaba’s detector has the best performance since it labels 67% of the images correctly. The detectors of Google and Baidu only recognize 30% and 32% of the images, respectively. One reason can be that the scenes of violent images are more complex and the detectors do not consider a variety of scenarios.
To better understand the quality of these images, we record the probabilities or scores when the API is called to detect these images. The detailed information can be found in Fig. 6. As shown in Fig. 6, the majority of images are labeled by the APIs with very high confidence. For Azure’s detector, 80% of the probability labels are over 0.7. It is more convincing for our evaluation that we attack images that are correctly classified by the detectors with high confidence. Otherwise, a small perturbation may cause the images with low confidence to evade the detection, which cannot indicate the strength of our attacks.
In our experiments, considering misclassification only is not enough. The detector may give a similar prediction to illegal images though the predictions changed. For example, the prediction may go from VERY LIKELY to LIKELY, which does not make the detector give a completely opposite prediction. Therefore, the successful adversarial examples in our experiments are defined as the examples that have completely changed the prediction. Then, we use the three metrics in Section 3.2 to measure these adversarial examples, which can indicate the quality of them.
5.2.2 Detectors of Pornographic Images
The Internet is flooded with pornographic images, which is a serious problem for website regulators. Websites often leverage detectors to detect these illegal images. Evasion attacks on these detectors can result in huge content security risks. The four cloud platforms all provide pornographic image detection services. The results returned by these detectors are different, as shown in Table VIII.
|Baidu||Probability||Porn, Sexy, Normal|
|Alibaba||Score||porn, sexy, normal|
|(a) Success rates||(b) PSNR values|
Image Processing. The success rates of IP attacks are shown in Fig. 7 (a). From Fig. 7 (a), we know that the detectors of Azure, AWS, and Google on pornographic images are vulnerable to the Gaussian noise attack. The Grayscale attack has a slight effect on Google and Azure, and no effect on Baidu and Alibaba. For Binarization, Salt-and-Pepper and Brightness attacks, the success rates all increase as the parameters increase.
In order to evaluate these successful adversarial images, PSNR value is used. The average PSNR values of all successful adversarial images are shown in Fig. 7 (b). As shown in Fig. 7 (b), we can learn that the results of Gaussian noise and Grayscale attacks are very promising. Usually, values for the PSNR are considered between 20 and 40 dB, (higher is better) . The Binarization attack makes pornographic images contain only black and white pixels, which affects the image quality greatly and leads to the misclassification problem of detectors. On the other hand, the Binarization attack also significantly reduces the content visibility to humans in pornographic images. Thus, the PSNR values are very small. For Salt-and-Pepper and Brightness attacks, we keep increasing the attack parameters until the attack is successful. The CDF plots of the success rates are shown in Fig. 8. In Fig. 8 (a), we can see that the success rates reach 100% when the parameter of Salt-and-Pepper is 0.1 for the detectors of Google, AWS and Azure. However, the detectors of Baidu and Alibaba require more Salt-and-Pepper noise to achieve high success rates. In Fig. 8 (b), we know that the Brightness attack needs large parameters to achieve high success rates, which affects the image quality greatly. In both Salt-and-Pepper noise attacks and Brightness attacks, the detectors of Baidu and Alibaba have shown better robustness than Google’s and Azure’s on pornographic images.
|(a) Success Rates||(b) PSNR|
We conduct the Single-Pixel (SP) attack on the cloud platforms with perturbation in different regions of the images, as shown in Fig. 9 (a). For instance, Baidu-s means the perturbation of the subject region on Baidu, and Baidu-r means the perturbation of the random region. As shown in Fig. 9 (a), the effect of the perturbation in the subject region is much more significant than that in the random region in the case of perturbing the same number of pixels, which verifies the validity of semantic segmentation. Besides, we find that the attack success rate can be increased by perturbing more pixels on the subject regions. Moreover, even if 2000 pixels are perturbed, the success rates of attacks on Baidu and Alibaba still remain low, whereas 91% of pornographic images can bypass Azure’s detectors. This demonstrates that the detectors of Baidu and Alibaba are more robust to SP attack than others.
In order to evaluate the picture quality, we choose to perturb subject regions. The PSNR values are shown in Fig. 9 (b). The majority of PSNR values are larger than 20, which means the values of most successful adversarial images are in the acceptable range.
Subject-based Local-Search Attack. We set the maximum number of cycles to 30 in the Subject-based Local-Search (SBLS) attack. Besides, we set to 255 and to 10, which correspond to the RGB value and expanding 10 unit pixels in the next loop as described in Algorithm 1 respectively. Previous experiments have demonstrated the effectiveness of semantic segmentation. Thus, the evaluation of subject-based adversarial images is statistically analyzed in the following subsections. We skip Google’s detector for the SBLS attack since Google’s does not return probability or score and the SBLS attack must rely on them. We do not conduct the SBLS attack or SBB attack on AWS’s detector since AWS only provides 5000 queries each month for free, which is not enough for us to test hundreds of images with the SBLS attack or SBB attack.
The results of this attack are shown in Fig. 10. Since small distance means fewer queries, we can obtain the queries based on distance. According to Fig. 10, Azure’s detector is the weakest of all as 54% adversarial images on Azure’s detector cannot be detected. The minimum number of queries is 50 and the minimum distance is 10, which means that one round in an SBLS attack is enough to generate an adversarial image. The average PSNR values are all over 20 and the average SSIM values are all over 0.9. In other words, good adversarial images are obtained through only a few queries. Note that the prediction is normal, not a similar illegal category. The success rates of attacking Baidu and Alibaba are 34% and 19%, respectively. Although the success rates on Alibaba’s detector are the lowest, the average distance is only 142. Based on the distance, we modify about 0.01%-0.6% pixels of the whole image ( pixels). Among the successful adversarial images, all SSIM values are over 0.9 and all PSNR values are over 20. These data suggest that the quality of these adversarial images is very high. It is easy for people to observe pornographic information from these adversarial images.
|(a) Success rates||(b) PSNR values|
Subject-based Boundary Attack. Finally, we conduct the Subject-based Boundary (SBB) attack as described in Algorithm 2. Initially, we set , . The step size is set based on the tradeoff between accuracy and efficiency. As the number of iterations increases, more and more pixels of original images are recovered. In order to guarantee the quality of the adversarial examples, we only consider adversarial examples that recover more than 80% of the pixels in this paper. Finally, the success rates of attacking Azure and Google are 80% and 78%, respectively. Besides, the success rates of attacking Baidu and Alibaba are about 40%. The evaluation of these adversarial examples is shown in Fig. 11. The successful adversarial images have lower similarity than that of the SBLS attack’. However, good adversarial examples still exist in our experiments. For Baidu’s detector, the minimum distance is 84 and the maximum SSIM value is 0.97. For Azure’s detector, the minimum distance is 48 and the maximum SSIM value is 0.98. The majority of PSNR values are over 20, which means good image quality.
5.2.3 Detectors of Violent Images
Similarly, violent images are segmented with semantic segmentation model. Noted that we only consider the violent images which contain persons. In fact, a person is a subject class and plays an important role in the identification of violent images. Besides, subsequent experiments all focus on perturbation in the subject regions except IP attacks.
Image Processing. The success rates of IP attacks are shown in Fig. 12 (a). We find that the success rates of IP attacks are extremely high and violent image detectors are easier to attack than pornographic. Similar to pornographic images, Gaussian noise and Grayscale attacks can generate adversarial violent images which have high-grade quality according to PSNR values. We show successful adversarial images in Fig. 13. Fig. 13 (a) and Fig. 13 (d) are labeled as violent by the detectors of Alibaba and Google, respectively. However, Fig. 13 (b) and Fig. 13 (e) which have been added with Gaussian noise are labeled as normal. Besides, grayscale images of Fig. 13 (a) and Fig. 13 (d) are also labeled as normal. Although the colors have been changed, we can still easily identify guns and terrorists from the images. The detailed PSNR values of IP attack on violent images can be found in Fig. 12 (b).
|(a) Violent images||(b) Political images|
The success rates of SP attacks are shown in Fig. 14 (a). To our surprise, the detectors of Baidu and Alibaba are not resistant against the SP attacks on violent images due to the high attack success rate, which is very different from their performance on the pornographic images. Google’s detector shows good robustness on detection of violent images since we cannot launch a successful SP attack with the small perturbation. We speculate that different companies have different content security priorities. For instance, Google in the United States may focus more on images filled with violence and terrorism, while Baidu and Alibaba in China are faced with stricter censorship on pornographic images.
Subject-based Local-search Attack. Similarly, the SBLS attacks are conducted on violent images. We set P to 255 or 0 for different platforms and we choose the best results. The results are shown in Fig. 15. The success rates of attacking Baidu and Alibaba are 100% and 72%, respectively. In other words, we can make all images adversarial on the detector of Baidu with the SBLS attacks within a limited number of queries. The average value of queries is 200 for Baidu’s detector, and the average value of distance for Baidu’s is 38, which means only the modification of 38 pixels on average could make the violent images adversarial. Besides, the average SSIM value is about 0.99 for Baidu’s detector, which reveals a high degree of similarity. Although Alibaba’s detector is more robust than Baidu’s on violent images, the quality of adversarial images for Alibaba’s is also sufficiently good. The average SSIM value for Alibaba’s detector is over 0.99.
|(a)Alibaba SSIM =0.97||(b)Baidu SSIM=0.98|
Subject-based Boundary Attack. We also carry out the Subject-based Boundary attack on violent images. The results are shown in Fig. 17. The success rates of SBB attacks are all over 67%, but the quality of adversarial images are not as good as that of SBLS. The average SSIM values are less than those of SBLS. However, the SBB attack works better on violent images than pornographic images.
A successful adversarial image is shown in Fig. 18. We keep restoring the pixel until the detector recognizes it as illegal. In round 22 the detector marked the image as violent, so the image of round 21 is chosen as the best adversarial image.
|round 1||round 5||round 7|
|round 9||round 11||round 13|
|round 15||round 17||round 21|
5.2.4 Detectors of Political Images.
Both Baidu and Alibaba provide the cloud services to detect whether a picture contains politicians, since sensitive political images can be insulted on the Internet. We are interested in exploring the adversarial attacks that aim to interfere with the results from detectors of such political images. The political images come from several countries, such as China, the United States, Japan, South Korea, etc. The subject of political images is defined as a person’s face since faces determine the identification of politicians. In order to find the location of the face in the image, we adopt face detection service in open cloud platform. In this paper, we choose the face detection API from Baidu since it is free and easy to use. We only need to perturb the face area rather than the entire person’s pixels, which can speed up subsequent attacks except for IP attacks.
Image Processing. Firstly, IP attacks are used to test political images. We find that the Gaussian noise attacks make about 76% images on Baidu’s detector and 57% images on Alibaba’s adversarial. For AWS’s detector, the success rate is 45%. 20% images on Baidu’s detector and 33% images on Alibaba’s can evade detection using the Grayscale attack. To our surprise, Binarization attacks perform very well. Unlike pornographic and violent images, low-quality political images do not affect human judgment. Several successful adversarial images are shown in Fig. 19. Although the qualities of images are damaged through binarization, we can still recognize these politicians easily. For Salt-and-Pepper noise, the average values of are all 0.05 on the detectors of these platforms. Among the successful adversarial images of Brightness attack, the average values of are about 0.5. These results imply that the quality of the images is still good.
|(a)Vladimir Putin||(b) Barack Obama||(c)Barack Obama|
Single-Pixel Attack. Then, the SP attacks of the subject region on political images are considered. The success rates are shown in Fig. 14 (b). Based on the figure, political images are vulnerable to SP attacks. Even hundreds of pixels are perturbed, the quality of images is still acceptable.
Subject-based Local-Search Attack. Next we conduct SBLS attacks on political images. The success rates of Baidu’s detectors and Alibaba’s are 60% and 46%, respectively. Only 222 and 382 queries are needed on average. Here we show an example in Fig. 20. The distance between Fig. 20 (a) and Fig. 20 (b) is 10, which means we only perturb 10 pixels to make the image adversarial. Among all successful examples on Baidu’s detector, the average distance is 54, the average SSIM value is 0.98. Fig. 20 (c) is an adversarial image misclassified by Alibaba’s detector. Even noticeable perturbation have been added in this image, people can still recognize Barack Obama easily through the adversarial image. In other words, if we upload the image to the cloud, the detector will not give any warning about the politician.
Subject-based Boundary Attack. Finally, SBB attacks are used to test the robustness of models. Through SBB attacks, we can make 82% images adversarial on Baidu’s detector and 67% images adversarial on Alibaba’s. Moreover, the attack only needs 601 and 375 queries on average, depending on the platform. All SSIM values are 0.9, which represents a very high similarity between original images and adversarial images. The median PSNR values for adversarial examples of Baidu and Alibaba are 27 and 30, respectively. Besides, the median value of distance for adversarial examples of Baidu is 416 and the number for Alibaba is 858.
A successful example is shown in Fig. 21, where only 17 iterations are required. We first perturb the whole face area to make the image adversarial. Then the original pixels are recovered step by step until the image is labeled political. In Fig. 21, the loop stops in round 17. Thus the image in round 16 is considered the best adversarial image.
|round 1||round 6||round 8|
|round 10||round 12||round 13|
|round 14||round 15||round 16|
6.1 Effect of Attacks
Unlike previous attacks, our attacks do not require massive queries. For some attack methods (e.g., IP attacks), the detectors can be bypassed without any queries. For an iteration-based attack, only less than two thousand queries are needed to generate a good adversarial image, which can be done with the free quota. Based on our investigation, almost all cloud service APIs have free invocation quotas and service providers that allow a trial period for registered users. Thanks to the effective attack design, all of our experiments can be completed when we obtain free trials.
Secondly, different from any of previous work, our attacks can affect cloud services to a certain extent. In our experiment, we find that the detectors are easily tricked. Although noticeable perturbation can be found in the adversarial examples, they are close to original images or do not affect people’s judgment. For instance, people can still recognize perturbed political images easily while detectors fail to do so. Such attacks can help spread illegal images, which breaks the content security of the Internet. Moreover, attackers can utilize defects of these services to make money.
To achieve different goals, these attacks can be launched in different scenarios. If the attacker cannot query the APIs frequently, the single-step IP and SP attacks can be launched. If the attacker can freely query the APIs, they can get the adversarial examples with good visibility through iterative SBLS and SBB attacks. The success rates of IP and SP attacks are higher than that of the SBLS and SBB attacks, while the adversarial examples generated by SBLS and SBB attacks have less perturbation than that of the IP and SP attacks. The attacker can adjust the attack methods according to the characteristics of the environment.
Since these attacks pose a significant threat to cloud services, it is crucial to design the defense mechanisms against these attacks. In   , adversarial training was proposed to improve the robustness of deep learning model. They iteratively create a set of adversarial examples and include them into the training data. Retraining the model with new training data may be very helpful. For instance, cloud service providers can collect plenty of grayscale and binarization images or adversarial examples which are generated from the SBLS attack and SBB attacks. In other words, a new deep model is trained to detect these illegal images. Although adversarial training can defend against these attacks to a certain extent, obtaining adversarial examples is costly. In  , the researchers proposed a mechanism where a detection is needed before feeding data to models. Certainly, detecting inputs can avoid some strange adversarial examples but also increase false positives, since the input in the real world is similar to an adversarial example. Besides, in , the authors showed that the detection mechanism can also be bypassed. In  , randomization was proposed for mitigating adversarial effects. The models make decisions through multiple classifiers, which can increase the robustness of the model. Adversarial training, detecting inputs and randomization rely on the specific model or datasets, but our proposed methods work in a black-box manner without access of any above.
Since our attacks (SBLS,SBB) rely on confidence, rounding confidence scores of output to some fixed precisions is a good defense. However, the outputs of APIs in our experiments are not always the exact probability values but also some scores represent the confidence, namely, the cloud platforms may have already deployed the defense. For instance, the outputs of AWS and Alibaba Cloud are rough scores and Google Cloud returns a word instead of a numerical representation. Although these outputs are not precise confidence values, our attacks can still achieve high success rates as shown in Section 5.2. If the detectors output labels only, it will hinder our iterative attacks. On the other hand, this method is not friendly to the users either and cannot help users understand the detection results.
Limiting the queries is a straightforward strategy since our SBLS and SBB attacks require iterative queries. Specifically, we statistically calculate the successful adversarial examples of iterative attacks as shown in Table IX. We counted the median, maximum and minimum values of queries required to generate adversarial examples. In our experiments, we do not need massive queries to launch an iterative attack. The maximum number of queries is less than 3000, and over a half are no more than 1000. Therefore, limiting the number of queries to a large threshold does not work. Besides, many websites which rely on cloud services may call the APIs multiple times within a short period to review the content of their webpages.
In , the authors claimed that noise filter could be an effective method to eliminate the adversarial attacks. In order to understand whether the noise filter is effective, we implemented it offline and the noise filter only processes the images sent to the cloud-based detectors. In our experiments, we select the Gaussian Filter and Median Filter since the Gaussian filter can handle Gaussian noise and the Median Filter can handle Salt-and-Pepper noise. For Binarization, Brightness, and Grayscale attacks, we also use both the Gaussian filter and Median Filter to process them since there is not a specific filter to handle them. To test IP attacks, we record the success rates of attacks with or without noise filters. The results are shown in Table X. From Table X, we can clearly see a decline in the success rates of Gaussian noise attacks and Salt-and-Pepper attacks. Especially, for Azure, Baidu, and Google, the success rate of Salt-and-Pepper attacks remain high even under the Gaussian Filter. We analyze these adversarial examples and find that the perturbation parameters of Salt-and-Pepper noise are almost twice as large as the original adversarial examples’. Besides, the Gaussian Filter is more effective than the Median Filter to evade Binarization attacks. However, Gaussian Filter and Median Filter affect the accuracy of models to a certain extent. For example, the success rates of Brightness and Grayscale attacks on Alibaba and Baidu have gone up a little bit, which means the noise filters increase the blurring of the adversarial images and prevent the model from detecting the adversarial images.
As for SP, SBLS and SBB attacks, they can be considered as adding noise to the images. Therefore, the Gaussian Filter and Median Filter are also selected in our simulated experiments. The experimental results show that the Gaussian Filter has almost no impact on the SP attacks. The Median Filter can reduce the success rates of SP attacks on pornography to less than 5%, and cut the success rates of SP attacks on violent and political images in half. Through the observation of filtered images, we think the Median Filter allows pornographic images to show more pixels of skin color, making them easier to detect by detectors, but the elimination of random noise from violent and political images also increases their blurring. To simulate noise filters on cloud platforms, noise filters are adopted in each iteration of SBLS and SBB attacks. Then the final success rates of the attacks are recorded. During our experiments, we find that the Gaussian Filter has little impact on both SBLS and SBB attacks. When using Median Filter, the SBLS attacks cannot work since SBLS attacks try to capture the difference of the prediction caused by small perturbations in each iteration, but the Median filter can easily filter out small perturbations. For SBB attacks, the Median Filter could be able to reduce the success rates by approximately 10%. The reason is that the initial perturbation of SBB attacks is large, and the Median Filter does not work well for large perturbations in the region. Nevertheless, with the deployment of noise filters, we need more perturbations to launch successful attacks.
In summary, we found that the Median Filter can resist most Salt-and-Pepper noise attacks, SP attacks, and SBLS attacks. The Gaussian filter can resist most Gaussian noise attacks. However, cloud-based detectors have no way to deploy a uniform filter and thus it is difficult to simultaneously defend against all kinds of attacks. At the same time, the noise filters will also introduce a decrease in model accuracy.
7 Related Work
Previous works mainly study the security and privacy in deep neural network via white-box mode      . In the white-box model, the attacker can obtain the adversarial examples quickly and accurately. Besides, the perturbation is too small to be perceived by people. However, it is difficult for the attacker to know the inner parameters of models in the real world. For instance, the architecture and parameters of deep models on cloud platforms cannot be obtained by the attack. The attacker can only access the APIs opened by cloud platforms. Thus, the black-box attack of neural networks is more threatening.
The researchers have launched some black-box attacks on deep neural networks recently. In , Papernot et al. proposed that the attacker can train a substituted model, which approaches the target model, and then generate adversarial examples on the substituted model. Their experiments showed that good transferability exists in adversarial examples, but the attack is not totally black-box. They have knowledge of the training data and test the attack with the same distributed data. In , Liu et al. adopted an ensemble-based model to improve transferability and successfully attack Clarifai.com. However, the classifiers are greatly different from detectors. Several existing classifiers models, like VGG16, Resnet50, and InceptionV3 perform well on classifying the test images, which can contribute to the generation of adversarial examples. It is difficult for attackers to get open-source models which are used to detect illegal images. Especially for complicated detectors, transferability is not very well.
The query-based black-box attack has also been explored by researchers. In   , researchers can get inner information of models through lots of queries, but it is impractical in commercial detectors. In , thousands of queries are required for low-resolution images. For high-resolution images, it still takes tens of thousands of times. By querying the output of the target model, gradient estimation based black-box attack methods were proposed in  . Nevertheless, faced with high-resolution images, millions of queries are required, which is very inefficient and impractical in the real world. In , Brendel proposed a decision-based attack that starts from a large adversarial perturbation and then seeks to reduce the perturbation while staying adversarial, but they need 1,200,000 queries on average to generate a good adversarial example for high-resolution images. In this paper, we utilize the semantic segmentation technique to speed up the process. Besides, we choose the important pixels according to the returned probability. Finally, only thousands of queries are required to generate pretty good adversarial examples. In , Nina et al. proposed a greedy local search algorithm to attack models via black-box mode. The SBLS attack in this paper is originated from , but we have modified it to adapt to cloud platforms. They do not attack the models in the real world. For instance, setting the RGB value to a value bigger than 255 is impractical in real images.
Several other forms of black-box attacks existed in recent works. In , Anish et al. constructed real-world 3D objects that consistently fool a neural network across a wide distribution of angles and viewpoints. They show that adversarial examples are a practical concern for real-world systems, but they focus on classifiers only. In , Hossein et al. found that the API generates completely different outputs for the noisy image, while a human observer would perceive its original content. We expand the experiment and use five image processing techniques to attack models of four cloud platforms. When faced with detectors, we find Salt-and-pepper noise is not valid enough and other image processing methods perform well. In , Seong et al. inferred the inner information of models through multiple queries and the revealed internal information helps generate more effective adversarial examples against the black-box model. Even thousands of models are trained to infer useful information. However, this is impractical and there are no multiple candidate models to infer when handling detectors.
8 Conclusion and Future Work
In this study, we conduct a comprehensive study of security issues in the cloud-based image detectors. We design four kinds of attack and verify them on major cloud service platforms. According to our experimental results, we find that cloud-based detectors are easily bypassed. In particular, Azure’s detector is the weakest on pornographic images since these attacks can be achieved within hundreds of queries, which makes the attacks practical in the real world. Moreover, we find that Baidu and Alibaba perform better on the detection of pornographic images than Google and Azure, while Google in the United States has higher success rates at the detection of violent images. For political images and disgusting images, the detectors are easier to be attacked than pornographic images. We reported our findings to the tested cloud platforms and received positive feedbacks from them.
In the future, we are interested in the optimization of queries to further improve the effectiveness of our algorithms. A general method to attack all platforms with a small number of queries would be another meaningful topic. It is also important to design the defense mechanisms for these cloud services against adversarial example attacks. For instance, cloud platforms can perform effective detection before outputting a label and distinguish between malicious tests and normal samples. We hope our work can help cloud platforms to design secure services and provide some inspiration to researchers who study the security issues in deep learning models.
-  Naveed Akhtar and Ajmal Mian. Threat of adversarial attacks on deep learning in computer vision: A survey. arXiv preprint arXiv:1801.00553, 2018.
-  Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z Berkay Celik, and Ananthram Swami. Practical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, pages 506–519. ACM, 2017.
-  Jamie Hayes and George Danezis. Machine learning as an adversarial service: Learning black-box adversarial examples. arXiv preprint arXiv:1708.05207, 2017.
-  Pin-Yu Chen, Huan Zhang, Yash Sharma, Jinfeng Yi, and Cho-Jui Hsieh. Zoo: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models. In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pages 15–26. ACM, 2017.
-  Yanpei Liu, Xinyun Chen, Chang Liu, and Dawn Song. Delving into transferable adversarial examples and black-box attacks. arXiv preprint arXiv:1611.02770, 2016.
-  Wieland Brendel, Jonas Rauber, and Matthias Bethge. Decision-based adversarial attacks: Reliable attacks against black-box machine learning models. arXiv preprint arXiv:1712.04248, 2017.
-  Jiajun Lu, Hussein Sibai, and Evan Fabry. Adversarial examples that fool detectors. arXiv preprint arXiv:1712.02494, 2017.
-  Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
-  Johannes Stallkamp, Marc Schlipsing, Jan Salmen, and Christian Igel. Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition. Neural networks, 32:323–332, 2012.
-  I Rosenberg, A Shabtai, L Rokach, and Y Elovici. âgeneric black-box end-to-end attack against state of the art api call based malware classifiers, 2017.
-  Jonathan Long, Evan Shelhamer, and Trevor Darrell. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3431–3440, 2015.
-  Vijay Badrinarayanan, Alex Kendall, and Roberto Cipolla. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. arXiv preprint arXiv:1511.00561, 2015.
-  Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L Yuille. Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv preprint arXiv:1412.7062, 2014.
-  Guosheng Lin, Anton Milan, Chunhua Shen, and Ian D Reid. Refinenet: Multi-path refinement networks for high-resolution semantic segmentation. In Cvpr, volume 1, page 5, 2017.
-  Chao Peng, Xiangyu Zhang, Gang Yu, Guiming Luo, and Jian Sun. Large kernel mattersâimprove semantic segmentation by global convolutional network. In Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on, pages 1743–1751. IEEE, 2017.
-  Nina Shiva Kasiviswanathan et al. Simple black-box adversarial attacks on deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 6–14, 2017.
-  Aishy Amer, Amar Mitiche, and Eric Dubois. Reliable and fast structure-oriented video noise estimation. In Image Processing. 2002. Proceedings. 2002 International Conference on, volume 1, pages I–I. IEEE, 2002.
-  Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4):600–612, 2004.
-  Hossein Hosseini, Baicen Xiao, and Radha Poovendran. Google’s cloud vision api is not robust to noise. In Machine Learning and Applications (ICMLA), 2017 16th IEEE International Conference on, pages 101–105. IEEE, 2017.
-  R. W Floyd. An adaptive algorithm for spatial grey scale. Sid Digest, 17:75–77, 1975.
-  Ken Chatfield, Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Return of the devil in the details: Delving deep into convolutional nets. arXiv preprint arXiv:1405.3531, 2014.
-  Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533, 2016.
-  Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
-  Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
-  Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2818–2826, 2016.
-  Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
-  Uri Shaham, Yutaro Yamada, and Sahand Negahban. Understanding adversarial training: Increasing local stability of neural nets through robust optimization. arXiv preprint arXiv:1511.05432, 2015.
-  Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Ian Goodfellow, Dan Boneh, and Patrick McDaniel. Ensemble adversarial training: Attacks and defenses. arXiv preprint arXiv:1705.07204, 2017.
-  Xiaoyu Cao and Neil Zhenqiang Gong. Mitigating evasion attacks to deep neural networks via region-based classification. In Proceedings of the 33rd Annual Computer Security Applications Conference, pages 278–287. ACM, 2017.
-  Weilin Xu, David Evans, and Yanjun Qi. Feature squeezing: Detecting adversarial examples in deep neural networks. arXiv preprint arXiv:1704.01155, 2017.
-  Nicholas Carlini and David Wagner. Adversarial examples are not easily detected: Bypassing ten detection methods. In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pages 3–14. ACM, 2017.
-  Cihang Xie, Jianyu Wang, Zhishuai Zhang, Zhou Ren, and Alan Yuille. Mitigating adversarial effects through randomization. arXiv preprint arXiv:1711.01991, 2017.
-  Yevgeniy Vorobeychik and Bo Li. Optimal randomized classification in adversarial settings. In Proceedings of the 2014 international conference on Autonomous agents and multi-agent systems, pages 485–492. International Foundation for Autonomous Agents and Multiagent Systems, 2014.
-  Nicolas Papernot, Patrick McDaniel, Somesh Jha, Matt Fredrikson, Z Berkay Celik, and Ananthram Swami. The limitations of deep learning in adversarial settings. In Security and Privacy (EuroS&P), 2016 IEEE European Symposium on, pages 372–387. IEEE, 2016.
-  Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool: a simple and accurate method to fool deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2574–2582, 2016.
-  Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. arXiv preprint arXiv:1608.04644, 2016.
-  Nicolas Papernot, Patrick McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami. Distillation as a defense to adversarial perturbations against deep neural networks. arXiv preprint arXiv:1511.04508, 2015.
-  Florian Tramèr, Fan Zhang, Ari Juels, Michael K Reiter, and Thomas Ristenpart. Stealing machine learning models via prediction apis. In USENIX Security Symposium, pages 601–618, 2016.
-  Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. Membership inference attacks against machine learning models. In Security and Privacy (SP), 2017 IEEE Symposium on, pages 3–18. IEEE, 2017.
-  Matt Fredrikson, Somesh Jha, and Thomas Ristenpart. Model inversion attacks that exploit confidence information and basic countermeasures. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, pages 1322–1333. ACM, 2015.
-  Andrew Ilyas, Logan Engstrom, Anish Athalye, and Jessy Lin. Query-efficient black-box adversarial examples. arXiv preprint arXiv:1712.07113, 2017.
-  Arjun Nitin Bhagoji, Warren He, Bo Li, and Dawn Song. Exploring the space of black-box attacks on deep neural networks. arXiv preprint arXiv:1712.09491, 2017.
-  Anish Athalye and Ilya Sutskever. Synthesizing robust adversarial examples. arXiv preprint arXiv:1707.07397, 2017.
-  Seong Joon Oh, Max Augustin, Bernt Schiele, and Mario Fritz. Whitening black-box neural networks. arXiv preprint arXiv:1711.01768, 2017.
Appendix A Detectors of Disgusting Images
Disgusting pictures will cause serious discomfort to visitors, such as some dense images or human organs as shown in Figure 22.(a)-(c). Thus, many websites do not allow people to upload disgusting pictures. If these images all need to review by people, the cost is huge. Among the four platforms we investigated, only Baidu open cloud APIs detect nauseous pictures. Therefore, we want to know whether these services are robust against our attacks.
Image Processing. For IP attacks, we find the results are bad. Only the Binarization and Brightness attacks work, while we cannot see anything disgusting from the successful adversarial images. We think the main reason is that dense shapes are generally considered disgusting images. If we add plenty of noise into the image, the intensity of the dense shape increases.
Single-Pixel Attack. For SP attacks, we find the success rates are not very high, even 2000 pixels have been perturbed, the attack success rate is just 18%. We think the reason is same as Image Processing’s, namely, perturbing too many pixels may make the image more disgusting than before. However, we still find a certain proportion of the adversarial images, which means perturbation on pixels works to a certain extent. The key is how to perturb the pixels. Thus, we forward to the SBLS attack.
Subject-based Local-Search Attack. Since disgusting images do not have subject class, we choose the whole region as the subject class to conduct our attack. For the SBLS attack, about 45% of perturbed images can bypass the detector. The average number of queries is 459 and the average distance is 97. The PSNR value and SSIM value of successful adversarial images are 34 and 0.97, respectively. The number of perturbed pixels is much less than SP attack’s, but the success rate is better. Among the successful images, The maximum distance is 260, which means that we just modify 260 pixels at most to make them adversarial. We show the examples in Figure 22.(d)-(f). For a image, we only perturb about 0.5% of total pixels.
Subject-based Boundary Attack. We find that the SBB attack does not work on disgusting images. The loop stops after a few iterations. After we tracked this iterative process, we find the reason. In the middle of the iteration, the image has been identified as disgusting since the image contains plenty of dense points. Therefore, the image quality is very bad when the loop stops.