RNN-Test: Adversarial Testing Framework for Recurrent Neural Network Systems
While huge efforts have been investigated in the adversarial testing of convolutional neural networks (CNN), the testing for recurrent neural networks (RNN) is still limited to the classification context and leave threats for vast sequential application domains. In this work, we propose a generic adversarial testing framework RNN-Test. First, based on the distinctive structure of RNNs, we define three novel coverage metrics to measure the testing completeness and guide the generation of adversarial inputs. Second, we propose the state inconsistency orientation to generate the perturbations by maximizing the inconsistency of the hidden states of RNN cells. Finally, we combine orientations with coverage guidance to produce minute perturbations. Given the RNN model and the sequential inputs, RNN-Test will modify one character or one word out of the whole inputs based on the perturbations obtained, so as to lead the RNN to produce wrong outputs.
For evaluation, we apply RNN-Test on two models of common RNN structure — the PTB language model and the spell checker model. RNN-Test efficiently reduces the performance of the PTB language model by increasing its test perplexity by 58.11%, and finds numbers of incorrect behaviors of the spell checker model with the success rate of 73.44% on average. With our customization, RNN-Test using the redefined neuron coverage as guidance could achieve 35.71% higher perplexity than original strategy of DeepXplore.
As the core part of the current artificial intelligence applications, deep learning has made great breakthroughs on computer vision[47, 22], natural language processing and speech recognition[19, 2]. With the increasingly deployments of deep neural network (DNN) systems in the security critical domains, like automated driving and medical diagnose, ensuring the robustness of DNNs becomes the essential part in the communities.
However, it is demonstrated that state-of-the-art DNNs could produce completely different predictions, when fed with the adversarial inputs. This inspired numerous adversarial testing works devoted to generate adversarial inputs for the DNNs, aiming to provide rich sources for training the DNNs to be more robust. The majority of these works[13, 35, 33] mutate the inputs based on the perturbations obtained by gradient descent. They exhibit high efficiency in generating adversarial inputs, but achieve low testing completeness. Recent years, multiple coverage criteria[38, 46, 29] are proposed measuring the coverage achieved in various granularities in the testing. They believe that reaching higher coverage could increase the confidence of the reliability of the DNN.
In spite of the effectiveness of these works, they are largely limited to the CNNs. Overall, there are two main types of DNNs, the convolutional neural networks (CNN) and recurrent neural networks (RNN). They are of different structures and suited for different kinds of tasks. CNNs introduced the convolution layer and pooling layer to the traditional fully connected DNNs, and have excellent performance in image processing applications[44, 18]. RNNs are known for their iterative structures and the support of temporal information, hence good at handling tasks with sequential data, like natural language processing and speech recognition. Owing to the gap between their structures, the adversarial testing on the two types of DNNs are hard to fit the other.
The adversarial testing for RNNs face certain challenges, summarized as threefold. First, there is no rule to recognize the adversarial input without the obvious class label. For the sequential outputs not then applied to classification, there is no standard to decide the outputs as wrong outputs with respect to the changing degree. Second, mutating the sequential inputs like texts is hard to ensure the minute perturbation. Applying the perturbations to words in a discrete space always cannot obtain a legal input and the explicit modification is distinguishable for humans. Third, the existing neuron based coverage metrics of CNN fail to consider the characteristics of RNN structures and could not be adopted directly.
Benefit from the simpler adaptation of works on CNNs, the existing works on RNNs have also applied adversarial testing to the classification domains. They perform well in specific tasks, such as sentiment analysis of texts[42, 40] and email classification, etc. For text inputs, most works add, modify or delete a word/character to make sure the minute alteration as a way against the second challenge. But the adversarial testing for RNNs with the main sequential domains are always left away or rather inadequate[36, 34], with the first challenge not well addressed. Besides that, the coverage metrics defined for the DNNs are also based upon CNNs, which have thousands of neurons activated by the activation function RELU. Instead, RNNs have significantly fewer states activated by sigmoid and tanh, with completely different value ranges. But this key issue is neglected by the relevant works for RNNs.
In this paper, we propose a generic adversarial testing framework RNN-Test for recurrent neural networks, with no limit to tasks. First, we define three coverage metrics targeting the particular computation logics of RNNs. Then, RNN-Test primarily adopts the joint optimization of maximizing the adversary orientations and boosting the coverage, which enables the perturbations obtained in a gradient-based way. In the adversary orientation module, we propose the state inconsistency orientation to maximize the inconsistency of the hidden states and lead the model to produce wrong outputs, with the cost orientation adapted from FGSM and the decision boundary orientation from DLFuzz. In the coverage boosting module, we first employ the coverage guidance to obtain the perturbations, not only as the indicator of testing completeness and the key goal to improve. Note that we only keep the perturbations of one word/character to modify out of the whole inputs, thus ensuring the tiny modification. Finally, we address the first challenge by leveraging the performance metrics of the tested models to assess the qualities of the adversarial inputs.
For evaluations, we select the PTB language model and a spell checker model based on their general structures and common applications, and implement a customized version of neuron coverage in DeepXplore for comparison. On the PTB model, RNN-Test demonstrates its effectiveness in adversarial input generation by increasing the test perplexity by 58.11% on average, where the state inconsistency orientation declines the model performance most among the three orientations. With coverage boosting, the redefined coverage of DeepXplore as guidance achieves 35.71% higher perplexity than with random strategy of DeepXplore. Furthermore, we retrain and improve the model by 1.159% using the augmented training set with adversarial inputs. On the spell checker model, the adversarial inputs result in the corrected mistakes emerging again with the success rate of 73.44% averagely. It is remarkable that the coverage guidance achieves the highest success rate of 74.29% with the aid of the boosting procedure.
To summarize, our work has the following contributions:
We define three coverage metrics customized for RNNs and first exploit the coverage boosting procedure to directly generate the adversarial inputs. During experiments, we found that there is no linear correlation between the coverage value and the qualities of adversarial inputs, and more efforts should be paid to improve the qualities of inputs not the value of coverage.
We propose the state inconsistency orientation to lead the tested RNN models to behave worse, which is also effective for adversarial input generation.
We design, implement and evaluate the generic adversarial testing framework RNN-Test, which is scalable for variants of RNNs without the limit of the application contexts and support multiple combinations of orientations and coverage metrics freely.
We demonstrate the effectiveness of RNN-Test on two RNN models. RNN-Test could efficiently generate adversarial inputs and improve the PTB model by retraining with the augmented training set.
We organize this paper as follows. In Section II, we provide the background related to RNN. In Section III, we formally describe the design of RNN-Test in detail. Section IV presents the evaluation results of RNN-Test. Section V discusses the threats to validate the work. In Section VI, we introduce the related works. Section VII makes a conclusion.
Ii-a Deep Neural Network
From the biology view of perspective, artificial neural networks were initially designed to imitate the structure of biological neurons with an activation process. The difference of deep neural networks with respect to shallow neural networks lies in more hidden layers to perform complex computation. A fully connected network requires each neuron to establish connections with all neurons in adjacent layers. Fig. (a)a shows the structure of a traditional DNN and Fig. (b)b for a typical neuron of DNN. For the traditional DNNs, the direction of data flow is from input layer and hidden layers and then to the output layer. Furthermore, CNNs keep the main feed forward structure and introduce the convolution layer and pooling layer to better extract the features of the inputs, which are mostly images. Note that the activation function used in CNN is usually RELU, which keeps the positive output value as the same but treats other values as 0, so having infinite upper bound.
Ii-B Recurrent Neural Network and the variants
RNN is widely applied in temporal sequence analysis. For traditional deep neural networks like CNNs, neurons in adjacent layers are fully connected while neurons within the same layer has no explicit relations. As a result, CNNs cannot deliver the context information surrounding the input data very well.
An example of such tasks can be, for instance, predicting next word with the knowledge of previous sentence. Because of the semantic relationship between two words in a sentence, we have to take the sequence of previous words into consideration to predict next word. Fig. 2 depicts the typical RNN structure and formula (1) summarizes the computation process of RNN cell. The hidden state output of the cell at time step in layer is decided by current input from the previous layer as well as from the previous step in the same layer, and then passed forward to compute the softmax predictions. Consequently, RNN is able to represent the context information in temporal sequences, which makes it appropriate for the natural language processing tasks. Besides of this key design, common RNNs always comprise two or three layers each with several states when unfolded, much fewer than CNNs usually of ten more layers each with hundreds of neurons. Moreover, activation functions sigmoid and tanh are commonly used and important in our definitions of coverage metrics.
Variants of RNNs. Although RNNs could solve most tasks involving time series, they still encounter the problem of gradient vanishing or gradient exploding, which causes the network unable to learn the dependency within long time steps. Then LSTM(Long short-term memory)[21, 51] and GRU bring the gate mechanism allowing RNNs to learn the context information from farther time steps. They are both widely applied in the related tasks now.
Taking LSTM as an example, its structure and the inner cell is given in Fig. 3, along with the computation process in formula (2). There are cell states and gates participating in the computation, where stand for input gate, forget gate, output gate, new input gate respectively, utilized to decide the flow and weights of different parts of inputs.
Iii RNN-Test Approach
Iii-a RNN-Test Overview
The overall workflow of RNN-Test is depicted in Fig. 4. The workflow is not restricted to the classification context but universally applicable for both the sequential and classification contexts.
RNN-Test relies on three core modules to generate adversarial inputs for recurrent neural networks, which are RNN wrapper, adversary orientation maximizing and coverage boosting. RNN wrapper extracts the hidden states and cell states of each RNN cell in the given RNN, without affecting its inherent process. The states obtained are crucial for adversarial input generation and utilized in both the other two modules. Additionally, our coverage metrics defined based on the states are given in § III-B and § III-C.
In the module of adversary orientation maximizing, RNN-Test integrates three orientation methods, including our proposed state inconsistency orientation and two orientation methods adapted from other works[13, 16] performing well in CNN testing, described in § III-E. These methods search the adversarial inputs by maximizing the orientations designed to lead the RNN to expose wrong behaviours. Meanwhile, the module coverage boosting aims to generate adversarial inputs by searching in the uncovered space of RNNs, referring to § III-F. The orientation methods and coverage guidances in the two modules are free to be integrated together, thus allowing RNN-Test to explore better means.
Finally, the integrated modules will produce a joint objective. Maximizing the objective by gradient ascent could obtain the perturbation to modify the test input. Here we just randomly modify one word or character out of the whole sequential input, ensuring the modification is little enough to maintain the original semantic meaning. As the words and characters are in a discrete embedding space, the minute perturbation applied to the test input probably will not lead to a legal input. We adopt the nearest embedding as the adversarial input after iteratively scaling the perturbation.
Iii-B Key insights for coverage metrics.
Insights for state coverage. Based on the distinctive structure of RNN, the outputs of each RNN cell are hidden states, denoted as , which are vectors. For LSTM cell, the outputs also incorporate cell states, denoted as , also vectors always of the same shape as . The computation procedure is illustrated in § II.
In the procedure, the hidden states play a key role for the prediction, used to map to the prediction results. For one input, if a specific hidden state has the maximum value of the RNN cell outputs, the probabilities of its mapping part of the prediction result tend to be higher as well. Thus, combinations of hidden states lead to the varying prediction results. As covering the permutations of each hidden state of the RNN cell is extremely time-consuming, covering all the maximum hidden states is a feasible solution. The definition for hidden state coverage is given in formula (3).
In LSTM structure, cell states are activated by tanh function, and then to compute the hidden states of the same cell. As shown in Fig. (a)a, the output value of tanh function ranges from to . According to our statistics, the activation values of cell states mostly fall into the central range while few be the boundary value. Hence we could measure the coverage over the different ranges, but too many sections like deepGauge will be nonsense due to the narrow value range. We split the sections each ranging from to , where are in the set according to tanh distribution. We suppose that covering more of each section, especially the boundary sections, could exercise more computation logics.
Insights for gate coverage. Multiple gates designed are a prominent characteristic of GRU and LSTM model. In the implementation, their gates are split from the concatenated hidden states and , as in formula (2). Then they are used for computing and after activation.
Similar to the statistics of activation values of cell states, the activation values for each gate are also mainly in the central range. We employs the same mechanism to compute the gate coverage, by first splitting the value range to several sections and then recording the coverage of each section. Moreover, the sections for the gates activated by tanh are the same as above, and the sections for other gates activated by sigmoid are also separated based on its distribution, shown in Fig. (b)b, where from 111The two sections are of these values in the boosting procedure but with wider boundary sections when recording coverage, convenient for evaluations..
Iii-C Coverage definitions
Hidden state coverage. Assume all the hidden states of an RNN model are represented by . is a matrix of shape , where are the number of the time steps, layers and batch, respectively. is the state size. The shape of varies among the RNN models, but are the necessary components. Note that modern DNNs always process inputs in batch to accelerate the computation.
For a specific hidden state vector , where , and . If a state and , then the state is covered. Thus, the hidden state coverage is computed as the below formula (3).
Cell state coverage. All the cell states of an RNN model are denoted as , which is also a matrix of shape . The value range of the activation function is split to sections and each section is represented as , where . For a specific cell state vector , if a state and , then is covered in . The cell state coverage in each section is given in formula (4).
Gate coverage. All the states that are utilized to compute the gates of the RNN model are represented by . The states for each type of gate are denoted as , for LSTM model as in § II-B. is a matrix of shape , is the state size for gate . If a state and and is the activation function for gate , then is covered in . Thus, gate coverage is computed as formula (5) below.
DX coverage. We also customize the neuron coverage in DeepXplore owing to the great difference between traditional DNN structure and RNN structure. For CNNs, DeepXplore treats each feature map (outputs of the convolution layer, a matrix of hundreds of values) as a neuron and takes the mean value as the output. If the same as DeepXplore, hidden states of each cell will be treated as a neuron. Then, a common RNN like the PTB model will only consist of fewer than 100 neurons which one layer of CNN owns, and the coverage value will be 100% just with several inputs. So we regard each hidden state as the neuron.
For all the hidden states and a state , if its output value after min-max normalization where is the user-defined threshold, then is covered, as in formula (6).
Iii-D Adversary search
The core algorithm of RNN-Test is presented in Algorithm 1, in which the procedure COVERAGE_BOOST is given in Algorithm 2 and procedures retrieve_states, get_orient are described in the following and § III-E, respectively.
RNN wrapping. In the inherent implementation of a deep RNN model taking a sequential input, it will output two data elements before making predictions, which are all the hidden states of the last layer, and all the hidden states and cell states of the last time step. For the subsequent workflow, we need the access to all the hidden states and cell states of each layer and time step. We wrap the RNN cell implementation and keep the hidden states and cell states of each cell, thus making all the states available, corresponding to procedure retrieve_states in line 4.
Joint optimization objective. Opposed to the training course minimizing the prediction error by tuning the parameters to achieve the desired performance, adversarial testing tries[13, 33] to maximize their objectives by mutating the test inputs to discover errors. Different optimization objectives will make the RNN model target to different outputs when mutating the input, with diverse capabilities of discovering adversarial inputs. We explore multiple alternatives and combinations of objectives for adversarial testing for RNN models. The optimization objective here includes two components (Algorithm 1 line 8), adversary orientation maximizing and coverage boosting, corresponding to in § III-E and in § III-F. Note that the two components can be utilized independently or combined together. Moreover, taking the derivatives of with respect to the input could obtain the gradient direction along which increases or decreases most (Algorithm 1 line 9). Afterwards RNN-Test mutates the input by scaling the gradients (line 20) and then applying to the input as the perturbations (line 21), thereby maximizing the objective and obtaining the adversarial inputs.
Why use the nearest embedding. In the procedure GEN_ADV, we iteratively apply the perturbations and then search the nearest word/character in the embedding space as the adversarial inputs (line 22 to 27 in Algorithm 1). Due to the discreteness of the embedding space of NLP tasks, this is a straightforward way to obtain the adversarial inputs. Besides that, the embedding representations of words or characters in each NLP task are acquired after enough training, which could unveil the semantic properties of the words or characters and solve the task. Searching in the given embedding space could get the adversarial inputs with the existing semantic information.
Model performance metrics. In sequential tasks, there is no obvious label of the predicted output to identify a generated sequence as the adversarial input, unless introducing the classification labels but violating the principle of generality. Fortunately, the metrics measuring the model performance are a good choice to exhibit the qualities of the adversarial inputs (Algorithm 1 line 15), which are supposed to be accessible in all the tasks.
Iii-E Adversary orientation maximizing
In RNN-Test, we explore three adversary orientations in adversarial testing for RNNs, including our proposed state inconsistency orientation, adapted cost orientation in FGSM and decision boundary orientation in DLFuzz.
State inconsistency orientation. The state inconsistency orientation is designed based upon the inner logic of RNN cell. As shown in formula (1) and (2), the states increase linearly with the states of and , if implemented. Therefore, the state inconsistency orientation tries to increase and while decrease simultaneously, leading the RNN to unusual behaviours, which is formulated in formula (7).
Cost orientation. FGSM and many other works[13, 5] generate the adversarial inputs by maximizing the loss of the predicted output label and original output label. For sequential tasks in RNN, the loss is mostly the weighted cross-entropy loss for a sequence of logits, briefly listed in formula (8), which is encapsulated in the implementation of the model and is accessible via APIs.
Decision boundary orientation. Decision boundary orientation is designed to decrease the probability of the original predicted label and increase the probabilities of other top k labels in prediction. For RNN testing, we adapted this idea with respect to the specific time step to mutate in the input, as its outputs are also a vector with softmax probabilities, formulated in (9).
Iii-F Coverage boosting
The coverage boosting module targets to cover the uncovered states and sections, in this way to search for adversarial inputs. As in formula (10), RNN-Test selects hidden states or cell states to boost their values. Besides the strategy of randomly selecting the states uncovered or with uncovered boundary sections, RNN-Test also adopts boosting procedure to select the states with values near the boundary section endpoints and guides their values to reach the boundaries, as in Algorithm 2. This given procedure is for the coverage metrics defined on a series of sections. For and DX coverage, the procedure selects states with values close to be covered.
Iv-a Experiment Setup
Implementation. We developed the framework RNN-Test on the widely deployed framework tensorflow 1.3.0, and evaluated RNN-Test on a computer having Ubuntu 16.04 as the host OS, with an Intel i7-7700HQ@3.6GHz processor of 8 cores, 16GB of memory and an NVIDIA GTX 1070 GPU.
We evaluate RNN-Test on two RNN models processing sequential tasks, including PTB language model of basic LSTM structure as in Fig. (a)a, and a sequence-to-sequence (seq2seq) spell checker model with a bi-direction LSTM in the encoding layer and Bahdanau Attention in the decoding layer. These two models are selected due to their general structures and application contexts.
PTB language model is a popular RNN model on Penn Tree Bank dataset, which is the implementation of . It takes a part of texts as input and predict the subsequent texts, that is, the word after each input word. We trained the word-based PTB model with test perplexity of 117.54 on its ‘small’ config, consistent with the result reported in . This model could be used for text generation, which is to generate new texts similar to the style of the trained text data.
Seq2seq spell checker model receives a sentence with spelling mistakes as input, and outputs the sentence with the mistakes corrected. We trained the character-based model with the sequence loss of 10.1%, similar to 15% they reported. The training data used are twenty popular books from project Gutenberg. We construct 160 test sentences with spelling mistakes like the example sentences they give, thanks to rich sources from Tatoeba.
Research Questions (RQs): We constructed the experiments to answer the following research questions.
Evaluations metrics. To answer RQ1 and RQ2, we also present the the performance of the tested models on the original test set, as well as the set of adversarial inputs obtained by randomly replacing a word/character of each input as the baseline setting. Here we list the performance metrics of the tested models plus with other necessary metrics.
. Test perplexity, the inverse probability of the test set as universe metric for language models, where lower perplexity corresponds to better model.
. Test perplexity of each input on average.
WER. Word error rate, the correlation of predicted outputs with the ground truth as generic metric for seq2seq models, where higher WER means worse predictions.
BLEU. Bilingual evaluation understudy, similar to WER but higher BLEU means better predictions.
gen_rate. Ratio of the test set the method has successfully produced the adversarial input.
orient_rate. Ratio of the generated set obtained by our method, not at random.
suc_rate. Ratio of the generated set the corrected mistakes in the original input appearing in the prediction result of the adversarial input.
norm. Distortion of the perturbation.
Iv-B Effectiveness of the adversary orientation methods (RQ1)
We run each orientation method recording each coverage metric on the tested models for 3 times, the same in the baseline setting, so as to alleviate the uncertainty running each time. Each coverage guidance and each combination of joint objectives are also run 3 times in following assessments. In below presented results, we denote the orientation method by their first word, the coverage guidance by the notation of the definitions in § III-C and their combination as the respective joint objective.
The average results of orientation methods are summarized in Table I and Table II, leaving the achieved coverage and samples of adversarial inputs given in RQ2, convenient for the comparison. In Table I, the set of adversarial inputs always reach higher perplexity than the original test set, inferring that the adversarial inputs could lead the model to expose worse behaviours. Overall, the orientation methods could obtain 1.7% higher perplexity than the baseline setting, but the cost and decision boundary orientations obtain lower perplexity on the whole test set than the baseline, maybe due to the smaller distortion and gen_rate.
For spell checker model, the adversarial inputs obtained by the orientation methods could also reduce the model performance, with higher WER and lower BLEU score. But except the state inconsistency orientation with 100% gen_rate, the cost orientation and decision boundary orientation achieve relatively low gen_rate, with the former 74.29% and the latter 85.71%. Considering the fairness, we boost the gen_rate of each method to be 100% by attempting to modify each character of the targeted input until the adversarial inputs obtained otherwise randomly replaced. The results are summarized in Table II, showing that the orientations could achieve 40.97% and 3.77% higher WER, 4.59% and 0.34% lower BLEU score than the original and baseline setting respectively, and also 7.14% higher suc_rate than the baseline.
The answer to RQ1: The state inconsistency orientation could increase 1.7% more perplexity of the PTB model than baseline. All the orientation methods can achieve higher suc_rate on the spell checker model than baseline, with the state inconsistency orientation always of 100% gen_rate.
Iv-C Effectiveness of the coverage metrics (RQ2)
Effectiveness of coverage guidance for adversarial input generation. Table III provides the results of multiple coverage metrics as guidance for the adversarial input generation on the spell checker model. On the PTB model, only guidance achieves 1.7% higher perplexity than the baseline, whereas others lower than the baseline, not listed here.
In Table III, all the coverage guidances are effective in adversarial input generation and better than the baseline, where and are not implemented due to the model structure. The gate coverage metrics and achieved highest suc_rate with smallest perturbations. Moreover, reached best WER and BELU score with 100% orient_rate while the adapted DX also gained good results.
The enhancement of coverage guidance to adversarial input generation. RNN-Test supports the various joint objectives of the orientations and coverage guidances to search for the better means for adversarial testing. Table IV and Table V present each orientation with two coverage guidances with the highest and suc_rate among all the combinations for the two models respectively.
On both models, the state inconsistency orientation together with guidance achieved better results than other objectives, except the enormous perturbations. As Table I and Table III, they each produced much larger perturbations and become even unusual when combined here. Upon this issue, we have attempted to restrict the perturbations of all the methods by dividing with their respective norm, which make their norm all less than 21. Nevertheless, after restriction, only the state inconsistency orientation combined with guidance still get 1.7% higher perplexity than the baseline but all the others not. This implies that restriction could not be a good choice for other methods on the PTB model. In contrast, the results for the spell checker model vary little after restriction.
Next, on average, the joint objectives could acquire better results than the orientations and the coverage guidances on the PTB model, indicating that the coverage guidances could enhance the performance of the adversarial input generation. Simultaneously, the coverage guidances obtained the highest suc_rate and WER on the spell checker model. Note that, the coverage guidance methods always have the smallest perturbations. Overall, RNN-Test increases the test perplexity by 58.11% than the original setting for PTB model, and acquire adversarial inputs with the success rate of 73.44%, both averaged over the results of all our methods.
Finally, the samples of adversarial inputs on the tested models are listed in Table VI, with each method to modify the same word/character. For the PTB model, different methods tend to generate different words. But for the spell checker model, most of the methods incline to generate the same character, except the method state+, maybe because of the sparse embedding space.
|Objective||PTB model||spell checker model|
Our customized metrics compared with the adapted DX coverage. In the evaluations, the boosting procedure is adopted for redefined DX coverage, not using that (randomly select one uncovered state) of DeepXplore. On the PTB model using DX as guidance, our boosting strategy achieves 35.71% higher perplexity than the random strategy of DeepXplore, even with more states selected. Meanwhile, the adapted DX coverage as guidance performs well for spell checker model and so for both models when combined with the orientations. Finally, the weakness of DX coverage is still evident that the coverage reaches 90% with at most four inputs on the PTB model when taking a higher threshold 0.5 in DeepXplore, thus having bad discrimination over enough inputs.
Correlation of the coverage with adversarial inputs. In previous works, researchers believe that exercising more logics of DNNs could trigger more wrong behaviours. We analysed the correlations of the evaluations metrics and the value of coverage metrics. Based on the acquired data, we could not draw the conclusion that obtaining higher coverage definitely results in more incorrect outputs. Fig. 6 presents the results with the most evident correlations, and most other results are messed up in such figures.
Boosting the coverage. In the coverage boosting module, RNN-Test tries both the boosting procedure and random strategy to select states to cover. In our evaluations, we found that there is no gold rule to increase all the coverage metrics. They each has the advantages over the other for different tested models and coverage metrics, shown in Fig. 7. Nevertheless, in most cases, utilizing the boosting procedure brings about the better testing effectiveness, leading the model to perform worse. It must be claimed that the coverage values strongly depend on the number of test inputs, the same amount of inputs are supposed to be with similar coverage.
Perturbation similarity between orientation maximizing and coverage boosting. Based on the statement that the perturbations generated by the coverage guidance are similar to the orientation search and so not add much, we record perturbation vectors obtained over several same inputs in the experiments. To visualize, we leverage the state-of-the-art high-dimensional reduction technique TSNE to transform the multi-dimensional perturbation vectors to the two-dimensional space, where orientation methods with more data. As Fig. 8 shows, there is no evident similarity of perturbation vectors of the orientations, coverage guidances and joint objectives. Together with observations of the correlation above, we guess the coverage guidance should be used as the unique way for adversarial input generation, not only the goal to improve.
The answer to RQ2: The coverage metrics as guidance are also effective in adversarial input generation, with enhancement to the orientations on the PTB model and best performance on the spell checker model.
Iv-D Improving the RNN models with retraining (RQ3)
For CNN testing, retraining the tested models by augmenting the training set with adversarial inputs could improve the accuracy of the tested models[38, 16]. Inspired by the impressive effects, we tried on the PTB model and incorporated adversarial inputs (82.5 KB) to the open-source training set (5.1 MB), where the adversarial inputs are obtained in the setting of decision boundary orientation. Additionally, the adversarial inputs obtained by the state inconsistency orientation achieved similar results, not listed here.
Table VII presents the perplexity of the PTB model before and after retraining, where the train perplexity indicates the performance on the training set while the valid perplexity for the valid set. Here the data are averaged over 5 times of running the same retraining process with 12 epochs, to mitigate the affects due to the intrinsic indeterminism of neural networks. From column 4 and 7, the results show that the train perplexity of the model after retraining increases by 1.082% whereas the valid perplexity decreases by 1.159% in the end. Moreover, the original test perplexity is 117.53 and that after retraining is 102.75, thus also declined by 12.582%. Notice that even by incorporating fewer adversarial inputs (1.6KB), the valid perplexity still declines by 0.058%.
|original||w. adv.||increment||original||w. adv.||decrement|
Therefore, the adversarial inputs generated for RNN models are proved to have practical use for improving the models. They could alleviate the over-fitting issue of the training process by reducing little train performance, but improving the valid and test performance and thus the robustness of the RNN model.
The answer to RQ3: The adversarial inputs could also be used to improve the performance of RNN models. By augmenting the adversarial inputs of 82.5 KB to the training set of 5.1 MB, the valid and test perplexity of the PTB language model declines by 1.159% and 12.582% respectively.
V Threats to validity.
Though RNN-Test exhibits appreciable effectiveness with the default setting in the evaluations, its performance is inevitably influenced by the parameters, including the scaling degree of the perturbations, the number of states selected to boost and the weights applied to the joint objectives, especially the ways of sections splitting of and . They are worthy to be well explored in the future work, on account of the important roles. Furthermore, the uncertainty running each time still exists in the presented results, owing to different search directions over stochastic targeted word/character, which could be diminished by fixing the target.
In addition, RNN-Test is devoted to be general and scalable for the variants of RNNs, but we could not exhaustively apply the framework to all the variants and their targeted applications. In this paper, the structures of the tested models are general to some extent, but training the spell checker model still costs hard work, due to its bad reproducibility of the training results given. Moreover, the RNN wrapper is designed to avoid interfering with the computation logics of the model, but the adapting efforts may be necessary for some variants with complex structures.
Vi Related Work
Adversarial deep learning. The concept of adversarial attacks was first introduced in . It discovered that DNNs would misclassify the input images by applying imperceptible perturbations, where these mutated inputs are called adversarial examples/inputs. Their work FGSM and the following works[24, 33, 5, 9] generate the adversarial examples by maximizing the prediction error in the gradient-based manner. Multiple trends are then developed, including targeted attacks[5, 35] and non-targeted attacks[33, 45], whitebox attacks[13, 5] and blackbox attacks, defense techniques[11, 15, 20, 37, 41] and methodologies like C&W attacks to construct adversarial attacks particularly against the defense methods, etc.
As metioned before, they are mostly limited to the image classification tasks. Besides, without concerns of covering the computation logics of the models, they are shown to reach low test coverage.
Coverage guided testing of DNN systems. DeepXplore first introduces neuron coverage into deep learning testing, defined over neurons of DNNs with the pre-defined threshold, requiring redefinition for RNNs as discussed earlier. Due to the coarse granularity of neuron coverage, DeepGauge defines more coverage metrics with finer-grained granularity. The key idea is to record the value range of outputs of training data as the major function region and split the region to k, e.g., 1000, sections, also not suitable for narrow value range of RNN states. DeepCT is even fine-grained to measure over combinations of neuron outputs. It is noteworthy that  argues that these works[38, 46, 30] fail to find more adversarial inputs than the adversary-oriented search and not efficiently measure the robustness of models as they reported.
The main difference between these works and ours lies in that, we primarily use the coverage as guidance directly to obtain the adversarial inputs, where the coverage metrics in other works are always the indicator of the effectiveness of their approaches.
Adversarial attacks for recurrent neural networks. Due to the effectiveness of RNNs on the tasks like speech recognition and natural language processing, adversarial attacks are also applied to RNNs to evaluate their robustness. Besides the works[23, 40, 28, 34] adopting the strategies to add, delete or substitute a word/character to construct the adversarial inputs, some methods replace the targeted word with its synonym. Other approaches restrict the directions of perturbations toward the existing words in the input embedding space. In summary, these works are effective but limited to the classification scenarios.
There are few works evaluating the tasks processing sequential outputs. The work first explains the definition of adversarial inputs for RNNs with categorical outputs and sequential outputs, but just presents rough qualitative descriptions that adversarial inputs could result in the change of outputs for evaluations on sequential outputs. Another work TensorFuzz produced adversarial inputs to lead the language model to sample words from blacklist, which is not even specified in the paper. For state-of-the-art adversarial attacks for speech recognition, the perturbations obtained could be applied to the audio waves in a similar way to the images, but still face several unsettled issues. The testing works for sequential outputs, especially texts, are inadequate and leave threats for majority application scenarios with sequential outputs.
We design and implement a generic adversarial testing framework RNN-Test for recurrent neural networks, integrating diverse adversary orientations and coverage metrics customized for RNNs with the support of free combinations. RNN-Test focuses on the main sequential contexts without limit to the classification tasks and first leverages coverage guidance to directly obtain adversarial inputs. For evaluation, RNN-Test effectively generated adversarial inputs to increase the test perplexity of the PTB language model by 58.11% on average, and caused the spell checker model not correcting the mistakes with the success rate of 73.44% averagely. Finally, the adversarial inputs can be employed to retrain the PTB model and decrease its valid perplexity and test perplexity by 1.159% and 12.582% respectively.
-  (2017-June.)(Website) External Links: Cited by: §I, §IV-A.
-  (2016-20–22 Jun) Deep speech 2 : end-to-end speech recognition in english and mandarin. In Proceedings of The 33rd International Conference on Machine Learning, M. F. Balcan and K. Q. Weinberger (Eds.), Proceedings of Machine Learning Research, Vol. 48, New York, New York, USA, pp. 173–182. External Links: Cited by: §I.
-  (2014) Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473. Cited by: §IV-A.
-  (2016) End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316. Cited by: §I.
-  (2017) Towards evaluating the robustness of neural networks. In 2017 38th IEEE Symposium on Security and Privacy (SP), pp. 39–57. Cited by: §III-E, §VI.
-  (2018) Audio adversarial examples: targeted attacks on speech-to-text. arXiv preprint arXiv:1801.01944. Cited by: §VI.
-  (2014) Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078. Cited by: §II-B.
-  (2008) A unified architecture for natural language processing: deep neural networks with multitask learning. In Proceedings of the 25th International Conference on Machine Learning, ICML ’08, New York, NY, USA, pp. 160–167. External Links: Cited by: §I.
-  (2018, Spotlight) Boosting adversarial attacks with momentum. In Proceedings of 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §VI.
-  (2018) DeepCruiser: automated guided testing for stateful deep learning systems. arXiv preprint arXiv:1812.05339. Cited by: §I.
-  (2016) A study of the effect of jpg compression on adversarial images. arXiv preprint arXiv:1608.00853. Cited by: §VI.
-  (2016-Apr.)(Website) External Links: Cited by: §IV-A.
-  (2015) Explaining and harnessing adversarial examples. Computer Science. Cited by: §I, §I, §III-A, §III-D, §III-E, §III-E, §VI.
-  (2013) Speech recognition with deep recurrent neural networks. In 2013 IEEE international conference on acoustics, speech and signal processing, pp. 6645–6649. Cited by: §I.
-  (2017) Countering adversarial images using input transformations. arXiv preprint arXiv:1711.00117. Cited by: §VI.
-  (2018) DLFuzz: differential fuzzing testing of deep learning systems. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 739–743. Cited by: §I, §III-A, §III-E, §IV-D.
-  (1994) Neural networks. Vol. 2, Prentice hall New York. Cited by: §VI.
-  (2016) Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778. Cited by: §I.
-  (2012-11) Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Processing Magazine 29, pp. 82–97. External Links: Cited by: §I.
-  (2015) Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531. Cited by: §VI.
-  (1997) Long short-term memory. Neural Computation 9 (8), pp. 1735–1780. Cited by: §II-B.
-  (2012) ImageNet classification with deep convolutional neural networks. In International Conference on Neural Information Processing Systems, Cited by: §I.
-  (2018) Adversarial examples for natural language classification problems. Cited by: §I, §VI.
-  (2017) Adversarial examples in the physical world. In Proceedings of the 2nd International Conference on Learning Representations, Cited by: §VI.
-  (2015) Deep learning. nature 521 (7553), pp. 436. Cited by: §III-E.
-  (1995) Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks 3361 (10), pp. 1995. Cited by: §I.
-  (2019) Structural coverage criteria for neural networks could be misleading. Cited by: §IV-C, §VI.
-  (2017) Deep text classification can be fooled. arXiv preprint arXiv:1704.08006. Cited by: §VI.
-  (2018) Deepgauge: multi-granularity testing criteria for deep learning systems. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, pp. 120–131. Cited by: §I, §III-B, §VI.
-  (2018) Combinatorial testing for deep learning systems. arXiv preprint arXiv:1806.07723. Cited by: §VI.
-  (2008) Visualizing data using t-sne. Journal of machine learning research 9 (Nov), pp. 2579–2605. Cited by: §IV-C.
-  (2011) Extensions of recurrent neural network language model. In 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5528–5531. Cited by: §I.
-  (2016) Deepfool: a simple and accurate method to fool deep neural networks. In Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §I, §III-D, §VI.
-  (2018) Tensorfuzz: debugging neural networks with coverage-guided fuzzing. arXiv preprint arXiv:1807.10875. Cited by: §I, §VI, §VI.
-  (2016) The limitations of deep learning in adversarial settings. In Security and Privacy (EuroS&P), 2016 IEEE European Symposium on, pp. 372–387. Cited by: §I, §VI.
-  (2016) Crafting adversarial input sequences for recurrent neural networks. In Military Communications Conference, MILCOM 2016-2016 IEEE, pp. 49–54. Cited by: §I, §VI.
-  (2016) Distillation as a defense to adversarial perturbations against deep neural networks. In 2016 IEEE Symposium on Security and Privacy (SP), pp. 582–597. Cited by: §VI.
-  (2017) Deepxplore: automated whitebox testing of deep learning systems. In Proceedings of the 26th Symposium on Operating Systems Principles, pp. 1–18. Cited by: §I, §I, §IV-D, §VI, §VI.
-  (1999) A recurrent neural network that learns to count. Connection Science 11 (1), pp. 5–40. Cited by: §I.
-  (2017) Towards crafting text adversarial samples. arXiv preprint arXiv:1707.02812. Cited by: §I, §VI.
-  (2018) Regularizing deep networks using efficient layerwise adversarial training. In Thirty-Second AAAI Conference on Artificial Intelligence, Cited by: §VI.
-  (2018) Interpretable adversarial perturbation in input embedding space for text. arXiv preprint arXiv:1805.02917. Cited by: §I, §VI.
-  (2017) Deep learning in medical image analysis. Annual review of biomedical engineering 19, pp. 221–248. Cited by: §I.
-  (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. Cited by: §I.
-  (2017) One pixel attack for fooling deep neural networks. arXiv preprint arXiv:1710.08864. Cited by: §VI.
-  (2018) Testing deep neural networks. arXiv preprint arXiv:1803.04792. Cited by: §I, §VI.
-  (2016-06) Rethinking the inception architecture for computer vision. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §I.
-  (2013) Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199. Cited by: §I, §VI.
-  (2006)(Website) External Links: Cited by: §IV-A.
-  (2016-Sep.)(Website) External Links: Cited by: §I, §IV-A.
-  (2014) Recurrent neural network regularization. arXiv preprint arXiv:1409.2329. Cited by: §II-B, §IV-A.