Say What I Want: Towards the Dark Side of Neural Dialogue Models

Say What I Want: Towards the Dark Side
of Neural Dialogue Models

Haochen Liu
Michigan State University
&Tyler Derr
Michigan State University
Zitao Liu
&Jiliang Tang
Michigan State University

Neural dialogue models have been widely adopted in various chatbot applications because of their good performance in simulating and generalizing human conversations. However, there exists a dark side of these models – due to the vulnerability of neural networks, a neural dialogue model can be manipulated by users to say what they want, which brings in concerns about the security of practical chatbot services. In this work, we investigate whether we can craft inputs that lead a well-trained black-box neural dialogue model to generate targeted outputs. We formulate this as a reinforcement learning (RL) problem and train a Reverse Dialogue Generator which efficiently finds such inputs for targeted outputs. Experiments conducted on a representative neural dialogue model show that our proposed model is able to discover such desired inputs in a considerable portion of cases. Overall, our work reveals this weakness of neural dialogue models and may prompt further researches of developing corresponding solutions to avoid it.

1 Introduction

Dialogue system, also known as conversational AI, which aims to conduct human-like conversations with users, is receiving increasing attention from both the industry and the academic research community. In the past, such systems either rely on intricate hand-crafted rules Weizenbaum (1966); Goddeau et al. (1996), or depend on a complicated processing pipeline including a series of functional modules Mott et al. (2004). Meanwhile, retrieval-based methods Banchs (2012); Ameixa et al. (2014); Lu and Li (2013); Hu et al. (2015), which search a suitable response from a repository given the query, are also adopted in many application scenarios. These methods are able to provide natural, human-like responses, but fail to generate novel responses out of the range of the repository Song et al. (2016). Recently, researchers begin to involve deep learning techniques in building fully data-driven and end-to-end dialogue systems Gao et al. (2019), which are referred to as neural dialogue models.

Based on the Seq2Seq framework Sutskever et al. (2014), these neural models Sordoni et al. (2015); Vinyals and Le (2015); Shang et al. (2015); Serban et al. (2016, 2017) have achieved surprising performances and gradually dominate the field of dialogue generation. First, these models are easy to train. Instead of designing complicated rules or modularized pipelines, the models can learn the mapping between queries and responses automatically from massive existing dialogue pairs Vinyals and Le (2015). Second, given that the neural models are trained on large-scale human conversation data, they show a strong generalization ability that they can handle open-domain conversations rather than restricting the topics in a narrow domain Vinyals and Le (2015); Zhou et al. (2018). In addition, neural dialogue models can provide fluent and smooth responses, and show the intelligence of performing simple common sense reasoning Vinyals and Le (2015). Since neural dialogue models achieve a breakthrough of conducting reasonable and engaging human-like conversations, they are widely adopted by the industry as a core component of practical chatbot applications, such as Microsoft XiaoIce Zhou et al. (2018).

While the research community is delighted with the success of neural dialogue models, there is a dark side of these models. Given that the internal mechanisms of neural networks are not explicitly interpretable, neural dialogue models are vulnerable. For example, they may have unpredictable behaviors with regard to some well-crafted inputs Szegedy et al. (2014); Yuan et al. (2017). This vulnerability can cause a series of problems, one of which is, whether we can manipulate a dialogue agent to say what we want. In other words, can we find a well-designed input, to induce the agent to provide a desired output? If this is possible, people with ulterior motives may take advantage of this weakness of the chatbots to guide them say something malicious or sensitive, causing adverse social impacts Wolf et al. (2017); Price (2016); He and Glass (2018).

In this paper, we want to study this dark side by seeking an answer to the question – whether we can design an algorithm that can automatically generate inputs that lead a state-of-the-art black-box neural dialogue model to “say what I want”. However, this presents tremendous challenges. First, unlike similar works such as Chen et al. (2018), where the authors try to craft inputs for a neural image captioning model to output targeted sentences, our problem involves discrete inputs (i.e. texts rather than images) and treats the model as a black-box (since the setting is more realistic). Thus, the traditional optimization method that finds the inputs by the guidance of gradient information is completely invalid. Second, when trying to manipulate a dialogue system released by others, it is impractical for us to interact with it for unlimited times. Based on this point, brute-force search methods cannot be adopted, and the number of the interactions with the black-box model that we need to find an input for a targeted output should be restricted to a reasonable level.

To address the above challenges, for a given black-box neural dialogue model, we propose to train a corresponding Reverse Dialogue Generator, which takes a targeted response as input and automatically outputs a query that leads the dialogue model to return that response. The proposed Reverse Dialogue Generator is based on a Seq2Seq model and performs as a reinforcement learning (RL) agent. The black-box dialogue model is regarded as the environment the agent interacts with. It is optimized through policy gradients, with the similarity between the targeted outputs and what the dialogue model outputs with regard to a crafted input as the reward signal. Extensive experiments conducted on a public well-trained neural dialogue model demonstrate the capacity of our model.

2 Related Works

Basically, our work is related to the problem of model attacks. Although deep learning models have been used for many tasks that have shown to be useful across a plethora of domains, more recently, researchers have become aware to the fact that although these systems perform extremely well when in a perfect and stable environment (the type they were designed in), but when placed in the real world, they are quite easily susceptible to being attacked. Szegedy et al. Szegedy et al. (2014) first investigates the vulnerability of DNN-based image classifiers by crafting adversarial examples with imperceptible perturbations that lead the classifier to make mistakes. Besides, Sharif et al. Sharif et al. (2016) focus on attacking face recognition models; Xie et al. Xie et al. (2017) try to find the weakness of an object detection system; in Kos et al. (2018); Tabacof et al. (2016), researchers study the robustness of generative models, and novel methods to make adversarial examples for deep reinforcement learning based models are introduced in Huang et al. (2017); Kos and Song (2017).

Adversarial attacks on deep learning models for NLP tasks also attract a lot of interest. Attacking NLP models are more challenging since the inputs are discrete texts instead of continuous values such as image inputs Zhang et al. (2019). Text classification problems are studied in Lei et al. (2018); Liang et al. (2018); sentiment analysis is involved in Iyyer et al. (2018); grammar error detection is investigated in Sato et al. (2018). And Belinkov et al. Belinkov and Bisk (2018) try to fool a machine translation system, while Chan et al. Chan et al. (2018) study how to attack a neural reading comprehension model. Besides, more works can be found in the survey Zhang et al. (2019).

For dialogue generation task, Wieting et al. Wieting et al. (2016) explore the over-sensitivity and over-stability of neural dialogue models by using some heuristic techniques to modify original inputs and observe the corresponding outputs. They evaluate the robustness of dialogue models by checking whether the outputs change significantly after the modifications on the inputs but don’t consider targeted outputs. The work which is most related to ours is He and Glass (2018), where the authors try to find trigger inputs which can lead a neural dialogue model to output a list of targeted egregious responses. Different from our work, they treat the dialogue model as a white-box and take advantage of the model structure and parameters. By comparison, our black-box setting is more challenging. Furthermore, their algorithm indeed fails to lead the dialogue model to output the exact targeted responses. It should be pointed out that similar with He and Glass (2018), our work is not a model attack task in the true sense since we only focus on the requirements of the outputs but don’t force the inputs to be close to any original inputs. However, our problems should be solved following the same ideas as adversarial attack problems.

By the way, nevertheless Cheng et al. Cheng et al. (2018) don’t focus on dialogue generation, they also try to manipulate a seq2seq model to generate texts with certain restrictions. However, their non-overlapping and targeted keywords settings are looser than ours where a whole targeted response is required to be output, and this work is under the white-box assumption.

3 Reverse Dialogue Generator

In this work, we consider the specific neural dialogue model of our interest as a black-box environment. The dialogue model is able to take an input sentence and output a corresponding response sentence . Now given a targeted output , our goal is to find a well-designed input , which leads the dialogue model to output a response that is exactly the same as (), or at least similar with in semantics (). To achieve it, we build a Reverse Dialogue Generator agent , which takes the targeted output as input to predict its corresponding input . A sketch of this agent-environment setup is shown in Figure 1.

Figure 1: The agent-environment setup of the proposed framework.

3.1 The Dialogue Model Environment

In this work, we adopt the classical Seq2Seq neural dialogue model Vinyals and Le (2015) as the dialogue model environment. In the model, the encoder and the decoder are implemented by two independent RNN structures or its variants such as LSTM Hochreiter and Schmidhuber (1997) and GRU Cho et al. (2014). The encoder reads the input sentence word by word and encodes it as a low-dimensional latent representation, which is then fed into the decoder to predict the corresponding response sentence word by word. We assume the environment is a black-box; once the model is well-trained, we keep it sealed and the agent has no access to its structures, parameters or gradients.

Thanks to its strong power in the simulation and generalization of human conversations, Seq2Seq-based neural dialogue generator has been applied as an important component of practical chatbot services Zhou et al. (2018). Thus, it is representative to be regarded as the object of study. However, due to the black-box setting of the dialogue model environment, we can manipulate other neural dialogue models such as HRED Serban et al. (2016), VHRED Serban et al. (2017), etc., indiscriminately through our proposed method. We leave the investigation into other choices as one future work.

3.2 The Reverse Dialogue Generator Model

In order to find a corresponding input for a given targeted output, it is intuitive to formulate this problem as an optimization problem where we need to find an optimal input so that the similarity between the output and the targeted output are maximized. However, the standard gradient-based back-propagation approach is not applicable to this scenario. Because the input consists of discrete tokens instead of continuous values, we cannot directly use gradient information to adjust it. If we make adjustments with regard to the word embeddings, this method will generate results that cannot be matched with any valid words in the word embedding space Zhang et al. (2019). What’s more, in our black-box setting, we are not allowed to get the gradient information from .

Therefore, in our work, we formulate this problem as a reinforcement learning problem. A generative agent , which is called Reverse Dialogue Generator, is trained to craft an input for a given targeted output , with the aim to maximize the reward, that is, the similarity between the current output and the targeted output . The agent treats the input generation process as a decision-making process, and it is trained through policy gradient with the guidance of the reward signals.

An RNN language model can be adopted as the agent . However, we have to retrain it for each targeted output to obtain the desired input, which requires a great number of interactions with the for every single targeted output. It is unachievable in reality when hackers try to manipulate a chatbot service. Instead, we adopt a classical Seq2Seq structure as the agent , which takes a targeted output as input and crafts the desired input. It is trained offline, and when deployed in practical applications, it is able to generate the corresponding inputs for lots of targeted outputs automatically. The details of its training process are described in Section 4.2.

4 Training

In this section, we detail the training process of the dialogue model environment and the Reverse Dialogue Generator model .

4.1 The Dialogue Model Environment

The Seq2Seq dialogue model is trained on a large-scale human conversation data collected on Twitter in a supervised manner with the aim to minimize the negative log-likelihood of the ground truth sentence given inputs. Afterward, it is treated as the black-box environment and its parameters are not further updated. The details of the implementation can be found in Section 5.1.1.

4.2 RL Training of Reverse Dialogue Generator Model

In this subsection, we detail the RL training process of the proposed Reverse Dialogue Generator agent . In summary, the agent (Reverse Dialogue Generator ) interacts with the environment (the dialogue model ). Given a state (the targeted output ), the agent takes an action (generating an input ) following a policy (defined by the Seq2Seq model in the Reverse Dialogue Generator), and then receives a reward (the similarity between the targeted output and current output ) from the environment. Afterward, the policy is updated to maximize the reward.

Next, we will introduce the environment, state, policy, action, and reward in detail.

4.2.1 Environment

The environment is the black-box dialogue model . When fed into an input , it returns an output . The input and the output are two dialogue utterances, which consist of a sequence of words with variable length.

4.2.2 State

A state is denoted as the targeted output , that is, the input of the Reverse Dialogue Generator.

4.2.3 Policy and Action

The policy is defined by the Seq2Seq model in the agent and its parameters. The Seq2Seq model forms a stochastic policy which assigns a probability to any possible input :


where is the length of the input and is the -th word.

The action is defined as the input to generate. When observing the state , the agent generates an input based on the distribution predicted by in Equation (1). In the training phase, the input is chosen by stochastic sampling. And in the test phase, the input can be chosen in a greedy manner or through beam search.

4.2.4 Reward

Recall that our goal is to train the agent to craft an input for the dialogue model to return an output that is as similar with the targeted output as possible. Let be the actual output returned by the dialogue model given a crafted input . We directly use the similarity between the targeted output and the current output as the reward for the input selected by .

We adopt the embedding average metric to measure the similarity. This metric has been frequently used in many NLP domains, such as textual similarity tasks Wieting et al. (2016), since it’s able to measure the similarity of two sentences in semantic level rather than simply consider the amount of word-overlap. The embedding average approach first computes the sentence-level embedding of a sentence by taking the average of the embeddings of all the constituent words in it: , and then the similarity between two sentences is defined as the cosine similarity of their corresponding sentence-level embeddings. Given a crafted input and the targeted output , we formally define the reward as the similarity between the current output and the targeted output :


4.2.5 Optimization

With the reward function defined above, the objective function that the agent aims to maximize can be formulated as follows:


The accurate value of in Equation (3) is very difficult to obtain in practice. Therefore, previous works have proposed many methods to estimate it and its gradient, which is then used to update the parameters of the policy (i.e., ).

To optimize the objective in Equation (3), we apply the widely used REINFORCE algorithm Williams (1992), where Monte-Carlo sampling is applied to estimate . Specifically,


With the obtained gradient , the parameters of the policy network can be updated as follows:


where is the learning rate. Thus, the REINFORCE algorithm for updating the policy can be summarized as: For each targeted output , we first sample inputs according to the distribution . Then we estimate the rewards of the sampled inputs and calculate the gradient. Finally, we update the parameters of the policy network.

5 Experiments

We conduct extensive experiments to evaluate the effectiveness of the proposed Reverse Dialogue Generator. We measure the success rates of the proposed model in seeking out the desired inputs towards various targeted outputs and explore the performance of the model under various settings. In this section, we will go into details about the experimental settings and results.

5.1 Experimental Settings

5.1.1 The Dialogue Model

To ensure the reproducibility of the experiments, we directly adopt the well-trained Seq2Seq dialogue model released on the dialogue system research software platform ParlAI Miller et al. (2017) as the dialogue model environment 111 The implementation of the dialogue model is detailed as follows. In the Seq2Seq model, both the encoder and the decoder are implemented by 3-layer LSTM networks with hidden states of size 1024. As the standard practice, the initial hidden state of the decoder is set as the same as the last hidden state of the encoder. The vocabulary size is 30,000. Pre-trained Glove word vectors Pennington et al. (2014) are used to initialize the word embeddings whose dimension is set as 300. The model had been trained through stochastic gradient descent (SGD) with a learning rate of 1.0 on 2.5 million Twitter single-turn dialogues. In the training process, the dropout rate and gradient clipping value are both set to be 0.1. It should be pointed out again that in the following experiments, this dialogue model is treated as a black-box which takes an input sentence and output a response sentence.

5.1.2 The Reverse Dialogue Generator

As described above, we adopt a Seq2Seq structure as the Reverse Dialogue Generator. Two 2-layer LSTM networks with the hidden size of 1,024 are applied as the encoder and the decoder respectively. The vocabulary size is set to be 60,000, and all the size of word embeddings is 300. The word embeddings are randomly initialized and fine-tuned during the training process. The last hidden state of the encoder is treated as the context vector which is used to initialize the hidden state of the decoder.

In order to train the Reverse Dialogue Generator and build the target output lists (see Section 5.2), we use real human dialogue data from a public Twitter corpus222"twitter_en big" in the repository which contains around 5M tweets scraped from Twitter. We note that this corpus is different from the dialogue dataset that is used to train the dialogue model.

Pre-training. Before RL training, we first initialize the agent by pre-training it on output-input pairs in a reference corpus in a supervised learning manner. We build the reference corpus by randomly sampling 160K sentences from the Twitter corpus, and feeding them into the dialogue model, resulting in 160K output-input pairs. Specifically, in the pre-training process, the model is optimized by the standard SGD algorithm with the initial learning rate of 20. At the end of each epoch, if the loss doesn’t decrease in the validation set, the learning rate is reduced with a decay rate of 0.25. And the batch size is 16. In addition, to prevent overfitting issues, we apply the dropout with the rate of 0.1 and gradient clipping with clip-value being 0.25.

RL training. In the RL training process, all the parameters in the model are optimized as Equation (5) through Adam optimizer with an initial learning rate of 0.001. The 160K outputs in the reference corpus are used as the targeted outputs for RL training. The batch size, dropout rate, and gradient clipping value are set the same as those in the pre-training process. When calculating the rewards (i.e. the embedding similarities), we only consider the tokens out of a fixed stopword list which consists of common punctuations and special symbols. Pre-trained Google news word embeddings333 are adopted to compute the similarities. And in order to accelerate the convergence of the model, we set all the rewards less than 0.5 to be 0, and the others remain the original values.

Decoding. In the test phase, the greedy method and beam search can be used for decoding. And we empirically find that if we first get candidates with top- scores in beam search and then feed them into the dialogue model, treat the one with the highest reward as the crafted input, the performance improves significantly. In the experiments, we report the results of greedy decoding of the pre-trained model (Pre-trained Greedy), greedy decoding of the RL-trained model (RL Greedy), and beam search of the RL-trained model with candidates (RL BeamSearch()).

Figure 2: Success rates of the pre-trained model and the RL-trained model with different decoding methods. The upper row and the lower row show the results on the Generated target list and the Real target list with various lengths respectively.

5.2 Experimental Results

We conduct experiments on two types of target output lists. To construct the Generated target output list, we first feed 10k human utterances (have no overlap with the reference corpus) from the Twitter corpus into the dialogue model to get 10k generated responses and then randomly sample 200 responses as the targets in length 1-3, 4-6 and 7-10, respectively. The Real target output list is obtained by randomly sampling sentences directly from the Twitter corpus. The number of target outputs in each length group is also 200.

Given a targeted output, when the proposed model finds an input which leads to an output whose similarity to the targeted one is above a preset threshold, we say the manipulation is successful. Figure 2 shows the success rates of our proposed model for manipulating the Twitter dialogue model in two experimental settings. The figures show how success rates vary with different thresholds. First of all, from the figures, we can see that for both the Generated and the Real target lists, RL-based model with beam search can achieve a success rate around with a score . Especially for more than around Generated targets with length greater than or equal to , we can find desired inputs that lead to a similarity score above . Second, we can see that in most of the scenarios, RL Greedy obviously outperforms Pre-trained Greedy, which demonstrates that using reward signals to fine-tune the policy network through RL significantly strengthens its capability for crafting desired inputs. Besides, compared with greedy decoding method, beam search with multiple candidates improves the success rates a lot. And compared with , the beam search methods with candidates improve the performance slightly, which means that interacting with the dialogue model for a reasonably small number of times can guarantee a considerable success rate. Another key observation is that the model performs significantly better on the Generated target list than on the Real target output list. Actually, the neural dialogue models suffer from the safe response problem Li et al. (2016). Such models tend to offer generic responses to diverse inputs, which makes it hard for the model to provide a specific targeted response (often seen in real human conversations).

Length 1-3 4-6 7-10
Real Input 0.439 0.518 0.566
Pre-trained Greedy 0.446 0.529 0.559
RL Greedy 0.486 0.560 0.588
RL BeamSearch(50) 0.599 0.678 0.709
RL BeamSearch(200) 0.621 0.694 0.726
Table 1: Average embedding similarity scores between the output and the target output in terms of Real target output list.

In order to further demonstrate the effectiveness of the proposed framework, for each Real targeted output, we feed its corresponding real inputs in the corpus into the dialogue model to check how similar the output responses and the target ones are. We calculate these similarity scores for each Real targeted output for the real inputs and the inputs found by the proposed model and report the average value in Table 1. According to the table, we observe that even inputting the real inputs, the similarity scores between the outputs and the target outputs are not high. Besides, with the crafted inputs from the proposed framework, these similarity scores are significantly improved. For example, for RL BeamSearch(200), the similarity is improved by , and for the target outputs with length 1-3, 4-6 and 7-10, respectively.

Inputs Responses Similarity
soo calm . you should be nervous . Target: i ’ m just trying to be a good person
Output: i ’ m not . i ’ m just trying to be a 0.952
better person .
guess she’ll be invited . Target: i ’ m sure she ’ ll be fine . 0.946
Output: i ’ m sure she ’ ll be a good one .
neither ready pls Target: i ’ m not ready for this 1.0
Output: i ’ m not ready for this
how is nephew ? Target: he ’ s a good guy 0.982
Output: he ’ s good . he ’ s a good guy .
you weren’t invited . Target: i was there 1.0
Output: i was there .
Table 2: Case Study. The first column shows the inputs found by RL BeamSearch(200) according to given target outputs. The middle column shows the target outputs and the outputs generated by the dialogue model to the inputs. The column on the right side shows the embedding similarity score between outputs and target outputs.

Table 2 shows five examples in the manipulating experiments. The first three target outputs are from the Generated target output list, while the other two are from the Real target list. Given those target outputs, desired inputs are successfully crafted. Each of them leads to an output of the dialogue model similar or equal to the target one, evaluating by the embedding similarity measurement. Besides, unlike some related works He and Glass (2018); Cheng et al. (2018) where crafted text inputs are ungrammatical and meaningless, the inputs generated by our model are smooth and natural utterances. This is because the decoder of the Seq2Seq model in the Reverse Dialogue Generator serves as a language model, which guarantees the smoothness of the generated inputs.

6 Conclusion

Recently, dialogue systems are being integrated into our daily lives at a quite rapid pace. In the practical implementation of dialogue systems, neural dialogue models play an important role. However, recent concerns have risen for neural models across all domains as to whether they can be manipulated (in most cases, by crafted adversarial examples), which inspires us to examine the same problem of neural dialogue models. Our work reveals a dark side of such models that they are likely to be manipulated to "say what I want" – by finding well-designed inputs, we can induce the dialogue agent to provide desired outputs. We propose a reinforcement learning based Reverse Dialogue Generator which learns to craft such inputs automatically in the process of interacting with the black-box neural dialogue model. Extensive experiments on a representative neural dialogue model demonstrate the effectiveness of our proposed model and show that dialogue systems used in our daily lives can indeed be manipulated, which is a warning about the security of dialogue systems for both the research community and the industry.

For future works, we plan to extend the current framework to other sequence models. Besides, in this work, we examine the security problem of dialogue systems. In future works, we will also investigate concerns about the privacy of them, specifically, the possibility of dialogue systems to leak the sensitive information of users.


  • [1] D. Ameixa, L. Coheur, P. Fialho, and P. Quaresma (2014) Luke, I am your father: dealing with out-of-domain requests by using movies subtitles. In Intelligent Virtual Agents - 14th International Conference, IVA 2014, Boston, MA, USA, August 27-29, 2014. Proceedings, pp. 13–21. External Links: Link, Document Cited by: §1.
  • [2] R. E. Banchs (2012) Movie-dic: a movie dialogue corpus for research and development. In The 50th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, July 8-14, 2012, Jeju Island, Korea - Volume 2: Short Papers, pp. 203–207. External Links: Link Cited by: §1.
  • [3] Y. Belinkov and Y. Bisk (2018) Synthetic and natural noise both break neural machine translation. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings, External Links: Link Cited by: §2.
  • [4] A. Chan, L. Ma, F. Juefei-Xu, X. Xie, Y. Liu, and Y. S. Ong (2018) Metamorphic relation based adversarial attacks on differentiable neural computer. CoRR abs/1809.02444. External Links: Link, 1809.02444 Cited by: §2.
  • [5] H. Chen, H. Zhang, P. Chen, J. Yi, and C. Hsieh (2018) Attacking visual language grounding with adversarial examples: A case study on neural image captioning. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, July 15-20, 2018, Volume 1: Long Papers, pp. 2587–2597. External Links: Link Cited by: §1.
  • [6] M. Cheng, J. Yi, H. Zhang, P. Chen, and C. Hsieh (2018) Seq2Sick: evaluating the robustness of sequence-to-sequence models with adversarial examples. CoRR abs/1803.01128. External Links: Link, 1803.01128 Cited by: §2, §5.2.
  • [7] K. Cho, B. van Merrienboer, D. Bahdanau, and Y. Bengio (2014) On the properties of neural machine translation: encoder-decoder approaches. In Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. External Links: Link Cited by: §3.1.
  • [8] J. Gao, M. Galley, and L. Li (2019) Neural approaches to conversational AI. Foundations and Trends in Information Retrieval 13 (2-3), pp. 127–298. External Links: Link, Document Cited by: §1.
  • [9] D. Goddeau, H. M. Meng, J. Polifroni, S. Seneff, and S. Busayapongchai (1996) A form-based dialogue manager for spoken language applications. In The 4th International Conference on Spoken Language Processing, Philadelphia, PA, USA, October 3-6, 1996, External Links: Link Cited by: §1.
  • [10] T. He and J. Glass (2018) Detecting egregious responses in neural sequence-to-sequence models. CoRR abs/1809.04113. External Links: Link, 1809.04113 Cited by: §1, §2, §5.2.
  • [11] S. Hochreiter and J. Schmidhuber (1997) Long short-term memory. Neural computation 9 (8), pp. 1735–1780. Cited by: §3.1.
  • [12] B. Hu, Z. Lu, H. Li, and Q. Chen (2015) Convolutional neural network architectures for matching natural language sentences. CoRR abs/1503.03244. External Links: Link, 1503.03244 Cited by: §1.
  • [13] S. Huang, N. Papernot, I. J. Goodfellow, Y. Duan, and P. Abbeel (2017) Adversarial attacks on neural network policies. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Workshop Track Proceedings, External Links: Link Cited by: §2.
  • [14] M. Iyyer, J. Wieting, K. Gimpel, and L. Zettlemoyer (2018) Adversarial example generation with syntactically controlled paraphrase networks. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2018, New Orleans, Louisiana, USA, June 1-6, 2018, Volume 1 (Long Papers), pp. 1875–1885. External Links: Link Cited by: §2.
  • [15] J. Kos, I. Fischer, and D. Song (2018) Adversarial examples for generative models. In 2018 IEEE Security and Privacy Workshops, SP Workshops 2018, San Francisco, CA, USA, May 24, 2018, pp. 36–42. External Links: Link, Document Cited by: §2.
  • [16] J. Kos and D. Song (2017) Delving into adversarial attacks on deep policies. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Workshop Track Proceedings, External Links: Link Cited by: §2.
  • [17] Q. Lei, L. Wu, P. Chen, A. G. Dimakis, I. S. Dhillon, and M. Witbrock (2018) Discrete attacks and submodular optimization with applications to text classification. CoRR abs/1812.00151. External Links: Link, 1812.00151 Cited by: §2.
  • [18] J. Li, M. Galley, C. Brockett, J. Gao, and B. Dolan (2016) A diversity-promoting objective function for neural conversation models. In NAACL HLT 2016, The 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego California, USA, June 12-17, 2016, pp. 110–119. External Links: Link Cited by: §5.2.
  • [19] B. Liang, H. Li, M. Su, P. Bian, X. Li, and W. Shi (2018) Deep text classification can be fooled. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI 2018, July 13-19, 2018, Stockholm, Sweden., pp. 4208–4215. External Links: Link, Document Cited by: §2.
  • [20] Z. Lu and H. Li (2013) A deep architecture for matching short texts. In Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013, Lake Tahoe, Nevada, United States., pp. 1367–1375. External Links: Link Cited by: §1.
  • [21] A. H. Miller, W. Feng, D. Batra, A. Bordes, A. Fisch, J. Lu, D. Parikh, and J. Weston (2017) ParlAI: A dialog research software platform. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, Copenhagen, Denmark, September 9-11, 2017 - System Demonstrations, pp. 79–84. External Links: Link Cited by: §5.1.1.
  • [22] B. W. Mott, J. C. Lester, and K. Branting (2004) Conversational agents. In The Practical Handbook of Internet Computing., External Links: Link, Document Cited by: §1.
  • [23] J. Pennington, R. Socher, and C. Manning (2014) Glove: global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp. 1532–1543. Cited by: §5.1.1.
  • [24] R. Price (2016) Microsoft is deleting its ai chatbot’s incredibly racist tweets. Business Insider. Cited by: §1.
  • [25] M. Sato, J. Suzuki, H. Shindo, and Y. Matsumoto (2018) Interpretable adversarial perturbation in input embedding space for text. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI 2018, July 13-19, 2018, Stockholm, Sweden., pp. 4323–4330. External Links: Link, Document Cited by: §2.
  • [26] I. V. Serban, A. Sordoni, Y. Bengio, A. C. Courville, and J. Pineau (2016) Building end-to-end dialogue systems using generative hierarchical neural network models. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, February 12-17, 2016, Phoenix, Arizona, USA., pp. 3776–3784. External Links: Link Cited by: §1, §3.1.
  • [27] I. V. Serban, A. Sordoni, R. Lowe, L. Charlin, J. Pineau, A. C. Courville, and Y. Bengio (2017) A hierarchical latent variable encoder-decoder model for generating dialogues. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4-9, 2017, San Francisco, California, USA., pp. 3295–3301. External Links: Link Cited by: §1, §3.1.
  • [28] L. Shang, Z. Lu, and H. Li (2015) Neural responding machine for short-text conversation. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, ACL 2015, July 26-31, 2015, Beijing, China, Volume 1: Long Papers, pp. 1577–1586. External Links: Link Cited by: §1.
  • [29] M. Sharif, S. Bhagavatula, L. Bauer, and M. K. Reiter (2016) Accessorize to a crime: real and stealthy attacks on state-of-the-art face recognition. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, October 24-28, 2016, pp. 1528–1540. External Links: Link, Document Cited by: §2.
  • [30] Y. Song, R. Yan, X. Li, D. Zhao, and M. Zhang (2016) Two are better than one: an ensemble of retrieval- and generation-based dialog systems. CoRR abs/1610.07149. External Links: Link, 1610.07149 Cited by: §1.
  • [31] A. Sordoni, M. Galley, M. Auli, C. Brockett, Y. Ji, M. Mitchell, J. Nie, J. Gao, and B. Dolan (2015) A neural network approach to context-sensitive generation of conversational responses. In NAACL HLT 2015, The 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Denver, Colorado, USA, May 31 - June 5, 2015, pp. 196–205. External Links: Link Cited by: §1.
  • [32] I. Sutskever, O. Vinyals, and Q. V. Le (2014) Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada, pp. 3104–3112. External Links: Link Cited by: §1.
  • [33] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. J. Goodfellow, and R. Fergus (2014) Intriguing properties of neural networks. In 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, External Links: Link Cited by: §1, §2.
  • [34] P. Tabacof, J. Tavares, and E. Valle (2016) Adversarial images for variational autoencoders. CoRR abs/1612.00155. External Links: Link, 1612.00155 Cited by: §2.
  • [35] O. Vinyals and Q. V. Le (2015) A neural conversational model. CoRR abs/1506.05869. External Links: Link, 1506.05869 Cited by: §1, §3.1.
  • [36] J. Weizenbaum (1966) ELIZA - a computer program for the study of natural language communication between man and machine. Commun. ACM 9 (1), pp. 36–45. External Links: Link, Document Cited by: §1.
  • [37] J. Wieting, M. Bansal, K. Gimpel, and K. Livescu (2016) Towards universal paraphrastic sentence embeddings. In 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings, External Links: Link Cited by: §2, §4.2.4.
  • [38] R. J. Williams (1992) Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning 8 (3-4), pp. 229–256. Cited by: §4.2.5.
  • [39] M. J. Wolf, K. W. Miller, and F. S. Grodzinsky (2017) Why we should have seen that coming: comments on microsoft’s tay "experiment, " and wider implications. SIGCAS Computers and Society 47 (3), pp. 54–64. External Links: Link, Document Cited by: §1.
  • [40] C. Xie, J. Wang, Z. Zhang, Y. Zhou, L. Xie, and A. L. Yuille (2017) Adversarial examples for semantic segmentation and object detection. In IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 1378–1387. External Links: Link, Document Cited by: §2.
  • [41] X. Yuan, P. He, Q. Zhu, R. R. Bhat, and X. Li (2017) Adversarial examples: attacks and defenses for deep learning. CoRR abs/1712.07107. External Links: Link, 1712.07107 Cited by: §1.
  • [42] W. E. Zhang, Q. Z. Sheng, and A. A. F. Alhazmi (2019) Generating textual adversarial examples for deep learning models: A survey. CoRR abs/1901.06796. External Links: Link, 1901.06796 Cited by: §2, §3.2.
  • [43] L. Zhou, J. Gao, D. Li, and H. Shum (2018) The design and implementation of xiaoice, an empathetic social chatbot. CoRR abs/1812.08989. External Links: Link, 1812.08989 Cited by: §1, §3.1.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description