Negative Training for Neural Dialogue Response Generation

Negative Training for Neural Dialogue Response Generation

Tianxing He
CSAIL, MIT
cloudygoose@csail.mit.edu
&James Glass
CSAIL, MIT
glass@mit.edu
Abstract

Although deep learning models have brought tremendous advancements to the field of open-domain dialogue response generation, recent research results have revealed that the trained models have undesirable generation behaviors, such as malicious responses and generic (boring) responses. In this work, we propose a framework named “Negative Training” to minimize such behaviors. Given a trained model, the framework will first find generated samples that exhibit the undesirable behavior, and then use them to feed negative training signals for fine-tuning the model. Our experiments show that negative training can significantly reduce the hit rate of malicious responses (e.g. from 12.6% to 0%), or discourage frequent responses and improve response diversity (e.g. improve response entropy by over 63%).

1 Introduction

End-to-end dialogue response generation can be formulated as a sequence-to-sequence (seq2seq) task: given a dialogue context, the model is asked to generate a high-quality response. In recent years, deep learning models, especially seq2seq language generation models Sutskever et al. (2014); Cho et al. (2014), have brought significant progress to the field of dialogue response generation.

However, recent research has revealed undesirable behaviors of seq2seq models that are side effects of standard maximum likelihood estimation (MLE) training, such as the generic (boring) response problem Li et al. (2016), vulnerability to adversarial attacks Cheng et al. (2018); Belinkov and Bisk (2017), and the malicious (egregious) response problem He and Glass (2019).

In this work, we propose and explore the negative training framework to correct unwanted behaviors of a dialogue response generator. During negative training, we first find or identify input-output pairs for a trained seq2seq model that exhibit some undesirable generation behavior, treat them as “bad examples,” and use them to feed negative training signals to the model. Correspondingly, we regard the training data as “good examples” and standard MLE training as “positive training”.

The idea of negative training is inspired from the way parents might teach their children to use language by incorporating both positive and negative training signals. For example, when teaching children how to use “love” and “hate”, in addition to using positive examples like “I love apples but I hate bananas”, they might also point out that saying “I hate you” to someone is considered impolite.

In this work, negative training is used to address the malicious response problem and the frequent response problem (to be described in Section 3.2 and 3.3) in open-domain dialogue response generation. In our experiments, we show that negative training can significantly reduce the hit rate for malicious responses, or discourage frequent responses and greatly improve response diversity.

2 Model Formulation

In this work we adopt recurrent neural network (RNN) based encoder-decoder seq2seq models Sutskever et al. (2014); Cho et al. (2014); Mikolov et al. (2010), which are widely used in NLP applications like dialogue response generation Li et al. (2016), machine translation Luong et al. (2015), etc. We use to denote one-hot vector representations of the input sequence, which serves as context or history information (e.g. the previous utterance), 111The last word is a <EOS> token which indicates the end of a sentence. to denote scalar indices of the corresponding reference target sequence, and as the vocabulary. We use to represent the parameters for the seq2seq model, and as the model’s generative distribution. In this work, only one previous sentence is used as input222In our baseline experiments, we find providing two or longer dialogue history information does not improve performance..

On the encoder side, every will be first mapped into its corresponding word embedding . Since is one-hot, this can be implemented by a matrix multiplication operation , where the th column of matrix is the word embedding of the th word. Then are input to a long-short term memory (LSTM) Hochreiter and Schmidhuber (1997) RNN to get a sequence of latent representations 333Here refers to the output layer of LSTM, not the cell memory layer. .

For the decoder, at time , similarly is first mapped to . Then a context vector , which is supposed to capture useful latent information of the input sequence, needs to be constructed. We adopt the “attention” mechanism for context vector construction: first an attention mask vector (which is a distribution) on the input sequence is calculated to decide which part to focus on, then the mask is applied to the latent vectors to construct : . We use the formulation of the “general” type of global attention, described in (Luong et al., 2015), to calculate the mask.

During baseline training, standard MLE training with stochastic gradient descent (SGD) is used to minimize the negative log-likelihood (NLL) of the reference target sentence given the input sentence in the data:

(1)

where refers to , in which is set to a begin-of-sentence token <BOS>.

In this work we take into consideration two popular ways of decoding (generating) a sentence given an input:

  1. Greedy decoding: greedily select the word that is assigned the biggest probability by the model:

    (2)
  2. Sampling: is sampled from the prediction distribution .

In practice for dialogue response generation, greedy decoding will provide stable and reproducible outputs, but is severely affected by the generic response problem. Sampling will provide more diverse but less predictable responses, and thus give rise to the malicious response problem.

3 The Negative Training Framework

3.1 Overview

The negative training framework is a two-stage process. Given a trained model, we put it under a “debugging” environment which provides test input samples444Note that here “test” does not refer to the test data., get the model’s decoded samples and decide (using well-defined criteria) whether each input-output pair exhibits some undesirable behavior. Then, these “bad” pairs are used to provide negative training signals.

Negative training can be derived from Empirical Bayes Risk Minimization Och (2003). Specifically, the overall objective is to minimize the expected risk that the model exhibits undesirable decoding behavior:

(3)

where refers to the binary criteria that will be if exhibits undesirable behavior, and otherwise.

Then, we take the derivative of w.r.t. to , using the log derivative trick (widely used in Reinforcement Learning Sutton and Barto (1998)):

(4)

Compared to in eq. 1, which maximizes the log-likelihood of training data samples, minimizes the log-likelihood of undesirable model samples. This is the reason why we call it “Negative Training”.

In our preliminary experiments, we find that negative training needs to be augmented with the standard MLE objective , encouraging the model to retain its original performance:

(5)

In our experiments, we find can be simply set to 0.1 to work well.

In the next two sections, we discuss how the general negative training framework is tailored for the malicious response problem and frequent response problem, respectively.

3.2 Negative Training for the Malicious Response Problem

For the malicious response problem, we follow the methodology proposed by He and Glass (2019). First a list of malicious target sentences are created, then the gibbs-enum algorithm555For this paper to be self-contained, we describe the gibbs-enum algorithm in Appendix A. is called to find “trigger input” that will cause the model to assign large probability to the target sequence. The following “hit types” are defined:

  • o-greedy-hit: A trigger input sequence is found such that the model generates the target sentence from greedy decoding.

  • o-sample-avg-hit: A trigger input sequence is found such that the model generates the target sentence with an average word log-probability larger than a given threshold .

  • o-sample-min-hit: A trigger input sequence is found such that the model generates the target sentence with a minimum word log-probability larger than a given threshold .

  • io-sample-min/avg-hit: In addition to the definition of o-sample-min/avg-hit, we also require that the average log-likelihood of the trigger input sequence, measured by a LM, is larger than a threshold . This enforces the trigger input to be more likely to be input by real-world users.

is set to the trained seq2seq model’s average word log-likelihood on the test data, and is set to be a reasonable LM’s 666A LSTM language model (LM) is trained on the same training data (regarding each response as an independent sentence). average word log-likelihood on the test set. The intuition is that the model should not assign larger probabilities to the malicious sentences than the reference sentences in the test set. Note that these hit types act as criteria , indicating whether a target sentence is hit by a trigger input.

As shown in He and Glass (2019), a typical seq2seq model trained by MLE has around a hit rate for malicious targets w.r.t. sample-min/avg-hit, across data-sets. However, very few malicious targets are hit w.r.t. greedy-hit, so in this work, we focus on the malicious response problem for sampling during decoding. In Table 1 we show pairs of trigger inputs and the malicious target sentences w.r.t io-sample-min-hit, for the baseline model on Ubuntu data.

  Input: Target list , model parameter , learning rate , criterion for hit , and training data
  for  in  do
     Get for using the gibbs-enum algorithm.
     while  do
        Negative update:
        Get data sample from
        Positive update:
     end while
  end for
Algorithm 1 Negative Training for the Malicious Response Problem

Now we apply the negative training framework, and aim to reduce the hit rate of a trained model for a given list of malicious targets. During each iteration of negative training, for every target sentence , we first call the gibbs-enum algorithm to find the trigger input . And if the target is hit (), we update the model to reduce the log-likelihood . The process is formulated in Algorithm 1777Note that in actual implementation, the algorithm is mini-batch based..

For each trigger input, multiple iterations of negative updates are usually needed before the hit criterion is no longer met. Note that in each iteration, the gibbs-enum algorithm is called again to find a new trigger input for each target.

In our experiments, we show that negative training effectively reduces the hit rate for malicious targets after each iteration, and eventually, the gibbs-enum algorithm can no longer find trigger inputs for a significant number of targets that were initially hits.


Trigger Input: ok give me a minute to see what you have
in the first place
Malicious Target: no one will help you
Trigger Input: mirc it ’s supposed to be <unk> ’ that
seems to be the problem
Malicious Target: i do n’t think i like you
Trigger Input: i know about photoshop i use skype too
but i ca n’t buy it
Malicious Target: you have no money
Table 1: Examples of trigger inputs for malicious targets.

3.3 Negative Training for the Frequent Response Problem

The generic response problem Li et al. (2016) for end-to-end dialogue response generation refers to the typical behavior of a MLE trained model, whereby the generated responses are mostly safe, boring or uninformative (such as “i don’t know” or “good idea”). However, it is difficult to invent an automatic criterion to determine whether a response is generic or not.

In this work, we focus on the frequent response problem, as a sub-problem of the generic response problem. It refers to the behavior that a trained model generates exactly the same (usually boring) response, with a high frequency. A well-behaved generator should, obviously, generate more diverse responses.

In our experiments, we propose to use a metric called max-ratio to measure how severe the frequent response problem is. Given a test set and a decoding method, the model will generate a set of responses, and max-ratio is defined to be the ratio of the most frequent response. In our experiments, the baseline models have a max-ratio of around for response like “I don’t know” across different data-sets, showing the severity of the frequent response problem.

During negative training for frequent response, first a threshold ratio is selected (such as 0.01), and responses with frequency ratio larger than will be discouraged. For each iteration, the model’s response to each training data input sentence is monitored and responses with frequency larger than will be used as negative examples. The frequency statistics are calculated using the current and the last 200 mini-batches. The procedure is formulated in Algorithm 2. Note that positive training is also needed here for the model to retain its original performance.

  Input: Model parameter , threshold ratio , learning rate , and training data set
  for  in  do
     Generate response from the model.
     Compute the frequency for in the last 200 mini-batches.
     if  then
        Negative update:
        Positive update:
     end if
  end for
Algorithm 2 Negative Training for the Frequent Response Problem

In our experiments, it is shown that negative training significantly reduces max-ratio for the model on test data, and greatly increases the diversity of the model’s responses.

4 Experiments

We conduct experiments on three publicly available conversational dialogue data-sets: Ubuntu, Switchboard, and OpenSubtitles. To save space, descriptions of the data-sets are provided in Appendix B.

4.1 Baseline Model Training

For all data-sets, we first train an LSTM based LM and attention based seq2seq models with one hidden layer of size 600, and the embedding size is set to 300888The pytorch toolkit is used for all neural network related implementations. In supplementary materials we include our code and the Switchboard data for reference and reproduction. For Switchboard a dropout layer with rate 0.3 is added to the model because over-fitting is observed. The mini-batch size is set to 64 and we apply SGD training with a fixed starting learning rate (LR) for 10 iterations, and then another 10 iterations with LR halving. For Ubuntu and Switchboard, the starting LR is 1, while a starting LR of 0.1 is used for OpenSubtitles. The results are shown in Appendix C.

After negative training, in addition to measuring the hit rate for malicious targets or the diversity of the responses, it is also important to check whether the original performance of the baseline model is damaged. Towards that end, the perplexity of the model before and after negative training will be compared, and qualitative analysis of decoding samples will be conducted. We mainly rely on the perplexity measurement for the following reasons: 1) Since the baseline models are trained by MLE, it is natural to use a measurement directly related to the model’s NLL on the test-set, such as perplexity; 2) Other popular measurements, such as the BLEU score, have been found to correspond poorly with human judgements Liu et al. (2016).

4.2 Experiments on the Malicious Response Problem

Following He and Glass (2019), a list of malicious targets are created to test whether negative training can teach the model not to generate sentences in the list. However, in addition to prevent the model from generating targets in a specific list, it is also important to check whether negative training generalizes to other malicious targets. So, a test target list which contains similar but different targets from the training list are also created to test generalization. The training and test lists each contain 0.5k targets.


Train Paraphrase of Train Test
you are broken you ’re broken are you broken
i will kill i ’ll kill myself i ’m going to kill
you are bad you ’re bad you are really bad
you are stupid you ’re stupid you are so stupid
you shut up you shut your mouth can you shut up
Table 2: Examples of malicious targets in the training list, the test list, and paraphrases of the training targets which will be used for augmentation.

It is also interesting to investigate whether using more malicious targets for negative training can lower the hit rate on the test list. Towards that end, we train a seq2seq paraphrase model using the paraNMT data-set Wieting and Gimpel (2017), with a model of the same structure as described in Section 2. Then, the paraphrase model is used to generate paraphrases of the malicious targets in the training target list999Note the training and test lists are manually created. for augmentation. In our experiments, the training list without augmentation is first used for negative training, then it is augmented with 0.5k or 2k paraphrased targets respectively (1 or 4 paraphrase copies for each training target sentence). Samples of the malicious targets are shown in Table 2. Note that the same training, augmented training and test list are used for all three data-sets, and there is no sequence-level overlap between training lists (augmented or not) and the test list.

In our experiments, we spotted a harmful side effect of negative training where frequent words in the training target list are severely penalized and sometimes receive low probability even in normal perplexity testing, especially for experiments with small . To alleviate this problem, we use a simple technique called frequent word avoiding (FWA): negative gradients are not applied to the most frequent words in the malicious training target list101010The exact avoiding word set used is {<EOS>, you, i, me, are, to, do}.. For example, when doing negative training against the target “i hate you <EOS>”, only “hate” will get a negative gradient.

For all data-sets, negative training (Algorithm 1) is executed on the (trained) baseline model for 20 iterations over the training target list. A fixed learning rate of 0.01 and a mini-batch size of 100 are used. is set to 0.1 for Ubuntu, and to 1 for Switchboard and OpenSubtitles.


Ubuntu o-sample-min-hit io-sample-min-hit
Training Train Test PPL Train Test PPL
Baseline 16.4% 12.6% 59.49 7.8% 5.2% 59.49
+neg-tr(0.5k) 0% 2% 60.42 0.2% 1.4% 59.97
+neg-tr(1k) 0.1% 1.4% 60.72 0.1% 1% 60.21
+neg-tr(2.5k) 0.04% 0% 62.11 0.2% 0% 63.37
Switchboard o-sample-avg-hit io-sample-avg-hit
Training Train Test PPL Train Test PPL
Baseline 27.8% 27.6% 42.81 19.6% 21% 42.81
+neg-tr(0.5k) 3.8% 13.4% 42.91 2.2% 9.4% 42.7
+neg-tr(1k) 2.4% 5% 42.96 2.1% 4% 42.76
+neg-tr(2.5k) 1.3% 2.6% 43.51 1.5% 1.6% 43.24
OpenSubtitles o-sample-min-hit io-sample-min-hit
Training Train Test PPL Train Test PPL
Baseline 40.7% 36.6% 70.81 19.2% 13.6% 70.81
+neg-tr(0.5k) 5.8% 12.2% 77.90 5.2% 6.6% 73.48
+neg-tr(1k) 5.2% 7% 68.77 9.2% 4.6% 68.92
+neg-tr(2.5k) 4.8% 6% 74.07 3.4% 3.6% 75.9
Table 3: Main results for the hit rates of malicious targets before and after negative training. ”Neg-tr(0.5k)” refers to the negative training experiment using the original malicious training target list without paraphrase augmentation.

The main results are shown in Table 3. For Switchboard we focus on sample-avg-hit because we find very few targets are hit w.r.t. sample-min-hit (Similar results are reported in He and Glass (2019)), while for Ubuntu and OpenSubtitles we focus on sample-min-hit. Note that we get very similar results w.r.t. sample-avg-hit for Ubuntu/OpenSubtitles, and we omit those results here.

We first observe that, for all data-sets, negative training can effectively reduce the hit rate on the training target list to less than 5% with little or no degradation on perplexity. We provide a comparison of the model’s behavior in Appendix D. Also, significant hit rate reduction is achieved on the test target list, which has no overlap with the training target list. This suggests that negative training, similar to traditional positive training, also generalizes. However, note that all targets in the test list used in this work have a counterpart which is both semantically and syntactically similar to it in the training list. It will be interesting to investigate whether negative training has more powerful generalization ability in future work.

It is also shown that training list augmentation can further reduce the malicious target hit rate consistently for both training and test lists. For example, on Ubuntu data, the hit rate after negative training w.r.t. o-sample-min-hit is 12.6%, and can be reduced to 0% with paraphrase augmentation.

4.3 Experiments on the Frequent Response Problem

In this section we report results where the negative training framework (Section 3.3) is applied to tackle the frequent response problem. For all data-sets, negative training is executed for 20 iterations on the MLE trained model over the training data, with a selected . A fixed learning rate of 0.001 is used for all three data-sets, the mini-batch size is set to 64 and is set to 1.

In this work, we focus on improving the model’s greedy decoding behavior instead of beam search for the following two reasons: 1) For the baseline models our experiments, we found that beam search gives far worse response diversity than greedy decoding, because it favors short responses (usually only of length one) too much, resulting in a much larger max-ratio; 2) During training, doing beam search is much more time-consuming than greedy decoding.

To measure the diversity of the model’s generated responses, in addition to max-ratio introduced in Section 3.3, which is specially design for the frequent response problem, we also adopt the entropy metric proposed in Zhang et al. (2018). Given a set of responses from decoding on the test set, Ent-n calculates the entropy of the n-gram distribution:

(6)

where is the set of all n-grams that appeared in the response set, and refers to the ratio (frequency) of n-gram w.r.t. all n-grams in the responses set.

In our experiments with negative training, a harmful side-effect is spotted: during decoding, the model tends to output long and ungrammatical responses such as “i do n’t know if it ’s a real valid deterrent crime crime yeah i ’m satisfied trying not to”. We believe the reason is that the sentence end token <EOS> gets over penalized during negative training (it appears in every negative example). So, we apply the same frequent word avoiding (FWA) technique used in Section 4.2, except that here only the negative gradient for <EOS> is scaled by 0.1111111We find that scal by zero will result in extremely short responses..

In addition to the baseline model, we compare our proposed negative training framework against a GAN Goodfellow et al. (2014a) approach, where a discriminator is introduced and the generator tries to fool the discriminator to believe its samples are real data samples:

(7)

where the generator refers to the seq2seq model . The GAN framework is very attractive for tackling the generic response problem Li et al. (2017); Zhang et al. (2018), because the discriminator can act as a critic to judge whether a response sample is boring. We describe the training details and hyper-parameter setting for the GAN approach in Appendix E.

We also provide an comparison to the MMI decoding (Li et al., 2016), which is a very popular work in this field. We implement MMI-antiLM for our models.


Ubuntu PPL M-ratio E-2 E-3
Test-set N/A N/A 1.1% 10.09 11.32
Baseline N/A 59.49 4.4% 5.33 5.92
+GAN N/A 59.43 4.7% 5.30 5.87
+MMI N/A N/A 4.5% 5.34 5.93
+neg-train 1% 59.76 1.2% 5.74 6.52
+neg-train 0.1% 60.06 1.3% 6.44 7.55
+neg-train 0.01% 60.96 1.3% 9.09 10.16
Switchboard PPL M-ratio E-2 E-3
Test-set N/A N/A 10.0% 8.61 9.65
Baseline N/A 42.81 37.4% 2.71 2.42
+GAN N/A 42.69 49% 2.66 2.35
+MMI N/A N/A 23% 5.48 6.23
+neg-train 10% 42.84 12.4% 3.86 4.00
+neg-train 1% 44.32 9.8% 5.48 6.03
+neg-train 0.1% 76.46 21.8% 5.55 6.10
OpenSubtitles PPL M-ratio E-2 E-3
Test-set N/A N/A 0.47% 9.66 10.98
Baseline N/A 70.81 20% 4.22 4.59
+GAN N/A 72.00 18.8% 4.08 4.43
+MMI N/A N/A 3.6% 7.63 9.08
+neg-train 1% 72.37 3.1% 5.68 6.60
+neg-train 0.1% 75.71 0.6% 6.90 8.13
+neg-train 0.01% 76.82 0.8% 7.28 7.92
Table 4: Main results of negative training with different , for the frequent response problem. Diversity metrics for the responses in the test data are also shown, “E-n”/“M-ratio” refer to the Ent-n/max-ratio metric.

Switchboard OpenSubtitles
Input: it ’ll cost about three hundred dollars for a stud Input: captain you wanted to see me
Baseline: i think that ’s a good idea Baseline: i ’m sorry
Neg-train: i think i would agree with that Neg-train: i was in the hotel
Input: we want to breed her with a champion Input: yes mr. brown could i
Baseline: i do n’t know Baseline: i do n’t know
Neg-train: i think it was Neg-train: i ’d like to introduce myself
Input: now these are long haired Input: leave it to me
Baseline: i do n’t know Baseline: i ’m not going to leave you
Neg-train: i ’ve been in a very very good shape Neg-train: you ’re taking the first step
Input: the other two are short hairs Input: thank you mr. brown
Baseline: i do n’t know Baseline: i ’m sorry
Neg-train: i ’m going to try to get it Neg-train: i ’m happy to see you
Table 5: Greedy-decoding samples on the test data before and after negative training. The samples are consecutive (input of the next sample is the reference response for the previous one).

The experimental results are shown in Table 4. The experiment with best diversity result and non-degenerate sample quality are shown in bold. For example, although the experiment with on Ubuntu gives higher diversity measurements, we find some of its samples are ungrammatical.

We first observe a large gap on the diversity measures between the baseline models and the test set, especially on Switchboard and OpenSubtitles data. That indicates the severity of the frequent/generic response problem. Then, results of negative training with different show that negative training can significantly increase response diversity, with little or no loss on PPL performance. We also find that negative training does not harm the model’s performance on BLEU score . For example, max-ratio is reduced by 73.7% and Ent-3 is increased by 149% for Switchboard data. Further, consistent improvement is achieved when a smaller is used. However, sample quality will decrease (becoming too long or ungrammatical) when is too small. The reason could be that when too much diversity is asked for, the model will go to extremes to provide diversity, resulting in degradation of sample quality.

Comparing to MMI, note that although on Switchboard/Opensubtitles MMI gives higher entropy, the max-ratio is not as low as the negative training result, which is the main focus of our work (the frequent response problem). We also find MMI‘s hyper-parameters are difficult to tune: the working set of hyper-parameters don’t transfer well between data-sets. Further, for MMI in a lot of configuration tries the model gives ungrammatical output samples (this is problem is also mentioned in the paper (Li et al., 2016)). For the Ubuntu data, we can not even find a configuration that performs better than the baseline model.

Somewhat surprisingly, the vanilla GAN approach is not shown to be effective in our experiments. The reason could be that despite its discriminative nature, GAN training still feeds “positive” gradient for samples from the model (eq. 12 and eq. 13 in Appendix E), which is not enough to prevent the model from generating them. We believe additional techniques Zhang et al. (2018); Li et al. (2017) are needed for the GAN approach to be effective.

We show some model samples before and after negative training in Table 5. It is shown that negative training effectively discourages boring responses, and response diversity is clearly improved. However, one limitation is observed that diversity does not necessarily lead to improvement on the informativeness of the response w.r.t. the input (sometimes the model generates a completely unrelated response). More samples for all three data-sets are included in Appendix G.

To rigorously verify negative training is not getting diversity when sacrificing the sample’s quality, a human evaluation is conducted and results are shown in Table 6. It is observed that negative training wins by a visible margin for all three data-sets, and for around half of the samples, the evaluators report a “tie”. This shows that, negative training does not damage the quality of the generated samples.


Data-set Tie Baseline wins Neg-train wins
Ubuntu 64.6% 14.0% 21.3%
Switchboard 45.1% 18.3% 36.4%
Opensubtitles 58.3% 19.0% 22.6%
Table 6: Human Evaluation Results. For each data-set, 300 samples (input-output pairs) from the baseline model and the model after negative training, are evenly distributed to 4 English-speaking human evaluators. The evaluators are asked to pick a preferred sample, or report a tie.

5 Related Works

The malicious response problem and the gibbs-enum algorithm to find trigger inputs He and Glass (2019) originates from a large body of work on adversarial attacks for deep learning models, with continuous input space (e.g. image classification) Goodfellow et al. (2014b); Szegedy et al. (2013), or discrete input space (e.g. sentence classification, or seq2seq models) Papernot et al. (2016); Samanta and Mehta (2017); Liang et al. (2018); Ebrahimi et al. (2017); Belinkov and Bisk (2017); Chen et al. (2017). “Adversarial attacks” refer to the phenomenon that when an imperceptible perturbation is applied to the input, the output of the model can change significantly (from correct to incorrect). The trigger inputs found by the gibbs-enum algorithm, can be regarded as a type of “targeted attack”, in which the attack triggers the model to assign large probability to a specific malicious target sentence.

Motivated by the works on adversarial attacks, various adversarial training strategies (Madry et al., 2017; Belinkov and Bisk, 2017; Miyato et al., 2016) have been proposed to make trained models more robust against those attacks. During adversarial training, the model is fed with adversarial examples and the correct labels. The negative training framework considered in this work differs from adversarial training in that, instead of asking the model to “do the right thing” (referred to as “positive training” in this work), the model is trained to “not do the wrong thing”. To the best of our knowledge, this is the first work investigating the concept of negative training for language generation models, and the first proposed solution for the malicious response problem.

The malicious target list used in this work is very similar to the one used in He and Glass (2019). We propose to add a test target list to test the generalization of negative training. Further, we show that the training list can be effectively augmented by utilizing a paraphrase model.

In this work, we propose a definition for the frequent response problem, as a sub-problem of the generic response problem Li et al. (2016). Much research work has devoted to alleviate the generic response problem in end-to-end dialogue response generation, Li et al. (2016) use the maximal mutual information (MMI) objective, and propose to utilize an auxiliary LM to penalize the generic response during decoding. Closely related to this work, sophisticated training frameworks based on GAN Zhang et al. (2018); Li et al. (2017) have also been shown to be effective, where techniques such as variational information maximization or reward for every generation step (REGS) are proposed to improve GAN training. However, in our experiments it is shown that a vanilla GAN approach gives unsatisfactory results. Whether negative training121212Note that negative training is considerably easier to implement than the mentioned frameworks based on GAN. is complementary to these frameworks is worth investigating in future work.

Finally, note that the concept of negative training in this work is very different to the negative samples in word2vec training (Mikolov et al., 2013). The negative samples in word2vec training are used to prevent the training from being trivial, and is usually chosen randomly. In this work, the negative samples are carefully chosen to exhibit some particular undesirable behavior of the model, and is then used to correct such behavior.

6 Conclusion

In this work, we propose the negative training framework to correct undesirable behaviors of a trained neural dialogue response generator. The algorithm involves two major steps, first input-output pairs that exhibit bad behavior are identified, and then are used for fine-tuning the model as negative training examples. We also show that negative training can be derived from an overall objective (eq. 3) to minimize the expected risk of undesirable behaviors. In our experiments, we apply negative training to the malicious response problem and the frequent response problem and get significant improvement for both problems.

References

  • Y. Belinkov and Y. Bisk (2017) Synthetic and natural noise both break neural machine translation. CoRR abs/1711.02173. External Links: 1711.02173 Cited by: §1, §5, §5.
  • H. Chen, H. Zhang, P. Chen, J. Yi, and C. Hsieh (2017) Show-and-fool: crafting adversarial examples for neural image captioning. CoRR abs/1712.02051. External Links: 1712.02051 Cited by: §5.
  • M. Cheng, J. Yi, H. Zhang, P. Chen, and C. Hsieh (2018) Seq2Sick: evaluating the robustness of sequence-to-sequence models with adversarial examples. CoRR abs/1803.01128. External Links: 1803.01128 Cited by: §1.
  • K. Cho, B. van Merriënboer, Ç. Gülçehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio (2014) Learning phrase representations using rnn encoder–decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, pp. 1724–1734. Cited by: §1, §2.
  • J. Ebrahimi, A. Rao, D. Lowd, and D. Dou (2017) HotFlip: white-box adversarial examples for NLP. CoRR abs/1712.06751. External Links: 1712.06751 Cited by: §5.
  • I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio (2014a) Generative adversarial nets. In Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2, NIPS’14, Cambridge, MA, USA, pp. 2672–2680. Cited by: Appendix E, §4.3.
  • I. J. Goodfellow, J. Shlens, and C. Szegedy (2014b) Explaining and harnessing adversarial examples. CoRR abs/1412.6572. External Links: 1412.6572 Cited by: §5.
  • T. He and J. Glass (2019) Detecting egregious responses in neural sequence-to-sequence models. In International Conference on Learning Representations, Cited by: Appendix A, §1, §3.2, §3.2, §4.2, §4.2, §5, §5.
  • S. Hochreiter and J. Schmidhuber (1997) Long short-term memory. Neural computation 9 (8), pp. 1735–1780. Cited by: §2.
  • Y. Kim (2014) Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL, pp. 1746–1751. Cited by: Appendix E.
  • J. Li, M. Galley, C. Brockett, J. Gao, and B. Dolan (2016) A diversity-promoting objective function for neural conversation models. In NAACL HLT 2016, The 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego California, USA, June 12-17, 2016, pp. 110–119. Cited by: §1, §2, §3.3, §4.3, §4.3, §5.
  • J. Li, W. Monroe, T. Shi, A. Ritter, and D. Jurafsky (2017) Adversarial learning for neural dialogue generation. CoRR abs/1701.06547. External Links: Link, 1701.06547 Cited by: §4.3, §4.3, §5.
  • B. Liang, H. Li, M. Su, P. Bian, X. Li, and W. Shi (2018) Deep text classification can be fooled. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI 2018, July 13-19, 2018, Stockholm, Sweden., pp. 4208–4215. External Links: Document Cited by: §5.
  • C. Liu, R. Lowe, I. Serban, M. Noseworthy, L. Charlin, and J. Pineau (2016) How not to evaluate your dialogue system: an empirical study of unsupervised evaluation metrics for dialogue response generation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 2122–2132. External Links: Document Cited by: §4.1.
  • R. Lowe, N. Pow, I. Serban, and J. Pineau (2015) The ubuntu dialogue corpus: A large dataset for research in unstructured multi-turn dialogue systems. CoRR abs/1506.08909. External Links: 1506.08909 Cited by: Appendix B.
  • T. Luong, H. Pham, and C. D. Manning (2015) Effective approaches to attention-based neural machine translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1412–1421. External Links: Document Cited by: §2, §2.
  • A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu (2017) Towards deep learning models resistant to adversarial attacks. CoRR abs/1706.06083. External Links: 1706.06083 Cited by: §5.
  • T. Mikolov, M. Karafiát, L. Burget, J. Cernocký, and S. Khudanpur (2010) Recurrent neural network based language model. In INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association, Makuhari, Chiba, Japan, September 26-30, 2010, pp. 1045–1048. Cited by: §2.
  • T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean (2013) Distributed representations of words and phrases and their compositionality. In Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2, NIPS’13, USA, pp. 3111–3119. External Links: Link Cited by: §5.
  • T. Miyato, A. M. Dai, and I. Goodfellow (2016) Adversarial training methods for semi-supervised text classification. Note: cite arxiv:1605.07725Comment: Published as a conference paper at ICLR 2017 Cited by: §5.
  • F. J. Och (2003) Minimum error rate training in statistical machine translation. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1, ACL ’03, Stroudsburg, PA, USA. External Links: Document Cited by: §3.1.
  • N. Papernot, P. D. McDaniel, A. Swami, and R. E. Harang (2016) Crafting adversarial input sequences for recurrent neural networks. In 2016 IEEE Military Communications Conference, MILCOM 2016, Baltimore, MD, USA, November 1-3, 2016, pp. 49–54. External Links: Document Cited by: §5.
  • S. Samanta and S. Mehta (2017) Towards crafting text adversarial samples. CoRR abs/1707.02812. External Links: 1707.02812 Cited by: §5.
  • R. K. Srivastava, K. Greff, and J. Schmidhuber (2015) Highway networks. CoRR abs/1505.00387. External Links: 1505.00387 Cited by: Appendix E.
  • I. Sutskever, O. Vinyals, and Q. V. Le (2014) Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada, pp. 3104–3112. Cited by: §1, §2.
  • R. S. Sutton and A. G. Barto (1998) Introduction to reinforcement learning. 1st edition, MIT Press, Cambridge, MA, USA. External Links: ISBN 0262193981 Cited by: §3.1.
  • C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. J. Goodfellow, and R. Fergus (2013) Intriguing properties of neural networks. CoRR abs/1312.6199. External Links: 1312.6199 Cited by: §5.
  • J. Tiedemann (2009) News from OPUS - A collection of multilingual parallel corpora with tools and interfaces. In Recent Advances in Natural Language Processing, N. Nicolov, K. Bontcheva, G. Angelova, and R. Mitkov (Eds.), Vol. V, pp. 237–248. External Links: ISBN 978 90 272 4825 1 Cited by: Appendix B.
  • J. Wieting and K. Gimpel (2017) Pushing the limits of paraphrastic sentence embeddings with millions of machine translations. CoRR abs/1711.05732. External Links: Link, 1711.05732 Cited by: §4.2.
  • L. Wu, Y. Xia, L. Zhao, F. Tian, T. Qin, J. Lai, and T. Liu (2017) Adversarial neural machine translation. CoRR abs/1704.06933. External Links: 1704.06933 Cited by: Appendix E.
  • L. Yu, W. Zhang, J. Wang, and Y. Yu (2016) SeqGAN: sequence generative adversarial nets with policy gradient.. CoRR abs/1609.05473. Cited by: Appendix E.
  • Y. Zhang, M. Galley, J. Gao, Z. Gan, X. Li, C. Brockett, and B. Dolan (2018) Generating informative and diverse conversational responses via adversarial information maximization. In Advances in Neural Information Processing Systems 31, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.), pp. 1815–1825. Cited by: §4.3, §4.3, §4.3, §5.

Appendix A The Gibbs-enum Algorithm for Finding Trigger Inputs

In this section, we briefly describe the gibbs-enum algorithm, we also refer readers to He and Glass (2019) for the intuition and full development of the algorithm. The goal of gibbs-enum is that given a (malicious) target sentence of length , and a trained seq2seq model, we aim to find a trigger input sequence , which is a sequence of one-hot vectors of length , to minimize the negative log-likelihood (NLL) that the model will generate . We formulate our objective function below:

(8)

A regularization term is applied when looking for io-sample-min/avg-hit, which is the LM score of :

(9)

In our experiments we set to when searching for io-sample-min/avg-hit, otherwise 0.

During gibbs-enum, every time we focus on a single index slot , and find the best one-hot while keeping the other parts of fixed:

(10)

Since the size of vocabulary is finite, it is possible to try all of them and get the best local . But it is still costly since each try requires a forwarding call to the neural seq2seq model. To address this, gradient information is utilized to narrow the range of search. We temporarily regard as a continuous vector and calculate the gradient of the negated loss function with respect to it:

(11)

Then, we try only the indexes that have the highest value on the gradient vector. The procedure is formulated in Algorithm 3.

  Input: a trained seq2seq model, target sequence , a trained LSTM LM, objective function , input length , output length , and target hit type.
  Output: a trigger input
  if hit type is in “io-hit” then
     initialize to be a sample from the LM
  else
     randomly initialize to be a valid input sequence
  end if
  for  do
     for  do
        get gradient , and set list to be the indexes with highest value in the gradient vector
        for  do
           set to be:
           if  then
              set
           end if
        end for
     end for
     if this sweep has no improvement for  then
        break
     end if
  end for
  return
Algorithm 3 Gibbs-enum algorithm

For hyper-parameters of gibbs-enum, (the maximum number of sweeps) is set to 5, (size of the set of indices for enumeration during each update) is set to 100, the algorithm is run 5 times with different random initializations and the trigger input with the best loss is returned. Note that larger hyper-parameters can give slightly higher hit rates, but will be more time-consuming.

Appendix B Data-set Descriptions

Three publicly available conversational dialogue data-sets are used: Ubuntu, Switchboard, and OpenSubtitles. The Ubuntu Dialogue Corpus Lowe et al. (2015) consists of two-person conversations extracted from the Ubuntu chat logs, where a user is receiving technical support from a helping agent for various Ubuntu-related problems. To train the baseline model, we select the first 200k dialogues for training (1.2M sentences / 16M words), and the next 5k dialogues for validation and testing respectively. We select the 30k most frequent words in the training data as our vocabulary, and out-of-vocabulary (OOV) words are mapped to the <UNK> token.

The Switchboard Dialogue Act Corpus 131313http://compprag.christopherpotts.net/swda.html is a version of the Switchboard Telephone Speech Corpus, which is a collection of two-sided telephone conversations, annotated with utterance-level dialogue acts. In this work we only use the conversation text part of the data, and select 1.1k dialogues for training (181k sentences / 1.2M words), 25 dialogues for validation and 25 dialouges for testing. We select the 10k most frequent words in the training data as our vocabulary.

We also report experiments on the OpenSubtitles data-set141414http://www.opensubtitles.org/ Tiedemann (2009). The key difference between the OpenSubtitles data and Ubuntu/Switchboard data is that it contains a large number of malicious sentences, because the data consists of movie subtitles. We randomly select 5k movies for training (each movie is regarded as a big dialogue), which contains 5M sentences and 36M words, and 50 movies for validation and testing respectively. The 30k most frequent words are used as the vocabulary. We show some samples of the three data-sets in Appendix C.

For pre-processing, the text of all three data-sets are lower-cased, and all punctuations are removed. The maximum input sequence length is set to 15, with a maximum output sequence length of 20. Longer input sentences are cropped, and shorter input sentences are padded with <PAD> tokens.

Appendix C Data Samples and Baseline Perplexity Results

Some data samples for Ubuntu, Switchboard, Opensubtitles are shown in Table 7.

Baseline perplexity results are shown Table 8. Note that and for various types of hit types discussed in Section 3.2 are set accordingly, for example, for io-sample-min-hit on the Ubuntu data, is set to -4.19, and is set to -4.08.

Ubuntu
A: anyone here got an ati hd 2400 pro card
working with ubuntu and compiz ?
B: i have an hd 3850
A: is it working with compiz ?
Switchboard
A: what movies have you seen lately
B: lately i ’ve seen soap dish
A: oh
B: which was a
A: that was a lot of fun
OpenSubtitles
B: you ca n’t do that .
A: my husband ’s asleep .
B: your husband know you ’re soliciting ?
A: give us a f*** ’ break .
Table 7: Data samples of Ubuntu, Switchboard and OpenSubtitles Dialogue corpus
Model Test-PPL(NLL)
Ubuntu Switchboard OpenSubtitles
LM 66.29(4.19) 44.37(3.79) 74.74(4.31)
Seq2seq 59.49(4.08) 42.81(3.75) 70.81(4.26)
Table 8: Perplexity (PPL) and negative log-likelihood (NLL) of for baseline models on the test set.

Appendix D Auxiliary Experiment Results for the Malicious Response Problem

Figure 1: Negative Log-probability (NLL) the model assigned to the test list malicious targets (when fed with trigger inputs) or test data samples. The data-set is OpenSubtitles and hit type is io-sample-min-hit. Sentences are separated by <EOS>.

We compare the models behavior before and after negative training in Figure 1. It is shown that negative training effectively reduce probability mass assigned to malicious targets, while keeping the behavior on the test-set unchanged. However, almost every word in the malicious target sentences gets lower probability, especially when FWA is not used. Ideally, we believe a “polite” language generator should only assign low probability to the key words in a malicious sentence. For example, in the target “i shall take my revenge”, only the “take my revenge” part should be penalized. Whether negative training has the potential to truly teach “manners” to a language generator is worth further investigation.

Appendix E Configurations of the GAN Approach for Dialogue Response Generation

We use the log derivative trick Wu et al. (2017) for the gradient derivation of the generator:

(12)

where is one input data sample. Then the generator is updated by:

(13)

where is the learning rate for the generator. Note that because is negative, will be eventually scaled positively and added to .

In our GAN experiments, different values in the set are tried for and the best result is reported.

We now describe the model configuration of the discriminator used in our work. The discriminator model configuration is similar to the one used in Yu et al. (2016). First is converted to as described in Section 2. Then a 1D-convolution operation and max-over-time pooling operation Kim (2014) is applied, with 300 filters of window size 3/4/5/6, respectively. The resulting representation vector is denoted as . .

The same network forward pass is also applied for to get . Finally, and are concatenated and passed to a 3-layer high-way DNN classifier Srivastava et al. (2015) of hidden size 2000.

Following Goodfellow et al. (2014a), we alternately train the discriminator and the generator with a ratio of 3:1. The discriminator is trained with a learning rate of 0.01. Similar to negative training, our experiments show that positive training (or “teacher forcing” in some literature) is crucial to aid the model to maintain its original performance, during GAN training.

Appendix F Auxiliary Experiment Results for the Frequent Response Problem

In Talbe 9, we show BLEU-4 scores for the model after negative training. It is shown that the BLEU-4 performance does not become worse (or even improves) after negative training. This result, to some extent, verifies our claim that the quality of model samples is not compromised.


Ubuntu PPL BLEU-4 M-ratio
Baseline 59.49 0.443 4.4%
Neg-train 60.06 0.464 1.3%
Switchboard PPL BLEU-4 M-ratio
Baseline 42.81 0.476 37.4%
Neg-train 44.32 0.475 9.8%
OpenSubtitles PPL BLEU-4 M-ratio
Baseline 70.81 0.502 20%
Neg-train 75.71 0.526 0.6%
Table 9: BLEU-4 results for negative training.

Appendix G More Model Samples for Comparison

We show more samples for comparison of different training methods in Table 10, 11 and 12.

Switchboard
Input: people do n’t realize it
Baseline: i do n’t know
GAN: i do n’t know
Neg-train: i think it ’s a very interesting
Input: they ’re sort of siamese in a way
Baseline: uh-huh
GAN: uh-huh
Neg-train: yeah
Input: they have the gloves on the paws
Baseline: uh-huh
GAN: uh-huh
Neg-train: i ’m really glad that they do n’t have that
Input: they ’re usually two colored
Baseline: i think that ’s a good idea
GAN: i think that ’s a good idea
Neg-train: yeah
Input: she ’s predominantly black
Baseline: i do n’t know
GAN: he ’s a little bit
Neg-train: he ’s a very good actor
Input: she has chocolate paws chocolate stomach
and silver on her hindquarters
Baseline: uh-huh
GAN: uh-huh
Neg-train: i think they have a great deal of <unk>
Input: she ’s quite attractive looking
Baseline: he ’s a good actor
GAN: he ’s a little bit
Neg-train: he ’s a good guy
Input: hope the stud will find her attractive
Baseline: i do n’t know
GAN: i do n’t know
Neg-train: i ’ve been in a very very good shape
Input: that she has kittens that look like her
Baseline: i think that ’s a good idea
GAN: i think that ’s a good idea
Neg-train: yeah
Input: that she has a whole bunch of kittens
Baseline: i think that ’s a good idea
GAN: i think that ’s a good idea
Neg-train: yeah
Input: we have to be careful
Baseline: i ’m not sure
GAN: i ’m not sure
Neg-train: i ’m in the dallas area
Input: we have to get a particular color point stud
Baseline: i do n’t know
GAN: i do n’t know
Neg-train: i ’m just going to
Input: you ca n’t just breed them with anything
Baseline: i do n’t know
GAN: i do n’t know
Neg-train: i have n’t been to a lot of the movies lately
Input: we have to get a silver point himie
Baseline: i do n’t know
GAN: i do n’t know
Neg-train: i ’m going to have to go back to work
Table 10: Consecutive model samples before and after negative training for Switchboard data.
Ubuntu
Input: just how good ubuntu
Baseline: i ’m not sure i ’m just a <unk>
GAN: i ’m not sure if it ’s possible but
i ’m sure there ’s a way to do it
Neg-train: i have a server that i want
to install on a server
Input: help please i broke xp with linux _url_
i ca n’t get an invite
Baseline: what is the problem
GAN: what is the problem
Neg-train: what is the issue
Input: the case it ’s possible that you might need to
repair the xp installation
Baseline: i have a <unk> and i want to install ubuntu
on a usb drive
GAN: i have a <unk> and i want to install ubuntu
on a usb drive
Neg-train: i have a live cd and i want to install
it on a hard drive
Input: but that takes hours
Baseline: i do n’t know i ’m not sure
GAN: i do n’t know
Neg-train: i ’m not sure i think it ’s a hardware problem
Input: how to i enable tv-out via fglrx
Baseline: you can use the <unk> driver
GAN: you can use the nvidia driver from the ubuntu wiki
Neg-train: you can try the <unk> driver
Input: they may be able to help you in channel # ati
Baseline: thanks
GAN: thanks
Neg-train: i have a ati radeon mobility radeon hd
Input: nobody feels like responding in # ati
Baseline: i have no idea what you ’re talking about
GAN: i have no idea
Neg-train: i think i found it
Input: its always worked perfectly for me why ’s it crap
Baseline: i do n’t know i ’m not sure
GAN: i do n’t know i ’m not sure
Neg-train: i do n’t know i just use it to make it work
Input: i have seen better performance on an
xp vm under parallels
Baseline: i have a <unk> and i have a <unk>
GAN: i have a <unk> and a <unk> <unk>
Neg-train: i have a vm with vmware server running on it
Input: use 2kpro and no issues whatsoever with 256mb of
ram and a 10gig vdrive
Baseline: i have a <unk> and it ’s not a big deal
GAN: i have a <unk> and it ’s not a big deal
Neg-train: i have a p4 with 1gb ram and i have a 1gb swap
Input: will someone please help me with php config on ubuntu
Baseline: what ’s the problem
GAN: what ’s the problem
Neg-train: what ’s wrong
Input: you are having trouble with
if someone knows how to help they will try
Baseline: i ’m trying to get a <unk> to work with ubuntu
GAN: i ’m not sure i ’m just trying to help you
Neg-train: i ’m trying to get a <unk> of my <unk> and
i want to know if it ’s possible
Table 11: Consecutive model samples before and after negative training for Ubuntu data.
Opensubtitiles
Input: i feel really bad marco so bad
Baseline: i ’m sorry
GAN: i ’m not a <unk>
Neg-train: i feel so bad
Input: me at least that you ’ll come back tell me you ’re not ill
Baseline: i ’m not going to be in the house
GAN: i ’m not going to let you go
Neg-train: i ’m so glad you are
Input: tell me you forgive me
Baseline: i ’m not going to let you go
GAN: i ’m not going to leave you
Neg-train: you ’re not the only one
Input: will you forgive me
Baseline: i ’ll be all right
GAN: i ’ll be all right
Neg-train: you ’ll never see the
Input: what do you want
Baseline: i ’m going to the hospital
GAN: i ’m going to the <unk>
Neg-train: i want to go
Input: get up hurry i got an idea
Baseline: i ’m not going to be a <unk>
GAN: i ’m not going to see you
Neg-train: you ’re not getting a bad idea
Input: where are we going
Baseline: i do n’t know
GAN: to the <unk>
Neg-train: to the hotel
Input: shut up and keep down
Baseline: i ’m not going to get a doctor
GAN: i ’m not going to get a doctor
Neg-train: we ’re not going
Input: we ’re screwed there ’s no wind
Baseline: i ’m not going to see you
GAN: i ’m not a <unk>
Neg-train: we ’re in the <unk>
Input: how ’d your old lady come to think you was kidnapped
Baseline: i do n’t know
GAN: i do n’t know
Neg-train: i ’d like to ask you to take a look at it
Input: your son is in our hands
Baseline: i ’m not going to
GAN: he ’s not here
Neg-train: he ’s not going to leave
Input: we ’ll stay in touch
Baseline: i ’m sorry
GAN: i ’m not going to
Neg-train: i think we ’ve got a <unk>
Input: get the money ready or else
Baseline: i ’m not going to be a <unk>
GAN: i ’m not going to be here
Neg-train: i want the <unk>
Input: i think they got it
Baseline: you know what
GAN: oh
Neg-train: it ’s a good thing
Input: you wanted to scare them
Baseline: i do n’t know
GAN: i ’m not a <unk>
Neg-train: i ’m a coward
Table 12: Consecutive model samples before and after negative training for Opensubtitles data.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
388264
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description