Find or Classify? Dual Strategy for Slot-Value Predictions on Multi-Domain Dialog State Tracking

Find or Classify? Dual Strategy for Slot-Value Predictions on Multi-Domain Dialog State Tracking

Jian-Guo Zhang Kazuma Hashimoto Chien-Sheng Wu Yao Wan                            Philip S. Yu Richard Socher Caiming Xiong
University of Illinois at Chicago
Zhejiang University
Salesforce Research

{jzhan51,psyu}@uic.edu
wanyao@zju.edu.cn {k.hashimoto,wu.jason,rsocher,cxiong}@salesforce.com
Abstract

Dialog State Tracking (DST) is a core component in task-oriented dialog systems. Existing approaches for DST usually fall into two categories, i.e, the picklist-based and span-based. From one hand, the picklist-based methods perform classifications for each slot over a candidate-value list, under the condition that a pre-defined ontology is accessible. However, it is impractical in industry since it is hard to get full access to the ontology. On the other hand, the span-based methods track values for each slot through finding text spans in the dialog context. However, due to the diversity of value descriptions, it is hard to find a particular string in the dialog context. To mitigate these issues, this paper proposes a Dual Strategy for DST (DS-DST) to borrow advantages from both the picklist-based and span-based methods, by classifying over a picklist or finding values from a slot span. Empirical results show that DS-DST achieves the state-of-the-art scores in terms of joint accuracy, i.e., on the MultiWOZ 2.1 dataset, and when the full ontology is accessible.

\useunder

\ul

1 Introduction

With the prevalence of virtual assistants such as Google Assistant, Cortana and Alexa, task-oriented dialog systems are playing important roles in facilitating our daily life, such as booking hotels, reserving restaurants and making traveling plans. Dialog State Tracking (DST) is a core component of task-oriented dialog systems gao2019neural; young2013pomdp. It estimates the state of a conversation based on the current utterance and the conversational history. In DST, a state consists of a set of triplets, which represent the values of requested slots given the active domain or intent at the current turn. DST aims to track all the states accumulated across the conversational turns. Fig. 1 shows a dialogue with corresponding annotated turn states.

Figure 1: An example of dialog state tracking process for booking a hotel and reserving a restaurant. Each turn contains a user utterance (grey) and a system utterance (orange). The dialog state tracker (green) tracks all the triplets until the current turn. Blue color denotes the new state appeared at that turn. Best viewed in color.

Traditional approaches for DST usually rely on hand-crafted features and domain-specific lexicon (besides ontology) henderson2014word; zilka2015incremental; wen2016network; mrkvsic2015multi. Recent data-driven deep learning models have shown promising performances in DST xu2018end; wu2019transferable; lee2019sumbt; goel2019hyst; chao2019bert; ren2018towards; ramadan2018large; liu2017end; zhong2018global, which can be categorized into two classes xu2018end; gao2019dialog; ramadan2018large; zhong2018global, i.e., picklist-based and span-based. The picklist-based approaches ramadan2018large; zhong2018global treat domain-slot pairs as picklist-based slots, where the values are predicted through performing classification on the candidate-value list. They usually require full access to the pre-defined ontology. However, in practical, we would only have partial ontology since full ontology is hard and expensive to access in industry. Even if a full ontology exists, it is computationally expensive to enumerate all the values when the full ontology is very large and diverse wu2019transferable; xu2018end, e.g., values for the time slot could have unlimited choices. The span-based approaches gao2019dialog; xu2018end treat domain-slot pairs as span-based slots, where the values can be found through span matching with start and end positions in the dialog context. However, it is nontrivial to handle situations where values do not appear in the dialog context or have various descriptions by users.

To tackle the aforementioned challenges and borrow advantages from both worlds, this paper proposes to treat the domain-slot pairs of a dialog state as span-based slots and picklist-based slots. The values of each span-based slot can be found through span matching with start and end positions in the dialog context, and the values for each picklist-based slot are found in corresponding candidate-value list. The way to decide whether a slot belongs to span-based slots or a picklist-based slot depends on human heuristics. For example, it is common that when users book a hotel, the requests for parking are usually yes or no with limited choices, we treat these kinds of slots as picklist-based slots. Whereas the number of days the user will stay have unlimited values and they can be found in context, we treat these kind of slots as span-based slots. Since our approach can treat all slots as span-based slots or picklist-based slots according to real scenarios or datasets, it is flexible when the dialog system has access to the ontology or all the values can be found in the dialog context. Inspired by recent successful experience in visual question answering xiong2016dynamic and text reading comprehensions chen2018neural, where the former have candidate-value list and the latter have text spans with start and end positions. We design a Dual-Strategy Dialog State Tracking model (DS-DST) which depends on BERT question answering models devlin2018bert and enables slot-value predictions for both span-based slots and picklist-based slots.

Our contributions are summarized as follows.

We design an approach which treats domain-slot pairs as span-based slots and picklist-based slots based on human heuristics. Our approach mitigates the limitations of relying on fixed-vocabulary or unseen values of span and is flexible to different datasets and real scenarios;

We build a dual strategy model for multi-domain DST, which achieves state-of-the-art results on the MultiWOZ 2.1 dataset eric2019multiwoz;

2 Related Work

DST tracks dialog states in complicated conversations across multiple domains with many slots. It has been a hot research topic during the past few years, along with the development of Dialogue State Tracking Challenges williams2013dialog; henderson2014second; henderson2014third; kim2017fourth; kim2016fifth; rastogi2019towards. Traditional approaches usually rely on hand-crafted features or domain-specific lexicon henderson2014word; wen2016network, which are difficult to extend to new domains. In addition, they require a pre-defined ontology, in which the values of a slot are constrained by a set of candidate values ramadan2018large; liu2017end; zhong2018global; lee2019sumbt. Furthermore, these approaches are hard to adapt to unseen values and large vocabularies. To tackle these issues, several methods are proposed to extract slot values through span matching with start and end positions in the dialog context. For example, xu2018end utilizes an attention-based pointer network to copy values from the dialog context. gao2019dialog treats DST as a reading comprehension problem and incorporates a slot carryover model to copy states from previous conversational turns. However, tracking states only from the dialog context is not enough as many values in DST cannot be found in the context due to annotation errors or diverse descriptions for slot values from users. On the other hand, pre-trained models such as BERT devlin2018bert and GPT radfordimproving have showed promising performances on many down-stream tasks. Among them, DSTreader gao2019dialog utilizes BERT as word embeddings for dialog contexts, SUMBT lee2019sumbt employs BERT to extract representations of candidate values, and BERT-DST chao2019bert adopts BERT to encode the inputs of the user turn as well as the previous system turn. Different from these approaches, our method is based on BERT from the perspective of visual question answering xiong2016dynamic and text reading comprehensions chen2018neural.

Recent generative approaches lei2018sequicity; wu2019transferable generate slot values for DST without relying on fixed vocabularies and spans. However, such generative methods would generate ill-formatted strings (e.g., repeated words) upon generating long strings, which is common in DST. For example, the hotel address may be long and a small difference makes the whole dialog state tacking incorrect. By contrast, both the picklist-based and span-based methods can rely on existing strings rather than generating them.

Figure 2: The architecture of the proposed DS-DST model. The left part is a fixed BERT model which outputs the representations of candidate values in the list for each picklist-based slot (purple). The right part is the other fine-tuned BERT model which outputs representations for the concatenation of each domain-slot pair and the recent dialog context. are used for the slot-gate classification for all domain-slot pairs. are used to predict spans for each span-based domain-slot pair (orange). are used for cosine similarity matching over the candidate-value list for each picklist-based domain-slot pair (purple). Best viewed in color.

3 DS-DST: a Dual Strategy for DST

Let denote a set of pairs of a system utterance and a user utterance (), given a dialogue context with turns. Each turn talks about a particular domain (e.g., hotel), and a certain number of slots (e.g., price range) are associated with the domain. We denote all the possible domain-slot pairs as , where each domain-slot pair has tokens. The task of DST is tracking the states over the whole dialogue; in other words, at each turn we need to predict the values for each (e.g., ) considering the context , where has tokens. We follow recent strategies wu2019transferable; xu2018end to predict the values for all the domain-slot pairs in at each turn. Our intuition is that we can find values from pre-defined picklists, if we have access to the partial or full candidate-value lists. Otherwise, we need to directly find the values as text spans in the dialogue context. We call the former type a picklist-based slot, and the latter one a span-based slot. Here we assume that domain-slot pairs in are treated as the span-based slots, and the remaining pairs as the picklist-based slots. Each picklist-based slot has possible candidate values, i.e., , where is the size of the picklist, and each value has tokens.

We then propose a novel dual strategy for DST. Fig. 2 shows the overview of our model. We first utilize a pre-trained BERT devlin2018bert to encode information about the dialogue context along with each domain-slot pair in , to obtain contextualized representations conditioned on the domain-slot information. We then use a slot gate to handle special types of values. For the span-based slots, we utilize a two-way linear mapping to find text spans. For the picklist-based slots, we select the most plausible values from the picklists based on the contextual representation.

3.1 Slot-Context Encoder

We employ a pre-trained BERT devlin2018bert to encode the domain-slot types and dialog contexts. For the domain-slot pair and the dialog context at turn , we concatenate them and get corresponding representations:

(1)

where [CLS] is a special token added in front of every sample, and [SEP] is a special separator token. The outputs of BERT in Eq. (1) can be decomposed as , where is the aggregate representation for the total sequential input tokens, are the token-level representations. They will be used for slot-value predictions in the following sections, and BERT will be fine-tuned during the training process.

3.2 Slot-Gate Classification

As there are many domain-slot pairs in multi-domain dialogues, it is nontrivial to correctly predict whether a domain-slot pair appears at each turn of the dialogue. Here we add a slot gate classification module wu2019transferable; xu2018end to our neural network. Specifically, at turn , the classifier makes a decision among , where denotes that a domain-slot pair is not mentioned at this turn, implies that the user can accept any values for this slot, and represents that the slot should be processed by the model with a real value. We utilize for the slot-gate classification, and the probability for the domain-slot pair at turn is calculated as:

(2)

where and are learnable parameters and bias, respectively.

The loss for slot gate classification is computed as:

(3)

where is the one-hot gate label for the domain-slot pair at turn .

3.3 Span-Based Slot-Value Prediction

For each span-based slot, its value can be mapped to a span with start and end position in the dialog context, e.g., slot of the domain has spans in the context. We take token-level representations of the dialog context as input, and apply a two-way linear mapping to get a start vector and an end vector :

(4)

where and are learnable parameters and bias, respectively.

The probability of the word being the start position of the span is computed as: , and the loss for the start position prediction can be calculated as:

(5)

where is the one-hot start position label for the domain-slot pair at turn .

Similarly we can get the loss for end positions prediction. The total loss for the span-based slot-value prediction is the summation of and .

3.4 Picklist-Based Slot-Value Prediction

Each picklist-based slot has several candidate values, e.g., slot in the domain has possible values . At turn , for the domain-slot type, we first use a separate pre-trained BERT to get the aggregate representations of corresponding values:

(6)

where , is the the number of candidate values. Note that during the training process the model parameters of this separate BERT are fixed.

We formulate a relevance score of the aggregate representation given a reference candidate by the cosine similarity lin2017adversarial:

(7)

where and are the aggregate representations from the slot-context encoder and the reference candidate value, respectively.

During the training process, we employ a hinge loss to enlarge the difference between the similarity of to the target value and that to the most similar value in the candidate-value list:

(8)

where is a constant margin, .

3.5 Training Objective

During training process, the above three modules can be jointly trained and share parameters of BERT. We optimize the summations of different losses as:

(9)

4 Experimental Setup

Domain Hotel Train Restaurant Attraction Taxi All Domains Total Turns
Slots
price range
type
parking
book stay
book day
book people
area
stars
internet
name
destination
day
departure
arrive by
book people
leave at
food
price range
area
name
book time
book day
book people
area
name
type
leave at
destination
departure
arrive by
- -
Train 3381 3103 3813 2717 1654 8421 56668
Validation 416 484 438 401 207 1001 7374
Test 394 494 437 395 195 1000 7368
Table 1: The dataset information of MultiWOZ 2.1. There are in total 30 domain-slot pairs of 5 selected domains as shown in the top two rows. The last three rows show the number of dialogues related to each domain. The last two columns show the total number of the dialogues and total conversational turns for all the domains.

4.1 Dataset

To demonstrate the performance of our DS-DST, we use the recent released MultiWOZ 2.1 dataset eric2019multiwoz. It is one of the largest multi-domain dialogue corpora to-date with seven distinct domains and over dialogues. Compared with the original MultiWOZ 2.0 dataset budzianowski2018multiwoz, it conducts dataset correction, including correcting dialog states, typos and mis-annotations to reduces several substantial noises, making the dataset more challenging (more details can be found in eric2019multiwoz). As hospital and police contain very few dialogues ( of total dialogues), and they only appear in the training dataset, we ignore them in our experiments, following wu2019transferable. We adopt only five domains (train, restaurant, hotel, taxi, attraction) and obtain totally domain-slot pairs in the experiments. Table 1 summarises the domain-slot pairs and their corresponding statistics. We follow the standard training/validation/test split strategy provided in the original dataset budzianowski2018multiwoz; eric2019multiwoz, and the data pre-processing way provided in wu2019transferable. Instead of formulating the candidate-value list for each picklist-based slot through directly using the incomplete ontology lee2019sumbt of MultiWOZ 2.1, we construct the candidate-value list for each picklist-based slot through traversing the dataset.

4.2 Models

Due to the lack of official pre-processing standard for the MultiWOZ dataset,111The pre-processing way provided in wu2019transferable is currently suggested by the data providers: http://dialogue.mi.eng.cam.ac.uk/index.php/corpus/ different ways would affect the performance evaluation (more details can be found in Sec. 6). To make a fair comparison, we only adopt several state-of-the-art baselines which either follow the same data pre-processing way wu2019transferable or the results are provided by the data providers eric2019multiwoz:

SpanPtr xu2018end: It applies a pointer-network based model to find text spans with start and end pointers for each domain-slot pair.

FJST eric2019multiwoz: It contains a bidirectional LSTM network to encode the dialog context and a separate feedforward network to predict each dialog state slot.

HyST goel2019hyst: It is a hybrid approach based on hierarchical RNNs and an open-vocabulary generation.

DSTreader gao2019dialog: It models the DST from the perspective of text reading comprehensions and applies a pre-trained BERT to set word embeddings.

TRADE wu2019transferable: It contains a slot gate module for slots classification and a pointer generator for states generation.

Among the above baselines. SpanPtr and TRADE are originally tested on DSTC2 dataset henderson2014second and MultiWOZ 2.0 dataset, respectively. We utilize the publicly available code for TRADE222https://github.com/jasonwu0731/trade-dst and implement SpanPtr, to evaluate these two models on the MultiWOZ 2.1 dataset.

For our proposed methods, we design three variants:

DST-Span: Similar to xu2018end, it treats all domain-slot pairs as span-based slots, where corresponding values for each slot are extracted through text spans (string matching) with start and end positions in the dialog context;

DST-Picklist: It treats all domain-slot pairs as picklist-based slots, where corresponding values for each slot are found in the candidate-value list;

DS-DST: It includes both span-based slots and picklist-based slots. In the default setting, as some slots related to and could have lots of possible values, e.g., the slot of of domain could fall into depends on the users. We treat these kinds of slots as span-based slots, resulting in five slot types across four domains (nine domain-slot pairs in total). For the other slots which have limited set of values, e.g., the slot of of domain has two candidate values . We treat these kinds of slots as picklist-based slots, and there are in total such domain-slot pairs.

During evaluation, we evaluate all models using the joint accuracy metric. For joint accuracy, at each turn, the joint accuracy is if and only if all triplets are predicted correctly, otherwise .

4.3 Training Details

We employ the pre-trained BERT model (bert-base-uncased) with 12 layes of 768 hidden units and 12 self-attention heads.333https://github.com/huggingface/transformers/tree/master/examples During the fine-tuning process, we update all the model parameters using the BertAdam devlin2018bert optimizer with an initial learning rate . The proportion for learning rate warmup is set to . We follow the learning rate decay mechanism as in lee2019sumbt, and we set early stopping based on the join accuracy on the validation set. The constant margin is set to for DS-DST and DST-Picklist. The maximum total input sequence length after WordPiece tokenization for BERT is set to , and the maximum training epoch is set to .

Joint Accuracy
SpanPtr xu2018end 29.09%
FJST eric2019multiwoz 38.00%
HyST goel2019hyst 38.10%
DSTreader gao2019dialog 36.40%
TRADE wu2019transferable 45.96%
DST-Span 40.39%
DS-DST 51.21%
DST-Picklist 53.30%
Table 2: Joint accuracy on the test set of MultiWOZ 2.1. : results reported by eric2019multiwoz.

5 Experimental Results

Table 2 shows the results on the test set. We can see that DS-DST and DST-Picklist achieve the top-two performance, which surpasses the current state-of-the-art TRADE model by and , respectively. Comparing DST-Span and DS-DST, we can find that treating slots as span-based and picklist-based slots is indeed helpful in multi-domain DST. The performance of DST-Picklist shows that our method could further improve the DST performance when the full ontology or database is accessible.

Compared with those methods which predict all slot values through span matching with start and end positions in the dialog context, we can observe that DST-Span which employs the strength of BERT outperforms SpanPtr by , and it also outperforms the DSTreader which uses the pre-trained BERT model as word embeddings. We attribute the performance difference between DST-Span and the other two variations to the limitations of using span matching, since some values can not be found in the dialog context due to the annotation errors or diverse descriptions from users. For example, the ground truth label for hotel parking is ‘yes’ while it is described in a different way as ‘need parking’ in the user utterances.

5.1 Analysis

In this paper, we design a dual-strategy model, which classifies domain-slot pairs into span-based slots and picklist-based slots based on human heuristics. To further investigate the effectiveness of our methods, we redesign the SpanPtr model which originally predicts values of each domain-slot pair through attention based span matching with start and end positions in the dialog context. The new model SpanPtr-New follows the SpanPtr architecture with the exception that some slots are picklist-based slots followed our default setting and can be found in candidate-value lists.

Table 3 presents the comparative results. DST-Picklist outperforms DST-Span by , SpanPtr-New outperforms SpanPtr by , which implies that our dual strategy using both span-based slots and picklist-based slots could largely improve the DST performance when tracking states across different domains. Furthermore, compared with SpanPtr and SpanPtr-New, we can observe that DST-Span and DS-DST improve the performance by and , respectively. In addition, DST-Span and DS-DST both achieve better performance when compared with DSTreader. It verifies the effectiveness of our way to apply pre-trained models to the DST task.

SpanPtr SpanPtr-New DST-Span DS-DST
Joint Accuracy 29.09% 42.17% 40.39% 51.21%
Table 3: Joint accuracy on the test set of MultiWOZ 2.1.
Concatenation Slot Description Question Asking
DS-DST 54.72% 55.74% 53.91%
DST-Picklist 56.33% 56.75% 56.17%
Table 4: Joint Accuracy with different representations of the domain-slot pairs on the validation set. ‘Concatenation’ concatenates the domain and slot directly, ‘Slot Description’ applies a sentence description to describe each domain-slot pair, ‘Question Asking’ represents the domain-slot pair in a question asking way.

As our model is inspired by visual question answering and text reading comprehensions, the way to handle domain-slot pairs is similar to question answering models. Here we investigate the impacts of different ways of representing domain-slot pairs on the dialog state tracking performance. We design three types of representations: (1) concatenation: It is the concatenation of the domain and slot directly, e.g., represents the slot of domain, which is the default setting used in the paper. (2) Slot Description: It is the slot description for the domain-slot pair, e.g., . (3) Question Asking: It interacts the domain-slot pair with the dialog context through asking a question, e.g., . More details can be found in the appendix.

Table 4 shows the performances of different representations on the validation set. We can see that these three types of representations achieve similar performance, where using slot descriptions is slightly better than the other two types, and the ‘Question Asking’ type doesn’t improves the performance. We hypothesis that in our design of questions, there are only few question variations, e.g, ‘what’, ‘where’, ‘which’, where similar question types for different slots may affect the DST performance. In real applications, it could be better to use slot descriptions when there are carefully designed descriptions for different domain-slot pairs.

Threshold-10 Threshold-100 DST-Span DS-DST
Joint Accuracy 49.08% 54.11% 43.25% 54.72%
Table 5: Joint accuracy on the validation set based on different variations of choosing span-based slots and picklist-based slots. Threshold-10 and Threshold-100 mean choosing picklist-based slots based on the size of candidate-value lists.

In the paper’s default setting, the domain-slot pairs related to and are treated as span-based slots followed human heuristics. To investigate the potential effects of ways to decide span-based slots and picklist-based slots, we design two more variations, where we select picklist-based slots through setting thresholds for the sizes of candidate-value lists: (1) Threshold-10: It treats the domain-slot pairs with candidate values no more than choices as picklist-based slots, while the other domain-slot pairs as span-based slots, resulting in picklist-slot pairs; (2) Threshold-100: It treats the domain-slot pairs with no more than candidate values as picklist-based slots, resulting in picklist-slot pairs.

Table 5 presents the results of different variations on the validation set. Based on the results, we can see that treating all slots as span-based slots cannot help multi-domain DST performance. While deciding the slot types based on threshold is helpful, its performance is worse than choosing span-based slots and picklist-based slots based on human heuristics. For example, Threshold-100 has more picklist-based slots (), but its performance is worse than DS-DST with less slots () based on human heuristics.

6 Open Discussion

With the recent release of MultiWOZ 2.0 budzianowski2018multiwoz and its later update MultiWOZ 2.1 eric2019multiwoz, multi-domain dialog state tracking is enjoying popularity in enhancing task-oriented dialog systems, handling tasks across different domains and supporting large number of services. However, a potential problem is that no standard ways are available to handle the MultiWOZ dataset; the test set are usually modified based on different pre-processing ways. For example, wu2019transferable fixes general label errors for the dataset,444https://github.com/jasonwu0731/trade-dst/blob/master/utils/fix_label.py and lee2019sumbt treats all the slot-value labels in the dataset not appeared in the ontology as none.555see convert_to_glue_format.py under the
https://github.com/SKTBrain/SUMBT/blob/master/data/multiwoz/original.zip
Moreover, there are around of annotation errors of turns in the MultiWOZ 2.0 dataset as reported in eric2019multiwoz, and more than of that in the MultiWOZ 2.1 dataset from our statistics, hence even slight modifications on the test set would make a difference. To make fair comparisons and facilitate the research on multi-domain dialog state tracking in task-oriented dialog systems, it is important to have the standard way to handle the test set and the evaluation process.

7 Conclusion

In this work, we present the DS-DST model for multi-domain dialog state tracking, which treats domain-slot pairs as span-based slots and picklist-based slots based on human heuristics. Our strategy is consistent with real scenarios and flexible to real applications and datasets. DS-DST mitigates the issues existing in previous work and achieves state-of-art results on the MultiWOZ 2.1 dataset.

References

Appendix A Appendix

a.1 Different Representations of the Domain-Slot Pairs

Table 6 presents the three ways of representing domain-slot pairs over five domains (train, restaurant, hotel, taxi, attraction) on the MultiWOZ 2.1 dataset. The other two domains (police, hospital) only appear in the training set with very few dialogues ( of total dialogues), and we do not show them here. The slots of (arrive by, leave at, book stay, book people, book time) belong to the span-based slots (nine in total), and the others belong to the picklist-based slots ( in total). ‘Slot Description’ and ‘Question Asking’ are only used in Sec 5.1, and there could be better ways to design them.

Concatenation Slot Description Question Asking
hotel price range price range of hotel what is the price range for the hotel?
hotel type type of hotel what is the hotel type?
hotel parking parking of hotel does the person need parking for the hotel?
hotel book stay number of days to stay in the hotel how many days does the person want to stay for the hotel?
hotel book day date to book the hotel which day does the person want to book for the hotel?
hotel book people number of people to book for the hotel how many people does the person want to book for the hotel?
hotel area area of hotel which area does the person want to select for the hotel?
hotel stars stars of hotel what star does the person need for the hotel?
hotel internet internet of hotel does the person need internet for the hotel?
train destination destination of the train where is the destination of the train?
train day date to book for the train which day does the person want to book for the train?
train departure departure place of the train where is the departure of the train?
train arrive by arrival time of the train what time will the train arrive by?
train book people number of people to book for the train how many people does the person want to book for the train?
train leave at time when the train leave what time will the train leave at?
attraction area area of attraction which area does the person want to select for the attraction?
restaurant food food of restaurant what type of the food does the person need for the restaurant?
restaurant price range price range of the restaurant what is the price range for the restaurant?
restaurant area area of restaurant which area does the person want to select for the restaurant?
attraction name name of attraction what is the attraction name?
restaurant name name of restaurant what is the restaurant name?
attraction type type of attraction what is the attraction type?
hotel name name of hotel what is the hotel name?
taxi leave at time when the taxi leave what time will the taxi leave at?
taxi destination destination of the taxi where is the destination of the taxi?
taxi departure departure place of taxi where is the departure of the taxi?
restaurant book time time to book for the restaurant what time does the person want to book for the restaurant?
restaurant book day date to book for the restaurant which day does the person want to book for the restaurant?
restaurant book people number of people to book for the restaurant how many people does the person want to book for the restaurant?
taxi arrive by arrival time of the taxi what time will the taxi arrive by?
Table 6: Different representations of the domain-slot pairs of the MultiWOZ 2.1 dataset. ‘Concatenation’ concatenates the domain and slot directly, and it is used as one of default settings in the paper. ‘Slot Description’ applies a sentence description to describe each domain-slot pair. ‘Question Asking’ represents the domain-slot pair in a question asking way.

a.2 Sample Output

We present outputs of DS-Span and DS-DST in all the turns for two sample dialogues (MUL0729, PMUL2428) on the validation set of the MultiWOZ 2.1 dataset. Table 7 and Table 8 show the results for MUL0729 and PMUL2428 respectively. In Table 7, hotel type and hotel internet are predicted incorrectly by DST-Span, where the value yes of hotel internet has a different description free wifi in the dialog context. For this type of values, DST-Span cannot find the spans directly in the dialog context. In Table 8, DST-Span does not predict the state <taxi, departure, funky fun house> at turn .

User: i am looking for a train from ely to cambridge . do you have such trains ?
Ground Truths: <train, destination, cambridge>, <train, departure, ely>
Predicted Dialog State (DST-Span): <train, destination, cambridge>, <train, departure, ely>
Turn 1 Predicted Dialog State (DS-DST): <train, destination, cambridge>, <train, departure, ely>
System: there are 70 trains do you have a date and time you would like to go ?
User: i would like to leave on a sunday and arrive by 11:30 .
Ground Truths: <train, arrive by, 11:30>, <train, destination, cambridge>, <train, departure, ely>, <train, day, sunday>
Predicted Dialog State (DST-Span): <train, arrive by, 11:30>, <train, destination, cambridge>, <train, departure, ely>, <train, day, sunday>
Turn 2 Predicted Dialog State (DS-DST): <train, arrive by, 11:30>, <train, destination, cambridge>, <train, departure, ely}, <train, day, sunday>
System: tr4212 is arriving at 9:52 would that work for you ?
User: what time does that train depart ?
Ground Truths: <train, arrive by, 11:30>, <train, destination, cambridge>, <train, departure, ely>, <train, day, sunday>
Predicted Dialog State (DST-Span): <train, arrive by, 11:30>, <train, destination, cambridge>, <train, departure, ely>, <train, day, sunday>
Turn 3 Predicted Dialog State (DS-DST): <train, arrive by, 11:30>, <train, destination, cambridge>, <train, departure, ely>, <train, day, sunday>
System: the train leaves at 9:35 . shall i book it for you ?
User: no , thank you . i would like to see if you could find a place for me to stay that would have a 4 star rating .
Ground Truths: <train, day, sunday>, <hotel, stars, 4>, <train, destination, cambridge>, <train, arrive by, 11:30>, <train, departure, ely>
Predicted Dialog State (DST-Span): <train, day, sunday>, <hotel, stars, 4>, <train, destination, cambridge>, <train, arrive by, 11:30>, <train, departure, ely>
Turn 4 Predicted Dialog State (DS-DST): <train, day, sunday>, <hotel, stars, 4>, <train, destination, cambridge>, <train, arrive by, 11:30>, <train, departure, ely>
System: of course , there are many 4 star hotel -s to choose from . would you like me to book 1 for you ?
User: not yet , thanks . are any of them guesthouses ?
Ground Truths: <train, day, sunday>, <hotel, stars, 4>, <train, destination, cambridge>, <train, arrive by, 11:30>, <train, departure, ely>, <hotel, type, guest house>
Predicted Dialog State (DST-Span): <train, day, sunday>, <hotel, stars, 4>, <train, destination, cambridge>, <train, arrive by, 11:30>, <train, departure, ely>, \ul<hotel, type, ;>
Turn 5 Predicted Dialog State (DS-DST): <train, day, sunday>, <hotel, stars, 4>, <train, destination, cambridge>, <train, arrive by, 11:30>, <train, departure, ely>, <hotel, type, guest house>
System: there are 18 guesthouses to choose from , do you have a preference to the area you would like to stay ?
User: i need a 4 star , and in the east with free wifi for 4 people , 5 nights . i’ll need a reference number .
Ground Truths:
<train, day, sunday>, <hotel, book stay, 5>, <hotel, book people, 4>, <hotel, stars, 4>, <train, destination, cambridge>, <hotel, internet, yes>,
<train, arrive by, 11:30>, <train, departure, ely>, <hotel, area, east>, <hotel, type, guest house>
Predicted Dialog State (DST-Span):
<train, day, sunday>, <hotel, book stay, 5>, <hotel, book people, 4>, <hotel, stars, 4>, <train, destination, cambridge>, \ul<hotel, internet, ;>,
<train, arrive by, 11:30>, <train, departure, ely>, <hotel, area, east>, \ul<hotel, type, ;>
Turn 6 Predicted Dialog State (DS-DST):
<train, day, sunday>, <hotel, book stay, 5>, <hotel, book people, 4>, <hotel, stars, 4>, <train, destination, cambridge>, <hotel, internet, yes>,
<train, arrive by, 11:30>, <train, departure, ely>, <hotel, area, east>, <hotel, type, guest house>
System: do you want that guesthouse reservation to begin on sunday ?
User: yes . i need 5 nights starting on sunday .
Ground Truths:
<train, day, sunday>, <hotel, book stay, 5>, <hotel, book people, 4>, <hotel, stars, 4>, <train, destination, cambridge>, <hotel, internet, yes>,
<hotel, type, guest house>, <train, arrive by, 11:30>, <train, departure, ely>, <hotel, area, east>, <hotel, book day, sunday>
Predicted Dialog State (DST-Span):
<train, day, sunday>, <hotel, book stay, 5>, <hotel, book people, 4>, <hotel, stars, 4>, <train, destination, cambridge>, <hotel, internet, yes>,
\ul<hote, type, ;>, <train, arrive by, 11:30>, <train, departure, ely>, <hoelm area, east>, <hotel, book day, sunday>
Turn 7 Predicted Dialog State (DS-DST):
<train, day, sunday>, <hotel, book stay, 5>, <hotel, book people, 4>, <hotel, stars, 4>, <train, destination, cambridge>, <hotel, internet, yes>,
<hotel, type, guest house>, <train, arrive by, 11:30>, <train, departure, ely>, <hotel, area, east>, <hotel, book day, sunday>
System: i have confirmed your reservation at allenbell starting on sunday for 5 nights . your reference number is sltivabu . can i help with anything else ?
User: that is all i need . thank you so much for all your help .
Ground Truths:
<train, day, sunday>, <hotel, book stay, 5>, <hotel, book people, 4>, <hotel, stars, 4>, <train, destination, cambridge>, <hotel, internet, yes>,
<hotel, type, guest house>, <train, arrive by, 11:30>, <train, departure, ely>, <hotel, area, east>, <hotel, book day, sunday>
Predicted Dialog State (DST-Span):
<train, day, sunday>, <hotel, book stay, 5>, <hotel, book people, 4>, <hotel, stars, 4>, <train, destination, cambridge>, <hotel, internet, yes>,
\ul<hotel, type, ;>, <train, arrive by, 11:30>, <train, departure, ely>, <hotel, area, east>, <hotel, book day, sunday>
Turn 8 Predicted Dialog State (DS-DST):
<train, day, sunday>, <hotel, book stay, 5>, <hotel, book people, 4>, <hotel, stars, 4>, <train, destination, cambridge>, <hotel, internet, yes>,
<hotel, type, guest house>, <train, arrive by, 11:30>, <train, departure, ely>, <hotel, area, east>, <hotel, book day, sunday>
Table 7: Predicted dialog states of DST-Span and DS-DST for dialogue ID MUL0729 with domains (train, hotel) on the MultiWOZ 2.1 dataset.
User: i am planning a trip to go to a particular restaurant , can you assist ?
Ground Truths:
Predicted Dialog State (DST-Span):
Turn 1 Predicted Dialog State (DS-DST):
System: sure , what is the name of this particular restaurant ?
User: it s called nandos and i would like to book it for monday at 15:00 . there will be 6 people .
Ground Truths: <restaurant, book day, monday>, <restaurant, name, nandos>, <restaurant, book time, 15:00>, <restaurant, book people, 6>
Predicted Dialog State (DST-SPan): <restaurant, book day, monday>, <restaurant, name, nandos>, <restaurant, book time, 15:00>, <restaurant, book people, 6>
Turn 2 Predicted Dialog State (DS-DST): <restaurant, book day, monday>, <restaurant, name, nandos>, <restaurant, book time, 15:00>, <restaurant, book people, 6>
System: no problem ! i have your table reserved for 15:00 on monday . they will hold your table for 15 minutes , your reference number is hvb51vam .
User: thank you . am also looking for place -s to go in town . the attraction should be in the east
Ground Truths: <restaurant, book day, monday>, <restaurant, name, nandos>, <restaurant, book people, 6>, <attraction, area, east>, <restaurant, book time, 15:00>
Predicted Dialog State (DST-Span): <restaurant, book day, monday>, <restaurant, name, nandos>, <restaurant, book people, 6>, <attraction, area, east>, <restaurant, book time, 15:00>
Turn 3 Predicted Dialog State (DS-DST): <restaurant, book day, monday>, <restaurant, name, nandos>, <restaurant, book people, 6>, <attraction, area, east>, <restaurant, book time, 15:00>
System: what are you looking to do ? my system has located 10 place -s .
User: could you recommend something ?
Ground Truths: <restaurant, book day, monday>, <restaurant, name, nandos>, <restaurant, book people, 6>, <attraction, area, east>, <restaurant, book time, 15:00>
Predicted Dialog State (DST-Span): <restaurant, book day, monday>, <restaurant, name, nandos>, <restaurant, book people, 6>, <attraction, area, east>, <restaurant, book time, 15:00>
Turn 4 Predicted Dialog State (DS-DST): <restaurant, book day, monday>, <restaurant, name, nandos>, <restaurant, book people, 6>, <attraction, area, east>, <restaurant, book time, 15:00>
System: yes , the funky fun house is a great place .
User: great ! can you tell me what the entrance fee is please ?
Ground Truths:
<restaurant, book day, monday>, <restaurant, name, nandos>, <restaurant, book people, 6>, <attraction, area, east>, <restaurant, book time, 15:00>,
<attraction, name, funky fun house>
Predicted Dialog State (DST-Span):
<restaurant, book day, monday>, <restaurant, name, nandos>, <restaurant, book people, 6>, <attraction, area, east>, <restaurant, book time, 15:00>,
<attraction, name, funky fun house>
Turn 5 Predicted Dialog State (DS-DST):
<restaurant, book day, monday>, <restaurant, name, nandos>, <restaurant, book people, 6>, <attraction, area, east>, <restaurant, book time, 15:00>
<attraction, name, funky fun house>
System: no , i am sorry . you will have to call them for the entrance fee . the phone number is 01223304705 .
User: thank you . i am also looking to book a taxi to travel between the 2 . i need it to arrive to the restaurant by the reservation time .
Ground Truths:
<restaurant, book day, monday>, <restaurant, name, nandos>, <restaurant, book people, 6>, <attraction, area, east>, <restaurant, book time, 15:00>,
<taxi, arrive by, 15:00>, <attraction, name, funky fun house>, <taxi, destination, nandos>, \ul <taxi, departure, funky fun house>
Predicted Dialog State (DST-Span):
<restaurant, book day, monday>, <restaurant, name, nandos>, <restaurant, book people, 6>, <attraction, area, east>, <restaurant, book time, 15:00>,
<taxi, arrive by, 15:00>, <attraction, name, funky fun house>, <taxi, destination, nandos>
Turn 6 Predicted Dialog State (DS-DST):
<restaurant, book day, monday>, <restaurant, name, nandos>, <restaurant, book people, 6>, <attraction, area, east>, <restaurant, book time, 15:00>,
<taxi, arrive by, 15:00>, <attraction, name, funky fun house> , <taxi, destination, nandos>, <taxi, departure, funky fun house>
System: your taxi is booked and will be a white audi . the contact number is 07057575130 . how else may i help you ?
User: that s all . thank you for your help !
Ground Truths:
<restaurant, book day, monday>, <restaurant, name, nandos>, <restaurant, book people, 6>, <attraction, area, east>, <restaurant, book time, 15:00>,
<taxi, arrive by, 15:00>, <attraction, name, funky fun house> , <taxi, destination, nandos>, <taxi, departure, funky fun house>
Predicted Dialog State (DST-Span):
<restaurant, book day, monday>, <restaurant, name, nandos>, <restaurant, book people, 6>, <attraction, area, east>, <restaurant, book time, 15:00>,
<taxi, arrive by, 15:00>, <attraction, name, funky fun house> , <taxi, destination, nandos>, <taxi, departure, funky fun house>
Turn 7 Predicted Dialog State (DS-DST):
<restaurant, book day, monday>, <restaurant, name, nandos>, <restaurant, book people, 6>, <attraction, area, east>, <restaurant, book time, 15:00>,
<taxi, arrive by, 15:00>, <attraction, name, funky fun house> , <taxi, destination, nandos>, <taxi, departure, funky fun house>
Table 8: Predicted dialog states of DST-Span and DS-DST for dialogue ID PMUL2428 with domains (taxi, attraction, restaurant) on the MultiWOZ 2.1 dataset.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
393362
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description