Augmenting Knowledge through Statistical,Goal-oriented Human-Robot Dialog

Augmenting Knowledge through Statistical,
Goal-oriented Human-Robot Dialog

Saeid Amiri, Sujay Bajracharya, Cihangir Goktolga, Jesse Thomason, and Shiqi Zhang Amiri, Goktolga, and Zhang are with SUNY Binghamton; samiri1@binghamton.eduBajracharya is with Cleveland State UniversityThomason is with the University of Washington

Some robots can interact with humans using natural language, and identify service requests through human-robot dialog. However, few robots are able to improve their language capabilities from this experience. In this paper, we develop a dialog agent for robots that is able to interpret user commands using a semantic parser, while asking clarification questions using a probabilistic dialog manager. This dialog agent is able to augment its knowledge base and improve its language capabilities by learning from dialog experiences, e.g., adding new entities and learning new ways of referring to existing entities. We have extensively evaluated our dialog system in simulation as well as with human participants through MTurk and real-robot platforms. We demonstrate that our dialog agent performs better in efficiency and accuracy in comparison to baseline learning agents. Demo video can be found at

I Introduction

Mobile robots have been extensively used to conduct tasks, such as guidance and object delivery, in the real world. Notable examples include the Amazon warehouse robots and the Relay robots from Savioke. However, these robots either work in human-forbidden environments, or have no interaction with humans except for obstacle avoidance. Researchers are developing mobile robot platforms that are able to interact with people in everyday, human-inhabited environments [1, 2, 3, 4]. Some of the robot platforms can learn from the experience of human-robot interaction (HRI) to improve their language skills, e.g., learning new synonyms [5], but none of them learn entirely new entities. This work aims at a multitask dialog management problem, where a robot simultaneously identifies service requests through human-robot dialog and learns new entities from this experience to augment its internal knowledge base (KB).

A robot dialog system typically includes at least three components for language understanding: state tracking, dialog management, and language synthesis. Our dialog agent includes the four components by further supporting dialog-based knowledge augmentation. Our dialog system is goal-oriented, and aims at maximizing information gain. In this setting, people prefer dialog agents that are able to accurately identify human intention using fewer dialog turns.

Fig. 1: Our dialog agent is implemented and deployed on a Segway-based mobile robot platform (front and back).

Goal-oriented dialog systems are necessary for language-based human-robot interaction because, in most cases, people cannot fully and accurately deliver information using a single dialog turn. Consider a service request of “Robot, please deliver a coffee to the conference room!” It is possible that the robot does not know which conference room the speaker is referring to, in which case it is necessary to ask clarification questions such as “Where should I deliver a coffee?” in order to perform the correct action. To further identify the service request, the robot might want to ask about the recipient as well: “For whom is the delivery?” Although such goal-oriented dialog systems have been implemented on robots, few of them can learn to improve their language capabilities or augment their KB from the experience of human-robot conversations in the real world (details in Section II).111In comparison, there are dialog agents that aim at maximizing social engagement and prefer extended conversations, e.g., Microsoft XiaoIce, which are beyond the scope of this work.

This work focuses on dialog-based robot knowledge augmentation, where the agent must identify when it is necessary to augment its KB and where in the KB to do that, as applied to our Segway-based mobile robot shown in Fig. 1. In this paper, we develop a dual-track dialog manager to help the agent maintain a confidence level of how well the current dialog being supported by the KB, and accordingly decide to whether to augment its KB or not. After the agent becomes confident that new entities are necessary so as to make progress in the dialog, it decides where in the KB to add a new entity (e.g., a new item or a new person is being referred to by the user) by analyzing the flow of the dialog. As a result, our dialog agent is able to decide both when and how to augment its KB in a semantically meaningful way.

Our dialog system has been evaluated in simulation and in the real world. Results show that our dialog system performs better in service request identification (both efficiency and accuracy), in comparison to baselines that use predefined strategies. Human-subject experiments suggest that our knowledge augmentation component improves user experience as well.

Ii Related Work

Researchers have developed algorithms for learning to interpret natural language commands [6, 7, 8]. Recent research enabled the co-learning of syntax and semantics of spatial language [9, 10]. Although the systems support the learning of language skills, they do not have a dialog management component (implicitly assuming perfect language understanding), and hence do not readily support multi-turn communications.

Algorithms have been developed for dialog policy learning [11, 12, 13]. Recent research on Deep RL has enabled dialog agents to learn complex representations for dialog management [14, 15]. The systems do not include a language parsing component. As a result, users can only communicate with their dialog agents using simple or predefined language patterns.

Mobile robot platforms have been equipped with semantic parsing and dialog management capabilities. After a task is identified in dialog, these robots are able to conduct service tasks using a task planner [3, 16, 17]. Although these works enable a robot to identify human requests via dialog, they do not enable learning from these experiences.

Dialog agents for mobile service robots have been developed for identifying service tasks such as human guidance and object delivery [5, 18]. A dialog manager suggests language actions for asking clarification questions, and the agent is able to learn from human-robot conversations. These methods focus on learning to improve an agent’s language capabilities, but do not augment its knowledge base in this process (i.e., only pre-defined people, objects, and environmental locations can be reasoned about by the robot). This work builds on the dialog agent implemented by [5], and introduces a dual-track dialog-knowledge manager and a strategy for augmenting the robot’s knowledge base.

Fig. 2: A pictorial overview of our dialog system, including a hybrid parser for language understanding, and two management tracks for knowledge base (KB) and dialog respectively.

There are other dialog agents that specifically aim at knowledge augmentation through human-robot dialog [19, 20]. An instructable agent is able to learn new concepts and new procedural knowledge through human-robot dialog [21]. Recent work enabled a mobile robot to ground new concepts using visual-linguistic observations, e.g., to ground new word “box” given a command of “move to the box” by exploring the environment and hypothesizing potential new objects [22]. These agents are able to augment their knowledge bases through language-based interactions with humans. However, their dialog management components (if any) do not model the noise in language understanding. Researchers developed a robot dialog system that focuses on situated verb semantics learning [23]. Their dialog agent uses RL to learn a dialog management policy, and uses a semantic parser to process natural language inputs. A recent paper surveyed research on robot learning new tasks through natural language and action demonstration [24]. These works focused on learning the semantics of verbs, limiting the applicability of their knowledge augmentation approach. A recent work focuses on planning with open-world knowledge by reasoning about hypothetical objects while we focus more on modelling the language uncertainty [25].

Our dialog agent is the first that together: 1) processes language inputs using a semantic parser to understand users’ service requests, 2) leverages a dialog manager to account for the unreliability from the parser, and 3) augments its knowledge base using a knowledge manager.

Iii Dialog Agent

In this section, we present our dialog agent that integrates a decision-theoretic dialog manager, and an information-theoretic knowledge manager, as illustrated in Fig. 2.

Iii-a Dialog and Knowledge Management

Markov decision process (MDP) is a general sequential decision-making framework that can be used for planning under uncertainty [12]. Partially observable MDP (POMDP) [26] generalizes MDP to situations where ground truth world knowledge is fuzzy. POMDPs have been used for dialog management [27], where the intentions of the interlocutors are latent. There are two interleaved control loops in our dialog agent, resulting in a dual-track controller. One track focuses on maintaining the dialog belief state, and suggests language actions to the agent. The other focuses on maintaining the belief of the current knowledge being (in)sufficient to complete the task, and suggests knowledge augmentation.

Our dialog agent is implemented on a mobile service robot that communicates with human users using natural language to identify service tasks in the form of

where the agent must efficiently and accurately identify the service request (with unreliable language understanding capabilities) while augmenting KB on an as-needed basis.

Iii-A1 Dialog Management Track

The dialog management POMDP includes the following components:

  • , where is the set of task types (delivery and guidance in our case), is the set of items used in the task, is the set of recipients of the delivery, and term is the terminal state.

  • is the action set. consists of general “wh” questions, such as “Whom is this delivery for?” and “What item should I deliver?”. includes confirming questions that expect yes/no answers. Reporting actions return the estimated human requests.

  • is the state-transition function. In our case, the dialog remains in the same state after question-asking actions, and reporting actions lead transitions to term deterministically.

  • is the reward function. The reward values are assigned as:

    where and are the costs of confirming and general questions, in the form of negative, relatively small values; is a big bonus for correct reports; and is a big penalty (negative) for incorrect reports.

  • is the set of observations, where , and include observations of task type, item, and recipient respectively. and correspond to “yes” and “ no”. Our dialog agent takes in observations as semantic parses that have correspondence to elements in . Other parses, including the malformed ones, produce random observations (detailed shortly).

  • is the observation function that specifies the probability of observing in state , after taking action . Reporting actions yield the inapplicable observations. Our observation function models the noise in language understanding, e.g., the probability of correctly recognizing (“yes”) is . The noise model is heuristically designed in this work, though it can be learned from real conversations.

Solving this POMDP generates a policy , which maps a belief to a language action () that maximizes long-term information gain.

Iii-A2 Knowledge Management Track

In addition to the dialog management POMDP, we have a knowledge management POMDP that monitors whether the agent’s knowledge is sufficient to support estimating human intentions. The knowledge management POMDP formulation is similar to that for dialog management but includes entities for unknown items and recipients. The components of the knowledge management POMDP are:

  • the set of states. It includes all states in along with the states corresponding to new entities that correspond to an unknown item and an unknown recipient ;

  • the set of actions including the actions in , two actions ( and ) for confirming the unknown item and recipient, and , reporting actions that correspond to the states in ;

  • the augmented observation set, including and for unknown item and recipient.

Transition and observation functions are generated accordingly and hence not listed.

At runtime, we maintain belief distributions for both tracks of POMDPs. Belief of dialog POMDP is used for sequential decision making and dialog management, and belief of knowledge POMDP is only used for language augmentation purposes, i.e., determining when it is necessary to augment the KB. When observations are made (observation ), both beliefs are updated using the Bayes update rule [26]. In our dialog system, observations are made based on the language understanding using a semantic parser.

The dual-track controller identifies the first contribution of this work. We use a dual-track, instead of merging them to unify the action selection process, because the knowledge track is only used for the purpose of maintaining beliefs (not for action selection) and modeling the uncertainty of unknown entities in a single controller will result in unnecessarily long dialogs. Separating the two tracks reduces the learning complexity of the entire framework.

Iii-B Language Understanding

Fig. 3: An example of parsing a service request sentence using CCG semantic parsing and calculus.

In order to understand natural language and make observations for POMDPs, we use a semantic parser that builds on the Semantic Parsing Framework (SPF) described in [28]. The input of the semantic parser is natural language from human users, and the output is a list of possible parses for a given sentence. Using the semantic parser, the request in natural language is transformed to a formal representation compatible with the robot’s internal KB.

Figure 3 shows an example of the parser recognizing a sentence. It can reason over the ontology of the known words when it parses a sentence, e.g., james:pe and coffee:it. The dialog manager can use this information to translate from words to the corresponding observation for the question asked by the robot. If the language understanding fails (e.g., producing parses that are malformed or do not comply with the preceding questions), then a random observation from will be made for the unknown part of the request (introducing enough entropy to move the dialog along).

Iii-C Domain Knowledge Representation

We use Answer Set Prolog (ASP) [29], a declarative language, for knowledge representation. The agent’s knowledge base (KB), in the form of an ASP program, includes rules in the form of:

where ’s are literals that represent whether a statement is true. The right side of a rule is the body, and the left side is the head. The not symbol is called default negation, representing no evidence supporting a statement.

The KB of our agent includes a set of entities in ASP: {alice, sandwich, kitchen, office1, delivery, }, where delivery specifies the task type. A set of predicates, such as {recipient, item, task, room}, are used for specifying a category for each object. As a result, we can use ASP rules to fully specify tasks, such as “deliver a coke to Alice”:

One can easily include more information into the ASP-based KB, such as rooms, positions of people, and a categorical tree of objects. Robot’s KB is built on a lexicon that is a collection of information about the words of a language about the lexical categories. This ASP-based KB can be used for query responding and task planning purposes, where the query and/or task are specified by our dialog agent.

Fig. 4: In this example, the user requested a Pop to a novel recipient, Dennis. The dialog agent understood Pop, but not Dennis (Turn 0), and it mistakenly observed Alice as the recipient. The user denied Alice (Turn 1), and confirmed pop (Turn 2). The conversation continued, as our dialog agent kept trying to identify the recipient, until the number of EFs crossed a threshold ( in this case) in Turn 8. Accordingly, Dennis was added into the KB as a new recipient entity. The agent continued asking clarification questions while being aware of Dennis, until the dialog manager suggested the correct reporting action and delivers pop to Dennis. Although the clarification question may frustrate human, it makes the robot more confident in estimating human request.

Iii-D Algorithm for Knowledge Augmentation

We define a few functions before introducing the main algorithm for simultaneous dialog management and knowledge augmentation. We use entropy to measure the uncertainty level of the agent’s belief distribution:

When the agent is (un)confident about the state, the entropy value is (high) low. In particular, a uniform belief distribution corresponds to the highest entropy level. We use entropy for the following two purposes in our algorithm.

I) Rewording Service Request

If the belief entropy is higher than threshold (meaning the agent is highly uncertain about the dialog state), we encourage the human user to state the entire request in one sentence. Otherwise, the dialog manager decides the flow of the conversation.

II) Entropy Fluctuation

We introduce the concept of entropy fluctuation (EF):

where b is a belief queue that records dialog beliefs of the last three dialog turns, outputs true, if there is an EF in the last three beliefs (i.e., entropy of the second last is the highest or lowest among the three), and is the xor operator.

1:, and a POMDP solver
2:Initialize with uniform distributions
3:Initialize EF counter
4:Initialize queue b of size 3 with
6:     if  then
7:          Add a new recipient entity in KB
8:     else if  then
9:          Add a new item entity in KB
10:     else if   then
11:          Add (item or recipient) entity that is more likely      
12:     if  is true then
14:     if  then
15:          Please reword your service request
16:     else
19:      based on observation and action
20:     b.enqueue()
21:     if  then
22:           based on observation and action      
23:until  is
24:return the request based on the last (reporting) action, and the (possibly updated) knowledge base.
Algorithm 1 Dialog-based Knowledge Augmentation
Algorithm for Dialog-Knowledge Management

Algorithm 1 shows the main operation loop of the dialog system. and are models for dialog-track and knowledge-track control respectively; is a probability threshold; is an entropy threshold; and is a threshold over the number of EFs.

The algorithm starts by initializing the two beliefs with uniform distributions. , which counts the number of EFs, is initialized to . If the marginal probability over (or ) of knowledge belief is higher than threshold , or the number of EFs is higher than , we add a new entity into the KB. If the entropy of dialog belief is higher than , then the agent asks for rewording the original service request. Otherwise, we use the dialog POMDP to maintain the dialog flow. Finally, the knowledge belief is only updated by confirming questions, which are able to invalidate agent hypothesis of unknown entities. The algorithm returns the request and the updated knowledge base. When adding a new entity, the agent explicitly asks the user to help specify the name of the unknown item (or person). The KB is updated along with the robot’s lexicon for the semantic parser. The index for the unknown item or person is associated with the new entry. We utilized two functions to calculate the parameters and :

With the new knowledge added to KB, POMDPs are dynamically constructed so that the dialog can continue seamlessly, and the belief is replaced with reinitialized . Fig. 4 illustrates an example dialog.

Iv Experiments

We have evaluated our dialog agent both in simulation and with human participants. When the user verbalizes the entire request, the agent receives a sequence of three (unreliable) observations on task, item and recipient in a row. Unreliable language understanding is modeled in POMDP observations, e.g., the agent can correctly recognize “coffee” with probability ( in our case), and this probability decreases given more items in the KB. The reward of confirming questions is , and the reward of wh-questions is . The above settings were shared in experiments both in simulation and with human participants. POMDPs are solved using an off-the-shelf system [30].

Experiments were mainly designed to evaluate the following hypotheses: Our algorithm is able to I) Efficiently and accurately identify whether there is the need for KB augmentation or not, in case there is the need; II) Augment KB with higher F1 score under the noise in language understanding; and III) Both augment KB and recognize human intention in the service request with higher success rate while minimizing QA cost.

We compared our algorithm with two learning baselines that use predefined strategies to update their KB: Baseline-I augments KB only when the marginal probability of () reaches , and Baseline-II augments KB only when number of EF reaches threshold .

Evaluation metrics used in the experiments consist of: QA cost, the total cost of QA actions; Accuracy, in an accurate trial, robot correctly identifies its KB entity inadequacy; Success rate, where a trial is deemed successful, if the service request is correctly identified and (if needed) the KB is correctly augmented; and Dialog reward, where QA cost and bonus/penalty are considered together. Focusing on the knowledge augmentation accuracy, we also use F1 score as a harmonic average of precision and recall in evaluation.

Iv-a Experiments in Simulation

Fig. 5: Our dialog agent (Shown in ) is able to detect the need of KB augmentation with higher accuracy in fewer dialog turns compared to baselines. Covariance error ellipses calculated for 5 batches for each domain size. The numbers next to data points denotes the KB size.
Agent KB Size F1 Score (std.)
Dual Track Manager 0.79 (0.020)
Baseline I 17 0.59 (0.030)
Baseline II 0.61 (0.017)
Dual Track Manager 0.77 (0.025)

Baseline I
26 0.52 (0.016)
Baseline II 0.66 (0.007)
Dual Track Manager 0.62 (0.011)

Baseline I
37 0.47 (0.019)
Baseline II 0.46 (0.022)
TABLE I: F1 Score of KB Update Given Different KB Sizes
Fig. 6: Comparison between our agent and baselines in terms of both dialog and knowledge management.

To evaluate each of the hypotheses, we simulated 3,000 trials over various domain sizes. In each trial, a task, an item, and a recipient are sampled. is all possible combinations of item/recipient plus the terminal state. For instance, = 17 corresponds to a domain with 1 task, 4 items and 4 recipients. In the first experiment, we evaluated KB update accuracy versus the dialog turn in which the KB update has occurred (Hypothesis-I). Our algorithm consistently detects the need for KB augmentation earlier (fewer dialog turns) and with higher accuracy while baselines require longer conversations to figure out if they need a KB update (Fig. 5).

We further evaluated whether the entity added to KB matches with the human intention or not (Hypothesis-II). As presented in Table I, our algorithm consistently maintains higher F1 score in comparison to other baseline agents in the medium sized KB. Finally, we evaluated how our algorithm is capable of both correct KB augmentation as well as correct execution of the task (Hypothesis-III). Figure 6 shows that, our agent consistently maintains higher dialog reward while achieving lower QA cost and higher overall success. As the domain size increases, the agent gives up asking further questions that results in lower overall success and reward.

Iv-B Experiments with Human Participants

Twelve students of ages 19-30 volunteered to participate in an experiment where they asked the robot to conduct delivery tasks using the items and recipients shown in Fig. 7. Two items and two recipient in the lists were unknown to the robot, resulting in about (i.e., ) of the service requests not requiring knowledge augmentation. The participants were not aware of this setting, and arbitrarily chose any item-recipient pair to form a delivery task. Each participant conducted the experiment in two (randomly ordered) trials, where the robot used our dialog agent and a baseline agent with a static KB respectively.

By the end of each dialog, each participant filled out a survey form that includes prompts: Q1, Task is easy to participants; Q2, Robot understood participant; Q3, Robot frustrated participant; Q4, Participant will use the robot in the future. The response choices range from 0 (Strongly disagree) to 4 (Strongly agree). Fig. 8 shows results from the survey papers, and Table II shows the average scores. At the confidence level, our dialog agent performed significantly better in response for Q3 (frustrated) and Q4 (usefulness). There is no significance difference observed in responses to the other two questions.

Fig. 7: Left: A user is interacting with our robot. Right: Items and recipients used for specifying delivery tasks.
Mechanical Turk Experiment

Experiments have been conducted with 103 human participants via MTurk. The setup was same as the robot experiment, except that we used a more challenging baseline. The baseline agent augments its KB after a fixed number of dialog turns ( in our case). Each worker participated in only one trial (using our agent or the baseline, randomly selected). At a confidence level of 0.1, we found our agent to use significantly fewer dialog turns and achieve a significantly higher success rate on average. Despite the quantitative improvements, there was no significant difference observed from the scores collected from the survey prompts.222This could be because the testing domain is static and relatively simple, in the sense of the numbers of items and people. Additionally, MTurk workers are less invested in qualitative feedback than human users in the presence of a real robot.

Q1 Q2 Q3 Q4
Our dialog agent 3.42 2.50 1.50 2.50

A baseline agent with static KB
3.33 1.83 2.17 1.75
TABLE II: Results of the human participant experiment.
Fig. 8: Results of survey papers from participants, including four statements with Likert-scale responses.
An Example Trial on a Mobile Robot

Table III shows the dialog in a human-robot trial.333The agent was implemented using Robot Operating System (ROS) [31]. The semantic parser failed to correctly parse the request, because it did not know the meaning of get. As a result, the dialog agent made a random guess that is “Alice requesting hamburger”. The robot asked again for clarification on what it should deliver. The user responded “coffee”, making the robot confused between coffee and hamburger. The robot asked for confirmation about “hamburger”, and got denied by the user. Although the user was explicitly guided to specify the recipient by names provided in Figure 7, this participant used “me”, an unknown word to the robot. When the robot heard “Nate” the first time, it could not understand, and mistakenly interpreted it as “Ellen”. As a result, the agent confirmed “Ellen”, and got denied. After the negative responses, it added (Nate) to its KB, and became confident about the request.

Robot Human
How can I help you? Get me coffee
What item should I bring? Coffee
Do you want me to deliver hamburger? No
Who should I bring the item to? Nate
Is this delivery for ellen? No
It seems I do not know the person you are talking about. Please write their name so I can learn it. Nate
Who should I bring the item to? Nate
Is this delivery for Nate? Yes
What item should I bring? Coffee

Execute: Robot brings coffee for Nate; the dialog is over.
TABLE III: An example dialog from a human participant.

V Conclusions & Future Work

We introduced a dialog agent that simultaneously supports human intention identification and knowledge augmentation on an as-needed basis. Experiments show that our dual-track POMDP controller enables the agent to simultaneously conduct dialog and knowledge management. In comparison to a baseline that augments its knowledge base after a fixed number of turns, our dialog agent consistently produces higher overall dialog success. Experiments with human participants show that our agent is more successful in augmenting knowledge and estimating human intention, and people are more willing to use our system. In the future, we intend to improve our agent by minimizing the dialog duration (e.g. less double-checking attempts) as well as augmenting its knowledge with more complex structures, e.g., to model subclasses of items.


We are grateful to the BWI team at UT Austin for making their software available to the public. Part of this work has taken place in the Autonomous Intelligent Robotics (AIR) group at SUNY Binghamton. AIR research is supported in part by SUNY RF, and Ford.


  • [1] P. Khandelwal, S. Zhang, J. Sinapov, M. Leonetti, J. Thomason, F. Yang, I. Gori, M. Svetlik, P. Khante, V. Lifschitz et al., “BWIbots: A platform for bridging the gap between AI and human–robot interaction research,” The International Journal of Robotics Research, 2017.
  • [2] N. Hawes, C. Burbridge, F. Jovan et al., “The strands project: Long-term autonomy in everyday environments,” IEEE Robotics & Automation Magazine, vol. 24, no. 3, pp. 146–156, 2017.
  • [3] Y. Chen, F. Wu, W. Shuai, and X. Chen, “Robots serve humans in public places — kejia robot as a shopping assistant,” International Journal of Advanced Robotic Systems, vol. 14, no. 3, 2017.
  • [4] M. M. Veloso, “The increasingly fascinating opportunity for human-robot-ai interaction: The cobot mobile service robots,” ACM Transactions on Human-Robot Interaction, 2018.
  • [5] J. Thomason, S. Zhang, R. J. Mooney, and P. Stone, “Learning to interpret natural language commands through human-robot dialog,” in Proceedings of the 24th International Conference on Artificial Intelligence, 2015, pp. 1923–1929.
  • [6] C. Matuszek, E. Herbst, L. Zettlemoyer, and D. Fox, “Learning to parse natural language commands to a robot control system,” in Experimental Robotics, 2013, pp. 403–415.
  • [7] D. K. Misra, J. Sung, K. Lee, and A. Saxena, “Tell me dave: Context-sensitive grounding of natural language to manipulation instructions,” The International Journal of Robotics Research, vol. 35, no. 1-3, pp. 281–300, 2016.
  • [8] S. Tellex, T. Kollar, S. Dickerson, M. R. Walter, A. G. Banerjee, S. Teller, and N. Roy, “Understanding natural language commands for robotic navigation and mobile manipulation,” in Proceedings of the 25th AAAI Conference, 2011, pp. 1507–1514.
  • [9] M. Spranger and L. Steels, “Co-acquisition of syntax and semantics: an investigation in spatial language,” in Proceedings of the 24th International Conference on Artificial Intelligence, 2015.
  • [10] Z. Gong and Y. Zhang, “Temporal spatial inverse semantics for robots communicating with humans,” in Proceedings of (ICRA), 2018.
  • [11] S. Singh, D. Litman, M. Kearns, and M. Walker, “Optimizing dialogue management with reinforcement learning: Experiments with the njfun system,” JAIR, vol. 16, pp. 105–133, 2002.
  • [12] M. L. Puterman, Markov decision processes: discrete stochastic dynamic programming.    John Wiley & Sons, 2014.
  • [13] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction.    MIT press Cambridge, 1998.
  • [14] H. Cuayáhuitl, “Simpleds: A simple deep reinforcement learning dialogue system,” in Dialogues with Social Robots, 2017.
  • [15] K. Lu, S. Zhang, and X. Chen, “Goal-oriented dialogue policy learning from failures,” in AAAI, 2019.
  • [16] S. Zhang and P. Stone, “CORPP: commonsense reasoning and probabilistic planning, as applied to dialog with a mobile robot,” in Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015, pp. 1394–1400.
  • [17] D. Lu, S. Zhang, P. Stone, and X. Chen, “Leveraging commonsense reasoning and multimodal perception for robot spoken dialog systems,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2017, pp. 6582–6588.
  • [18] A. Padmakumar, J. Thomason, and R. J. Mooney, “Integrated learning of dialog strategies and semantic parsing,” in Proceedings of the 15th Conference of the European Chapter of the ACL, 2017, pp. 547–557.
  • [19] c. Meriçli, S. D. Klee, J. Paparian, and M. Veloso, “An interactive approach for situated task specification through verbal instructions,” in Proceedings of the 2014 international AAMAS conference, 2014.
  • [20] V. Perera, R. Soetens, T. Kollar, M. Samadi, Y. Sun, D. Nardi, R. van de Molengraft, and M. Veloso, “Learning task knowledge from dialog and web access,” Robotics, vol. 4, no. 2, pp. 223–252, 2015.
  • [21] A. Azaria, J. Krishnamurthy, and T. M. Mitchell, “Instructable intelligent personal agent.” in AAAI, 2016, pp. 2681–2689.
  • [22] M. Tucker, D. Aksaray, R. Paul, G. J. Stein, and N. Roy, “Learning unknown groundings for natural language interaction with mobile robots,” in ISRR, 2017.
  • [23] L. She and J. Chai, “Interactive learning of grounded verb semantics towards human-robot communication,” in Proceedings of ACL, 2017, pp. 1634–1644.
  • [24] J. Y. Chai, Q. Gao, L. She, S. Yang, S. Saba-Sadiya, and G. Xu, “Language to action: Towards interactive task learning with physical agents.” in IJCAI, 2018, pp. 2–9.
  • [25] Y. Jiang, N. Walker, J. Hart, and P. Stone, “Open-world reasoning for service robots,” in Proceedings of the 29th International Conference on Automated Planning and Scheduling (ICAPS), 2019.
  • [26] L. P. Kaelbling, M. L. Littman, and A. R. Cassandra, “Planning and acting in partially observable stochastic domains,” Artificial intelligence, vol. 101, no. 1-2, pp. 99–134, 1998.
  • [27] S. Young, M. Gašić, B. Thomson, and J. D. Williams, “POMDP-based statistical spoken dialog systems: A review,” Proceedings of the IEEE, vol. 101, no. 5, pp. 1160–1179, 2013.
  • [28] Y. Artzi, “Cornell SPF: Cornell semantic parsing framework,” arXiv preprint arXiv:1311.3011, 2013.
  • [29] M. Gelfond and Y. Kahl, Knowledge representation, reasoning, and the design of intelligent agents: The answer-set programming approach.    Cambridge University Press, 2014.
  • [30] H. Kurniawati, D. Hsu, and W. S. Lee, “Sarsop: Efficient point-based pomdp planning by approximating optimally reachable belief spaces.” in RSS, vol. 2008.    Zurich, Switzerland., 2008.
  • [31] M. Quigley, K. Conley, B. Gerkey, J. Faust, T. Foote, J. Leibs, R. Wheeler, and A. Y. Ng, “Ros: an open-source robot operating system,” in ICRA workshop on open source software, 2009.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description