Learning Hybrid Object Kinematics for Efficient Hierarchical Planning Under Uncertainty
Abstract
Sudden changes in the dynamics of robotic tasks, such as contact with an object or the latching of a door, are often viewed as inconvenient discontinuities that make manipulation difficult. However, when these transitions are wellunderstood, they can be leveraged to reduce uncertainty or aid manipulation—for example, wiggling a screw to determine if it is fully inserted or not. Current modelfree reinforcement learning approaches require large amounts of data to learn to leverage such dynamics, scale poorly as problem complexity grows, and do not transfer well to significantly different problems. By contrast, hierarchical POMDP planningbased methods scale well via plan decomposition, work well on novel problems, and directly consider uncertainty, but often rely on precise handspecified models and task decompositions. To combine the advantages of these opposing paradigms, we propose a new method, MICAH, which given unsegmented data of an object’s motion under applied actions, (1) detects changepoints in the object motion model using actionconditional inference, (2) estimates the individual local motion models with their parameters, and (3) converts them into a hybrid automaton that is compatible with hierarchical POMDP planning. We show that model learning under MICAH is more accurate and robust to noise than prior approaches. Further, we combine MICAH with a hierarchical POMDP planner to demonstrate that the learned models are rich enough to be used for performing manipulation tasks under uncertainty that require the objects to be used in novel ways not encountered during training.
I Introduction
Robots working in human environments need to perform dexterous manipulation on a wide variety of objects. Such tasks typically involve making or breaking contacts with other objects, leading to sudden discontinuities in the task dynamics. Furthermore, many objects exhibit configurationdependent dynamics, such as a refrigerator door that stays closed magnetically. While the presence of such nonlinearities in task dynamics can make it challenging to represent good manipulation policies and models, if wellunderstood, these nonlinearities can also be leveraged to improve task performance and reduce uncertainty. For example, when inserting a screw into the underside of a table, if direct visual feedback is not available, indirect feedback from wiggling the screw (a semirigid connection between the screw and the table) can be leveraged to ascertain whether the screw is inserted or not. In other words, the sensed change in dynamics (from freebody motion to rigid contact) serves as a landmark, partially informing the robot about the state of the system and reducing uncertainty. Such dynamics can be naturally represented as hybrid dynamics models or hybrid automata [14], in which a discrete state represents which continuous dynamics model is active at any given time.
Current modelfree reinforcement learning approaches [12, 7, 16, 17, 11] can learn to cope with hybrid dynamics implicitly, but require large amounts of data to do so, scale poorly as the problem complexity grows, face representational issues near discontinuities, and do not transfer well to significantly different problems. Conversely, hierarchical POMDP planningbased methods [11, 8, 3, 22, 27] can represent and reason about hybrid dynamics directly, scale well via plan decomposition, work well on novel problems, and reason about uncertainty, but typically rely on precise handspecified models and task decompositions. We propose a new method, Model Inference Conditioned on Actions for Hierarchical Planning (MICAH), that bridges this gap and enables hierarchical POMDP planningbased methods to perform novel manipulation tasks given noisy observations. MICAH infers hybrid automata for objects with configurationdependent dynamics from unsegmented sequences of observed poses of object parts. These automata can then be used to perform motion planning under uncertainty for novel manipulation tasks involving these objects.
MICAH consists of two parts, corresponding to our two main contributions: (1) an novel actionconditional inference algorithm called ActCHAMP for kinematic model estimation and changepoint detection from unsegmented data, and (2) an algorithm to construct hybrid automata for objects using the detected changepoints and estimated local models from ActCHAMP. Due to actionconditional inference, MICAH is more robust to noise and less vulnerable to several modes of failure than existing model inference approaches [18, 10, 23]. These prior approaches assume that the visual pose observations alone provide sufficient information for model estimation, which does not hold for many scenarios and can lead to poor performance. For example, an observationonly approach cannot distinguish between observations obtained by applying force against a rigid object and taking no action at all on a free body, estimating that the model is rigid in both the cases.
To evaluate the proposed method, we first show that for articulated objects, MICAH can correctly infer changepoints and the associated local models with higher fidelity and less data than a stateoftheart observationonly algorithm, CHAMP [18]. We also consider four classes of noisy data to demonstrate its robustness to noise. Next, to test the planningcompatibility of the learned models, we learn hybrid automata for a microwave and a drawer from human demonstrations and use them with a recently proposed hierarchical POMDP Planner, POMDPHD [8], to successfully manipulate them in new situations. Finally, we show that the learned models through MICAH are richenough to be leveraged creatively by a hierarchical planner for completing novel tasks efficiently—we learn a hybrid automaton for a stapler and use it to dexterously place the stapler at a target point that is reachable only through a narrow corridor in the configuration space.
Ii Related Works
Learning kinematic models for articulated objects directly from visual data has been studied via different approaches in the literature [23, 21, 18, 2, 20, 24, 25, 9, 10, 15, 1, 13, 29]. Sturm et al. [23] proposed a probabilistic framework to learn motion models of articulation bodies from human demonstrations. However, the framework assumes that the objects are governed by a single articulation model, which may not hold true for all objects. For example, a stapler intrinsically changes its articulation state (e.g. rigid vs. rotational) based on the relative angle between its arms. To address this, Niekum et al. [18] proposed an online changepoint detection algorithm, CHAMP, to detect both the governing articulation model and the temporal changepoints in the articulation relationships of objects. However, all these approaches are observationonly and may fail to correctly infer the object motion model under noisy demonstrations or in cases, when actions are critical for inference.
In other closely related works, interactive perception approaches aim at leveraging the robot’s actions to better perceive objects and build accurate kinematic models [9, 10, 15]. Katz et al. first used this approach to learn articulated motion models for planar objects [9], and later extended it to use RGBD data to learn 3D kinematics of articulated objects [10]. Though these approaches use robot’s actions to generate perceptual signals for model estimation, they require the robot’s interaction behavior to be prescripted by an expert. Also, these approaches do not explicitly reason about the effects of actions while performing model inference.
An alternative method for learning object motion models is to learn them directly from raw visual data [28, 4, 5, 1, 13, 29]. While deep neural networkbased approaches have shown much potential, the biggest hurdle in using such approaches on a wide variety of realworld robotics tasks is the need for a vast amount of training data, which is often not readily available. Also, these approaches tend to transfer poorly to new tasks. In this work, we combine model learning with generalizable planning under uncertainty to address these challenges, though deep learning methods may be useful in future work, in place of our more traditional perception pipeline.
Iii Background
Iiia Kinematic Graphs
We represent the kinematic structure for articulated objects using kinematic graphs [23]. A kinematic graph consists of a set of vertices , corresponding to the parts of the articulated object, and a set of undirected edges , each describing the kinematic link between two object parts. An example kinematic graph for a microwave is shown in Figure 1. Strum et al. [23] proposed to associate a single kinematic link model with model parameter vector with each edge. However, there are many articulated objects with links that are not governed by a single kinematic link model. For example, in most configurations, a microwave door is a revolute joint with respect to the microwave; however, due to the presence of a latch, this relationship changes to a rigid one when the door is closed. In this work, we extend kinematic graphs so that they can represent the hybrid kinematic structure of such objects (see Figure 2).
IiiB Changepoint Detection
Given a time series of observations , a changepoint model introduces a number of temporal changepoints that split the data into a set of disjoint segments, with each segment assumed to be governed by a single model (though different models can govern different segments). We build on the online MAP (maximum a posteriori) changepoint detection model proposed by Fearnhead and Liu [6], which was specialized for detecting motion models for articulated objects by Niekum et al. [18]. Given a time series of observations and a set of parametric candidate models , the changepoint model infers the MAP set of changepoint times where and , giving us segments. Thus, the segment consists of observations , and has an associated model with parameters .
Assuming that the data after a changepoint is independent of the data prior to that changepoint, we model the position of changepoints in the time series as a Markov chain in which the transition probabilities are defined by the time since the last changepoint,
(1) 
where is a probability distribution over time. For a segment from time to , the model evidence for the governing model being , is defined as:
(2) 
The distribution over the position of the most recent changepoint prior to time t, , can be efficiently estimated using the standard Bayesian filtering recursions and an online Viterbi algorithm [6]. We define as the event that given a changepoint at time , the MAP choice of changepoints has occurred prior to time . Then, the probability of having a changepoint at time , , is defined as:
(3) 
which results in
(4) 
where is the cumulative distribution function of . By finding the values of that maximize , the Viterbi path can be recovered at any point. This process can be repeated until the time is reached to estimate all changepoints that occurred in the given time series .
The algorithm is fully online, but requires computations at each time step, since values must be calculated for all . The computation time is reduced to a constant by using a particle filter that keeps a constant number of particles, , at each time step, each of which represents a support point in the approximate density . If at any time step, the number of particles exceeds , stratified optimal resampling [6] is used to choose which particles to keep such that the KolmogorovSmirnov distance from the true distribution is minimized in expectation.
IiiC Hybrid Automaton
A hybrid automaton describes a dynamical system which evolves both in the continuous space and over a finite set of discrete states with time [14]. Formally, a hybrid automaton is a collection , where each discrete state of the system can be interpreted as representing a separate local dynamics model that governs the evolution of the continuous states under applied actions . represents the set of initial states. The discrete state transitions can be represented as a directed graph with each possible discrete state corresponding to a node and edges () marking possible transitions between the nodes. These transitions are conditioned on the continuous states through guards . A transition from the discrete state to another state happens if the continuous states are in the of the edge . assigns to each an input dependent invariant set , such that there exists a control law such that for all and . Reset map assigns to each , , and a map that relates the values of the continuous states before and after the discrete state transition through the edge . The set of admissible inputs for each state is defined using .
Iv Approach
Given a sequence of object part pose observations (e.g. from a visual tracking algorithm) and a sequence of applied actions on an articulated object, MICAH creates a planningcompatible hybrid automaton for the object. It does so in two steps: (1) it estimates the kinematic graph representing the kinematic structure of the object given the sequence of pose observations and the applied actions , and then (2) constructs a hybrid automaton representing the motion model for the object given .
For the first step, we extend the framework proposed by Sturm et al. [23] in two important ways to better learn the kinematic structure of articulated objects. First, we include reasoning about the applied actions along with the observed motion of the object while estimating its kinematic structure. Second, we extend the framework to be able to learn the kinematic structure of more complex articulated objects that may exhibit configurationdependent kinematics, e.g., a microwave. The original framework [23] assumes that each link of an articulated body is governed by a single kinematic model. For complex articulated objects that exhibit configurationdependent kinematics, the transitions points in the kinematic model along with the set of governing local models and their parameters need to be estimated to learn the complete kinematic structure of the object.
To facilitate these extensions, we introduce a novel actionconditional changepoint detection algorithm, Action conditional Changepoint detection using Approximate Model Parameters (ActCHAMP), that can detect the changepoints in the relative motion between two rigid objects (or two object parts), given a time series of observations of the relative motion between the objects and the corresponding applied actions. The algorithm is described in section IVA.
Kinematic trees have the property that their edges are independent of each other. As a result, when learning the kinematic relationship between object parts and of an articulated object, only their relative transformations are relevant for estimating the edge model. MICAH first uses the ActCHAMP algorithm to learn the kinematic relationships between different parts of the articulated object separately, and then combines them to estimate the complete kinematic graph for the object. Once the kinematic graph for an articulated object is known, MICAH constructs a hybrid automaton to represent its motion model. We choose hybrid automata as they present a natural choice to model the motion of objects that may exhibit different motion models based on their configuration. Steps to construct a hybrid automation from the learned kinematic graph is described in section IVB.
Iva Actionconditional Model Inference
Following Sturm et al. [23], we define the relative transform between two objects with poses and at time as:
We propose a novel algorithm, ActCHAMP, that performs actionconditional changepoint detection to estimate the set given input time series of observations and the corresponding applied actions . ActCHAMP builds upon the CHAMP algorithm proposed by Niekum et al. [18]. The CHAMP algorithm reasons only about the observed relative motion between the objects for estimating the kinematic relationship between the objects. However, an observationonly approach can easily lead to false detection of changepoints and result in an inaccurate system model. Consider an example case of deducing the motion model for a drawer from a noisy demonstration in which the majority of applied actions are orthogonal to the axis of motion of the drawer. Due to intermittent displacements, an observationonly approach might model the motion of the drawer to be comprised of a sequence of multiple rigid joints. On the other hand, an actionconditional inference can maintain an equal likelihood of observing either a rigid or a prismatic model under offaxis actions, leading to a more accurate model.
Given the two time series inputs and , we define the model evidence for model being the governing model for the time segment between times and as:
(5) 
Each model admits two functions: a forward kinematics function, , and an inverse kinematics function, , which maps the relative pose between the objects to a unique configuration for the model (e.g. a position along the prismatic axis, or an angle with respect to the axis of rotation) as:
We consider three candidate models , , and to define the kinematic relationship between two objects. Complete definitions of forward and inverse kinematics models for these models are beyond the scope of this work; for more details, see Strum et al. [23].
Additionally, we define the Jacobian and inverse Jacobian functions for the model as
where and represent small perturbations applied to the relative pose and the configuration, respectively.
Using these functions, we can define the likelihood of obtaining observations upon applying action under model as:
(6) 
where is the predicted relative pose under the model at time t, and can be calculated using the observation and applied action at time as:
(7) 
The probability can be calculated by defining an observation model, given an observation error covariance for the perception system as:
(8) 
where the probability of observation being an outlier is , in which case it is drawn from a uniform distribution . The data likelihood is then defined as:
(9)  
(10)  
(11) 
and is a weighting constant.
Finally, similar to Niekum et al. [18], we can define our BICpenalized likelihood function as:
(12) 
where estimated parameters are inferred using MLESAC (Maximum Likelihood Estimation Sample Consensus) [26]. This likelihood function can be used in conjugation with the changepoint detection algorithm described in section IIIB to infer the MAP set of changepoint times along with the associated local models with parameters . The detected changepoints and the local models can be later combined appropriately to obtain a set consisting of tuples , where and denote the starting and the end changepoints for the time segment in the input time series .
The transition conditions between the local models can be made independent of the changepoint times, , by making use of the observations corresponding to the changepoint times . If an observation corresponds to the changepoint denoting the transition from local model to the model , then the inverse kinematics function can be used to find an equivalent configurational changepoint , a fixed configuration for model , that marks the transition from model to the next model . We can thus convert the set to the set , consisting of tuples , that is independent of the input time series.
The complete kinematic structure of the articulated object can then be estimated by finding the set of edges , denoting the kinematic connections between its parts, that maximizes the posterior probability of observing under applied actions [23]. However, to account for complex articulated objects that exhibit configurationdependent kinematics, now each edge of the kinematic graph can correspond to multiple kinematic link models , unlike the original framework [23], in which each edge corresponds to only one kinematic link model . To denote the change, we call such kinematic graphs, extended kinematic graphs. An example extended kinematic graph for a microwave is shown in Figure 2.
IvB Hybrid Automaton Construction
Hybrid automata present a natural choice for representing an articulated object that can have a discrete number of configurationdependent kinematics models. A hybrid automaton can model a system that evolves over both discrete and continuous states with time effectively, which facilitates robot manipulation planning for tasks involving that object. We define the hybrid automaton for the articulated object as:

, i.e. the Cartesian product of the sets of local models defining kinematic relationship between two object parts;

, where we use a single variable to represent the configuration value under all models , as each of the candidate articulation models admits a singledimensional configuration variable ;

, where is the input delta to be applied to the continuous state and the set of discrete input variables is the null set as we cannot control the discrete states directly;

is defined as per the task definition;

The vector field governing the evolution of the continuous state vector with time is defined as , where , , and . The vector is so defined that its th element , where th dimension of corresponds to the kinematic relationship between object parts and with , and ;

For each discrete state , an invariant set is defined such that within it the time evolution of the continuous states is governed by the vector field . We define as , where with defined as ;

The set of edges defines the set of feasible transitions between the discrete states, ;

Guards can be constructed using the configurational changepoints estimated for the object. If an edge corresponds to a transition from a local model to model , then the guard for the edge can be defined as . Analogously, the guard for the reverse transition . To handle the corner cases when for or for model (assuming ), we define two additional edges and which corresponds to the self transitions to the same discrete states such that is lowerbounded at for and upperbounded at for model ;

The reset map is an identity map;

The set of admissible inputs .
V Experiments and Discussions
In the first set of experiments, we compare the performance of ActCHAMP with the CHAMP algorithm [18] to estimate changepoints and local motion models for a microwave and a drawer. Next, we test the complete method, MICAH, to construct planningcompatible hybrid automata for the microwave and drawer and discuss the results of manipulation experiments to open and close the microwave door and the drawer using the learned models. Finally, we show that MICAH can be combined with a recent hierarchical POMDP planner, POMDPHD [8], to develop a complete pipeline that can learn a hybrid automaton from demonstrations and leverage it to perform a novel manipulation task—in this case, with a stapler.
Va Learning Kinematics Models for Objects
We collected six sets of demonstrations to estimate motion models for the microwave and the drawer. We provided kinesthetic demonstrations to a twoarmed robot, in which the human expert physically moved the right arm of the robot, while the left arm shadowed the motion of the right arm to interact with objects while collecting unobstructed visual data. The first two sets provide lownoise data, by manipulating the door handle or drawer knob via a solid grasp. The next two sets provide data in which random periods of no actions on the objects were deliberately included while giving demonstrations. The last two sets consist of highnoise cases, in which the actions were applied by pushing with the endeffector without a grasp. Relative poses of object parts were recorded as timeseries observations with an RGBD sensor using the SimTrack object tracker [19]. For each time step , the demonstrator’s action on the object was defined as the difference between the position of the right endeffector at times and .
With grasp: Both algorithms (CHAMP and ActCHAMP) detected a single changepoint in the articulated motion of the microwave door and determined the trajectory to be composed of two motion models, namely rigid and revolute. For the drawer, both algorithms were able to successfully determine its motion to be composed of a single prismatic motion model(see Table I). This demonstrates that for clean, informationrich demonstrations, ActCHAMP can perform on par with the baseline.
NoActions: When no action is applied to an object, due to the lack of motion, an observationonly model inference algorithm can infer the object motion model to be rigid. Moreover, if the agent stops applying actions after interacting with the object for some time, an observationonly approach can falsely detect a changepoint in the motion model. We hypothesize that an actionconditional inference algorithm such as ActCHAMP won’t suffer from these shortcomings as it can reason that no motion is expected if no actions are applied. To test it, we conducted experiments in which the demonstrator stopped applying actions on the object midway during a demonstration for an extended time randomly at two distinct locations. As expected, the observationonly CHAMP algorithm falsely detected changepoints in the object motion model and performed poorly (see Table I). However, as ActCHAMP reasons about the applied actions as well, it performed much better (see Table I).
Without grasp: When actions are applied directly on the object (microwave door and the drawer, respectively), the majority of the applied actions are orthogonal to the axis of motion leading to lowinformation demonstrations. In such a case, while CHAMP almost completely failed to detect correct motion models for the microwave ( success), ActCHAMP was able to correctly detect models in almost onethird of the trials (see Table I). For the drawer, CHAMP falsely detected a changepoint and determined that the articulation motion model is composed of two separate prismatic articulation models with different model parameters (Figure 4). However, due to actionconditional inference, ActCHAMP correctly classified the motion to be composed of only one articulation model (Figure 4, see Table I).
Changepoint Detection  
Case  Object  Algorithms 


Error in Model Parameters  

Center:  
Microwave  CHAMP  20/20 (100%)  Axis:  
With  Radius:  
grasp  Center:  
ActCHAMP  20/20 (100%)  Axis:  
Radius:  
Drawer  CHAMP  20/20 (100%)  —  Axis:  
ActCHAMP  20/20 (100%)  —  Axis:  
Center:  
Microwave  CHAMP  11/20 (55%)  Axis:  
No  Radius:  
Actions  Center:  
ActCHAMP  14/20 (70%)  Axis:  
Radius:  
Drawer  CHAMP  4/20 (20%)  —  Axis:  
ActCHAMP  12/20 (60%)  —  Axis:  
Center:  
Microwave  CHAMP  1/20 (5%)  Axis:  
Without  Radius:  
grasp  Center:  
ActCHAMP  6/20 (30%)  Axis:  
Radius:  
Drawer  CHAMP  9/20 (45%)  —  Axis:  
ActCHAMP  15/20 (75%)  —  Axis: 
VB Object Manipulation Using Learned Models
To test the effectiveness of the learned hybrid automata using MICAH, we used them to perform the tasks of opening and closing a microwave door and a drawer using a robot manipulator. We use the POMDPHD planner [8] to develop manipulation plans. Figure 5 shows the belief space and actual trajectories for the microwave and drawer manipulation tasks. For both the objects, low final errors were reported: for the microwave and for the drawer (average of 5 different tasks), validating the effectiveness of the learned automata.
VC Leveraging Learned Models for Novel Manipulations
Finally, we show that our learned models and planner are rich enough to be used to complete novel tasks under uncertainty that require intelligent use of object kinematics. To do so, we combine MICAH with the POMDPHD planner for performing a manipulation task of placing a desk stapler at a target point on top of a tall stack of books. Due to the height of the stack, it is challenging to plan a collisionfree path to deliver the stapler to the target location through a narrow corridor in the free configuration space of the robot; if the robot attempts to place the stapler at the target point while its governing kinematic model is revolute, the lower arm of the stapler will swing freely and collide with the obstacle. However, a feasible collisionfree motion plan can be obtained if the robot first closes and locks the stapler (i.e. rigid articulation), and then proceeds towards the goal. To change the state of the stapler from revolute to rigid, the robot can plan to make contact with the table surface to press down and lock the stapler in a nonprehensile fashion.
As the task involves making and breaking contacts with the environment, we need to extend the learned hybrid motion model of the stapler to include local models due to contacts. We approximately define the contact state between the stapler and the table as to be either a line contact (an edge of the lower arm of the stapler in contact with the table), a surface contact (the lower arm lying flat on the table) or no contact. The set of possible local models for the hybrid task kinematics can be obtained by taking a Cartesian product of the set of possible discrete states for the stapler’s hybrid automaton and the set of possible contact states between the stapler and the table. However, if the stapler is in the rigid mode, its motion would be the same under all contact states. Hence, a compact task kinematics model would consist of four local models—the stapler in revolute mode with no contact with the table, the stapler in revolute mode with a line contact with the table, the stapler in revolute mode with a surface contact with the table, and the stapler in rigid mode.
Given a human demonstration of robot’s interaction with the stapler as input, MICAH first learns a hybrid automaton for the stapler and then extends it to the hybrid task model using the provided taskspecific parameters. Next, the POMDPHD planner uses the learned task model to develop motion plans to complete the task with minimum final state uncertainty. Note that only the final Cartesian position for the stapler was specified as the target for the task and not the articulation state of the stapler (rigid/revolute). Motion plans generated by the planner are shown in Figure 6. As can be seen from the plots, the planner plans to make contacts with the table to reduce the relative angle between the stapler arms and change the articulation model of the stapler. The plan drags the stapler along the surface of the table, indicating that it waits until it is highly confident that the stapler has become rigid before breaking contact. Making contacts with the table along the path also helps in funneling down the uncertainty in the stapler’s location relative to the table in a direction parallel to the table plane normal, thereby increasing the probability of reaching the goal successfully. Figure 7 shows snapshots of the motion plan and actual execution of the robot performing the task.
Vi Conclusion
Robots working in human environments require a fast and dataefficient way to learn motion models of objects around them to interact with them dexterously. We present a novel method MICAH, that performs actionconditional model inference from unsegmented human demonstrations via a novel algorithm, ActCHAMP, and then uses the resulting models to construct hybrid automata for articulated objects. Actionconditional inference enables articulation motion models to be learned with higher accuracy than the prior methods in the presence of noise and leads to the development of models that can be used directly for manipulation planning. Furthermore, we demonstrate that the learned models are rich enough to be used for performing novel tasks with such objects in a manner that has not been previously observed. One advantage of using an actionconditional model inference approach over observationonly approaches is that it can enable robots to take informative exploratory actions for learning object motion models autonomously. Hence, future work may include the development of an active learning framework that can be used by a robot to autonomously learn the motion models of objects in a small number of trials.
Footnotes
 The operators and represent motion composition operations. For example, if poses , are represented as homogeneous matrices, then these operators correspond to matrix multiplications and its inverse multiplication, , respectively.
References
 (2019) Learning to generalize kinematic models to novel objects. In Proceedings of the Third Conference on Robot Learning, Cited by: §II, §II.
 (2014) Interactive bayesian identification of kinematic mechanisms. In 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 2013–2020. Cited by: §II.
 (2008) ContinuousState POMDPs with Hybrid Dynamics. Symposium on Artificial Intelligence and Mathematics, pp. 13–18. Cited by: §I.
 (2017) Se3nets: learning rigid body motion using deep neural networks. In Robotics and Automation (ICRA), 2017 IEEE International Conference on, pp. 173–180. Cited by: §II.
 (2018) SE3posenets: structured deep dynamics models for visuomotor control. In 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1–8. Cited by: §II.
 (2007) Online inference for multiple changepoint problems. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 69 (4), pp. 589–605. Cited by: §IIIB, §IIIB, §IIIB.
 (2016) Oneshot learning of manipulation skills with online dynamics adaptation and neural network priors. In Intelligent Robots and Systems (IROS), 2016 IEEE/RSJ International Conference on, pp. 4019–4026. Cited by: §I.
 (2018) Efficient hierarchical robot motion planning under uncertainty and hybrid dynamics. In Conference on Robot Learning, pp. 757–766. Cited by: §I, §I, §VB, §V.
 (2008) Manipulating articulated objects with interactive perception. In 2008 IEEE International Conference on Robotics and Automation, pp. 272–277. Cited by: §II, §II.
 (2013) Interactive segmentation, tracking, and kinematic modeling of unknown 3d articulated objects. In 2013 IEEE International Conference on Robotics and Automation, pp. 5003–5010. Cited by: §I, §II, §II.
 (2019) A review of robot learning for manipulation: challenges, representations, and algorithms. arXiv preprint arXiv:1907.03146. Cited by: §I.
 (2015) Learning contactrich manipulation skills with guided policy search. In 2015 IEEE international conference on robotics and automation (ICRA), pp. 156–163. Cited by: §I.
 (2019) Categorylevel articulated object pose estimation. arXiv preprint arXiv:1912.11913. Cited by: §II, §II.
 (2012) Hybrid systems: foundations, advanced topics and applications. under copyright to be published by Springer Verlag. Cited by: §I, §IIIC.
 (2014) Online interactive perception of articulated objects with multilevel recursive estimation based on taskspecific priors. In 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 2494–2501. Cited by: §II, §II.
 (2017) Prediction and control with temporal segment models. arXiv preprint arXiv:1703.04070. Cited by: §I.
 (2018) Neural network dynamics for modelbased deep reinforcement learning with modelfree finetuning. In 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 7559–7566. Cited by: §I.
 (2015) Online bayesian changepoint detection for articulated motion models. In 2015 IEEE International Conference on Robotics and Automation (ICRA), pp. 1468–1475. Cited by: §I, §I, §II, §IIIB, §IVA, §IVA, §V.
 (20150928) SimTrack: a simulationbased framework for scalable realtime object pose detection and tracking. In IEEE/RSJ International Conference on Intelligent Robots and Systems, Hamburg, Germany. Cited by: §VA.
 (2017) Clearn: learning geometric constraints from demonstrations for multistep manipulation in shared autonomy. In Robotics and Automation (ICRA), 2017 IEEE International Conference on, pp. 4058–4065. Cited by: §II.
 (2015) Learning articulated motions from visual demonstration. arXiv preprint arXiv:1502.01659. Cited by: §II.
 (2002) Policycontingent abstraction for robust robot control. In Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence, pp. 477–484. Cited by: §I.
 (2011) A probabilistic framework for learning kinematic models of articulated objects. Journal of Artificial Intelligence Research 41, pp. 477–526. Cited by: §I, §II, §IIIA, §IVA, §IVA, §IVA, §IV.
 (2018) Recognizing geometric constraints in human demonstrations using force and position signals. In IEEE International Conference on Robotics and Automation (ICRA), Cited by: §II.
 (2014) Object–object interaction affordance learning. Robotics and Autonomous Systems 62 (4), pp. 487–496. Cited by: §II.
 (2000) MLESAC: a new robust estimator with application to estimating image geometry. Computer vision and image understanding 78 (1), pp. 138–156. Cited by: §IVA.
 (2008) Hierarchical pomdp controller optimization by likelihood maximization.. In UAI, Vol. 24, pp. 562–570. Cited by: §I.
 (2015) Embed to control: a locally linear latent dynamics model for control from raw images. In Advances in neural information processing systems, pp. 2746–2754. Cited by: §II.
 (2019) Unsupervised discovery of parts, structure, and dynamics. arXiv preprint arXiv:1903.05136. Cited by: §II, §II.