Realtime Simulation of Thin-Shell Deformable Materials using CNN-Based Mesh Embedding

Realtime Simulation of Thin-Shell Deformable Materials using CNN-Based Mesh Embedding

Qingyang Tan, Zherong Pan, Lin Gao, and Dinesh Manocha
Video Link:
Qingyang Tan and Dinesh Manocha are with Department of Computer Science and Electrical & Computer Engineering, University of Maryland at College Park. {qytan,} Zherong Pan is with Department of Computer Science, University of North Carolina at Chapel Hill. {} Lin Gao is with Insititute of Computing Technology, Chinese Academy of Sciences. {}

We address the problem of accelerating thin-shell deformable object simulations by dimension reduction. We present a new algorithm to embed a high-dimensional configuration space of deformable objects in a low-dimensional feature space, where the configurations of objects and feature points have approximate one-to-one mapping. Our key technique is a graph-based convolutional neural network (CNN) defined on meshes with arbitrary topologies and a new mesh embedding approach based on physics-inspired loss term. We have applied our approach to accelerate high-resolution thin shell simulations corresponding to cloth-like materials, where the configuration space has tens of thousands of degrees of freedom. We show that our physics-inspired embedding approach leads to higher accuracy compared with prior mesh embedding methods. Finally, we show that the temporal evolution of the mesh in the feature space can also be learned using a recurrent neural network (RNN) leading to fully learnable physics simulators. After training our learned simulator runs faster and the accuracy is high enough for robot manipulation tasks.


FigFigure LABEL:#1 \newrefformatfigFigure LABEL:#1 \newrefformatparSection LABEL:#1 \newrefformatappenAppendix LABEL:#1 \newrefformatsecSection LABEL:#1 \newrefformatsubSection LABEL:#1 \newrefformattableTable LABEL:#1 \newrefformatalgAlgorithm LABEL:#1 \newrefformatAlgAlgorithm LABEL:#1 \newrefformatDefDefinition LABEL:#1 \newrefformatThmTheorem LABEL:#1 \newrefformatstepStep LABEL:#1 \newrefformatlnLine LABEL:#1 \newrefformateqEquation LABEL:#1 \newrefformatpbProblem LABEL:#1 \newrefformatitItem LABEL:#1 \newrefformatteTerm LABEL:#1 \newrefformatEqEquation (LABEL:eq:#1)

I Introduction

A key component in robot manipulation tasks is a dynamic model of target objects to be manipulated. Typical applications include cloth manipulation [26, 31], liquid manipulation [42], and in-hand rigid object manipulation [44]. Of these objects, cloth is unique in that it is modeled as a thin-shell, i.e., a 2D deformable object embedded in a 3D workspace. To model the dynamic behaviors of thin-shell deformable objects, people typically use high-resolution meshes (e.g. with thousands of vertices) to represent the deformable objects. Many techniques have been developed to derive a dynamic model under a mesh-based representation, including the finite-element method [30], the mass-spring system [6, 11], the thin-shell model [20], etc. However, the complexity of these techniques can vary from to [19], where is the number of DOFs, which makes them very computationally cost on high-resolution meshes. For example, [38] reported an average computational time of over minute for predicting a single future state of a thin-shell mesh with around vertices. This simulation overhead is a major cost in various cloth manipulation algorithms including [26, 31, 29].

In order to reduce the computational cost, one recent trend is to develop machine learning methods to compute low-dimensional embeddings of these meshes. Low-dimensional embeddings were original developed for applications such as image compression [28] and dimension reduction [55]. The key idea is to find a low-dimensional feature space with approximate one-to-one mapping between a low-dimensional feature point and a high-dimensional mesh shape. So that the low-dimensional feature point can be treated as an efficient, surrogate representation of the original mesh.

However, computing low-dimensional embeddings for general meshes poses new challenges because, unlike 2D images, meshes are represented by a set of unstructured vertices connected by edges and these vertices can undergo large distortions when cloth deforms. As a result, a central problem in representing mesh deformation data is to find an effective parameterization of the feature space that can handle arbitrary mesh topologies and large, nonlinear deformations. Several methods for low-dimensional mesh embeddings are based on PCA [2], localized PCA [39], and Gaussian Process [54]. However, these methods are based on vertex-position features and cannot handle large deformations.

Main Results: We present a novel approach that uses physics-based constraints to improve the accuracy of low-dimensional embedding of arbitrary meshes for deformable simulation. We further present a fully learnable physics simulator of clothes in the feature space. The novel components of our algorithm include:

  • A graph-based CNN [15] mapping high-DOF configurations to low-DOF features, which extends conventional CNN to handle arbitrary mesh topology, with loss function defined in the ACAP feature space.

  • A mesh embedding approach aware of the inertial and internal potential forces used by a physical simulator, which is achieved by introducing a physics-inspired loss function term, i.e., vertex-level physics-based loss term (PB-loss), and better preserves the material properties of the meshes.

  • A stateful, recurrent feature-space physics simulator that predicts the temporal changes of meshes in the feature space, which is accurate enough for learning cloth features and training cloth manipulation controllers (see \prettyreffig:simulateRobot).

To test the accuracy of our method, we construct multiple datasets by running cloth simulations using a high-resolution mesh under different material models, material parameters, and mesh topologies. We show that our embedding approach leads to better accuracy in terms of physics rule preservation than prior method [46] that uses only a data term, with up to improvement. We have also observed up to and improvements in mesh embedding accuracy on commonly used metrics such as and . Finally, we show that our feature space physics simulator can robustly predict dynamic behaviors of clothes undergoing unseen robot manipulations, while achieving speedup over simulators running in the high-dimensional configuration space.


Fig. 1: Overview of our method: Each generated mesh is represented as vertices connected by edges. (a): We use a graph-based CNN where each convolutional layer is a local filter and the filter stencil is the 1-ring neighbor (red arrow). (b): We build an autoencoder using the filter-based convolutional layers. The decoder mirrors the encoder and both use convolutional layers and one fully connected layer. The input of and the output of are defined in the ACAP feature space, in which we define the reconstruction loss, . We recover the vertex-level features, , using the function , on which we define our PB-loss, , and vertex-level regularization, . The PB-loss can be formulated using two methods. (c): In the mass-spring model, the stretch resistance term is modeled as springs between each vertex and its 1-ring neighbors (blue) and the bend resistance term is modeled as springs between each vertex and its 2-ring neighbors (green). (d): FEM models the stretch resistance term as the linear elastic energy on each triangle and the bend resistance term as a quadratic penalty on the dihedral angle between each pair of neighboring triangles (yellow).

The paper is organized as follows. We first review related work in \prettyrefsec:related. We define our problem and introduce the basic method of low-dimensional mesh embedding in \prettyrefsec:VDE. We introduce our novel PB-loss and the learnable simulation architecture in \prettyrefsec:physics. Finally, we describe the applications in \prettyrefsec:app and highlight the results in \prettyrefsec:results.

Ii Related Work and Background

We summarize related work in mesh deformations and representations, deformable object simulations, and machine learning methods for mesh deformations.

Deformable Simulation for Robotics are frequently encountered in service robots applications such as laundry cleaning [8, 29] and automatic cloth dressing [12]. Studying these objects can also benefit the design of soft robots [37, 16]. While these soft robots are usually 3D volumetric deformable objects, we focus on 2D shell-like deformable objects or clothes. In some applications such as visual servoing [25] and tracking [10], deformable objects are represented using point clouds. In other applications including model-based control [41] and reconstruction [48], the deformable objects are represented using meshes and their dynamics are modeled by discretizing the governing equations using the finite element method (FEM). Solving the discretized governing equation is a major bottleneck in training a cloth manipulation controller, e.g., [5] reported up to 5 hours of CPU time spend on thin-shell simulation which is 4-5 times more costly than the control algorithm.

Deformable Object Simulations is a key component in various model-based control algorithms such as virtual surgery [3, 4, 32] and soft robot controllers [41, 14, 26]. However, physics simulators based on the finite element method [30], the boundary-element method [9], or simplified models such as the mass-spring system [11] have a superlinear complexity. An analysis is given in [19], resulting in complexity, where is the number of DOFs. In a high-resolution simulation, can be in the tens of thousands. As a result, learning-based methods have recently been used to accelerate physics simulations. This can be done by simulating under a low-resolution using FEM and then upsampling [51] or by learning the dynamics behaviors of clothes [40] and fluids [50]. However, these methods are either not based on meshes [50] or not able to handle arbitrary topologies [40].

Machine Learning Methods for Mesh Deformations has been in use for over two decades, of which most methods are essentially low-dimensional embedding techniques. Early work are based on principle component analysis (PCA) [2, 55, 39] that can only represent small, local deformations or Gaussian processes [49, 54] that are computationally costly to train and do not scale to large datasets. Recently, deep neural networks have been used to embed high-dimensional nonlinear functions [28, 43]. However, these methods rely on regular data structures such as 2D images. To handle meshes with arbitrary topologies, earlier methods [36] represent a mesh as a 3D voxelized grid or reconstruct 3D shapes from 2D images [52] using a projection layer. Recently, methods have been proposed to define CNN directly on mesh surfaces, such as CNN on parametrized texture space [35], and CNN based on spatial filtering [15]. The later has been used in [46] to embed large-scale deformations of general meshes. Our contribution is orthogonal to these techniques and can be used to improve the embedding accuracy for any one of these methods.

Iii Low-Dimensional Mesh Embedding

In this section, we provide an overview of low-dimensional embedding of thin shell like meshes such as clothes. Our goal is to represent a set of deformed meshes, , with each mesh represented using a set of vertices, denoted as . We denote the th vertex as . Here and . These vertices are connected by edges, so we can define the 1-ring neighbor set, , and the 2-ring neighbor set, , for each , as shown in \prettyreffig:method (c). Our goal is to find a map , where is a low-dimensional feature and such that, for each , there exists a where is mapped to a mesh close to . To define such a function, we use graph-based CNN and ACAP features [17] to represent large-scale deformations.

Iii-a ACAP Feature

For each , an ACAP feature is computed by first finding the deformation gradient on each vertex:


where are cotangent weights [13]. Here, we use as a reference shape. Next, we perform polar decomposition to compute where is orthogonal and is symmetric. Finally, is transformed into log-space in an as-consistent-as-possible manner using mixed-integer programming. The final ACAP feature is defined as due to the symmetry of . We denote the ACAP feature transform as: . It is suggested, e.g., in [23], that mapping to the ACAP feature space leads to better effectiveness in representing large-scale deformations. Therefore, we define our mapping function to be and then recover via the inverse feature transform: .

Iii-B Graph-Based CNN for Feature Embedding

The key idea in handling arbitrary mesh topologies is to define as a graph-based CNN using local filters [15]:

where is the number of convolutional layers and is the transpose of a graph-based convolutional operator. Finally, is the transpose of a fully connected layer. Each layer is appended by a leaky ReLU activation layer. A graph-based convolutional layer is a linear operator defined as:

where are optimizable weights and biases, respectively. All the weights in the CNN are trained in a self-supervised manner using an autoencoder and the reconstruction loss:

where is a mirrored encoder of defined as:

The construction of this CNN is illustrated in \prettyreffig:method (a). In the next section, we extend this framework to make it aware of physics rules.

Iv Physics-Based Loss Term

We present a novel physics-inspired loss term that improves the accuracy of low-dimensional mesh embedding. Our goal is to combine physics-based constraints with graph-based CNNs, where our physics-based constraints take a general form and can be used with any material models such as FEM [38] and mass-spring system [11]. We assumes that is generated using a physics simulator that solves a continuous-time PDE of the form:


where is the mass matrix and is the time. This form of governing equation is the basis for state-of-the-art thin shell simulators including [11, 38]. models internal and external forces affecting the current mesh . The force is also a function of the current control parameters , which are the positions of the grasping points on the mesh (red dots of \prettyreffig:datasetVis). This continuous time PDE \prettyrefeq:PDE can be discretized into timesteps such that is the position of at time instance , where is the timestep size. A discrete physics simulator can determine all given the initial condition and the sequence of control parameters by the recurrent function:


where is a discretization of \prettyrefeq:PDE. To define this discretization, we use a derivation of [33] that reformulates as the following optimization:


Note that \prettyrefeq:PLOSS is just one possible implementation of \prettyrefeq:SIM. Here the first term models the kinematic energy, which requires each vertex to move in its own velocity as much as possible if no external forces are exerted. The second term models forces caused by various potential energies at configuration . In this work, we consider three kinds of potential energy:

  • Gravitational energy , where is the gravitational acceleration vector.

  • Stretch resistance energy, , models the potential force induced by stretching the material.

  • Bending resistance energy, , models the potential force induced by bending the material.

There are many ways to discretize , such as the finite element method used in [38] or the mass-spring model used in [33, 11]. Both formulations are evaluated in this work.

  • [11] models the stretch resistance term, , as a set of Hooke’s springs between each vertex and vertices in its 1-ring neighbors. In addition, the bend resistance term, , is defined as another set of Hooke’s springs between each vertex and vertices in its 2-ring neighbors. (\prettyreffig:method (c))

  • [38] models the stretch resistance term, , as a linear elastic energy resisting the in-plane deformations of each mesh triangle. In addition, the bend resistance term, , is defined as a quadratic penalty term resisting the change of the dihedral angle between any pair of two neighboring triangles. (\prettyreffig:method (d))

Our approach uses \prettyrefeq:PLOSS as an additional loss function for training . Since \prettyrefeq:PLOSS is used for data generation, using it for mesh deformation embedding should improve the accuracy of the embedded shapes. However, there are two inherent difficulties in using as an loss function. First, is defined on the vertex level as a function of , not on the feature level as a function of . To address this issue, we use the inverse function to reconstruct from . The implementation of is introduced in \prettyrefsec:IssueA. By combining with , we can train the mesh deformation embedding network using the following loss:


Our second difficulty is that the embedding network is stateless and does not account for temporal information. In other words, function only takes as input, while \prettyrefeq:PLOSS requires . To address this issue, we use a small, fully connected, recurrent network to represent the physics simulation procedure in the feature space. The training of this stateful network is introduced in \prettyrefsec:IssueB. Finally, in addition to the PB-loss, we also add an autoencoder reconstruction loss on the vertex level as a regularization:

Iv-a The Inverse of the ACAP Feature Extractor

The inverse of the function (black block in \prettyreffig:method) involves three steps. Fortunately, each step can be easily implemented in a modern neural network toolbox such as TensorFlow [1]. The first step computes from using the Rodrigues’ rotation formula, which involves only basic mathematical functions such as dot-product, cross-product, and the cosine function. The second step computes from , which is a matrix-matrix product. The final step computes from . According to \prettyrefeq:RECON, this amounts to pre-multiplying the inverse of a fixed sparse matrix, , representing the Poisson reconstruction. However, this is rank-3 deficient because it is invariant to rigid translation. Therefore, we choose to define a pseudo-inverse by fixing the position of the grasping points :


which can be pre-factorized. Here is a matrix selecting the grasping points.

Iv-B Stateful Recurrent Neural Network

A physics simulation procedure is Markovian, i.e. current configuration only depends on previous configuration of the mesh. As a result, is a function of both , , and , which measures the violation of physical rules. However, our embedding network is stateless and only models . In order to learn the entire dynamic behavior, we augment the embedding network with a stateful, recurrent network represented as a multilayer perceptron (MLP). This MLP represents a physically correct simulation trajectory in the feature space and is also Markovian, denoted as:


Here the additional control parameters are given to as additional information. We can build a simple reconstruction loss below to optimize :

In addition, we can also add PB-loss to train this MLP, for which we define on a sequence of meshes by unrolling the recurrent network:


However, we argue that \prettyrefeq:LPHYS will lead to a physically incorrect result and cannot be directly used for training. To see this, we note that \prettyrefeq:PLOSS is the variational form of \prettyrefeq:PDE. So that is physically correct when is at its local minima, i.e. the following partial derivative vanishes:


However, if we sum up over a sequence of meshes and require the summed-up loss to be at a local minimum, as is done in \prettyrefeq:LPHYS, then we are essentially requiring the following derivatives to vanish:


The difference between \prettyrefeq:VIO_MEASURE and \prettyrefeq:VIO_MEASURE_SUM is the reason that \prettyrefeq:LPHYS gives an incorrect result. To resolve the problem, we slightly modify the back propagation procedure of our training process by setting the partial derivatives of with respect to its first two parameters to zero:

which, combined with \prettyrefeq:VIO_MEASURE_SUM, leads to \prettyrefeq:VIO_MEASURE. (We add similar gradient constraints when optimizing over \prettyrefeq:embedding_phys.) This procedure is equivalent to an alternating optimization procedure, where we first compute a sequence of feature space coordinates, , using the recurrent network (\prettyrefeq:MLP) and then fix the first two parameters and optimize with respect to its third parameter .

V Applications

The two novel components in our method, the operator and the stateful PB-loss, enable a row of new applications, including realtime cloth inverse kinematics and feature space physics simulations.

V-a Cloth Inverse Kinematics

Our first application allows a robot to grasp several points of a piece of cloth and then infer the full kinematic configuration of the cloth. Such inverse kinematics can be achieved by minimizing a high-dimensional nonlinear potential energy, such as ARAP energy [45], which is computationally costly. Using the inverse of the ACAP feature extractor, our method allows vertex-level constraints. Therefore, we can perform solve for the cloth configuration by a fast, low-dimensional minimization in the feature space as:

where we treat all the grasped vertices as control parameters used in \prettyrefeq:invACAP. This application is stateless and the user controls a single feature of a mesh, , so that we drop the kinetic term in and only retain the potential term . Some inverse kinematic examples generated using this formulation are shown in \prettyreffig:control. Note that detailed wrinkles and cloth-like deformations are synthesized in unconstrained parts of the meshes.

Fig. 2: Three examples of cloth inverse kinematics with fixed vertices marked in red. Note that our method can synthesize detailed wrinkles and cloth-like deformations in unconstrained parts of the meshes (black box).

V-B Feature Space Physics Simulation

For our second application, we approximate an entire cloth simulation sequence (\prettyrefeq:SIM) in the 128-dimensional feature space. Starting from , we can generate an entire sequence of frames by using the recurrent relationship in \prettyrefeq:MLP and can recover the meshes via the function . Such a latent space physics model has been previously proposed in [50] for voxelized grids, while our model works on surface meshes. We show two synthesized simulation sequences in \prettyreffig:simulate.

V-C Accuracy of Learned Simulator for Robotic Cloth Manipulation

We show three benchmarks (\prettyreffig:simulateRobot) from robot cloth manipulation tasks defined in prior work [26]. In these benchmarks, the robot is collaborating with human to maintain a target shape of a piece of cloth. To design such a collaborating robot controller, we use imitation learning by teaching the robot to recognized cloth shapes under various, uncertain human movements. Our learnable simulator can be used to efficiently generate these cloth shapes for training the controller. To this end, we train our neural-network using the original dataset from [26] obtained by running the FEM-based simulator [38], which takes 3 hours. During test time, we perturb the human hands’ grasp points along randomly directions. Our learned physical model can faithfully predict the dynamic movements of the cloth.


Fig. 3: Two examples of simulation sequence generation in our feature space. (a): 5 frames in the simulation of a cloth swinging down. (b): Synthesized simulation sequence. (c): Another example where two diagonal points are grasped. (d): Synthesized simulation sequence.
Fig. 4: We reproduce benchmarks from [26] where the robot is collaborating with human to manipulate a piece of cloth (a). We randomly perturb two grasp points on the left (gray arms) and the robot is controlling the other two grasp points (purple arms) using a visual-serving method to maintain the cloth at a target state, e.g., keeping the cloth flat (b), twisted (c), or bent (d). The red cloth is the groundtruth acquired by running the accurate FEM-based cloth simulator [38], which takes 3 hours. The difference between our result (blue) and the groundtruth is indistinguishable.

Vi Results

To evaluate our method, we create two datasets of cloth simulations using \prettyrefeq:PLOSS. Our first dataset is called SHEET, which contains animations of a square-shaped cloth sheet swinging down under different conditions, as shown in \prettyreffig:datasetVis (a). This dataset involves simulation sequences, each with frames. Among these sequences, the first sequence uses the mass-spring model [11] to discretize \prettyrefeq:PDE and the cloth mesh has no holes (denoted as SHEET+[11]). The second sequence uses the mass spring model and the cloth mesh has holes, as shown in \prettyreffig:simulate (a,b), which is denoted as (SHEET+[11]+holes). The third sequence uses FEM [38] to discretize \prettyrefeq:PDE and the cloth mesh has no holes (denoted as SHEET+[38]). The forth sequence uses FEM to discretize \prettyrefeq:PDE and the cloth interacts with an obstacle, as shown in \prettyreffig:datasetVis (c) (denoted as SHEET+[38]+obstacle). In the SHEET dataset, the cloth mesh without holes has vertices and the cloth mesh with holes has vertices. Our second dataset is called BALL, which contains animations of a cloth ball being dragged up and down under different conditions, as shown in \prettyreffig:datasetVis (d). This dataset also involves simulation sequences, each with frames. Using the same notation as the SHEET dataset, the sequences in the BALL dataset are (BALL+[11], BALL+[38], BALL+[38]+, BALL+[38]+). Here means that we multiply the stretch/bend resistance term by , making the material softer and less resilient when stretched or bent. In the BALL dataset, the cloth ball mesh has vertices. During comparison, for each dataset, we select first 12 frames in every 17 frames to form the training set. The other frames are used as the test set.


Fig. 5: A visualization of our two datasets. The SHEET dataset contains simulation sequences, each with frames. (a,b): We generate the dataset by grasping two corners of the cloth (red dot) and moving the grasping points back and forth along the axes. (c): In two sequences of the SHEET dataset, we add a spherical obstacle to interact with the cloth. (d): The BALL dataset contains simulation sequences, each with frames. We generate the dataset by grasping the topmost vertex of the cloth ball (red dot) and moving the grasping point back and forth along the axes.

Vi-a Implementation

We implement our method using Tensorflow [1] and we implement the PB-loss as a special network layer. When there is an obstacle interacting with the cloth, we model the collision between the cloth and the obstacle using a special potential term proposed in [18]. For better conditioning and a more robust initial guess, our training procedure is broken into three stages. During the first stage, we use the loss:

to optimize . During the second stage, we use the loss:

to optimize . Finally, we add a fine-tuning step and use the loss:

to optimize both and . Notice that, in order to train the mesh embedding network and at the same time, we feed:

to for better stability during the third stage.

Vi-B Physics Correctness of Low-Dimensional Embedding

We first compare the quality of mesh deformation embeddings using two different methods. The quality of embedding is measured using three metrics. The first metric is the root mean square error, [27], which measures the averaged vertex-level error over all shapes and vertices. Our second metric is the STED metric, [47]. This metric linearly combines several aspects of errors crucial to visual quality, including relative edge length changes and temporal smoothness. However, since is only meaningful for consecutive frames, we compute for the consecutive frames in every frames, which is the test set. Finally, we introduce a third metric, physics correctness, which measures how well the physics rule is preserved. Inspired by \prettyrefeq:VIO_MEASURE, physics correctness is measured by the norm of partial derivatives of : . Note that the absolute value of can vary case by case. For example, using the FEM method can be orders of magnitude larger than that using the mass-spring system in our dataset. So that only the relative value of indicates improvement in physics correctness.

Dataset Method
SHEET+[11] ours
SHEET+[38] ours
Dataset Method
BALL+[11] ours
BALL+[38] ours
TABLE I: We compare the embedding quality using our method (, where for our method is tuned for different datasets.) and [46]+ (). From left to right: name of dataset, method used, , , and .

Our first experiment compares the accuracy of mesh embedding with or without PB-loss. The version without PB-loss is our baseline, which is equivalent to adding vertex level loss to [46]. In addition, we remove the sparsity regularization from [46] to make it consistent with our formulation. We denote this baseline as [46]+. A complete summary of our experimental results is given in \prettyreftable:comparisonA. The benefit of three-stage training is given in \prettyreftable:MLP. From \prettyreftable:comparisonA, we can see that including PB-loss significantly and consistently improves . This improvement is large, up to on the SHEET+[38] dataset. In addition, by adding , our method also better recognizes the relationship between each model and embeds them, thus improves in all the cases. However, our method sometimes sacrifice as temporal smoothness is not modeled explicitly in our method.

Vi-C Discriminability of Feature Space

In our second experiment, we evaluate the discriminability of mesh embedding by classifying the meshes using their feature space coordinates. Note that our datasets (\prettyreffig:datasetVis) are generated by moving the grasping points back and forth. We use these movement directions as the labels for classification. For the SHEET dataset, we have 6 labels: , where means rotating the grasping points around axes. For the BALL dataset, we have 2 labels: . Note that it is trivial to classify the meshes if we know the velocity of the grasping points. However, this information is missing in our feature space coordinates because ACAP features are invariant to global rigid translation, which makes the classification challenging. \prettyreffig:classify shows the feature space visualization using t-SNE [34] compressed to 2 dimensions. We also report retrieval performance in the KNN neighborhoods across different K’s, using method suggested by [53]. The normalized discounted cumulative gain (DCG) on the test set for SHEET+[11] is and for BALL+[11] is .

Fig. 6: A feature space visualization for SHEET+[11] using t-SNE.

Vi-D Sensitivity to Training Parameters

In our third experiment, we evaluate the sensitivity of our method with respect to the weights of loss terms, as summarized in \prettyreftable:comparisonB. Our method outperforms [46]+ under a range of different parameters. We have also compared our method with other baselines such as [39] and [24]. As shown in the last two columns of \prettyreftable:comparisonB, they generate even worse result, which indicates that [46]+ is the best baseline.

Method (, , ) (, , ) (, , ) [46]+ [39] [24]
TABLE II: We compare the performance of our method with several previous ones in terms of and under different weights (, , ) of . The experiment is done on the dataset SHEET+[11]. Our method outperforms [46]+ over a wide range of parameters. Previous methods, including [39] and [24], generate even worse results, which supports our choice of using convolutional neural network and ACAP feature for mesh deformation embedding.

Vi-E Robustness to Mesh Resolutions

In our final experiment, we highlight the robustness of our method to different mesh resolutions by lowering the resolution of our dataset. For SHEET+[11], we create a mid-resolution counterpart with vertices and a low-resolution counterpart with vertices. On these two new datasets, we compare the accuracy of mesh embedding with or without PB-loss. The results are given in \prettyreftable:comparisonC. Including PB-loss consistently improves and overall embedding quality, no matter the resolution used.

Dataset #Vertices Method
SHEET+[11] 4225 ours
4225 [46]+
1089 ours
1089 [46]+
289 ours
289 [46]+
TABLE III: We profile the improvement in various metrics under different mesh resolution (), compared with [46]+. From left to right: name of dataset, number of vertices, method used, , , and . Our method consistently outperforms [46]+.
Dataset Method
SHEET+[11] baseline
2nd stage
3rd stage
SHEET+[11]+holes baseline
2nd stage
3rd stage
TABLE IV: We compare the physical simulation performance of after training with (baseline), training with (2nd stage), and fine-tuning (3rd stage). For the consecutive meshes in every frames (the test set), we give the first frames and predict the remaining frames to generate this table.

Vi-F Difficulty in Contact Handling

One exception appears in the SHEET+[38]+obstacle (blue row in \prettyreftable:comparisonA), where our method deteriorates physics correctness. This is the only dataset where the mesh is interacting with an obstacle. The deterioration is due to the additional loss term penalizing the penetration between the mesh and the obstacle. This term is non-smooth and has very high value and gradient when the mesh is in penetration, making the training procedure unstable. This means that direct learning a feature mapping for meshes with contacts and collisions can become unstable. However, we can solve this problem using a two-stage method, where we first learn a feature mapping for meshes without contacts and collisions, and then handle contacts and collisions at runtime using conventional method [21], as is done in [7].

Vii Conclusion & Limitations

In this paper, we present a new method that bridges the gap between mesh embedding and and physical simulation for efficient dynamic models of clothes. We achieve low-dimensional mesh embedding using a stateless, graph-based CNN that can handle arbitrary mesh topologies. To make the method aware of physics rules, we augment the embedding network with a stateful feature space simulator represented as a MLP. The learnable simulator is trained to minimize a physics-inspired loss term (PB-loss). This loss term is formulated on the vertex level and the transformation from the ACAP feature level to the vertex level is achieved using the inverse of the ACAP feature extractor.

Our method can be used for several applications, including fast inverse kinematics of clothes and realtime feature space physics simulation. We have evaluated the accuracy and robustness of our method on two datasets of physics simulations with different material properties, mesh topologies, and collision configurations. Compared with previous models for embedding, our method achieves consistently better accuracy in terms of physics correctness and the mesh change smoothness metric ([47]).

A future research direction is to apply our method to other kinds of deformable objects, i.e., volumetric objects [22]. Each and every step of our method can be trivially extended to handle volumetric objects by replacing the triangle surface mesh with a tetrahedral volume mesh. A minor limitation of the current method is that the stateful MLP and the stateless mesh embedding cannot be trained in a fully end-to-end fashion. We would like to explore new optimization methods to train the two networks in an end-to-end fashion while achieving good convergence behavior.


  • [1] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, M. Kudlur, J. Levenberg, R. Monga, S. Moore, D. G. Murray, B. Steiner, P. Tucker, V. Vasudevan, P. Warden, M. Wicke, Y. Yu, and X. Zheng (2016) TensorFlow: a system for large-scale machine learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), pp. 265–283. External Links: Link Cited by: §IV-A, §VI-A.
  • [2] M. Alexa and W. Müller (2000) Representing animations by principal components. Computer Graphics Forum 19 (3), pp. 411–418. Cited by: §I, §II.
  • [3] R. Alterovitz, M. Branicky, and K. Goldberg (2008) Motion planning under uncertainty for image-guided medical needle steering. The International journal of robotics research 27 (11-12), pp. 1361–1374. Cited by: §II.
  • [4] R. Alterovitz, K. Y. Goldberg, J. Pouliot, and I. Hsu (2009) Sensorless motion planning for medical needle insertion in deformable tissues. IEEE Transactions on Information Technology in Biomedicine 13 (2), pp. 217–225. Cited by: §II.
  • [5] Y. Bai, W. Yu, and C. K. Liu (2016) Dexterous manipulation of cloth. In Computer Graphics Forum, Vol. 35, pp. 523–532. Cited by: §II.
  • [6] D. Baraff and A. Witkin (1998) Large steps in cloth simulation. In Proceedings of the 25th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH ’98, New York, NY, USA, pp. 43–54. External Links: ISBN 0-89791-999-8, Link, Document Cited by: §I.
  • [7] J. Barbič and D. L. James (2010) Subspace self-collision culling. ACM Trans. on Graphics (SIGGRAPH 2010) 29 (4), pp. 81:1–81:9. Cited by: §VI-F.
  • [8] C. Bersch, B. Pitzer, and S. Kammel (2011-Sep.) Bimanual robotic cloth manipulation for laundry folding. In 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems, Vol. , pp. 1413–1419. External Links: Document, ISSN 2153-0866 Cited by: §II.
  • [9] C. A. Brebbia and M. H. Aliabadi (Eds.) (1993) Industrial applications of the boundary element method. Computational Mechanics, Inc., Billerica, MA, USA. External Links: ISBN 1853121835 Cited by: §II.
  • [10] G. J. Brostow, J. Shotton, J. Fauqueur, and R. Cipolla (2008) Segmentation and recognition using structure from motion point clouds. In Proceedings of the 10th European Conference on Computer Vision: Part I, ECCV ’08, Berlin, Heidelberg, pp. 44–57. External Links: ISBN 978-3-540-88681-5 Cited by: §II.
  • [11] K. Choi and H. Ko (2002) Stable but responsive cloth. In Proceedings of the 29th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH ’02, New York, NY, USA, pp. 604–611. External Links: ISBN 1-58113-521-1, Link, Document Cited by: §I, §II, 1st item, §IV, Fig. 6, §VI-C, §VI-E, TABLE I, TABLE II, TABLE III, TABLE IV, §VI.
  • [12] A. Clegg, W. Yu, J. Tan, C. K. Liu, and G. Turk (2018-12) Learning to dress: synthesizing human dressing motion via deep reinforcement learning. ACM Trans. Graph. 37 (6), pp. 179:1–179:10. External Links: ISSN 0730-0301 Cited by: §II.
  • [13] M. Desbrun, M. Meyer, P. Schröder, and A. H. Barr (1999) Implicit fairing of irregular meshes using diffusion and curvature flow. In Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH ’99, New York, NY, USA, pp. 317–324. External Links: ISBN 0-201-48560-5, Link, Document Cited by: §III-A.
  • [14] C. Duriez (2013) Control of Elastic Soft Robots based on Real-Time Finite Element Method. In ICRA 2013 IEEE International Conference on Robotics and Automation, Karlsruhe, France. Cited by: §II.
  • [15] D. K. Duvenaud, D. Maclaurin, J. Iparraguirre, R. Bombarell, T. Hirzel, A. Aspuru-Guzik, and R. P. Adams (2015) Convolutional networks on graphs for learning molecular fingerprints. In NIPS, pp. 2224–2232. Cited by: 1st item, §II, §III-B.
  • [16] J. Fras, Y. Noh, M. Macias, H. Wurdemann, and K. Althoefer (2018-05) Bio-inspired octopus robot based on novel soft fluidic actuator. In 2018 IEEE International Conference on Robotics and Automation (ICRA), Vol. , pp. 1583–1588. Cited by: §II.
  • [17] Lin. Gao, Y. Lai, Jie. Yang, Ling-Xiao. Zhang, Leif. Kobbelt, and Shihong. Xia (2017) Sparse Data Driven Mesh Deformation. arXiv:1709.01250. Cited by: §III.
  • [18] T. F. Gast, C. Schroeder, A. Stomakhin, C. Jiang, and J. M. Teran (2015) Optimization integrator for large time steps. IEEE transactions on visualization and computer graphics 21 (10), pp. 1103–1115. Cited by: §VI-A.
  • [19] A. George and E. Ng (1988) On the complexity of sparse $qr$ and $lu$ factorization of finite-element matrices. SIAM Journal on Scientific and Statistical Computing 9 (5), pp. 849–861. External Links: Document Cited by: §I, §II.
  • [20] E. Grinspun, A. N. Hirani, M. Desbrun, and P. Schröder (2003) Discrete shells. In Proceedings of the 2003 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, SCA ’03, Aire-la-Ville, Switzerland, Switzerland, pp. 62–67. External Links: ISBN 1-58113-659-5 Cited by: §I.
  • [21] G. Hirota, S. Fisher, and M. Lin (2000) Simulation of non-penetrating elastic bodies using distance fields. Cited by: §VI-F.
  • [22] Z. Hu, T. Han, P. Sun, J. Pan, and D. Manocha (2019-07) 3-d deformable object manipulation using deep neural networks. IEEE Robotics and Automation Letters PP, pp. 1–1. External Links: Document Cited by: §VII.
  • [23] J. Huang, Y. Tong, K. Zhou, H. Bao, and M. Desbrun (2011-07) Interactive shape interpolation through controllable dynamic deformation. IEEE Transactions on Visualization and Computer Graphics 17 (7), pp. 983–992. External Links: Document, ISSN 1077-2626 Cited by: §III-A.
  • [24] Z. Huang, J. Yao, Z. Zhong, Y. Liu, and X. Guo (2014) Sparse localized decomposition of deformation gradients. Comp. Graph. Forum 33 (7), pp. 239–248. Cited by: §VI-D, TABLE II.
  • [25] B. Jia, Z. Pan, Z. Hu, J. Pan, and D. Manocha (2018) Cloth manipulation using random forest-based controller parametrization. CoRR abs/1802.09661. Cited by: §II.
  • [26] B. Jia, Z. Pan, and D. Manocha (2018) Fast motion planning for high-dof robot systems using hierarchical system identification. External Links: arXiv:1809.08259 Cited by: §I, §II, Fig. 4, §V-C.
  • [27] L. Kavan, P.-P. Sloan, and C. O’Sullivan (2010) Fast and efficient skinning of animated meshes. Computer Graphics Forum 29 (2), pp. 327–336. External Links: Document, Link, Cited by: §VI-B.
  • [28] D. P. Kingma and M. Welling (2013) Auto-encoding variational bayes.. arXiv:1312.6114 . Cited by: §I, §II.
  • [29] K. Lakshmanan, A. Sachdev, Z. Xie, D. Berenson, K. Goldberg, and P. Abbeel (2013) A constraint-aware motion planning algorithm for robotic folding of clothes. In Experimental Robotics, pp. 547–562. Cited by: §I, §II.
  • [30] M. G. Larson and F. Bengzon (2013) The finite element method: theory, implementation, and applications. Springer Publishing Company, Incorporated. External Links: ISBN 3642332862, 9783642332869 Cited by: §I, §II.
  • [31] Y. Li, Y. Yue, D. Xu, E. Grinspun, and P. K. Allen Folding deformable objects using predictive simulation and trajectory optimization. In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 6000–6006. Cited by: §I.
  • [32] Y. Lim and S. De (2007) Real time simulation of nonlinear tissue response in virtual surgery using the point collocation-based method of finite spheres. Computer Methods in Applied Mechanics and Engineering 196 (31-32), pp. 3011–3024. Cited by: §II.
  • [33] T. Liu, A. W. Bargteil, J. F. O’Brien, and L. Kavan (2013-11) Fast simulation of mass-spring systems. ACM Transactions on Graphics 32 (6), pp. 209:1–7. Note: Proceedings of ACM SIGGRAPH Asia 2013, Hong Kong External Links: Link Cited by: §IV.
  • [34] L. v. d. Maaten and G. Hinton (2008) Visualizing data using t-sne. Journal of machine learning research 9 (Nov), pp. 2579–2605. Cited by: §VI-C.
  • [35] H. Maron, M. Galun, N. Aigerman, M. Trope, N. Dym, E. Yumer, V. G. Kim, and Y. Lipman (2017-07) Convolutional neural networks on surfaces via seamless toric covers. ACM Trans. Graph. 36 (4), pp. 71:1–71:10. External Links: ISSN 0730-0301, Link, Document Cited by: §II.
  • [36] D. Maturana and S. Scherer (2015-Sept) VoxNet: a 3d convolutional neural network for real-time object recognition. In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vol. , pp. 922–928. External Links: Document, ISSN Cited by: §II.
  • [37] K. Nakajima (2017) Muscular-hydrostat computers: physical reservoir computing for octopus-inspired soft robots. In Brain Evolution by Design, pp. 403–414. Cited by: §II.
  • [38] R. Narain, A. Samii, and J. F. O’Brien (2012-11) Adaptive anisotropic remeshing for cloth simulation. ACM Transactions on Graphics 31 (6), pp. 147:1–10. Note: Proceedings of ACM SIGGRAPH Asia 2012, Singapore Cited by: §I, 2nd item, §IV, Fig. 4, §V-C, §VI-B, §VI-F, TABLE I, §VI.
  • [39] T. Neumann, K. Varanasi, S. Wenger, M. Wacker, M. Magnor, and C. Theobalt (2013-11) Sparse localized deformation components. ACM Trans. Graph. 32 (6), pp. 179:1–179:10. External Links: ISSN 0730-0301 Cited by: §I, §II, §VI-D, TABLE II.
  • [40] Y. J. Oh, T. M. Lee, and I. Lee (2018) Hierarchical cloth simulation using deep neural networks. arXiv:1802.03168 . Cited by: §II.
  • [41] Z. Pan and D. Manocha (2018-05) Realtime planning for high-dof deformable bodies using two-stage learning. In 2018 IEEE International Conference on Robotics and Automation (ICRA), Vol. , pp. 1–8. Cited by: §II, §II.
  • [42] Z. Pan, C. Park, and D. Manocha (2016) Robot motion planning for pouring liquids. In Twenty-Sixth International Conference on Automated Planning and Scheduling, Cited by: §I.
  • [43] A. Radford, L. Metz, and S. Chintala (2015) Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv:1511.06434 . Cited by: §II.
  • [44] A. Rajeswaran*, V. Kumar*, A. Gupta, G. Vezzani, J. Schulman, E. Todorov, and S. Levine (2018) Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations. In Proceedings of Robotics: Science and Systems (RSS), Cited by: §I.
  • [45] O. Sorkine and M. Alexa (2007) As-rigid-as-possible surface modeling. In Proceedings of the Fifth Eurographics Symposium on Geometry Processing, SGP ’07, Aire-la-Ville, Switzerland, Switzerland, pp. 109–116. External Links: ISBN 978-3-905673-46-3, Link Cited by: §V-A.
  • [46] Q. Tan, L. Gao, Y. Lai, J. Yang, and S. Xia (2018) Mesh-based autoencoders for localized deformation component analysis. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, Cited by: §I, §II, §VI-B, §VI-D, TABLE I, TABLE II, TABLE III.
  • [47] L. Vasa and V. Skala (2011-02) A perception correlated comparison method for dynamic meshes. IEEE Transactions on Visualization and Computer Graphics 17 (2), pp. 220–230. External Links: ISSN 1077-2626, Link, Document Cited by: §VI-B, §VII.
  • [48] B. Wang, L. Wu, K. Yin, L. Liu, and H. Huang (2015) Deformation capture and modeling of soft objects. ACM Transactions on Graphics(Proc. of SIGGRAPH 2015) 34 (4), pp. 94:1–94:12. Cited by: §II.
  • [49] J. M. Wang, D. J. Fleet, and A. Hertzmann (2008-02) Gaussian process dynamical models for human motion. IEEE Transactions on Pattern Analysis and Machine Intelligence 30 (2), pp. 283–298. External Links: Document, ISSN 0162-8828 Cited by: §II.
  • [50] S. Wiewel, M. Becher, and N. Thuerey (2018) Latent-space physics: towards learning the temporal evolution of fluid flow. arXiv:1802.10123 . Cited by: §II, §V-B.
  • [51] Y. Xie, Franz,Erik, M. Chu, and N. Thuerey (2018) TempoGAN: a temporally coherent, volumetric gan for super-resolution fluid flow. ACM Transactions on Graphics (TOG) 37 (4), pp. 95. Cited by: §II.
  • [52] X. Yan, J. Yang, E. Yumer, Y. Guo, and H. Lee (2016) Perspective transformer nets: learning single-view 3d object reconstruction without 3d supervision. arXiv:1612.00814 . Cited by: §II.
  • [53] Z. Yang, J. Peltonen, and S. Kaski (2014) Optimization equivalence of divergences improves neighbor embedding. In International Conference on Machine Learning, pp. 460–468. Cited by: §VI-C.
  • [54] J. Zhu, S. C. H. Hoi, and M. R. Lyu (2009-06) Nonrigid shape recovery by gaussian process regression. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, Vol. , pp. 1319–1326. External Links: Document, ISSN 1063-6919 Cited by: §I, §II.
  • [55] H. Zou, T. Hastie, and R. Tibshirani (2004) Sparse principal component analysis. J. Comp. Graph. Statistics 15, pp. 2006. Cited by: §I, §II.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description