KeypointNet: A Large-scale 3D Keypoint DatasetAggregated from Numerous Human Annotations

KeypointNet: A Large-scale 3D Keypoint Dataset Aggregated from Numerous Human Annotations


Detecting 3D objects keypoints is of great interest to the areas of both graphics and computer vision. There have been several 2D and 3D keypoint datasets aiming to address this problem in a data-driven way. These datasets, however, either lack scalability or bring ambiguity to the definition of keypoints. Therefore, we present KeypointNet: the first large-scale and diverse 3D keypoint dataset that contains 83,060 keypoints and 8,329 3D models from 16 object categories, by leveraging numerous human annotations. To handle the inconsistency between annotations from different people, we propose a novel method to aggregate these keypoints automatically, through minimization of a fidelity loss. Finally, ten state-of-the-art methods are benchmarked on our proposed dataset.


1 Introduction

Detection of 3D keypoints is essential in many applications such as object matching, object tracking, shape retrieval and registration [17, 4, 32]. Utilization of keypoints to match 3D objects has its advantage of providing features that are semantically significant and such keypoints are usually made invariant to rotations, scales and other transformations.

Figure 1: We propose a large-scale KeypointNet dataset. It contains 8K+ models and 62K+ keypoint annotations.

In the trend of deep learning, 2D semantic point detection has been boosted with the help of a large quantity of high-quality datasets [2, 18]. However, there are few 3D datasets focusing on the keypoint representation of an object. Dutagaci et al. [9] collect 43 models and label them according to annotations from various persons. Annotations from different persons are finally aggregated by geodesic clustering. ShapeNetCore keypoint dataset [36], and a similar dataset [12], in another way, resort to an expert’s annotation on keypoints, making them vulnerable and biased.

In order to alleviate the bias of experts’ definitions on keypoints, we ask a large group of people to annotate various keypoints according to their own understanding. Challenges rise in that different people may annotate different keypoints and we need to identify the consensus and patterns in these annotations. Finding such patterns is not trivial when a large set of keypoints spread across the entire model. A simple clustering would require a predefined distance threshold and fail to identify closely spaced keypoints. As shown in Figure 1, there are four closely spaced keypoints on each airplane empennage and it is extremely hard for simple clustering methods to distinguish them. Besides, clustering algorithms do not give semantic labels of keypoints since it is ambiguous to link clustered groups with each other. In addition, people’s annotations are not always exact and errors of annotated keypoint locations are inevitable. In order to solve these problems, we propose a novel method to aggregate a large number of keypoint annotations from distinct people, by optimizing a fidelity loss. After this auto aggregation process, we verify these generated keypoints based on some simple priors such as symmetry.

In this paper, we build the first large-scale and diverse dataset named KeypointNet which contains 8,957 models with 62,295 keypoints. These keypoints are of high fidelity and rich in structural or semantic meanings. Some examples are given in Figure 1. We hope this dataset could boost semantic understandings of common objects.

In addition, we propose two large-scale keypoint prediction tasks: keypoint saliency estimation and keypoint correspondence estimation. We benchmark ten state-of-the-art algorithms with mIoU, mAP and PCK metrics. Results show that the detection and identification of keypoints remain a challenging task.

In summary, we make the following contributions:

  • To the best of our knowledge, we provide the first large-scale dataset on 3D keypoints, both in number of categories and keypoints.

  • We come up with a novel approach on aggregating people’s annotations on keypoints, even if their annotations are independent from each other.

  • We experiment with ten state-of-the-art benchmarks on our dataset, including point cloud, graph, voxel and local geometry based keypoint detection methods.

2 Related Work

2.1 Detection of Keypoints

Detection of 3D keypoints has been a very important task for 3D object understanding which can be used in many applications, such as object pose estimation, reconstruction, matching, segmentation, etc. Researchers have proposed various methods to produce interest points on objects to help further objects processing. Traditional methods like 3D Harris [26], HKS [27], Salient Points [5], Mesh Saliency [14], Scale Dependent Corners [20], CGF [11], SHOT [30], etc, exploit local reference frames (LRF) to extract geometric features as local descriptors. However, because of the repeatability of similar local frames, most of the interest points detected by traditional methods are redundant which are not suitable for describing the objects. Besides, these methods only consider local geometric information without semantic knowledge, diverged from human understanding.

Recent deep learning methods like SyncSpecCNN [36], deep functional dictionaries [28] are proposed to detect keypoints. Unlike traditional ones, these methods do not handle rotations well. Though some recent methods like S2CNN [7] and PRIN [37] try to fix this, deep learning methods still rely on ground-truth keypoint labels annotated by human with expert verification.

2.2 Keypoint Datasets

Keypoint datasets have its origin in 2D images, where plenty of datasets on human skeletons and object interest points are proposed. For human skeletons, MPII human pose dataset [2], MSCOCO keypoint challenge [19] and PoseTrack [1] annotate millions of keypoints on humans. For more general objects, SPair-71k [18] contains 70,958 image pairs with diverse variations in viewpoint and scale, with a number of corresponding keypoints on each image pair. PUB [31] provides 15 part locations on 11,788 images from 200 bird categories and PASCAL [3] provides keypoint annotations for 20 object categories.

Keypoint datasets on 3D objects, include Dutagaci et al. [9], SyncSpecCNN [36] and Kim et al. [12]. Dutagaci et al. [9] aggregates multiple annotations from different people with an ad-hoc method while the dataset is extremely small. Though SyncSpecCNN [36], Pavlakos et al. [21] and Kim et al. [12] give a relatively large keypoint dataset, they rely on a manually designed template of keypoints, which is inevitably biased and flawed.

3 KeypointNet: A Large-scale 3D Keypoint Dataset

3.1 Data Collection

KeypointNet is built on ShapeNetCore [6]. ShapeNetCore covers 55 common object categories with about 51,300 unique 3D models.

We filter out those models that deviate from the majority and keep at most 1000 instances for each category in order to provide a balanced dataset. In addition, a consistent canonical orientation is established (e.g., upright and front) for every category because of the incomplete alignment in ShapeNetCore.

We let annotators determine which points are important, and same keypoint indices should indicate same meanings for each annotator. Though annotators are free to give their own keypoints, three general principles should be obeyed: (1) each keypoint should describe an object’s semantic information shared across instances of the same object category, (2) keypoints of an object category should spread over the whole object and (3) different keypoints have distinct semantic meanings. After that, we utilize a heuristic method to aggregate these points, which will be discussed in Section 4.

Keypoints are annotated on meshes and these annotated meshes are then downsampled to 2,048 points. Our final dataset is a collection of point clouds, with keypoint indices.

3.2 Annotation Tools

We develop an easy-to-use web annotation tool based on NodeJS. Every user is allowed to click up to 20 interest points according to his/her own understanding. The UI interface is shown in Figure 2. Annotated models are shown in the left panel while the next unprocessed model is shown in the right panel.

Figure 2: Web interface of the annotation tool.
Figure 3: Dataset Visualization. Here we plot ground-truth keypoints for several categories. We can see that by utilizing our automatic aggregation method, keypoints of high fidelity are extracted.

3.3 Dataset Statistics

At the time of this work, our dataset has collected 16 common categories from ShapeNetCore, with 8957 models. Each model contains 3 to 20 keypoints. Our dataset is divided into train, validation and test splits, with 7:1:2 ratio. Table 1 gives detailed statistics of our dataset. Some visualizations of our dataset is given in Figure 3.

Figure 4: Keypoint aggregation pipeline. We first infer dense embeddings from human labeled raw annotations. Then fidelity error maps are calculated by summing embedding distances to human labeled keypoints. Non Minimum Suppression is conducted to form a potential set of keypoints. These keypoints are then projected onto 2D subspace with t-SNE and verified by humans.
Category Train Val Test All #Annotators Airplane 719 103 205 1027 21 Bathtub 351 50 100 502 11 Bed 111 16 32 158 6 Bottle 277 40 79 396 8 Cap 29 4 8 42 6 Car 704 101 201 1005 14 Chair 715 102 204 1021 15 Guitar 431 62 123 615 11 Helmet 71 10 20 101 5 Knife 217 31 62 310 5 Laptop 312 45 89 446 10 Motorcycle 211 30 60 301 7 Mug 134 19 38 192 8 Skateboard 106 15 30 151 6 Table 793 113 227 1133 13 Vessel 650 93 186 929 13 Total 5830 833 1666 8329 159 Category Train Val Test All Airplane 7035 1005 2010 10050 Bathtub 4731 676 1352 6759 Bed 1416 202 405 2023 Bottle 2487 355 711 3553 Cap 146 21 42 209 Car 11152 1593 3186 15932 Chair 8472 1210 2421 12103 Guitar 2684 383 767 3834 Helmet 480 69 137 685 Knife 650 93 186 929 Laptop 1858 265 531 2654 Motorcycle 1670 239 477 2386 Mug 1413 202 404 2019 Skateboard 790 113 226 1128 Table 6381 912 1823 9116 Vessel 6776 968 1936 9680 Total 58142 8306 16612 83060
Table 1: Keypoint Dataset statistics. Left: number of models in each category. Right: number of keypoints in each category.

4 Keypoint Aggregation

Given all human labeled raw keypoints, we leverage a novel method to aggregate them together into a set of ground-truth keypoints.

There are generally two reasons: 1) distinct people may annotate different sets of keypoints and human labeled keypoints are sometimes erroneous, so we need an elegant way to aggregate these keypoints; 2) a simple clustering algorithm would fail to distinguish those closely spaced keypoints and cannot give consistent semantic labels.

4.1 Problem Statement

Given a -dimensional sub-manifold , where is the index of the model, a valid annotation from the -th person is a keypoint set , where is the keypoint index and is the number of keypoints annotated by person . Note that different people may have different sets of keypoint indices and these indices are independent.

Our goal is to aggregate a set of potential ground-truth keypoints , where is the number of proposed keypoints for each model , so that and share the same semantic.

4.2 Keypoint Saliency

Each annotation is allowed to be erroneous within a small region, so that a keypoint distribution is defined as follows:

where is Gaussian kernel function. is a normalization constant. This contradicts many previous methods on annotating keypoints where a -function is implicitly assumed. We argue that it is common that humans make mistakes when annotating keypoints and due to central limit theorem, the keypoint distribution would form a Gaussian.

4.3 Ground-truth Keypoint Generation

We propose to jointly output a dense mapping function whose parameters are , and the aggregated ground-truth keypoint set . transforms each point into an high-dimensional embedding vector in . Specifically, we solve the following optimization problem:


where is the data fidelity loss and is a regularization term to avoid trivial solution like . The constraint states that the embedding of ground-truth keypoints with the same index should be the same.

Fidelity Loss

We define as:

where is the L2 distance between two vectors in embedding space:


Unlike previous methods such as Dutagaci et al.[9] where a simple geodesic average of human labeled points is given as ground-truth points, we seek a point whose expected embedding distance to all human labeled points is smallest. The reason is that geodesic distance is sensitive to the misannotated keypoints and could not distinguish closely spaced keypoints, while embedding distance is more robust to noisy points as the embedding space encodes the semantic information of an object.

Equation 1 involves both and and it is impractical to solve this problem in closed form. In practice, we use alternating minimization with a deep neural network to approximate the embedding function , so that we solve the following dual problem instead (by slightly loosening the constraints):


and alternate between the two equations until convergence.

By solving this problem, we find both an optimal embedding function , together with intra-class consistent ground-truth keypoints , while keeping its embedding distance from human-labeled keypoints as close as possible. The ground-truth keypoints can be viewed as the projection of human labeled data onto embedding space.

Figure 5: Visualizations of detected keypoints for six algorithms.
Airplane Bathtub Bed Bottle Cap Car Chair Guitar Helmet Knife Laptop Motor Mug Skate Table Vessel Average
PointNet 9.1/8.5 0.5/3.6 6.4/6.4 0.0/1.3 0.0/3.2 0.0/2.3 4.5/9.6 0.0/1.0 0.0/0.4 0.0/16.3 11.6/14.5 1.9/2.6 0.0/3.4 0.0/1.7 11.0/12.0 0.0/2.2 2.8/5.6
PointNet++ 20.5/33.6 10.2/5.7 16.1/18.4 22.0/26.0 30.7/32.2 40.3/49.9 27.3/39.7 31.5/36.6 42.3/47.2 20.5/27.8 29.8/38.9 15.7/14.3 22.0/35.4 48.2/31.3 18.0/25.9 12.4/16.7 25.5/30.0
RSNet 20.5/31.1 12.8/17.8 19.2/29.5 13.1/12.8 15.7/21.8 15.1/21.8 13.9/15.4 16.4/16.1 8.4/6.1 18.3/31.5 22.8/35.0 20.2/26.1 16.8/23.2 4.0/5.4 15.4/45.3 9.7/12.8 15.1/22.0
SpiderCNN 22.2/25.8 7.2/6.7 17.7/19.8 4.1/2.7 2.7/4.0 5.5/6.5 15.9/18.9 7.1/10.5 0.0/0.4 30.0/28.4 22.4/34.3 14.5/15.0 4.9/5.3 0.0/1.7 23.9/30.2 8.5/8.9 11.7/13.7
PointConv 25.3/28.1 15.2/24.6 32.4/45.8 7.3/10.1 13.5/15.7 20.3/24.6 21.7/30.8 21.2/21.7 2.1/2.0 5.0/17.3 27.8/46.5 18.9/29.3 21.7/27.3 13.2/18.9 26.8/42.4 13.9/22.6 17.9/25.5
RSCNN 21.0/34.4 11.9/17.3 19.3/28.4 11.6/16.8 18.9/31.6 15.8/16.3 17.6/21.7 17.9/18.2 0.0/3.1 24.2/30.6 25.3/37.9 13.4/23.8 17.2/25.4 5.9/9.7 23.7/41.4 10.1/14.7 15.9/23.2
DGCNN 32.3/43.8 17.7/26.2 21.6/33.4 15.0/20.7 21.5/27.6 15.1/21.4 23.8/30.3 20.7/22.9 3.5/4.8 29.4/40.5 30.1/46.4 23.5/29.2 18.1/24.9 12.8/17.3 31.7/52.1 15.6/19.7 20.8/28.8
GraphCNN 22.9/25.9 12.5/14.9 0.7/0.8 1.8/5.2 0.2/0.3 10.3/10.5 12.7/14.8 1.0/5.7 0.3/0.4 0.1/0.2 0.3/0.3 18.7/23.0 10.8/11.4 0.4/0.5 24.2/34.4 8.9/9.7 7.9/9.9
Harris3D 0.4/- 0.3/- 1.0/- 1.0/- 0.0/- 0.7/- 1.4/- 1.6/- 0.2/- 0.0/- 0.0/- 0.3/- 0.3/- 0.5/- 0.7/- 3.3/- 0.7/-
SIFT3D 4.5/- 0.9/- 0.9/- 0.7/- 1.0/- 1.2/- 0.9/- 0.2/- 0.9/- 0.0/- 0.5/- 0.7/- 0.7/- 0.3/- 1.0/- 2.2/- 1.0/-
ISS3D 0.4/- 1.0/- 0.5/- 0.9/- 1.9/- 2.0/- 0.0/- 0.6/- 0.8/- 0.0/- 0.2/- 0.3/- 0.5/- 0.5/- 0.0/- 3.3/- 0.8/-
Table 2: mIoU and mAP results (in percentage) for compared methods with distance threshold 0.01.

Non Minimum Suppression

Equation 3 may be hard to solve since is also unknown beforehand. For each model , the fidelity error associated with each potential keypoint is:


where .

Then is found by conducting Non Minimum Suppression (NMS), such that:


where is some neighborhood threshold.

After NMS, we would get several ground-truth points for each manifold . However, the arbitrarily assigned index within each model does not provide a consistent semantic correspondence across different models. Therefore we cluster these points according to their embeddings by first projecting them onto 2D subspace with t-SNE [16].

Ground-truth Verification

Though the above method automatically aggregate a set of potential set of keypoints with high precision, it omits some keypoints in some cases. As the last step, experts manually verify these keypoints based on some simple priors such as the rotational symmetry and centrosymmetry of an object.

4.4 Implementation Details

At the start of the alternating minimization, we initialize to be sampled from raw annotations and then run one iteration, which is enough for the convergence. We choose PointConv with hidden dimension 128 as the embedding function . During the optimization of Equation 3, we classify each point into classes with a SoftMax layer and extract the feature of the last but one layer as the embedding. The learning rate for PointConv is 1e-3 and the optimizer is Adam [13].

4.5 Pipeline

The whole pipeline is shown in Figure 4. We first infer dense embeddings from human labeled raw annotations. Then fidelity error maps are calculated by summing embedding distances to human labeled keypoints. Non Minimum Suppression is conducted to form a potential set of keypoints. These keypoints are then projected onto 2D subspace with t-SNE and verified by humans.

5 Tasks and Benchmarks

In this section, we propose two keypoint prediction tasks: keypoint saliency estimation and keypoint correspondence estimation. Keypoint saliency estimation requires evaluated methods to give a set of potential indistinguishable keypoints while keypoint correspondence estimation asks to localize a fixed number of distinguishable keypoints.

5.1 Keypoint Saliency Estimation

Dataset Preparation

For keypoint saliency estimation, we only consider whether a point is the keypoint or not, without giving its semantic label. Our dataset is split into train, validation and test sets with the ratio 70%, 10%, 20%.

Evaluation Metrics

Two metrics are adopted to evaluate the performance of keypoint saliency estimation. Firstly, we evaluate their mean Intersection over Unions [29] (mIoU), which can be calculated as


mIoU is calculated under different error tolerances from 0 to 0.1. Secondly, for those methods that output keypoint probabilities, we evaluate their mean Average Precisions (mAP) over all categories.

Benchmark Algorithms

We benchmark seven state-of-the-art algorithms on point cloud semantic analysis: PointNet [22], PointNet++ [23], RSNet [10], SpiderCNN [35], PointConv [34], RSCNN [15], DGCNN [33] and GraphCNN [8]. Three traditional local geometric keypoint detectors are also considered: Harris3D [26], SIFT3D [24] and ISS3D [25].

Evaluation Results

For deep learning methods, we use the default network architectures and hyperparameters to predict the keypoint probability of each point and mIoU and mAP are adopted to evaluate their performance. For local geometry based methods, mIoU is used. Each method is tested with various geodesic error thresholds. In Table 2, we report mIoU and mAP results under a restrictive threshold 0.01. Figure  6 shows the mIoU curves under different distance thresholds from 0 to 0.1 and Figure 7 shows the mAP results. We can see that under a restrictive distance threshold 0.01, geometric and deep learning methods both fail to predict qualified keypoints.

Figure 5 shows some visualizations of the results from RSNet, RSCNN, DGCNN, GraphCNN, ISS3D and Harris3D. Deep learning methods can predict some of ground-truth keypoints while the predicted keypoints are sometimes missing. For local geometry based methods like ISS3D and Harris3D, they give much more interest points spread over the entire model while these points are agnostic of semantic information. Learning discriminative features for better localizing accurate and distinct keypoints across various objects is still a challenging task.

Figure 6: mIoU results under various distance thresholds (0-0.1) for compared algorithms.
Figure 7: mAP results under various distance thresholds (0-0.1) for compared algorithms.

5.2 Keypoint Correspondence Estimation

Keypoint correspondence estimation is a more challenging task than keypoint saliency estimation. One needs to predict not only the keypoints, but also their semantic labels. The semantic labels should be consistent across different objects in the same category.

Dataset Preparation

For keypoint correspondence estimation, each keypoint is labeled with a semantic index. For those keypoints that do not exist on some objects, index -1 is given. Similar with SyncSpecCNN [36], the maximum number of keypoints of each category is bounded. Data split is the same as keypoint saliency estimation.

Figure 8: Visualizations of detected keypoints and their semantic labels. Same colors indicate same semantic labels.
Airplane Bath Bed Bottle Cap Car Chair Guitar Helmet Knife Laptop Motor Mug Skate Table Vessel Average
PointNet 42.7 19.5 28.9 25.0 85.0 22.9 13.2 18.8 2.8 54.2 47.9 25.0 30.2 15.9 39.6 16.7 30.5
PointNet++ 42.3 24.1 32.7 21.5 45.0 30.4 25.3 23.3 6.5 26.7 42.2 32.5 22.4 12.1 55.8 17.6 28.8
RSCNN 36.9 27.8 34.6 15.6 38.3 30.8 21.5 32.6 4.7 33.3 52.2 40.0 25.5 17.2 49.2 19.4 30.0
RS-Net 38.3 28.8 46.3 24.0 20.0 39.3 24.1 29.2 9.6 57.8 60.0 45.8 31.7 18.2 36.7 19.4 33.1
SpiderCNN 44.3 19.4 32.2 12.6 80.0 18.3 23.7 26.7 6.5 24.4 40.0 34.2 21.2 19.8 54.2 22.4 30.0
PointConv 40.3 0.0 0.0 15.6 55.0 13.3 22.6 20.9 3.7 33.3 50.0 35.0 25.5 25.0 42.5 21.2 25.2
DGCNN 38.9 20.3 21.9 14.2 10.0 16.2 13.9 19.8 8.3 38.9 44.4 21.9 16.0 9.8 36.8 9.0 21.3
GraphCNN 41.1 23.3 25.0 16.0 13.3 18.5 20.0 17.5 3.0 37.8 44.4 31.7 15.0 11.5 40.0 24.4 23.9
Table 3: PCK results under distance threshold 0.01 for various deep learning networks.

Evaluation Metric

The prediction of network is evaluated by the percentage of correct keypoints (PCK), which is used to evaluate the accuracy of keypoint prediction in many previous works [36, 28].

Figure 9: PCK results under various distance thresholds (0-0.1) for compared algorithms.

Benchmark Algorithms

We benchmark seven state-of-the-art algorithms on point cloud semantic analysis: PointNet [22], PointNet++[23], RSNet [10], SpiderCNN [35], PointConv [34], RSCNN [15], DGCNN [33] and GraphCNN [8].

Evaluation Results

Similarly, we use the default network architectures. Table 3 shows the PCK results with error distance threshold 0.01. Figure 9 illustrates the percentage of correct points curves with distance thresholds varied from 0 to 0.1. RS-Net performs relatively better than other methods with the distance threshold under 0.02. RSCNN gives better results by a large margin with the distance threshold above 0.02. However, all seven methods face big difficulties in predicting exact consistent semantic keypoints.

Figure 8 shows some visualizations of the results for different methods. Same colors denote same semantic labels. We can see that most methods can accurately predict some of keypoints. However, there are still some missing keypoints and inaccurate localizations.

Keypoint saliency estimation and keypoint correspondence estimation are both important for object understanding. Keypoint saliency estimation gives a spare representation of object by extracting meaningful points. Keypoint correspondence estimation establishes relations between points on different objects. From the results above, we can see that these two tasks still remain challenging. The reason is that object keypoints from human perspective are not simply geometrically salient points but abstracts semantic meanings of the object.

6 Conclusion

In this paper, we propose a large-scale and high-quality KeypointNet dataset. In order to generate ground-truth keypoints from raw human annotations where identification of their modes are non-trivial, we transform the problem into an optimization problem and solve it in an alternating fashion. By optimizing a fidelity loss, ground-truth keypoints, together with their correspondences are generated. In addition, we evaluate and compare several state-of-the-art methods on our proposed dataset and we hope this dataset could boost the semantic understanding of 3D objects.


  1. footnotemark:


  1. M. Andriluka, U. Iqbal, E. Insafutdinov, L. Pishchulin, A. Milan, J. Gall and B. Schiele (2018) Posetrack: a benchmark for human pose estimation and tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5167–5176. Cited by: §2.2.
  2. M. Andriluka, L. Pishchulin, P. Gehler and B. Schiele (2014-06) 2D human pose estimation: new benchmark and state of the art analysis. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §1, §2.2.
  3. L. Bourdev and J. Malik (2009) Poselets: body part detectors trained using 3d human pose annotations. In 2009 IEEE 12th International Conference on Computer Vision, pp. 1365–1372. Cited by: §2.2.
  4. M. Bueno, J. Martínez-Sánchez, H. González-Jorge and H. Lorenzo (2016) DETECTION of geometric keypoints and its application to point cloud coarse registration.. International Archives of the Photogrammetry, Remote Sensing & Spatial Information Sciences 41. Cited by: §1.
  5. U. Castellani, M. Cristani, S. Fantoni and V. Murino (2008) Sparse points matching by combining 3d mesh saliency with statistical descriptors. In Computer Graphics Forum, Vol. 27, pp. 643–652. Cited by: §2.1.
  6. A. X. Chang, T. Funkhouser, L. Guibas, P. Hanrahan, Q. Huang, Z. Li, S. Savarese, M. Savva, S. Song and H. Su (2015) Shapenet: an information-rich 3d model repository. arXiv preprint arXiv:1512.03012. Cited by: §3.1.
  7. T. S. Cohen, M. Geiger, J. Köhler and M. Welling (2018) Spherical cnns. arXiv preprint arXiv:1801.10130. Cited by: §2.1.
  8. M. Defferrard, X. Bresson and P. Vandergheynst (2016) Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in neural information processing systems, pp. 3844–3852. Cited by: §5.1, §5.2.
  9. H. Dutagaci, C. P. Cheung and A. Godil (2012) Evaluation of 3d interest point detection techniques via human-generated ground truth. The Visual Computer 28 (9), pp. 901–917. Cited by: §1, §2.2, §4.3.
  10. Q. Huang, W. Wang and U. Neumann (2018) Recurrent slice networks for 3d segmentation of point clouds. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2626–2635. Cited by: §5.1, §5.2.
  11. M. Khoury, Q. Zhou and V. Koltun (2017) Learning compact geometric features. In Proceedings of the IEEE International Conference on Computer Vision, pp. 153–161. Cited by: §2.1.
  12. V. G. Kim, W. Li, N. J. Mitra, S. Chaudhuri, S. DiVerdi and T. Funkhouser (2013) Learning part-based templates from large collections of 3d shapes. ACM Transactions on Graphics (TOG) 32 (4), pp. 70. Cited by: §1, §2.2.
  13. D. P. Kingma and J. Ba (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §4.4.
  14. C. H. Lee, A. Varshney and D. W. Jacobs (2005) Mesh saliency. ACM transactions on graphics (TOG) 24 (3), pp. 659–666. Cited by: §2.1.
  15. Y. Liu, B. Fan, S. Xiang and C. Pan (2019) Relation-shape convolutional neural network for point cloud analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8895–8904. Cited by: §5.1, §5.2.
  16. L. v. d. Maaten and G. Hinton (2008) Visualizing data using t-sne. Journal of machine learning research 9 (Nov), pp. 2579–2605. Cited by: §4.3.
  17. A. S. Mian, M. Bennamoun and R. Owens (2006) Three-dimensional model-based object recognition and segmentation in cluttered scenes. IEEE transactions on pattern analysis and machine intelligence 28 (10), pp. 1584–1601. Cited by: §1.
  18. J. Min, J. Lee, J. Ponce and M. Cho (2019) SPair-71k: a large-scale benchmark for semantic correspondence. arXiv preprint arXiv:1908.10543. Cited by: §1, §2.2.
  19. (2016)(Website) External Links: Link Cited by: §2.2.
  20. J. Novatnack and K. Nishino (2007) Scale-dependent 3d geometric features. In 2007 IEEE 11th International Conference on Computer Vision, pp. 1–8. Cited by: §2.1.
  21. G. Pavlakos, X. Zhou, A. Chan, K. G. Derpanis and K. Daniilidis (2017) 6-dof object pose from semantic keypoints. In 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 2011–2018. Cited by: §2.2.
  22. C. R. Qi, H. Su, K. Mo and L. J. Guibas (2017) Pointnet: deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 652–660. Cited by: §5.1, §5.2.
  23. C. R. Qi, L. Yi, H. Su and L. J. Guibas (2017) Pointnet++: deep hierarchical feature learning on point sets in a metric space. In Advances in neural information processing systems, pp. 5099–5108. Cited by: §5.1, §5.2.
  24. B. Rister, M. A. Horowitz and D. L. Rubin (2017) Volumetric image registration from invariant keypoints. IEEE Transactions on Image Processing 26 (10), pp. 4900–4910. Cited by: §5.1.
  25. S. Salti, F. Tombari, R. Spezialetti and L. Di Stefano (2015) Learning a descriptor-specific 3d keypoint detector. In Proceedings of the IEEE International Conference on Computer Vision, pp. 2318–2326. Cited by: §5.1.
  26. I. Sipiran and B. Bustos (2011) Harris 3d: a robust extension of the harris operator for interest point detection on 3d meshes. The Visual Computer 27 (11), pp. 963. Cited by: §2.1, §5.1.
  27. J. Sun, M. Ovsjanikov and L. Guibas (2009) A concise and provably informative multi-scale signature based on heat diffusion. In Computer graphics forum, Vol. 28, pp. 1383–1392. Cited by: §2.1.
  28. M. Sung, H. Su, R. Yu and L. J. Guibas (2018) Deep functional dictionaries: learning consistent semantic structures on 3d models from functions. In Advances in Neural Information Processing Systems, pp. 485–495. Cited by: §2.1, §5.2.
  29. L. Teran and P. Mordohai (2014) 3D interest point detection via discriminative learning. In European Conference on Computer Vision, pp. 159–173. Cited by: §5.1.
  30. F. Tombari, S. Salti and L. Di Stefano (2010) Unique signatures of histograms for local surface description. In European conference on computer vision, pp. 356–369. Cited by: §2.1.
  31. C. Wah, S. Branson, P. Welinder, P. Perona and S. Belongie (2011) The Caltech-UCSD Birds-200-2011 Dataset. Technical report Technical Report CNS-TR-2011-001, California Institute of Technology. Cited by: §2.2.
  32. H. Wang, J. Guo, D. Yan, W. Quan and X. Zhang (2018) Learning 3d keypoint descriptors for non-rigid shape matching. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19. Cited by: §1.
  33. Y. Wang, Y. Sun, Z. Liu, S. E. Sarma, M. M. Bronstein and J. M. Solomon (2019) Dynamic graph cnn for learning on point clouds. ACM Transactions on Graphics (TOG) 38 (5), pp. 146. Cited by: §5.1, §5.2.
  34. W. Wu, Z. Qi and L. Fuxin (2019) Pointconv: deep convolutional networks on 3d point clouds. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9621–9630. Cited by: §5.1, §5.2.
  35. Y. Xu, T. Fan, M. Xu, L. Zeng and Y. Qiao (2018) Spidercnn: deep learning on point sets with parameterized convolutional filters. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 87–102. Cited by: §5.1, §5.2.
  36. L. Yi, H. Su, X. Guo and L. J. Guibas (2017) Syncspeccnn: synchronized spectral cnn for 3d shape segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2282–2290. Cited by: §1, §2.1, §2.2, §5.2, §5.2.
  37. Y. You, Y. Lou, Q. Liu, Y. Tai, W. Wang, L. Ma and C. Lu (2018) Prin: pointwise rotation-invariant network. arXiv preprint arXiv:1811.09361. Cited by: §2.1.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description