Geometric Property Guided Semantic Analysis of 3D Point Clouds

Geometric Property Guided Semantic Analysis of 3D Point Clouds

Auxiliary Geometric Learning on Point Clouds

Lulu Tang, Ke Chen, Chaozheng Wu, Yu Hong, Kui Jia, and Zhixin Yang equal contribution to this work; corresponding author.This work is supported in part by the Ministry of Science and Technology of China (Grant NO. 2016YFE0121700), the National Natural Science Foundation of China (Grant No.: 61771201, 61902131), the Science and Technology Development Fund of Macao SAR (FDCT) (MoST-FDCT Joint Grant No. 015/2015/AMJ, FDCT/194/2017/A3), the Program for Guangdong Introducing Innovative and Enterpreneurial Teams (Grant No.: 2017ZT07X183) and the University of Macau (Grant No.: MYRG2016-00160-FST, MYRG2018-00248).L. Tang and Z. Yang are with the State Key Laboratory of Internet of Things for Smart City and Department of Electromechanical Engineering , University of Macau, Macau SAR, China, E-mails: lulu.tang@connect.umac.mo; zxyang@umac.mo.K. Chen, C. Wu, Y. Hong and K. Jia are with the School of Electronic and Information Engineering, South China University of Technology, Guangzhou 510641, China.

Improving Point Cloud Analysis by Auxiliary Deep Regression on Geometric Properties

Lulu Tang, Ke Chen, Chaozheng Wu, Yu Hong, Kui Jia, and Zhixin Yang equal contribution to this work; corresponding author.This work is supported in part by the Ministry of Science and Technology of China (Grant NO. 2016YFE0121700), the National Natural Science Foundation of China (Grant No.: 61771201, 61902131), the Science and Technology Development Fund of Macao SAR (FDCT) (MoST-FDCT Joint Grant No. 015/2015/AMJ, FDCT/194/2017/A3), the Program for Guangdong Introducing Innovative and Enterpreneurial Teams (Grant No.: 2017ZT07X183) and the University of Macau (Grant No.: MYRG2016-00160-FST, MYRG2018-00248).L. Tang and Z. Yang are with the State Key Laboratory of Internet of Things for Smart City and Department of Electromechanical Engineering , University of Macau, Macau SAR, China, E-mails: lulu.tang@connect.umac.mo; zxyang@umac.mo.K. Chen, C. Wu, Y. Hong and K. Jia are with the School of Electronic and Information Engineering, South China University of Technology, Guangzhou 510641, China.

Improving Semantic Analysis on Point Sets by Auxiliary Regression on Geometric Properties

Lulu Tang, Ke Chen, Chaozheng Wu, Yu Hong, Kui Jia, and Zhixin Yang equal contribution to this work; corresponding author.This work is supported in part by the Ministry of Science and Technology of China (Grant NO. 2016YFE0121700), the National Natural Science Foundation of China (Grant No.: 61771201, 61902131), the Science and Technology Development Fund of Macao SAR (FDCT) (MoST-FDCT Joint Grant No. 015/2015/AMJ, FDCT/194/2017/A3), the Program for Guangdong Introducing Innovative and Enterpreneurial Teams (Grant No.: 2017ZT07X183) and the University of Macau (Grant No.: MYRG2016-00160-FST, MYRG2018-00248).L. Tang and Z. Yang are with the State Key Laboratory of Internet of Things for Smart City and Department of Electromechanical Engineering , University of Macau, Macau SAR, China, E-mails: lulu.tang@connect.umac.mo; zxyang@umac.mo.K. Chen, C. Wu, Y. Hong and K. Jia are with the School of Electronic and Information Engineering, South China University of Technology, Guangzhou 510641, China.

Deep Geometric Learning with Auxiliary Regression on Surface Properties

Lulu Tang, Ke Chen, Chaozheng Wu, Yu Hong, Kui Jia, and Zhixin Yang equal contribution to this work; corresponding author.This work is supported in part by the Ministry of Science and Technology of China (Grant NO. 2016YFE0121700), the National Natural Science Foundation of China (Grant No.: 61771201, 61902131), the Science and Technology Development Fund of Macao SAR (FDCT) (MoST-FDCT Joint Grant No. 015/2015/AMJ, FDCT/194/2017/A3), the Program for Guangdong Introducing Innovative and Enterpreneurial Teams (Grant No.: 2017ZT07X183) and the University of Macau (Grant No.: MYRG2016-00160-FST, MYRG2018-00248).L. Tang and Z. Yang are with the State Key Laboratory of Internet of Things for Smart City and Department of Electromechanical Engineering , University of Macau, Macau SAR, China, E-mails: lulu.tang@connect.umac.mo; zxyang@umac.mo.K. Chen, C. Wu, Y. Hong and K. Jia are with the School of Electronic and Information Engineering, South China University of Technology, Guangzhou 510641, China.

Improving Semantic Analysis on Point Clouds via Auxiliary Supervision of Geometric Properties

Lulu Tang, Ke Chen, Chaozheng Wu, Yu Hong, Kui Jia, and Zhixin Yang equal contribution to this work; corresponding author.This work is supported in part by the Ministry of Science and Technology of China (Grant NO. 2016YFE0121700), the National Natural Science Foundation of China (Grant No.: 61771201, 61902131), the Science and Technology Development Fund of Macao SAR (FDCT) (MoST-FDCT Joint Grant No. 015/2015/AMJ, FDCT/194/2017/A3), the Program for Guangdong Introducing Innovative and Enterpreneurial Teams (Grant No.: 2017ZT07X183) and the University of Macau (Grant No.: MYRG2016-00160-FST, MYRG2018-00248).L. Tang and Z. Yang are with the State Key Laboratory of Internet of Things for Smart City and Department of Electromechanical Engineering , University of Macau, Macau SAR, China, E-mails: lulu.tang@connect.umac.mo; zxyang@umac.mo.K. Chen, C. Wu, Y. Hong and K. Jia are with the School of Electronic and Information Engineering, South China University of Technology, Guangzhou 510641, China.

Improving Semantic Analysis on Point Clouds via Auxiliary Supervision of Local Geometric Priors

Lulu Tang, Ke Chen, Chaozheng Wu, Yu Hong, Kui Jia, and Zhixin Yang equal contribution to this work; corresponding author.This work is supported in part by the Ministry of Science and Technology of China (Grant NO. 2016YFE0121700), the National Natural Science Foundation of China (Grant No.: 61771201, 61902131), the Science and Technology Development Fund of Macao SAR (FDCT) (MoST-FDCT Joint Grant No. 015/2015/AMJ, FDCT/194/2017/A3), the Program for Guangdong Introducing Innovative and Enterpreneurial Teams (Grant No.: 2017ZT07X183) and the University of Macau (Grant No.: MYRG2016-00160-FST, MYRG2018-00248).L. Tang and Z. Yang are with the State Key Laboratory of Internet of Things for Smart City and Department of Electromechanical Engineering , University of Macau, Macau SAR, China, E-mails: lulu.tang@connect.umac.mo; zxyang@umac.mo.K. Chen, C. Wu, Y. Hong and K. Jia are with the School of Electronic and Information Engineering, South China University of Technology, Guangzhou 510641, China.

Abstract

Existing deep learning algorithms for point cloud analysis mainly concern discovering semantic patterns from global configuration of local geometries in a supervised learning manner. However, very few explore geometric properties revealing local surface manifolds embedded in 3D Euclidean space to discriminate semantic classes or object parts as additional supervision signals. This paper is the first attempt to propose a unique multi-task geometric learning network to improve semantic analysis by auxiliary geometric learning with local shape properties, which can be either generated via physical computation from point clouds themselves as self-supervision signals or provided as privileged information. Owing to explicitly encoding local shape manifolds in favor of semantic analysis, the proposed geometric self-supervised and privileged learning algorithms can achieve superior performance to their backbone baselines and other state-of-the-art methods, which are verified in the experiments on the popular benchmarks.

Geometric properties, point clouds, semantic analysis, self-supervised learning, privileged learning.

I Introduction

Point clouds collecting a set of order-less points to represent 3D geometry of objects have been verified as a powerful shape representation in a number of recent works [27, 14, 33, 34, 26, 41, 42, 38]. Semantic analysis on a point set aims to categorizing the points globally into semantic classes (e.g. plane, chairs, mugs) [33, 34, 41, 43, 24, 47] or locally into object parts [33, 41, 24] according to their topological configuration. Such a problem plays a vital role in many applications, especially those demanding visual perception and interaction between machines and surrounding environment such as augmented reality, robotics and automatic driving. Semantic patterns of point clouds can be discovered from global configuration of local geometric patterns, but it is challenging to discover and exploit such local geometries due to inherently missing point-wise connectivity in their neighborhood.

A number of recent works have been proposed to feature learning on point sets, via either designing locally-connected convolutional/pooling layers on irregular non-Euclidean points such as PointNet [33], PointCNN [26], Dynamic Graph CNN (DGCNN) [41], and GeoNet [14], or hierarchically aggregating features revealing geometric patterns across scales, e.g. PointNet++ [34], SO-Net [24]. These existing methods in a supervised learning manner utilize pre-defined annotations to implicitly learn a global topology and local geometries sensitive to semantic classes. Very few work pays an attention to explicitly constraining 3D neural classifiers with auxiliary regressing onto local geometric properties.

Fig. 1: A flow chart of the proposed geometric self-supervised learning (GeoSSL): Geometric properties generated by physical computation are considered as self-supervised signals to support supervised semantic shape analysis. Owing to additionally fitting geometric properties, the backbone methods (e.g. PointNet++ [34], DGCNN [41] in our experiments) can be improved for semantic analysis on point sets.

Local geometric properties such as point-wise normal vectors, curvatures, and tangent spaces etc. are the primitive properties of local point groupings that reveal local geometric manifolds. For example, for computing a normal of a point, the typical solution is to first fit a plane via a set of its nearest neighboring points and obtain the normal of the plane, which indicates point-wise geometric properties describing local connectivity across nearby points. Some works [13, 3] design a deep network to directly estimate these geometric properties from point clouds. However, local geometric properties can be freely obtained by physical computation with no price for additional efforts on manually annotation, especially for massively amounts of auxiliary data that are usually produced by computer aided design (CAD).

Point-wise geometric properties, in most of existing works [34, 26], are combined with their corresponding point coordinates together as a type of rich point-base feature representation, which are set as input and then fed into deep networks directly for semantic analysis. Alternatively, geometric properties can be served as auxiliary self-supervision signals, inspired by the recent success of self-supervised learning in visual recognition [7, 19, 12, 29, 30, 11, 40], which generate supervision signals from data itself to avoid expensive manual annotations and then learns a proxy loss for network optimization. Moreover, high-quality local properties preserving finer geometric details can be more accurate in view of more dense sampling of points, which can be provided as privileged supervision signals only available during training.

Existing geometric learning methods concern on discovering semantic patterns from global shape, which consists of local geometric patterns. It remains an open problem whether capturing local geometric patterns have any positive effects on semantic analysis of its global configuration. This paper is the first attempt to design a novel geometric learning method to explicitly fit local geometric properties in either a self-supervised or a privileged-supervised learning manner as an additional optimization goal to support semantic analysis on point sets. Fig. 1 shows the main difference between the proposed geometric learning and conventional supervised classifier. Specifically, our deep model shares the low-level feature encoding layers and has two branches for semantic analysis (e.g. 3D object classification, part/scene segmentation) and geometric properties estimation tasks respectively in a multi-task learning style.

The main contributions of this paper are as follows.

  • This work for the first time explores geometric properties of point-based surface, perceiving the underlying local connectivity, as auxiliary supervision signals to improve 3D semantic analysis.

  • A novel geometric self-supervised learning method is proposed to jointly encode feature discriminative for semantic analysis on point sets and also well fitting local geometric properties in a multi-task learning manner.

  • Beyond geometric properties via physical computation, high-quality geometric properties as privileged information can further boost performance on semantic analysis.

Experimental evaluation on three public benchmarks can demonstrate our motivation to exploit local geometric patterns to improve learning semantic patterns of point clouds, with consistently achieving superior performance to its backbone competitor DGCNN [41] and other state-of-the-art methods in 3D object classification and part/scene segmentation. Source codes and pre-trained models will be released1.

Ii Related Works

Semantic analysis of point clouds – As a pioneer, the PointNet [33] starts the trend of designing deep networks for operating on irregular point-based surface, with the permutation invariance of points encoded by point-wise manipulation in multi-layer perceptrons (MLPs) and a symmetric function for accumulating features. Its following work – Pointnet++ [34] hierarchically aggregates multi-scale features to inherently capturing different modes of semantic patterns. However, both PointNet and PointNet++ only implicitly model semantic concept aware geometric patterns in local regions via deep feature encoding, but miss considering neighborhood information of points to benefit semantic analysis. Recently, the SO-Net [24] explicitly regularizes spatial correlation across points via -NN search on 2D projection of 3D points during feature encoding, while GeoNet [14] implicitly incorporates local connectivity via an autoencoder and a geodesic matching into extra point-wise features for further fusion. An alternative solution for analyzing point clouds are recently-proposed geometric deep learning methods, such as spectral networks [4], which apply convolution operation on graphs representing irregular distributed structure of points. Its follow-uppers concern on either reducing computational cost by replacing Laplacian eigen decomposition with a polynomial [5, 18] and rational [23] spectral alternatives, or improving its generalization capabilities [8, 50, 31]. Recently, the dynamic graph CNN (DGCNN) is proposed by Wang et al. [41] to discover local geometric manifold of each point by an edge convolution operation on a dynamic -NN graph, which is iteratively updated by the nearest neighbours. Such a DGCNN model achieves the state-of-the-art performance on semantic analysis, which is thus adopted in our methods as the backbone CNN model. The key difference between our methods and the DGCNN baseline lies in incorporating an extra branch (as shown in Figure 2) to learn local geometric patterns with self-supervision or privileged supervision signals. Superior performance of our methods can be achieved and illustrated in Tables I and VI of Sec. IV.

Geometric analysis of point clouds – Geometric analysis on point clouds aims to obtaining point-wise geometric properties such as the normal and curvature. A typical solution for obtaining local geometric properties of a point is direct computation based on principle component analysis (PCA) [15] within a local region, e.g. a plane best fitting the point and its -nearest neighbours. Such a method is simple but sensitive to noises and generation strategies of local regions. A number of advanced geometric computation techniques [28, 16] are developed to improve robustness against the aforementioned challenges, but remain impractical due to their poor generalization. On the other hand, geometric shape analysis can be learning-based, i.e. learning a regression mapping from point sets to point-wise geometric properties. A recent deep learning based PCPNet [13] performs robustly against noises and shape variation under a wide variety of settings, with sufficiently large-scale training data. Our goal of this paper is to directly mine local geometric patterns to additionally support semantic analysis on point-based shape via an auxiliary supervised mapping onto geometric properties. In our proposed multi-task network, more robust estimation on geometric properties can be achieved than fixed backbone baseline (See Fig. 6 and 7), with also improving classification accuracy for semantic analysis (See Table I).

Fig. 2: The proposed networks are based on the DGCNN architecture, which aims to estimating local geometric properties and further augmenting semantic analysis of point clouds. In Geometric Self-supervised Learning (GeoSSL), the classification network (GeoSSL) takes points as input, and shares the first three Edge convolution layers and one MLP layer, which is then divided into two task branches. The top one is the branch to estimate geometric properties, which consists of three fully-connected layers followed with a mean square loss on local geometric properties, while the bottom branch of GeoSSL estimates classification scores with the cross-entropy loss. The segmentation network (GeoSSL) shares most of network architecture as GeoSSL, and the only difference lies in the bottom branch to output the segmentation score on each point. Note that, Geometric Privileged Learning (GeoPL) employs the same network, but feeding with high quality of geometric properties as supervision signals in the top branch.

Deep self-supervised learning – Deep learning has gained significant successes in visual recognition [20, 49, 44, 25] and semantic shape analysis [33, 34, 41, 43, 24, 47] but heavily hinges on large-scale labelled training samples. Data augmentation becomes a simple yet effective pre-processing step to alleviate the demand for sufficient data to fit network parameters, especially for the larger network capacity than size of training samples. For avoiding label acquisition for some supervision-starved tasks and using vast numbers of unlabelled data, self-supervised learning is considered as a powerful alternative to relax the impractical requirement about large-scale labelled data available, via generating supervision labels from data itself. In other words, the self-supervised learning paradigm is typically formulated into a pretext learning task, such as motion segmentation in videos [32], and relative positions [6], exemplars [9] in the image domain. In light of this, the target task can be solved through transferring knowledge from self-supervised learning on a proxy loss. Inspired by the concept in self-supervised learning, this paper for the first time develops a novel geometric self-supervised learning (GeoSSL) to exploit local geometric patterns discovered by self-supervised learning to improve semantic analysis of point clouds. With local geometric regularization on deep feature encoding for semantic analysis, experiment results of the proposed GeoSSL can beat its direct competitor – DGCNN (the backbone net) as well as other comparative methods (see Table I).

Learning with privileged information – Information only available during training is referred to privileged information, which has been exploited in classification [37, 22], regression [46] and ranking [35]. For image based semantic analysis, text [35], attributes [35], bounding boxes [35], head pose [46], and gender [46] have been exploited as privileged information to boost performance, but this paper is the first work, to the best of our knowledge, in geometric learning with high-quality properties from more densely sampled points as privileged information. Similar to the aforementioned GeoSSL method, our geometric privileged learning (GeoPL) employs the identical multi-task network structure, and the only difference between GeoSSL and GeoPL lies in the quality of geometric properties to discover local patterns of 3D geometry to support semantic classification and segmentation. Experimental verification in this paper demonstrates that our model with privileged geometric properties performs better than the state-of-the-art methods in Table I as well as its self-supervised variant.

Iii Methodology

Iii-a Supervised Semantic Learning

Existing deep algorithms on point clouds focus on analysing semantic patterns of 3D geometry, in view of only semantic labels available in 3D object classification [41] or part segmentation [26]. Given a pair of 3D observation in the representation of a point cloud and its semantic label , the typical network architecture of supervised semantic learning frameworks such as PointNet [33], PointNet++ [34], PointCNN [26] and DGCNN [41] consists of several feature encoding layers (e.g. convolutional layers, MLP layers or a hybrid of both). Take the DGCNN [41] (the backbone network of the proposed GeoSSL and GeoPL) as an example, which is shown in gray box of Figure 2. The DGCNN introduces edge convolution operation on a directed graph representation for local connectivity of points. In details, a directed -Nearest Neighbour (NN) graph models correlation across closest vertices, where and denotes its vertices and edges. A parametric mapping function on edges is adopted for capturing global and local shape patterns, where is the parameters to be optimized in each edge convolution layer. In this sense, the output of edge convolution on the -NN graph on each vertex is calculated by aggregating edge features, which is thus invariant to the total size of points in the set.

Shared parts of the DGCNN is made up of three MLP based edge convolution blocks and a fully-connected layer to encode each point into a 1024-dimensional feature, and task-specific layers for object classification and part segmentation respectively. On one hand, another multi-layer perception, the output dimension of hidden layers in each MLP based decoder fixed to {512, 256, }, where denotes the size of object classes, is added to the shared parts of the DGCNN for semantic object classification. On the other hand, the shared parts of the DGCNN is followed by a multi-layer perception with {256, 128, }, where denotes the size of object part classes in part segmentation. However, such a model cannot provide supervision signals to incorporate local geometric structural information, which encourages us to design a novel network for improving semantic analysis by learning primitive geometric properties of points in their local neighbourhood.

Iii-B Generation of Local Geometric Properties

Given a point set , point-wise geometric properties can be either measured or calculated directly. A typical solution of generating th point’s normal is first to find out its -nearest neighbors and then calculate the covariance matrix as

(1)

where denotes points in the cloud and . Eigenvectors and eigenvalues of can be obtained via spectral decomposition [2]. The eigenvector corresponding to the minimal eigenvalue defines the estimated surface normal of point , as defined in [36]. Similarly, the second-order geometric property – curvature can also be calculated based on eigen decomposition on covariance matrix [2]. Particularly, the ratio of the minimal eigenvalue and the sum of all the eigenvalues can be used to estimate the change of geometric curvature. In mathematics, for -th point , the change of curvature can be approximated as the following

(2)

where denotes the minimal eigenvalue of . Additionally, for -th point , the curvature can also be computed by the normal vectors of that point and its neighbors as

(3)

Although geometric properties can be directly computed from point clouds, they can also be estimated via supervised regression learning algorithms [13, 3].

Normal and curvature approximating local geometric patterns of the shape are vital in semantic analysis, which encourages a number of work [33, 34] to combine such point-wise geometric properties with their corresponding coordinates, which are then fed into a supervised learning model as feature input. However, very few works consider normal and curvature of points as auxiliary supervision signals to improve analyzing semantic patterns owing to feature encoding local manifold structure and superior robustness against noisy point sets, especially when the model is trained on clean data. Beyond point-wise normal and curvature by computational self-generation from point clouds, more accurate and high quality geometric properties can be provided as privileged information available only during training, e.g. via physical computation from more dense points.

Iii-C Multi-Task Geometric Learning

In view of lack of local connectivity across order-less points, our motivation is to design an auxiliary task (regression learning with geometric properties) to explicitly incorporate local neighborhood information underlying surface manifolds. To this end, we propose a multi-task geometric learning network to simultaneously learn semantic and geometric patterns for 3D object classification and part segmentation, whose pipelines are visualized in Fig. 2. Given input and output pairs for an ordinary supervised learning network, i.e. a point cloud and its semantic class labels , geometric properties can be generated by physical computation in Sec. III-B as extra self-supervision signals or provided as privileged information extracted from high quality point clouds, i.e. Geometric Self-Supervised Learning (GeoSSL) and Geometric Privileged Learning (GeoPL) respectively. It is noted that, regardless of qualities of auxiliary labels, the proposed networks have an identical network structure for classification or segmentation. Training pairs for our multi-task geometric learning network are thus , where denotes point-wise geometric properties and is the size of training samples.

Based on the backbone DGCNN depicted in Sec. III-A, the proposed geometric learning consists of the shared layers and the application-specific block (blue or green boxes in Fig. 2), which shares the first three Edge convolution blocks followed by one MLP layer and is divided into two task-specific branches. The top branch is an auxiliary task to regress point-wise local geometric properties, while the bottom one is the original tasks of semantic analysis (i.e. classification, part/scene segmentation). To jointly optimizing both branches, we introduce a combinational loss function as the following, which utilizes the mean square loss to control the quality of normal/curvature estimation in geometric learning branch and the cross-entropy loss for task-specific semantic analysis on point sets as:

(4)

where and denote output of two branches in the proposed model, and are weighting parameters of the proposed geometric learning model. denotes shared weights in the lower shared layers, and are weights for the classification/segmentation and the geometry regression branch, respectively. is a trade-off parameter between two loss terms.

The key merit of the aforementioned cost function lies in that it brings additional object function to discover geometric patterns missing by existing supervised point cloud classifiers trained by semantic labels only. During training, we adopt the mean square loss for and the cross-entropy loss for . It is noted that the regression loss is not limited to the mean square, and we select it owing to its solid performance on estimation of geometric properties. Specifically, we have explored the Euclidean distance and Cosine similarity for the oriented normal vector, the unoriented normal Euclidean distance and RMS angle difference between the estimated normal and ground truth normal in our experiments. Without the loss of generality, we also employ the mean square loss for supervising geometric curvature. As a result, with both normal and curvature, the loss function can be written as

(5)

where and denote the ground truth normal (self-generated in GeoSSL or privileged provided in GeoPL) and the predicted normal, and and denote the ground truth curvature and the predicted curvature.

Fig. 3: Visualization of predicted normal with GeoSSL.
Fig. 4: Visualization of predicted curvatures with GeoSSL with a color bar on the right hand side. The darker, the smaller value of curvature.

Iv Experiments

We evaluate the proposed geometric learning algorithms (i.e. GeoSSL and GeoPL) introduced in Sec. III on three popular semantic analysis tasks, i.e. 3D object classification, part segmentation and scene segmentation.

Datasets and Settings – Evaluation on 3D object classification was conducted on the commonly used ModelNet40 benchmark [45], which contains 12,311 CAD models belonging to 40 pre-defined categories. In our experiments, we split the dataset into two parts, i.e. 9,843 for training and 2,468 for testing. We followed the same experimental settings as in [33, 41]. Specifically, 1024 points are sampled from mesh faces by farthest point sampling, and are normalized into a unit sphere. We evaluated our model architectures for part segmentation on the ShapeNet part dataset [48], containing 16,880 3D shapes from 16 object categories, annotated with 50 parts in total. We followed the data split as [26], i.e. 14006 for training and 2874 for testing. Part category labels are assigned to each point in the point cloud, which consists of 2048 points uniformly sampled from mesh surfaces of training samples. It is worth mentioning here that we assume that each object contains less than six parts. S3DIS [1] dataset is adopted on evaluation of our method for scene segmentation. Unlike the samples in the ModelNet40 and ShapeNet, which are made by 3D modeling tools, the S3DIS samples are collected from real scans of indoor environments. In details, this dataset contains 3D scans from Matterport scanners in 6 areas within 271 rooms. Each point in the scan is annotated with one semantic label from 13 categories.

Performance Metrics – For the classification task, we use mean accuracy as our evaluation metric widely adopted in recent work [33, 34, 41]. In the part segmentation task, Intersection-over-Union (IoU) is used to evaluate our model and other comparative methods, following the same evaluation protocol as the DGCNN [41]: the IoU of a shape is obtained by averaging the IoUs of different parts involving in that shape, while the mean IoU (mIoU) is calculated by averaging the IoUs of all the testing samples. In the scene segmentation task, mean Intersection-over-Union (mIoU) and overall accuracy(OA) are utilized for evaluating our method.

Implementation Details – To efficiently use the geometric cues, we pre-train independently the shared layers and geometric leaning branch (the top branches in Fig. 2) with generated geometry properties on the ShapeNetCore dataset, which is similar to the DGCNN architecture for part segmentation with the only change lying in the last layer to output 4 continuous values. Model parameters learned by such a network are then used to initialize the shared layers both in GeoSSL and GeoSSL. The learning rates of the GeoSSL and GeoSSL are set as 0.01 and 0.001 respectively, and are decreased with an exponential function by every 20 epochs. The overall training epochs in our experiments are 200.

Iv-a Comparison with State-of-the-Art

3D object classification – Comparative evaluation in 3D object classification on the ModelNet40 are shown in Table I. We can see that our GeoSSL achieves superior performance to its direct competitor DGCNN [41] as well as other state-of-the-art methods. In light of the identical input and output as well as the backbone CNN model, performance gain can only be explained by auxiliary incorporation of local geometric properties into the DGCNN. We also evaluate our geometric privileged learning (GeoPL) for classification on the ModelNet40 with privileged geometric properties only available during training, whose normal and curvature are generated from more dense point-based surface and thus more accurate than those directly computed from sparse points. For example, we can generate privileged normal and curvature from a dense point cloud consisting of 10000 points used in our experiment compared to ordinary one with 1024 points. Experiment results in Table I show significantly better performance than other comparative algorithms given accurate geometric properties, which further verifies the effectiveness of our concept on improve semantic analysis via exploiting local geometric priors.

Methods Mean Overall
Class Accuracy Accuracy
VoxNet [?] 83.0 85.9
PointNet [33] 86.0 89.2
PointNet++ [34] - 90.7
SO-NET [24] 87.3 90.9
PointCNN [26] - 92.2
DGCNN [41] 88.2 91.2
DGCNN+ [41] 90.2 92.2
GeoSSL (ours) 90.3 92.9
GeoPL(ours) 90.8 93.5
TABLE I: Comparisons of classification accuracy on the ModelNet40. Note that, the DGCNN and DGCNN+ here denote the DGCNN in [41] without and with the spatial transformer respectively. Our GeoSSL and GeoPL adopt the former as their backbone.

3D Part segmentation – The part segmentation network is evaluated on the ShapeNet Part benchmark, whose results on Intersection-over-Union (IoU) are illustrated in Table II. Evidently, regardless of the network structure, e.g. PointNet++ [34], PointCNN [26] or DGCNN [41], the proposed GeoSSL can consistently perform better than the backbone competitors. Specifically, PointNet++ [34] achieves the better performance compared to our GeoSSL but demands high quality point-wise geometric properties as input, which can be impractical for accurate point-wise normals available in real world. It is noted that we re-implement PointNet++, PointCNN and DGCNN by following the settings in original works, whose results are reported2 and noted as the backbone in each block of Table II. In view of the identical network structure to capture semantic properties in the GeoSSL and its backbone baselines, performance gain can only be explained by exploiting local fine-detailed geometries of objects, which can demonstrate our motivation again. More results about part segmentation results are illustrated in Fig. 5, from which we can see that the segmentation results of our method are very close to the ground truth.

Methods Mean IoU
PointNet [33] 83.7
SO-NET [24] 84.9
PointNet++ [34] 85.1
PointNet++ (backbone) 84.3
GeoSSL (ours) 84.8
PointCNN [26] 86.1
PointCNN (backbone) 85.3
GeoSSL (ours) 85.6
DGCNN+ [41] 85.1
DGCNN (backbone) 84.5
GeoSSL (ours) 85.7
TABLE II: Comparisons of part segmentation results on the ShapeNet part dataset with Mean IoU (%).
Fig. 5: Visualization of part segmentation results with GeoSSL, where GT denotes ground truth label, and P means predicted result.

Indoor Scene Segmentation – We also apply our GeoSSL to the semantic scene segmentation task, which replaces object part labels in part segmentation by semantic object classes in the scene. We conduct experiments on the S3DIS[1], which is collected from real scans of indoor environments. For a fair comparison, we follow the same setting as the DGCNN, where each room is sliced into 1 1 square-meters block, and 4096 points are sampled for each block. Based on the sampled points, we then calculate point-wise geometric properties (i.e. normal, curvature) using the method in Sec. III-B. Finally, we use the 6-fold cross validation over the 6 areas, and report the mean of evaluation results. We compare the proposed method with the state-of-the-art methods on the S3DIS, whose results are shown in Table III. We can conclude that our method consistently achieves superior segmentation performance to its direct competitor DGCNN [41], yet outperforms most of state-of-the-art methods except for PointCNN [26] and SPGraph [21]. Note that, the concept of our method is generic, which can be applied to other specific backbone CNN models, which achieves state-of-the-art scene segmentation performance, such as PointCNN [26] and SPGraph [21].

Methods Mean Overall
IoU Accuracy
PointNet(baseline) [33] 20.1 53.2
PointNet [33] 47.6 78.5
PointCNN [26] 65.4 88.1
G+RCU [10] 49.7 81.1
SGPN [39] 50.4 -
RSNet [17] 56.5 -
SPGraph [21] 62.1 85.5
DGCNN+ [41] 56.1 84.1
DGCNN 54.5 83.6
GeoSSL 59.1 86.3
TABLE III: Segmentation comparisons on S3DIS in mean IoU (mIoU, %) and overall accuracy (OA, %).

Iv-B More Results and Discussions

Methods DGCNN[41] GeoSSL
91.2 92.2
+ 91.7 92.5
+ 91.4 92.3
+ + 91.9 92.9
TABLE IV: Points () with vs. without geometric properties including normal () and curvature () on mean classification accuracy (%).

Ablation studies on geometric properties – Evaluation on combination of different geometric properties is shown in Table IV. In DGCNN [41], geometric properties are concatenated as additional feature input, while our GeoSSL exploits them as self-supervision signals of an auxiliary task. We observe that all methods with geometric properties either as input feature or as self-supervision signals can boost classification performance, which demonstrates our motivation to employ local geometric properties can reveal rich local geometries of 3D semantic classes. Moreover, geometric properties as self-supervision signals (in the right column) can consistently perform better than that as feature (in the middle column). The main reason is that our GeoSSL takes the form of multi-task learning, where self-supervision serves an auxiliary task to regularize learning of the main, supervised task. This is different from some alternatives, e.g. pre-training based self-supervision methods, where features are learned via self-supervision alone, and are subsequently used for supervised tasks. Given large capacities of deep networks, GeoSSL regularizes feature learning (via self-supervised prediction learning of local geometric properties), reduces their potentials of over-fitting, and thus improves generalization of the learned features for the supervised tasks. Moreover, the combination with normal and curvature can be preferred as self-supervision signals in view of exploiting both first and second order geometric smoothness in point sets.

Fig. 6: Comparisons with learned geometric properties by the proposed GeoSSL (ours) and DGCNN.
Methods Cosine Similarity
DGCNN [41] 0.99
DGCNN 0.97
GeoSSL 0.99
TABLE V: Comparison of the Cosine Similarity for normal estimation with involved methods

Effects of learning geometric patterns in typical supervised semantic learning – We are interested in whether the learned feature in supervised semantic learning on point clouds can be used to estimate geometric properties. As a result, we conduct an experiment for normal estimation to compare the following models: the first setting is to train the DGCNN for normal estimation from scratch, denoted as DGCNN in Table V; the second setting is another DGCNN, whose network parameters of lower layers are shared by the DGCNN pre-trained on the ModelNet40 for classification, which are then fixed during training with tuning the other parameters in higher layers (we denote it as Fixed DGCNN (DGCNN). The results are illustrated in Table V for a comparative purpose on the Cosine Similarity metric, which reveals an angle difference between the predict normal and the ground truth normal, i.e. the larger its value, the better. We also illustrate qualitative difference between our method and DGCNN in Fig. 6, which shows that the proposed method can predict more accurate normal than its competitor. Quantitative comparisons with normal estimation errors can be found in Fig. 7. Both Table V, Fig. 6 and 7 show that the DGCNN gain the worse performance in comparison with the DGCNN and GeoSSL. It implies that existing point cloud analysis methods with only semantic supervision labels pay less attention on whether the networks can learn local geometric patterns. Our method with geometric self-supervised learning can benefit each other task simultaneously, which captures local geometric patterns to further augment semantic recognition tasks.

Fig. 7: Quantitative comparisons of normal estimation for GeoSSL and DGCNN, the colors of points correspond to angular difference (estimation error) between predicted normal and ground truth normal, which are mapped to a heat-map ranging from 0-30 degrees. The small its value, the better.
Methods Overall Accuracy
Pointnet++ [34] 90.7
PointCNN [26] 92.2
DGCNN+ [41] 92.2
GeoSSL 91.7
GeoSSL 92.8
GeoSSL 92.9
TABLE VI: Classification performance (%) of GeoSSL with other baseline CNN models on the ModelNet40.

Evaluation across CNN backbone models – Evaluative results on different CNN baselines (i.e. PointNet++ [34], DGCNN [41], and PointCNN [26]) are illustrated in Table VI. We can evidently find out that, our proposed methods can consistently outperform their baseline models. It further confirms that the nature of auxiliary geometric learning on improving semantic point cloud recognition.

Methods Classification Segmentation
Accuracy(%) IoU(%)
DGCNN+ [41] 98.8 85.1
GeoSSL 98.9 84.4
GeoSSL 99.4 -
GeoSSL - 85.7
TABLE VII: Comparisons of various of multi-task on the ShapeNet Part.

Evaluation on multi-task learning architecture – To this end, we additionally conducted experiments on the ShapeNet Part dataset. The network architecture used here is the same as in Fig. 2, the only difference lies in the task setting. Comparison results are shown in Table VII, where we evaluate different options of combining two tasks in a multi-task learning framework. As can be seen from Table VII, when simply combining classification and segmentation tasks in a multi-task manner, denoted as MTNet. The classification performance (98.9%) of the MTNet is only slightly better than its baseline DGCNN (98.8%), but even worse than its baseline DGCNN on segmentation performance (84.4%). Different from that, our models with an auxiliary fitting on geometric properties achieve superior results to the DGCNN and MTNet both on classification (GeoSSL) and segmentation (GeoSSL) tasks, which further demonstrates performance gain of our method can be credited to additional regression learning branch.

Evaluation on estimation of geometric properties – Fig. 3 and 4 visualize the predicted normals and curvatures with the proposed GeoSSL respectively. From which we can see that estimation performance of our method are very close to the ground truth. Furthermore, when neural networks are trained on clean point sets, they could predict more accurate normals than those obtained by geometric computation, especially for noisy testing set. This could be attributed to their capability to learn statistical regularities from training data. For verification, we train a DGCNN based normal estimation network using clean training sets of points from the ModelNet40; for testing, we add Gaussian permutations to instances of point sets, where the noise level for each point is (clean point sets are normalized in a unit sphere). Geometric computation produces an averaged error of against GT normals (measured in the Cosine distance, ranging in ), and our trained neural model gives a lower one of , which verifies our claim the learning based method with clean data can predict more accurate geometric properties.

Setting ( ) 1 e-1 e-2 e-3 e-4 e-5
Accuracy (%) 87.1 89.3 92.9 92.3 91.9 91.7
TABLE VIII: Effect of different proportion of two loss in GeoSSL. The smaller is, the less effect of local geometric learning affects.

Evaluation on ratio between losses – In our classification settings, is an important parameter to determine the proportion of two loss function (i.e. the regression loss for fitting local geometries and the classification/segmentation loss). We hold out 20% of training data as the validation set. We observe that the trade-off parameter varies across different network architectures and different tasks, but when is set as between [, ], our model can steadily perform well. As a result, we select either 0.01 or 0.001 for in our experiments. Specifically, Table VIII illustrates the trend of classification accuracy with varying on the ModelNet40 with GeoSSL. When = e-2, it can reach the best classification performance.

Experiment Setting Accuracy
Random initialization 92.5
Pre-trained on the ShapeCore 92.9
TABLE IX: Evaluation on transferring knowledge for 3D classification using the GeoSSL.

Effects of pre-training with auxiliary data – An experiment to evaluate the effects of auxiliary data on pre-training is conducted by pre-training the proposed models on the ShapeNetCore dataset. Results in Table IX show that moderate improvement on the pre-trained models can be achieved over the identical network with random initialization, which encourages us to adopt pre-training for boosting performance.

V Conclusion

This paper, for the first time, systematically introduces self-supervised learning into 3D point cloud semantic analysis, which is a generic method to readily replace its backbone with any other deep geometric learning. Rather than employing geometric properties as additional feature input, our network utilizes them as auxiliary supervision signals, which can consistently improve performance on semantic analysis. Given accurate privileged local shape information, our method can further be boosted to 93.5% mean classification accuracy on the ModelNet40.

Footnotes

  1. https://github.com/Necole123/GeoSSL
  2. Our Implementation is slight worse than the reported results in the original works.

References

  1. I. Armeni, O. Sener, A. R. Zamir, H. Jiang, I. Brilakis, M. Fischer and S. Savarese (2016) 3d semantic parsing of large-scale indoor spaces. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1534–1543. Cited by: §IV-A, §IV.
  2. K. Bae and D. D. Lichti (2008) A method for automated registration of unorganised point clouds. ISPRS-J. Photogramm. Remote Sens. 63 (1), pp. 36–54. Cited by: §III-B.
  3. Y. Ben-Shabat, M. Lindenbaum and A. Fischer (2018) Nesti-net: normal estimation for unstructured 3d point clouds using convolutional neural networks. arXiv preprint arXiv:1812.00709. Cited by: §I, §III-B.
  4. J. Bruna, W. Zaremba, A. Szlam and Y. LeCun (2013) Spectral networks and locally connected networks on graphs. arXiv preprint arXiv:1312.6203. Cited by: §II.
  5. M. Defferrard, X. Bresson and P. Vandergheynst (2016) Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in neural information processing systems, pp. 3844–3852. Cited by: §II.
  6. C. Doersch, A. Gupta and A. A. Efros (2015) Unsupervised visual representation learning by context prediction. In Proceedings of the IEEE International Conference on Computer Vision, pp. 1422–1430. Cited by: §II.
  7. C. Doersch and A. Zisserman (2017) Multi-task self-supervised visual learning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2051–2060. Cited by: §I.
  8. M. Dominguez, R. Dhamdhere, A. Petkar, S. Jain, S. Sah and R. Ptucha (2018) General-purpose deep point cloud feature extractor. In 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1972–1981. Cited by: §II.
  9. A. Dosovitskiy, P. Fischer, J. T. Springenberg, M. Riedmiller and T. Brox (2016-Sep.) Discriminative unsupervised feature learning with exemplar convolutional neural networks. IEEE Trans. Pattern Anal. Mach. Intell. 38 (9), pp. 1734–1747. External Links: ISSN 0162-8828 Cited by: §II.
  10. F. Engelmann, T. Kontogianni, A. Hermans and B. Leibe (2017) Exploring spatial context for 3d semantic segmentation of point clouds. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 716–724. Cited by: TABLE III.
  11. B. Fernando, H. Bilen, E. Gavves and S. Gould (2017) Self-supervised video representation learning with odd-one-out networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3636–3645. Cited by: §I.
  12. C. Gan, B. Gong, K. Liu, H. Su and L. J. Guibas (2018) Geometry guided convolutional neural networks for self-supervised video representation learning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5589–5597. Cited by: §I.
  13. P. Guerrero, Y. Kleiman, M. Ovsjanikov and N. J. Mitra (2018) PCPNet learning local shape properties from raw point clouds. In Comput. Graph. Forum, Vol. 37, pp. 75–85. Cited by: §I, §II, §III-B.
  14. T. He, H. Huang, L. Yi, Y. Zhou, C. Wu, J. Wang and S. Soatto (2019) GeoNet: deep geodesic networks for point cloud analysis. arXiv preprint arXiv:1901.00680. Cited by: §I, §I, §II.
  15. H. Hoppe, T. DeRose, T. Duchamp, J. McDonald and W. Stuetzle (1992) Surface reconstruction from unorganized points. Vol. 26, ACM. Cited by: §II.
  16. H. Huang, S. Wu, M. Gong, D. Cohen-Or, U. Ascher and H. R. Zhang (2013) Edge-aware point set resampling. ACM Trans. Graph. 32 (1), pp. 9. Cited by: §II.
  17. Q. Huang, W. Wang and U. Neumann (2018) Recurrent slice networks for 3d segmentation of point clouds. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2626–2635. Cited by: TABLE III.
  18. T. N. Kipf and M. Welling (2016) Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907. Cited by: §II.
  19. A. Kolesnikov, X. Zhai and L. Beyer (2019) Revisiting self-supervised visual representation learning. arXiv preprint arXiv:1901.09005. Cited by: §I.
  20. A. Krizhevsky, I. Sutskever and G. E. Hinton (2012) Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pp. 1097–1105. Cited by: §II.
  21. L. Landrieu and M. Simonovsky (2017) Large-scale point cloud semantic segmentation with superpoint graphs. arXiv preprint arXiv:1711.09869. Cited by: §IV-A, TABLE III.
  22. M. Lapin, M. Hein and B. Schiele (2014) Learning using privileged information: SVM+ and weighted SVM. Neural Netw. 53, pp. 95–108. Cited by: §II.
  23. R. Levie, F. Monti, X. Bresson and M. M. Bronstein (2017) Cayleynets: graph convolutional neural networks with complex rational spectral filters. arXiv preprint arXiv:1705.07664. Cited by: §II.
  24. J. Li, B. M. Chen and G. Hee Lee (2018) So-net: self-organizing network for point cloud analysis. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 9397–9406. Cited by: §I, §I, §II, §II, TABLE I, TABLE II.
  25. S. Li, K. Jia, Y. Wen, T. Liu and D. Tao (2019) Orthogonal deep neural networks. IEEE Trans. Pattern Anal. Mach. Intell. (), pp. 1–1. External Links: Document, ISSN 1939-3539 Cited by: §II.
  26. Y. Li, R. Bu, M. Sun, W. Wu, X. Di and B. Chen (2018) PointCNN: convolution on x-transformed points. In Adv Neural Inf Process Syst, pp. 828–838. Cited by: §I, §I, §I, §III-A, §IV-A, §IV-A, §IV-B, TABLE I, TABLE II, TABLE III, TABLE VI, §IV.
  27. M. Liu (2015) Robotic online path planning on point cloud. IEEE T. Cybern. 46 (5), pp. 1217–1228. Cited by: §I.
  28. Q. Mérigot, M. Ovsjanikov and L. J. Guibas (2010) Voronoi-based curvature and feature estimation from point clouds. IEEE Trans. Vis. Comput. Graph. 17 (6), pp. 743–756. Cited by: §II.
  29. M. Noroozi, A. Vinjimoor, P. Favaro and H. Pirsiavash (2018) Boosting self-supervised learning via knowledge transfer. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 9359–9367. Cited by: §I.
  30. D. Novotny, S. Albanie, D. Larlus and A. Vedaldi (2018) Self-supervised learning of geometrically stable features through probabilistic introspection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3637–3645. Cited by: §I.
  31. S. Pan, R. Hu, S. Fung, G. Long, J. Jiang and C. Zhang (2019) Learning graph embedding with adversarial training methods. IEEE T. Cybern. (), pp. 1–13. External Links: ISSN 2168-2275 Cited by: §II.
  32. D. Pathak, R. Girshick, P. Dollár, T. Darrell and B. Hariharan (2017) Learning features by watching objects move. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2701–2710. Cited by: §II.
  33. C. R. Qi, H. Su, K. Mo and L. J. Guibas (2017) Pointnet: deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 652–660. Cited by: §I, §I, §II, §II, §III-A, §III-B, TABLE I, TABLE II, TABLE III, §IV, §IV.
  34. C. R. Qi, L. Yi, H. Su and L. J. Guibas (2017) Pointnet++: deep hierarchical feature learning on point sets in a metric space. In Advances in neural information processing systems, pp. 5099–5108. Cited by: Fig. 1, §I, §I, §I, §II, §II, §III-A, §III-B, §IV-A, §IV-B, TABLE I, TABLE II, TABLE VI, §IV.
  35. V. Sharmanska, N. Quadrianto and C. H. Lampert (2013) Learning to rank using privileged information. In Proceedings of the IEEE International Conference on Computer Vision, pp. 825–832. Cited by: §II.
  36. M. Tatarchenko, J. Park, V. Koltun and Q. Zhou (2018) Tangent convolutions for dense prediction in 3d. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3887–3896. Cited by: §III-B.
  37. V. Vapnik and A. Vashist (2009) A new learning paradigm: learning using privileged information. Neural Netw. 22 (5), pp. 544–557. Cited by: §II.
  38. K. Wang, K. Chen and K. Jia (2019) Deep cascade generation on point sets. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI), pp. 3726–3732. Cited by: §I.
  39. W. Wang, R. Yu, Q. Huang and U. Neumann (2018) Sgpn: similarity group proposal network for 3d point cloud instance segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2569–2578. Cited by: TABLE III.
  40. X. Wang, K. He and A. Gupta (2017) Transitive invariance for self-supervised visual representation learning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1329–1338. Cited by: §I.
  41. Y. Wang, Y. Sun, Z. Liu, S. E. Sarma, M. M. Bronstein and J. M. Solomon (2019) Dynamic graph cnn for learning on point clouds. ACM Trans. Graph. 38 (5), pp. 1–12. Cited by: Fig. 1, §I, §I, §I, §II, §II, §III-A, §IV-A, §IV-A, §IV-A, §IV-B, §IV-B, TABLE I, TABLE II, TABLE III, TABLE IV, TABLE V, TABLE VI, TABLE VII, §IV, §IV.
  42. Y. Wen, J. Lin, K. Chen and K. Jia (2019) Geometry-aware generation of adversarial and cooperative point clouds. arXiv preprint arXiv:1912.11171. Cited by: §I.
  43. J. Wu, C. Zhang, T. Xue, B. Freeman and J. Tenenbaum (2016) Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling. In Advances in neural information processing systems, pp. 82–90. Cited by: §I, §II.
  44. L. Wu, Y. Wang, X. Li and J. Gao (2018) Deep attention-based spatially recursive networks for fine-grained visual recognition. IEEE T. Cybern. 49 (5), pp. 1791–1802. Cited by: §II.
  45. Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang and J. Xiao (2015) 3d shapenets: a deep representation for volumetric shapes. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1912–1920. Cited by: §IV.
  46. H. Yang and I. Patras (2013) Privileged information-based conditional regression forest for facial feature detection. In 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), pp. 1–6. Cited by: §II.
  47. Y. Yang, C. Feng, Y. Shen and D. Tian (2018) Foldingnet: point cloud auto-encoder via deep grid deformation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 206–215. Cited by: §I, §II.
  48. L. Yi, V. G. Kim, D. Ceylan, I. Shen, M. Yan, H. Su, C. Lu, Q. Huang, A. Sheffer and L. Guibas (2016) A scalable active framework for region annotation in 3d shape collections. ACM Trans. Graph. 35 (6), pp. 1–12. Cited by: §IV.
  49. Y. Zhang, K. Jia and Z. Wang (2019) Part-aware fine-grained object categorization using weakly supervised part detection network. IEEE Trans. Multimedia. Cited by: §II.
  50. X. Zhou, F. Shen, L. Liu, W. Liu, L. Nie, Y. Yang and H. T. Shen (2018) Graph convolutional network hashing. IEEE T. Cybern.. Cited by: §II.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
404697
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description