Curriculum Learning for Deep Generative Models with Clustering

Curriculum Learning for Deep Generative Models with Clustering

Deli Zhao1,  Jiapeng Zhu 1,2,  Zhenfang Guo 1,3,  Bo Zhang1
1Xiaomi AI Lab
2Chinese University of Hong Kong
3Peking University
{zhaodeli,jengzhu0}@gmail.com, guozhenfang@pku.edu.cn, zhangbo@xiaomi.com
Abstract

Training generative models like Generative Adversarial Network (GAN) is challenging for noisy data. A novel curriculum learning algorithm pertaining to clustering is proposed to address this issue in this paper. The curriculum construction is based on the centrality of underlying clusters in data points. The data points of high centrality takes priority of being fed into generative models during training. To make our algorithm scalable to large-scale data, the active set is devised, in the sense that every round of training proceeds only on an active subset containing a small fraction of already trained data and the incremental data of lower centrality. Moreover, the geometric analysis is presented to interpret the necessity of cluster curriculum for generative models. The experiments on cat and human-face data validate that our algorithm is able to learn the optimal generative models (e.g. ProGAN) with respect to specified quality metrics for noisy data. An interesting finding is that the optimal cluster curriculum is closely related to the critical point of the geometric percolation process formulated in the paper.

1 Introduction

Deep generative models pique researchers’ interest in the past decade. The fruitful progress has been achieved on this topic, such as auto-encoder (Hinton and Salakhutdinov, 2006) and variational auto-encoder (VAE) (Kingma and Welling, 2013; Rezende et al., 2014), generative adversarial network (GAN) (Goodfellow et al., 2014; Radford et al., 2016; Arjovsky et al., 2017), normalizing flow (Rezende and Mohamed, 2015; Dinh et al., 2015, 2017; Kingma and Dhariwal, 2018), and auto-regressive models (van den Oord et al., 2016b, a, 2017). However, it is non-trial to train a deep generative model that can converge to a proper minimum of associated optimization. For example, GAN suffers non-stability, model collapse, and generative distortion during training. Many insightful algorithms have been proposed to circumvent those issues, including feature engineering (Salimans et al., 2016), various discrimination metrics (Mao et al., 2016; Arjovsky et al., 2017; Berthelot et al., 2017), distinctive gradient penalties (Gulrajani et al., 2017; Mescheder et al., 2018), spectral normalization to discriminator (Miyato et al., 2018), and orthogonal regularization to generator (Brock et al., 2019). What is particularly of interest is that the breakthrough for GAN has been made with a simple technique of progressively growing neural networks of generator and discriminator from low-resolution images to high-resolution counterparts (Karras et al., 2018a). This kind of progressive growing also helps push the state of the art to a new level by enabling StyleGAN to produce photo-realistic and detail-sharp results (Karras et al., 2018b), shedding new light on wide applications of GANs in solving real problems. This idea of progressive learning is actually a general manner of cognition process (Elman, 1993; Oudeyer et al., 2007), which has been formally named curriculum learning in machine learning (Bengio et al., 2009). The central topic of this paper is to explore a new curriculum for training deep generative models.

To facilitate robust training of deep generative models with noisy data, we propose curriculum learning with clustering. The key contributions are listed as follows:

  • We first summarize four representative curricula for generative models, i.e. architecture (generation capacity), semantics (data content), dimension (data space), and cluster (data distribution). Among these curricula, cluster curriculum is newly proposed in this paper.

  • Cluster curriculum is to treat data according to centrality of each data point, which is pictorially illustrated and explained in detail. To foster large-scale learning, we devise the active set algorithm that only needs an active data subset of small fixed size for training.

  • The geometric principle is formulated to analyze hardness of noisy data and advantage of cluster curriculum. The geometry pertains to counting a small sphere packed in an ellipsoid, on which is based the percolation theory we use.

The research on curriculum learning is diverse. Our work focuses on curricula that are closely related to data attributes, beyond which is not the scope we concern in this paper.

2 Curriculum learning

Curriculum learning has been a basic learning approach to promoting performance of algorithms in machine learning. We quote the original words from the seminal paper (Bengio et al., 2009) as its definition:

Curriculum learning.

“The basic idea is to start small, learn easier aspects of the task or easier sub-tasks, and then gradually increase the difficulty level” according to pre-defined or self-learned curricula.

From cognitive perspective, curriculum learning is common for human and animal learning when they interact with environments (Elman, 1993), which is the reason why it is natural as a learning rule for machine intelligence. The learning process of cognitive development is gradual and progressive (Oudeyer et al., 2007). In practice, the design of curricula is task-dependent and data-dependent. Here we summarize the representative curricula that are developed for generative models.

Architecture curriculum. The deep neural architecture itself can be viewed as a curriculum from the viewpoint of learning concepts (Hinton and Salakhutdinov, 2006; Bengio et al., 2006) or disentangling representations (Lee et al., 2011). For example, the different layers decompose distinctive features of objects for recognition (Lee et al., 2011; Zeiler and Fergus, 2014; Zhou et al., 2016) and generation (Bau et al., 2018). Besides, Progressive growing of neural architectures is successfully exploited in GANs (Karras et al., 2018a; Heljakka et al., 2018; Korkinof et al., 2018; Karras et al., 2018b).

Semantics curriculum. The most intuitive content for each datum is the semantic information that the datum conveys. The hardness of semantics determines the difficulty of learning knowledge from data. Therefore, the semantics can be a common curriculum. For instance, the environment for a game in deep reinforcement learning (Justesen et al., 2018) and the number sense of learning cognitive concepts with neural networks (Zou and McClelland, 2013) can be such curricula.

Dimension curriculum. The high dimension usually poses the difficulty of machine learning due to the curse of dimensionality (Donoho, 2000), in the sense that the amount of data points for learning grows exponentially with dimension of variables (Vershynin, 2018). Therefore, the algorithms are expected to be beneficial from growing dimensions. The effectiveness of dimension curriculum is evident from recent progress on deep generative models, such as ProGANs (Karras et al., 2018a, b) by gradually enlarging image resolution and language generation from short sequences to long sequences of more complexity (Rajeswar et al., 2017; Press et al., 2017).

3 Cluster curriculum

For fitting distributions, dense data points are generally easier to handle than sparse data or outliers. To train generative models robustly, therefore, it is plausible to raise cluster curriculum, meaning that generative algorithms first learn from data points close to cluster centers and then with more data progressively approaching cluster boundaries. Thus the stream of feeding data points to models for curriculum learning is the process of clustering data points according to cluster centrality that will be explained in section 3.2. The toy example in Figure 1 illustrates how to form cluster curriculum.

3.1 Why clusters matter

The importance of clusters for data points is actually obvious from geometric point of view. The data sparsity in high-dimensional spaces causes the difficulty of fitting the underlying distribution of data points (Vershynin, 2018). So generative algorithms may be beneficial when proceeding from the local spaces where data points are relatively dense. Such data points form clusters that are generally informative subsets with respect to entire dataset. In addition, clusters contain common regular patterns of data points, where generative models are easier to converge. What is most important is that noisy data points deteriorate performance of algorithms. For classification, the effectiveness of curriculum learning is theoretically proven to circumvent the negative influence of noisy data (Gong et al., 2016). We will analyze this aspect for generative models with geometric facts.

Figure 1: Cluster Curriculum. From magenta color to black color, the centrality of data points reduces. The value is the number of data points taken with centrality order.

3.2 Generative models with cluster curriculum

With cluster curriculum, we are allowed to gradually learn generative models from dense clusters to cluster boundaries and finally to all data points. In this way, generative algorithms are capable of avoiding the direct harm of noise or outliers. To this end, we first need a measure called centrality that is the terminology in graph-based clustering. It quantifies the compactness of a cluster in data points or a community in complex networks (Newman, 2010). A large centrality implies that the associated data point is close to one of cluster centers. For easy reference, we provide the algorithm of the centrality we use in appendix. For experiments in this paper, all the cluster curricula are constructed by the centrality of stationary probability distribution, i.e. the eigenvector corresponding to the largest eigenvalue of the transition probability matrix drawn from the data.

To be specific, let denote the centrality vector of data points. Namely, the -th entry of is the centrality of data point . Sorting in descending order and adjusting the order of original data points accordingly give data points arranged by cluster centrality. Let signify the set of centrality-sorted data points, where is the base set that guarantees a proper convergence of generative models, and the rest of is evenly divided into subsets according to centrality order. In general, the number of data points in is moderate compared to and determined according to . Such division of serves to efficiency of training, because we do not need to train models from a very small dataset. The cluster curriculum learning is carried out by incrementally feeding subsets in into generative algorithms. In other words, algorithms are successively trained on after , meaning that the curriculum for each round of training is accumulated with .

In order to determine the optimal curriculum , we need the aid of quality metric of generative models, such as Fréchet inception distance (FID) or sliced Wasserstein distance (SWD) (Borji, 2018). For generative models trained with each curriculum, we calculate the associated score via the specified quality metric. The optimal curriculum for effective training can be identified by the minimal value for all , where . The interesting phenomenon of this score curve will be illustrated in the experiment. The minimum of score is apparently metric-dependent. One can refer to (Borji, 2018) for the review of evaluation metrics. In practice, we can opt one of reliable metrics to use or multiple metrics for decision-making of the optimal model.

There are two ways of using incremental subset during training. One is that the parameters of models are re-randomized when new data are used, the procedure of which is given in Algorithm 1 in appendix. The other is that the parameters are fine-tuned based on pre-training of previous model, which will be presented with a fast learning algorithm in the following section.

3.3 Active set for scalable training

To obtain the precise minimum of , the cardinality of needs to be set much smaller than , meaning that will be large even for a dataset of moderate scale. The training of many loops will lead to time-consuming. Here we propose the active set to address the issue, in the sense that for each loop of cluster curriculum, generative models are always trained with a subset of small fixed size instead of whose size becomes incrementally large.

(a) (b)
Figure 2: Schematic illustration of active set for cluster curriculum. Here . The cardinality of the active set is . When is taken for training, we need to randomly sample another (i.e. ) data points from the history data to form . Then the complete active set is composed by . We can see that data points in become less dense after sampling.

To form the active set , the subset of data points are randomly sampled from to combine with for the next loop, where . For easy understanding, we illustrate the active set with toy example in Figure 2. In this scenario, progressive pre-training must be applied, meaning that the update of model parameters for the current training is based on parameters of previous loop. The procedure of cluster curriculum with active set is detailed in Algorithm 2 in appendix.

The active set allows us to train generative models with a small dataset that is actively adapted, thereby significantly reducing the training time for large-scale data.

4 Geometric view of cluster curriculum

Cluster curriculum bears the interesting relation to high-dimensional geometry, which can provide geometric understanding of our algorithm. Without loss of generality, we work on a cluster obeying normal distribution. The characteristic of the cluster can be extended into other clusters of the same distribution. For easy analysis, let us begin with a toy example. As Figure 3(a) shows, the confidence ellipse fitted from the subset of centrality-ranked data points is nearly conformal to of all data points, which allow us to put the relation of these two ellipses by virtue of the confidence-level equation. Let signify the center and covariance matrix of the cluster of interest, where . To make it formal, we can write the equation by

(a) (b)
Figure 3: Illustration of growing one cluster in cluster curriculum. (a) Data points taken with large centrality. (b) The annulus formed by of removing the inner ellipse from the outer one.
(1)

where can be the chi-squared distribution or Mahalanobis distance square, is the degree of freedom, and is the confidence level. For conciseness, we write as in the following context. Then the ellipses and correspond to and , respectively, where .

To analyze the hardness of training generative model, a fundamental aspect is to examine the number of given data points falling in a geometric entity  111For cluster curriculum, it is an annulus explained shortly. and the number of lattice points in it. The less is compared to , the harder the problem will be. However, the enumeration of lattice points is computationally prohibitive for high dimensions. Inspired by the information theory of encoding data of normal distributions (Roman, 1996), we count the number of small spheres of radius packed in the ellipsoid instead. Thus we can use this number to replace the role of as long as the radius of the sphere is set properly. With a little abuse of notation, we still use to denote the packing number in the following context. Theorem 1 gives the exact form of .

Theorem 1.

For a set of data points drawn from normal distribution , the ellipsoid of confidence is defined as , where has no zero eigenvalues and . Let be the number of spheres of radius packed in the ellipsoid . Then we can establish

(2)

We see that admits a tidy form with Mahalanobis distance , dimension , and sphere radius as variables. The proof is provided in appendix.

The geometric region of interest for cluster curriculum is the annulus formed by removing the ellipsoid 222The ellipse refers to the surface and the ellipsoid refers to the elliptic ball. from the ellipsoid , as Figure 3(b) displays. We investigate the varying law between and in the annulus when the inner ellipse grows with cluster curriculum. For this purpose, we need the following two corollaries that immediately follows from Theorem 1.

Corollary 1.

Let be the number of spheres of radius packed in the annulus that is formed by removing the ellipsoid from the ellipsoid , where . Then the following identity holds

(3)
Figure 4: Comparison between the number of data points sampled from isotropic normal distributions and of spheres (lattice) packed in the annulus with respect to the Chi quantile . is the dimension of data points. For each dimension, we sample 70,000 data points from . The scales of -axis and -axis are normalized by 10,000 and , respectively.
Corollary 2.

.

It is obvious that goes infinite when under the conditions that and is bounded. Besides, when (cluster) grows, reduces with exponent if the ellipsoid is fixed.

In light of Corollary 1, we can now demonstrate the functional law between and . First, we determine as follows

(4)

which means that is the ellipsoid of minimal Mahalanobis distance to the center that contains all the data points in the cluster. In addition, we need to estimate a suitable sphere radius , such that and have comparable scales in order to make and comparable in scale. To achieve this, we define an oracle ellipse where . For simplicity, we let be the oracle ellipse. Thus we can determine with Corollary 3.

Corollary 3.

If we let be the oracle ellipse such that , then the free parameter can be computed with .

To make the demonstration amenable to handle, data points we use for simulation are assumed to obey the isotropic normal distribution, meaning that data points are generated with nearly equal variance along each dimension. Figure 4 shows that gradually exhibits the critical phenomena of percolation processes333Percolation theory is a fundamental tool of studying the structure of complex systems in statistical physics and mathematics. The critical point is the percolation threshold where the transition takes place. One can refer to (Stauffer and Aharony, 1994) if interested. when the dimension goes large, implying that the data points in the annulus are significantly reduced when grows a little bigger near the critical point. In contrast, the number of lattice points is still large and varies negligibly until approaches the boundary. This discrepancy indicates clearly that fitting data points in the annulus is pretty hard and guaranteeing the precision is nearly impossible when crossing the critical point of even for a moderate dimension (e.g. ). Therefore, the plausibility of cluster curriculum can be drawn naturally from this geometric fact.

Figure 5: Examples of LSUN cat dataset and CelebA face dataset. The samples in the first row are of high centrality and the samples of low centrality in the second row are noisy data or outliers that we call in the context.

5 Experiment

The generative model that we use for experiments are Progressive growing of GAN (ProGAN) (Karras et al., 2018a). This algorithm is chosen because ProGAN is the state-of-the-arts algorithm of GANs with official open sources available. According to convention, we opt the Fréchet inception distance (FID) (Borji, 2018) for ProGAN as the quality metric.

5.1 Dataset and experimental setting

We randomly sample 200,000 cat images from the LSUN dataset (Yu et al., 2015). These cat images are captured in the wild. So their styles vary significantly. Figure  5 shows the cat examples of high and low centralities. We can see that noisy cat images differ much from the clean ones. There actually contain the images of very few informative cat features, which are outliers we refer to. The curriculum parameters are set as and , which means that the algorithms are trained with 20,000 images first and after the initial training, another 10,000 images according to centrality order are merged into the current training data for further re-training. For active set, its size is fixed to be .

The CelebA dataset is a large-scale face attribute dataset (Liu et al., 2015). We use the cropped and well-aligned faces with a bit of image backgrounds preserved for generation task. For cluster-curriculum learning, we randomly sample 70,000 faces as the training set. The face examples of different centralities are shown in Figure  5. The curriculum parameters are set as and . We bypass the experiment of active set on faces because it is used for large-scale data.

Each image in two databases is resized to be . To form cluster curricula, we exploit ResNet34 (He et al., 2016) pre-trained on ImageNet (Russakovsky et al., 2015) to extract 512-dimensional features for each face and cat images. The directed graphs are built with these feature vectors. We determine the parameter of edge weights by enforcing the geometric mean of weights to be 0.8. The number of nearest neighbors is set to be . The centrality is the stationary probability distribution. All codes are written with TensorFlow.

5.2 Experimental result

From Figure 6a, we can see that the FID curves are all nearly V-shaped, indicating that global minima exist amid the training process. This is clear evidence that noisy data and outliers deteriorate the quality of generative models during training. From the optimal curricula found by two algorithms (i.e. curricula at 110,000 and 100,000), we can see that the curriculum of the active set differs from that of normal training only by one-step data increment, implying that the active set is reliable for fast cluster-curriculum learning. The performance of the active set measured by FID is much worse than that of normal training, especially when more noisy data are fed into generative models. However, this does not change the whole V-shape of the accuracy curve. Namely, it is applicable as long as the active set admits the metric minimum corresponding to the appropriate curriculum.

The V-shape of the centrality-FID curve on the cat data is due to that the noisy data of low centrality contains little effective information to characterize the cats, as already displayed in Figure 5. However, it is different for the CelebA face dataset where the face images of low centrality also convey part of face features. As evident by Figure 6b, ProGAN keeps being optimized by the majority of data until the curriculum of size . To highlight the meaning of this nearly negligible minimum, we also conduct the exactly same experiment on the FFHQ face dataset containing face images of high-quality (Karras et al., 2018b). For FFHQ data, the noisy face data can be ignored. The gray curve of normal training in Figure 6b indicates that the FID of ProGAN is monotonically decreased for all curricula. This gentle difference of the FID curves at the ends between CelebA and FFHQ clearly demonstrate the difficulty of noisy data to generative algorithms.

(a) Cat (b) Face
Figure 6: FID curves of cluster-curriculum learning for ProGAN on cat dataset and CelebA face dataset. The centrality and the FID share the -axis due to that they have the same order of data points. The same colors of the -axis labels and the curves denote the figurative correspondence. The network parameters for “normal training” are randomly re-initialized for each re-training. The active set is based on progressive pre-training of fixed small dataset. The scale of the -axis is normalized by 10,000.

5.3 Geometric investigation

To understand cluster curriculum deeply, we employ the geometric method formulated in section 4 to analyze the cat and face data. The percolation processes are both conducted with 512-dimensional features from ResNet34. Figure 7 displays the curve of that is the variable of interest in this scenario. As expected, the critical point in the percolation process occurs for both cases, as shown by blue curves. An obvious fact is that the optimal curricula (red strips) both fall into the (feasible) domains of percolation processes after critical points, as indicated by gray color. This is a desirable property because data become rather sparse in the annuli when crossing the critical points. Then noisy data play the non-negligible role in tuning parameters of generative models. Therefore, a fast learning strategy can be derived from percolation process. The training may begin from the curriculum specified by the critical point, thus significantly accelerating cluster-curriculum learning.

(a) cat data (b) face data
Figure 7: Geometric phenomenon of cluster curriculum on LSUN cat and CelebA face datasets. The pink strips are intervals of optimal curricula derived by generative models. For example, the value of the pink interval in (a) is obtained by , where is one of the minima (i.e. 110,000) in Figure 6a. The others are derived in the same way. The subtraction transforms the data number in the cluster to be the one in the annulus. The critical points are determined by searching the maxima of the absolute discrete difference of the associated curves. The scales of -axes are normalized by 10,000.

Another intriguing phenomenon is that the more noisy the data, the closer the optimal interval (red strip) is to the critical point. We can see that the optimal interval of the cat data is much closer to the critical point than that of the face data. What surprises us here is that the optimal interval of cluster curricula associated with the cat data nearly coincides with the critical point of the percolation process in the annulus! This means that the optimal curriculum may be found at the intervals close to the critical point of of percolation for heavily noisy data, thus affording great convenience to learning an appropriate generative model for such datasets.

6 Analysis and conclusion

The cluster curriculum is proposed for robust training of generative models. The active set of cluster curriculum is devised to facilitate scalable learning. The geometric principle behind cluster curriculum is analyzed in detail as well. The experimental results on LSUN cat dataset and CelebA face dataset demonstrate that generative models trained with cluster curriculum is capable of learning the optimal parameters with respect to the specified quality metric such as Fréchet inception distance and sliced Wasserstein distance. Geometric analysis indicates that the optimal curricula obtained from generative models are closely related to critical points of the associated percolation processes established in this paper. This intriguing geometric phenomenon is worth being explored deeply in terms of the theoretical connection between generative models and high-dimensional geometry.

It is worth emphasizing that the meaning of model optimality refers to the global minimum of the centrality-FID curve. As we already noted, the optimality is metric-dependent. We are able to obtain the optimal model with cluster curriculum, which does not mean that the algorithm only serves to this purpose. We know that more informative data can help learn a more powerful model covering large data diversity. Here a trade-off arises, i.e. the robustness against noise and the capacity of fitting more data. The centrality-FID curve provides a visual tool to monitor the state of model training, thus aiding us in understanding the learning process and selecting suitable models according to noisy degree of given data. For instance, we can pick the trained model close to the optimal curriculum for heavily noisy data or the one near the end of the centrality-FID curve for datasets of little noise. In fact, this may be the most common way of using cluster curriculum.

In this paper, we do not investigate the cluster-curriculum learning for the multi-class case, e.g. the ImageNet dataset with BigGAN (Brock et al., 2019). The cluster-curriculum learning of multiple classes is more complex than that we have already analyzed on face and cat data. We leave this study for future work.

References

Appendix A Appendix

a.1 Centrality measure

The centrality or clustering coefficient pertaining to a cluster in data points or a community in a complex network is a well-studied traditional topic in machine learning and complex systems. Here we introduce the graph-theoretic centrality for the utilization of cluster curriculum. Firstly, we construct a directed graph (digraph) with nearest neighbors. The weighted adjacency matrix of the digraph can be formed in this way: if is one of the nearest neighbors of and otherwise, where is the distance between and and is a free parameter.

The density of data points can be quantified with stationary probability distribution of a Markov chain. For a digraph built from data, the transition probability matrix can be derived by row normalization, say, . Then the stationary probability can be obtained by solving an eigenvalue problem

(5)

where denotes the matrix transpose. It is straightforward to know that is the eigen-vector of corresponding to the largest eigenvalue (i.e. ). is also defined as a kind of PageRank in many scenarios.

For density-based cluster curriculum, the centrality coincides with the stationary probability . Figure 1 in the main context shows the plausibility of using the stationary probability distribution to quantify data density.

a.2 Theorem 1 and Proof

Theorem 1. For a set of data points drawn from normal distribution , the ellipsoid of confidence is defined as , where has no zero eigenvalues and . Let be the number of spheres of radius packed in the ellipsoid . Then we can establish

(6)
Proof.

As explained in the main context, the ellipse equation with respect to the confidence can be expressed by the following equation

(7)

Suppose that are the eigenvalues of . Then equation (7) can be written as

(8)

Further, eliminating on the right side gives

(9)

Then we derive the length of semi-axis with respect to , i.e.

(10)

For a -dimensional ellipsoid , the volume of is

(11)

where the leng of semi-axis of and is the Gamma function. Substituting (10) into the above equation, we obtain the final formula of volume

(12)
(13)

Using the volume formula in (11), it is straightforward to get the volume of packing sphere

(14)

By the definition of , we can write

(15)
(16)

We conclude the proof of the theorem. ∎

a.3 Procedures of Algorithm 1 and Algorithm 2

1:  
2:, dataset containing data points
3:GenerativeModel(), generative models
4:QualityScore(), metric for generative results
5:, number of subsets
6:  
7: Solve centralities
8: Centrality() section A.1
9: Cluster curriculum
10:Get sorted data according to descending order of
11:Divide to be
12: Train generative models
13:for   do
14:     Initialize model parameters randomly
15:      GenerativeModel(, ) e.g. GAN
16:     
17:end for
18:
19: Search the optimal set
20:for   do
21:     Generate data with model of parameter
22:      QualityScore( ) e.g. FID
23:end for
24:,
25:Return the optimal model parameter
Algorithm 1 Cluster Curriculum for Generative Models
1:  
2:, GenerativeModel(), QualityScore() as in Algorithm 1
3:, cardinality of active set
4:  
5: Solve centralities
6: Centrality() section A.1
7: Cluster curriculum
8:Get sorted data according to descending order of
9:Divide to be
10: Train generative models
11:Initialize model parameters randomly
12: GenerativeModel(, ) e.g. GAN
13:for   do
14:     Derive by randomly sampling
15:     
16:      Use pre-training model
17:      GenerativeModel(, )
18:     
19:end for
20:
21: Search the optimal set
22:for   do
23:     Generate data with model of parameter by sampling a prior e.g. Gaussian
24:      QualityScore( ) e.g. FID
25:end for
26:
27:,
28:Return the optimal model parameter
Algorithm 2 Cluster Curriculum with Active Set
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
391936
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description