Spectral Overlap and a Comparison of Parameter-Free, Dimensionality Reduction Quality Metrics

Spectral Overlap and a Comparison of Parameter-Free, Dimensionality Reduction Quality Metrics

Jonathan Johannemann
jonjoh@stanford.edu
   Robert Tibshirani
tibs@stanford.edu
Stanford University
Abstract

Nonlinear dimensionality reduction methods are a popular tool for data scientists and researchers to visualize complex, high dimensional data. However, while these methods continue to improve and grow in number, it is often difficult to evaluate the quality of a visualization due to a variety of factors such as lack of information about the intrinsic dimension of the data and additional tuning required for many evaluation metrics. In this paper, we seek to provide a systematic comparison of dimensionality reduction quality metrics using datasets where we know the ground truth manifold. We utilize each metric for hyperparameter optimization in popular dimensionality reduction methods used for visualization and provide quantitative metrics to objectively compare visualizations to their original manifold. In our results, we find a few methods that appear to consistently do well and propose the best performer as a benchmark for evaluating dimensionality reduction based visualizations.

1 Introduction

In a variety of modern applications, researchers find dimensionality reduction as a beneficial way of visualizing high dimensional data. Some examples of successful applications include HIV analysis [2] and analyzing gene data [6] while some researchers might use dimensionality reduction to showcase algorithmic output as opposed to the raw data [26], [18]. But, while dimensionality reduction for visualization is often regarded as a helpful way to explore data, a quantitative measure of the low dimensional output’s similarity to its high dimensional input is infrequently mentioned.

The task of dimensionality reduction for a data set can be specified as follows. The researcher begins with a set of observations with dimensions which can be represented as . Next, a target dimension is chosen which could be any such that to reduce the number of unfavorable properties that come with high dimensional spaces [10] or can be set to 2 or 3 dimensions as a means to visualize the high dimensional data. The dimensionality reduction algorithm is then applied to map to a low dimensional set while seeking to maintain as much of the original structure from in . This problem is difficult in real applications due to a variety of reasons such as lack of information about the effective dimension or the geometry of the manifold on which the data lives [23].

One of the challenges for researchers using tunable nonlinear dimensionality reduction algorithms is determining which performance metric in the literature is best suited for their application. The first difficulty in the current literature is the prevalence of quality metrics that require additional tuning from the user. Without a proper prior, this task can be difficult because different choices of metric parameter values result in different optimal choices for dimensionality reduction hyperparameters. Then, improper tuning of a quality metric can lead to misleading visualizations. The second is the lack of lower dimensional exploration of quality metric performance to ensure that the a given metric is useful for hyperparameter optimization. In the experiments section, we seek to compare these methods for more trivial cases in low dimensions to explore each method’s ability to act as a performance metric when we can visualize the ground truth for the low dimensional local structure.

As the primary target of dimensionality reduction evaluation literature, nonlinear dimensionality reduction methods continue to grow in number and have succeeded in a variety of applications. The following methods are just a few of the algorithms that we use in our experiments to assess the quality metrics. An early method is Sammon mapping [19] where squared differences in high and low dimensional pairwise distances are scaled by the Euclidean distance in the original space . Another very popular method, t-SNE [9], uses pairwise Euclidean distances to generate conditional probabilities in the high and low dimensional space and then seeks to minimize the KL divergence between a high dimensional Gaussian and low dimensional Student t-distribution with 1 degree of freedom. Also, [5] propose their method “local multidimensional scaling” which utilizes a combination of MDS and concepts from “force-direct” graph drawing. Our emphasis is especially on methods such as local MDS and t-SNE which both have hyperparameters that can dramatically change the visualization and mislead a user into believing that there is additional underlying structure or that the classes are very easily separable.

The contributions of the paper are as follows. We provide a joint comparison of recent “parameter-free” quality metrics and promising nonlinear dimensionality reduction algorithms for visualization. Based on our findings, we recommend [16] and the Spectral Overlap quality metric developed in this paper.

The remainder of the paper is outlined as follows. In Section 2, we introduce existing challenges and past approaches. In Section 3, we discuss the quality metric literature and some of the pitfalls. Finally, in Section 4, we run experiments to evaluate popular and promising nonlinear dimensionality reduction algorithms for visualization using quality metrics that do not require the researcher to specify parameters such as the number of relevant nearest neighbors.

2 Algorithms

In some of the more basic dimensionality reduction algorithms for visualization such as multidimensional scaling and PCA, there are no additional tunable parameters to influence the outcome of the output Y and the means by which one can interpret the data is straightforward. However, these linear techniques are limited and suffer from various pitfalls such as overly focusing on large pairwise distances and therefore compromise small pairwise distances which make up the local structure of the data [23]. In addition, some of the most successful applications of dimensionality reduction in areas such as computer vision, natural language processing, and audio signal processing have made tremendous strides through the use of highly nonlinear models. Both of these occurrences suggest that there is potentially more to gain by using nonlinear methods which focus on local structure in the input data.

But, since these methods are nonlinear and many do not develop an explicit mapping, additional concerns arise including the stability of visualizations and how well the newly generated output dimension represents the original high dimensional space. Interestingly, [4] tackle this by posing dimensionality reduction as an optimization problem and learn mapping functions which could potentially act as one way to visualize the stability of the dimensionality reduction process.

Another area of difficulty for these nonlinear dimensionality reduction methods is that they are often favored for high dimensional tasks where there is no known equivalent to a “ground truth”. In regression tasks, we have a “ground truth” response and in classification tasks, we have . Since the field of dimensionality reduction does not have a universal and similarly objective method for comparison, there is a reliance on the researcher to provide the remaining assumptions to correctly map some high dimensional X to low dimensional Y. We can see that this is the case in [9] and [17] where much of the analysis is qualitative and the evaluation is primarily based on an algorithm’s ability to separate like-classes into separate clusters or groups. However, no numerical measurement is used to bolster the qualitative claims and provide formal comparison with respect to the original input dataset. This can be a bit concerning if it turns out that a visualization was spurious or has little to no relation to the original high dimensional data. For cases where the data may be used downstream in tasks like regression or classification, it might be reasonable to have some measure of similarity to the high dimensional space in order to better gauge if observed group separability or low dimensional structure is robust to new data.

A demonstration conducted by [25] expresses the same concern by displaying how changing just one of t-SNE’s tuning parameters can result in a variety of possible misleading visualizations. The authors discuss how, when one is using t-SNE, cluster sizes, distances between clusters, and interesting geometry in the visualization might be random or inconsistent with the original input data. Furthermore, they show how smaller values of t-SNE’s “perplexity” hyperparameter, which is a smooth equivalent to a kNN graph, can result in different visualizations after each run of the algorithm.

Ultimately, as we see in Section 3, the approach we propose is similar to the intuition behind the heuristic [14] which is to maintain as much local structure as possible but with a secondary consideration for global structure.

3 Metrics

The beginning of dimensionality reduction quality metrics dates back to the 1960’s with methods such as Kruskal’s Stress Measure and Sammon Stress. Since then, numerous methods have been developed to evaluate the effectiveness of an algorithm’s ability to replicate high dimensional structure in a lower dimensional space [7] with an added emphasis on local structure. Many of these methods, such as trustworthiness and continuity [24], local continuity meta-criterion [5], and mean relative rank errors [12] depend on an additional tunable parameter which requires the user to specify performance by an algorithm’s ability to maintain the same nearest neighbors in some fashion. While a subjective might not mean much for a small number of data points, it can become unclear as if a carefully chosen value for is reasonable.

Cognizant of this concern, many authors such as [15], [14], and [16], have tried evaluating projections qualitatively by plotting the curve created by evaluating tunable metrics for multiple values of . Some complications with this approach can be seen in [14] if one reviews figure 5 which plots dimensionality reduction performance on the swiss roll dataset and figure 6 which plots dimensionality reduction performance on 1000 images drawn from the MNIST dataset. We see an immediate local maximum in the performance curve followed by a decline in values before increasing once again for large values of the parameter . CCA using geodesic distances appears to be the best in figure 5 but t-SNE using Euclidean distance is labeled the best performer based on their rule of judging performance by the best “local” score. We see what appears to be the first few neighbors are kept very well but that same level of performance is not achieved until is much greater than the first location of the local maximum. This phenomenon occurs when there is an overemphasis on local neighborhoods such as in cases in [25]. In the examples with small perplexity, t-SNE forces immediate neighbors to be close but results in tiny clusters that have very little resemblance with the original dataset.

In addition, [13] propose the co-ranking matrix which is a more comprehensive means of evaluating the dimensionality reduction process. The co-ranking matrix is an (N-1)-by-(N-1) matrix that is a joint histogram of the ranks with observations above the diagonal called “extrusions” and below the diagonal “intrusions”. If used as an evaluation tool, one can identify hard intrusions and extrusions based on a choice of which determines the nearest neighbors. Lee and Verleysen also show how T&C, MRRE, and LCMC are similar to penalizing different portions of the co-ranking matrix and go on to offer the co-ranking matrix as a framework for future development of dimensionality reduction quality metrics. Some of the insights gained through visualizing dimensionality reduction performance through this lens include allowing the user to identify less harmful, small intrusions/extrusions which may come with noise flattening and large amplitude extrusions which characterizes a tearing in the manifold.

While tunable metrics do provide substantial information about the relationship between high and low dimensional space, quality metrics without tunable parameters require fewer assumptions on the part of the user and therefore one can guarantee more consistency in the quality of the output. [3] propose using a metric that measures preservation of distance orderings and have offered the Spearman’s Rho as one of the early measures to determining the preservation of topology. Some newer methods have come out such as “entropy” and “mutual information” which were proposed by [1] and treat the dimensionality reduction process as a communication channel model which transfers data points from high to low dimensional space.

4 Spectral Overlap

Building on this understanding of nonlinear dimensionality reduction quality metrics, we propose the following method which we call “Spectral Overlap”. The intuition here is that we want to penalize any lack of overlap in every KNN graph for . This provides equal weight across all neighbors.

We specify “Spectral Overlap” as follows: Let be the KNN graph in input space with parameter and be the KNN graph in output space with parameter . We begin by calculating the overlap penalty:

(1)

The overlap measure above captures the mismatch in high and low dimensional KNN graphs. From here, we scale by a normalizing constant to characterize the decay in performance as one increases the number of data points. This yields the quality metric:

(2)

If there is a tear in the manifold, more nearest-neighbor relationships will not be upheld and the penalty will be larger. Additionally, this also penalizes cases where one group can potentially occlude another due to the limits of the dimensionality reduction algorithm. We find that for linear dimensionality reduction algorithms, this can happen often such as in the case of the clusters data set we explore in Section 5.

Ultimately, due to the limit of metric spaces, spectral overlap’s objective is to measure the preservation of nearest neighbor relationships and cannot properly address a lack of transitivity in cases such as word embeddings [22]. Thus, we alternatively recommend representing high dimensional data in multiple maps if the transitivity of data points is not guaranteed.

5 Experiments

To evaluate some of these more objective measures of local structure performance, we evaluate four methods on six data sets using 3 publicly available CRAN packages and a package we developed for local multidimensional scaling. We apply t-SNE from the R package Rtsne [9], UMAP from umap [17], Sammon Mapping from stats [19], and local MDS from our package lmds [5]. Next, while datasets such as the synthetic Swiss roll dataset [21] and popular MNIST dataset [11] are common datasets for comparing nonlinear dimensionality reduction visualizations, we instead propose a series of datasets where we know confounding attributes such as the intrinsic dimension and local structure and utilize a Bayes error metric that complements each specific data generating process to evaluate the performance of the more general quality metrics that have been proposed.

5.1 Datasets

The six datasets are: (1) Two Lines, (2) Trefoil Knot, (3) Three Gaussians, (4) Noisy Circles, (5) Curved X’s, and (6) High Dimensional Clusters. Datasets (1) and (2) are directly from [25] and (3) is inspired by the three cluster dataset except we increase the variance for one cluster. Dataset (4) comes from [20] and (5) is inspired by the parabola example in [8]. Finally, we add (6) which is a set of separable clusters on the corners of a 4-dimensional hyper-cube.

Figure 1: LMDS High Dimensional Clusters Result. Titles correspond to quality metrics optimized and the resulting performance.

5.2 Setup

Since we have the data generating processes for each method, we begin by generating 250 data points for each 2D dataset and 800 data points for the high dimensional cluster data set from the same random seed. Next, for each nonlinear dimensionality reduction algorithm, we pre-specify a range of values for each algorithm’s set of hyperparameters and create a grid of all combinations for a grid search for each quality metric. The list of hyperparameters for each method is provided in the lmds github here.

For each algorithm, we then set 5 different randomized seeds and evaluate each algorithm with a given group of hyperparameters from its corresponding grid using each of the different quality metrics. We then select the best scoring hyperparameter configurations for each algorithm and quality metric and re-run those algorithms with the same seed. These resulting visualizations are then representative of good performance for each nonlinear dimensionality reduction algorithm’s parameters with respect to each quality metric.

Finally, we evaluate each algorithm-quality metric pair for each data set using Bayes error metrics that are appropriate for the data generating processes. For the 2-dimensional data sets, since we are projecting back to the same dimension, we are able to use the Procrustes Distance between the input data and output visualization where . The Procrustes Distance [8] compares two data sets of dimension and provides the L2 distance after scaling, rotating, and shifting one of the data sets. This provides a means for measuring how well an algorithm can exactly replicate the original manifold if we are mapping to the intrinsic dimension of the data and know its topology exactly. Next, for the high dimensional clusters data sets, we use the accuracy of the first nearest neighbor in in classifying the class we assigned in high dimensional space . Once again, this is only relevant given that we have specifically devised a case where the groups are completely separable with no overlap. In both cases, the Bayes error metrics are hypothetical and complement the specific data generating process. However, by identifying these base cases, we can compare more general quality metrics such as those in Appendix C to get a better sense of robustness of said metrics prior to using them in more complex cases.

5.3 Results & Discussion

Results from our simulations are shown in 1. Since scale varies across datasets, we rank the performance of each metric on each algorithm-dataset pair from 1 to for the metrics including the “Bayes error” metric. Average performance is broken down by dimension first and then the cumulative performance for 2-dimensional and higher dimensional performance is averaged to generate a final score.

Based on our simulations, we see that and Spectral Overlap tend to do the best in reducing the defined Bayes error metrics for each dataset and algorithm. This is then followed by Local Error and then Entropy. We see that which is based on a heuristic for tunable quality metrics and Mutual Information perform the worst.

For the 2-dimensional data sets, we see that on average, across these metrics, Sammon mapping does the best followed by local MDS, then t-SNE, and finally UMAP. It appeared that local MDS, UMAP, and t-SNE had difficulty with the Curved X’s dataset the most with higher Procrustes distances on average than the other datasets. The next worse dataset was the 3 Gaussians dataset. For a more visual exploration of the results, we provide visualizations of the original datasets as well as the best performers for each algorithm-quality metric pair in the supplementary material.

However, on the multi-dimensional cluster data set, we see that Sammon mapping performs the worst and that more nonlinear methods tend to outperform across almost all quality metrics. Across most metrics, t-SNE appears to have the most robust performance with UMAP following closely. We see that performs the best followed by Local Error and Spectral Overlap. Perhaps one of the most interesting aspects of these results can be seen in the visualization in Figure 1. While has good numerical performance based on our choice of Bayes error, we can see that this Bayes error is prone to tearing of the manifold. and Spectral Overlap sought to maintain the relationship such that clusters were relatively equidistant from each other. While these did not perform as well, we can see that they did a better job of preserving what one could conceptualize as the original manifold.

Metric 2D Avg. Rank High Dim Avg. Rank Overall Avg.
5.88 4.50 5.19
4.28 3.13 3.70
Entropy 4.38 5.13 4.75
Mutual Info 6.63 6.75 6.69
Local Error 4.60 4.25 4.43
Spectral Error 4.23 4.38 4.30
Spearman 3.83 6.00 4.91
Procrustes/KNN 2.20 1.88 2.04
Table 1: Average Ranked Performance

Based on the outcomes, we see that is predisposed to certain edge cases that can mislead a user to think he or she has found a high quality visualization. As stated in Section 3, the problem appears in the case where an algorithm puts too much weight on maintaining a small number of nearest neighbors but subsequently compromises the remaining neighbor rankings. We see that, on average, and Spectral Overlap perform well in both the low and high dimensional data sets which leads us to believe for now that these are good proxies for quality of output dimension .

Ultimately, this demonstration shows how tuning nonlinear dimensionality reduction methods using various meta-criterion perform in the most ideal case when we know (1) the intrinsic dimension, (2) the true local structure or any topological structure, and (3) the true global structure.

6 Conclusion

We present a simple exploration of some of the existing quality metrics that researchers can use without a prior with some of the leading nonlinear dimensionality reduction algorithms. By creating data generating processes that allow performance to be compared to a reasonable “Bayes error” metric, we are able to provide the beginnings of a more rigorous study for dimensionality reduction quality metrics. Based on our observations, we find that metrics based on maximizing the overlap in nearest neighbors such as and Spectral Overlap tend to have robust performance for 2 dimensional and higher dimensional cases. As shown in the case of LMDS and the Clusters data set, using these methods can provide potentially informative visualizations without tearing the manifold. Finally, we provide an R wrapper for local MDS which is available for download here.

For future work, we look to explore a few avenues. First, by defining a unifying metric to go by, this comparison acts as a first step to a fuller study of the existing nonlinear dimensionality reduction algorithms. We also intend to further investigate the realm of dimensionality reduction quality metrics in order to define more intuitive or analytically promising measures of local structure and then global structure. Finally, we aim to develop a method that more directly uses the and Spectral Overlap quality metric and will potentially become a competitor to leading methods.

Appendix A Results

Algorithm Data Entropy Mutual Info Local Error Spectral Error Spearman 1-NN
LMDS Clusters 0.955 0.880 0.688 0.241 0.698 0.911 0.506 0.958
tSNE Clusters 0.960 0.968 0.965 0.960 0.965 0.965 0.965 0.975
UMAP Clusters 0.964 0.966 0.963 0.959 0.965 0.963 0.783 0.975
Sammon Clusters 0.451 0.451 0.451 0.451 0.451 0.451 0.451 0.451
Table 2: High Dimensional Dataset
Method Avg. Procrust. Dist.
Local MDS 98.26
Sammon Mapping 9.51
t-SNE 103.51
UMAP 78.28
Table 3: Procrustes Distances
Algorithm Data Entropy Mutual Info Local Error Spectral Error Spearman Procrustes
LMDS 2 Lines 22.91 8.73 8.73 34.86 8.73 8.73 8.73 5.42
tSNE 2 Lines 27.07 19.79 19.93 10.00 11.57 19.91 19.91 8.28
UMAP 2 Lines 13.97 17.06 13.66 27.54 11.24 17.06 13.66 11.14
Sammon 2 Lines 4.61 4.61 4.61 4.61 4.61 4.61 4.61 4.61
LMDS Circles 2.05 2.05 2.05 8.93 2.05 2.05 2.05 0.34
tSNE Circles 0.96 0.51 0.95 8.81 0.94 0.45 0.45 0.45
UMAP Circles 8.91 1.46 1.46 8.58 4.33 1.46 1.46 1.46
Sammon Circles 0.23 0.23 0.23 0.23 0.23 0.23 0.23 0.23
LMDS Trefoil 5.22 5.22 5.22 24.19 5.22 5.22 5.22 1.05
tSNE Trefoil 1.92 1.66 1.66 24.29 2.61 1.55 1.55 1.51
UMAP Trefoil 9.68 3.74 4.37 26.21 5.57 3.74 3.74 3.74
Sammon Trefoil 0.53 0.53 0.53 0.53 0.53 0.53 0.53 0.53
LMDS Xs 442.34 53.39 53.39 637.30 72.36 63.48 72.36 23.09
tSNE Xs 900.26 141.65 101.41 146.97 279.97 78.61 78.61 78.61
UMAP Xs 342.33 211.24 140.10 542.88 91.91 211.24 211.24 60.54
Sammon Xs 1.51 1.51 1.51 1.51 1.51 1.51 1.51 1.51
LMDS 3 Gaussians 79.12 14.55 38.44 56.75 38.59 38.59 14.55 14.55
tSNE 3 Gaussians 48.56 48.69 48.26 75.47 48.26 48.45 48.45 46.76
UMAP 3 Gaussians 47.11 48.52 60.76 78.24 48.83 56.97 47.75 46.57
Sammon 3 Gaussians 43.95 43.95 43.95 43.95 43.95 43.95 43.95 43.95
Table 4: Procrustes Distances

Appendix B Data Generating Process

Below outlines the mathematical formulation for each of the data generating processes:

2 Lines

The 2 Lines dataset from [25] is generated as two parallel line segments. This can be done as such:

3 Gaussians

The 3 Gaussians dataset inspired by [25] is generated as 3 separable clusters with 2 closer to each other and 1 with large variance. We formulate the data as such:

Trefoil Knot

The trefoil knot also comes from [25] and we add a small amount of noise . The data generating process is:

where our final input matrix is .

Curved X’s

The Curved X’s data set is inspired by [8] who show how using linear methods does not often guarantee that the nearest neighbor relationships are kept and that nonlinear methods can do a better job by focusing on local structure. This data set is a slight change and consists of two curved X’s which are specified as such:

where our final input matrix is .

Noisy Circles

The Noisy Circles data set comes from [20] and is one circle drawn inside a larger circle. We also add noise and the data generating process is therefore as such:

High Dimensional Clusters

For the high dimensional clusters data set, we place clusters on the corners of a 4-dimensional hyper-cube as such:

Appendix C Quality Metrics

Below we outline the formulation for each of the quality metrics:

and Quality Metrics

was proposed in [13] and chooses the maximum LCMC score from [5] for all available K. The metric begins by calculating from [16] and is calculated as:

(3)

which counts the number of points that remained in the same local neighborhood defined by . This can also be conceptualized as penalizing the number of mismatched ranks when comparing ranked Euclidean distances in high and low dimensional space. One can tweak the number of nearest neighbors or as mentioned in [16], removing the tuning element. Next, the LCMC is calculated as:

(4)

And finally, we achieve our choice of via:

(5)

Entropy and Mutual Information

Both methods come from [1] and first require the user to calculate the co-ranking matrix as follows:

(6)

for all N data points and

(7)

Next, the joint probability distribution is specified as follows:

(8)

The Entropy quality metric is:

(9)

The Mutual Information metric is:

(10)

Local Error

Similar to Kruskal Stress, we devise a meta criterion that focuses more on getting the correct immediate errors by linearly weighting lower ranked neighbors higher than higher ranked neighbors. The result is a sum over the cumulative sum of the squared distance errors. More succinctly, we can think of this as linearly weighting the squared distance errors as:

(11)

where and are the Euclidean distance in high and low dimensional space and is the neighborhood for a given point defined by the th nearest neighbors.

Spectral Overlap

Let be the KNN graph in input space with parameter and be the KNN graph in output space with parameter . The metric is calculated as:

(12)

The intuition here is that we want to have overlap in every KNN graph for . This provides more weight on more immediate neighbors and results in a light penalty if pairwise neighbor relations are off by one or two ranks. However, if there is a drastic tear in the manifold, this will more heavily penalize terms that should be a nearest neighbor to a given point but are very far from said given point.

Spearman

The Spearman rank correlation coefficient is calculated on all values of and for one coefficient.

Procrustes Distance

From [8], the objective is the l2 norm on the rotated, shifted, and scaled image with respect to the reference image .

(13)

1-Nn

From [23], one metric that can be used when the true groups are known is the accuracy of the first nearest neighbor classifier. We can calculate this quickly via the following process. Let be the column-wise rank of each element excluding elements where which have default distance . Let each class of total classes have elements in .

(14)
(15)

where blkdiag corresponds to a block diagonal matrix of diagonal blocks of size such that or the total number of points in the input matrix .

We then calculate as such:

(16)

References

  • [1] M. Babaee, M. Datcu, and G. Rigoll (2013) Assessment of dimensionality reduction based on communication channel model; application to immersive information visualization. In 2013 IEEE international conference on big data, pp. 1–6. Cited by: Appendix C, §3.
  • [2] B. L. Betechuoh, T. Marwala, and T. Tettey (2006) Autoencoder networks for hiv classification. Current Science, pp. 1467–1473. Cited by: §1.
  • [3] J. C. Bezdek and N. R. Pal (1995) An index of topological preservation for feature extraction. Pattern Recognition 28 (3), pp. 381–391. Cited by: §3.
  • [4] K. Bunte, M. Biehl, and B. Hammer (2012) A general framework for dimensionality-reducing data visualization mapping. Neural Computation 24 (3), pp. 771–804. Cited by: §2.
  • [5] L. Chen and A. Buja (2009) Local multidimensional scaling for nonlinear dimension reduction, graph drawing, and proximity analysis. Journal of the American Statistical Association 104 (485), pp. 209–219. Cited by: Appendix C, §1, §3, §5.
  • [6] S. Ekins, K. V. Balakin, N. Savchuk, and Y. Ivanenkov (2006) Insights for human ether-a-go-go-related gene potassium channel inhibition using recursive partitioning and kohonen and sammon mapping techniques. Journal of medicinal chemistry 49 (17), pp. 5059–5071. Cited by: §1.
  • [7] A. Gracia, S. González, V. Robles, and E. Menasalvas (2014) A methodology to compare dimensionality reduction algorithms in terms of loss of quality. Information Sciences 270, pp. 1–27. Cited by: §3.
  • [8] T. Hastie, R. Tibshirani, and J. Friedman (2009) The elements of statistical learning. New York: Springer. Cited by: Appendix B, Appendix C, §5.1, §5.2.
  • [9] G. Hinton (2008) Visualizing data using t-sne.. Journal of machine learning research 9, pp. 2579–2605. Cited by: §1, §2, §5.
  • [10] L. O. Jimenez and D. A. Landgrebe (1998) Supervised classification in high-dimensional space: geometrical, statistical, and asymptotical properties of multivariate data. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 28 (1), pp. 39–54. Cited by: §1.
  • [11] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner (1998) Gradient-based learning applied to document recognition. Proceedings of the IEEE 86 (11), pp. 2278–2324. Cited by: §5.
  • [12] J. A. Lee and M. Verleysen (2007) Nonlinear dimensionality reduction. Springer Science & Business Media. Cited by: §3.
  • [13] J. A. Lee and M. Verleysen (2009) Quality assessment of dimensionality reduction: rank-based criteria. Neurocomputing 72 (7-9), pp. 1431–1443. Cited by: Appendix C, §3.
  • [14] J. A. Lee and M. Verleysen (2010) Scale-independent quality criteria for dimensionality reduction. Pattern Recognition Letters 31 (14), pp. 2248–2257. Cited by: §2, §3.
  • [15] J. A. Lee, M. Verleysen, et al. (2008) Rank-based quality assessment of nonlinear dimensionality reduction.. In ESANN, pp. 49–54. Cited by: §3.
  • [16] J. Liang, S. Chenouri, and C. G. Small (2017) A new method for performance analysis in nonlinear dimensionality reduction. arXiv preprint arXiv:1711.06252. Cited by: Appendix C, §1, §3.
  • [17] L. McInnes and J. Healy (2018) Umap: uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426. Cited by: §2, §5.
  • [18] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean (2013) Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pp. 3111–3119. Cited by: §1.
  • [19] J. W. Sammon (1969) A nonlinear mapping for data structure analysis. IEEE Transactions on computers 100 (5), pp. 401–409. Cited by: §1, §5.
  • [20] sklearn (2018) Comparing different clustering algorithms on toy datasets. External Links: Link Cited by: Appendix B, §5.1.
  • [21] J. B. Tenenbaum, V. De Silva, and J. C. Langford (2000) A global geometric framework for nonlinear dimensionality reduction. science 290 (5500), pp. 2319–2323. Cited by: §5.
  • [22] L. Van der Maaten and G. Hinton (2012) Visualizing non-metric similarities in multiple maps. Machine learning 87 (1), pp. 33–55. Cited by: §4.
  • [23] L. Van Der Maaten, E. Postma, and J. Van den Herik (2009) Dimensionality reduction: a comparative. Cited by: Appendix C, §1, §2.
  • [24] J. Venna and S. Kaski (2006) Local multidimensional scaling. Neural Networks 19 (6-7), pp. 889–899. Cited by: §3.
  • [25] M. Wattenberg, F. Viégas, and I. Johnson (2016) How to use t-sne effectively. Distill. External Links: Link, Document Cited by: Appendix B, Appendix B, Appendix B, §2, §3, §5.1.
  • [26] R. Ying, R. He, K. Chen, P. Eksombatchai, W. L. Hamilton, and J. Leskovec (2018) Graph convolutional neural networks for web-scale recommender systems. arXiv preprint arXiv:1806.01973. Cited by: §1.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
388246
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description