Manifold Alignment by Feature Correspondence

Manifold Alignment by Feature Correspondence

Jay S. Stanley III
Comp. Bio. & Bioinfo. Prog.
Yale University
New Haven, CT, USA
jay.stanley@yale.edu &Scott Gigante
Comp. Bio. & Bioinfo. Prog.
Yale University
New Haven, CT, USA
scott.gigante@yale.edu \ANDGuy Wolf 11footnotemark: 1
Dept. of Math. and Stat.
Université de Montréal
Montreal, QC, Canada
guy.wolf@umontreal.ca
&Smita Krishnaswamy    `
Depts. of Genetics & Comp. Sci.
Yale University
New Haven, CT, USA
smita.krishnaswamy@yale.edu
These authors contributed equally;  ` Corresponding author.
Abstract

We propose a novel framework for combining datasets via alignment of their intrinsic geometry. This alignment can be used to fuse data originating from disparate modalities, or to correct batch effects while preserving intrinsic data structure. Importantly, we do not assume any pointwise correspondence between datasets, but instead rely on correspondence between a (possibly unknown) subset of data features. We leverage this assumption to construct an isometric alignment between the data. This alignment is obtained by relating the expansion of data features in harmonics derived from diffusion operators defined over each dataset. These expansions encode each feature as a function of the data geometry. We use this to relate the diffusion coordinates of each dataset through our assumption of partial feature correspondence. Then, a unified diffusion geometry is constructed over the aligned data, which can also be used to correct the original data measurements. We demonstrate our method on several datasets, showing in particular its effectiveness in biological applications including data fusion and batch effect removal.

 

Manifold Alignment by Feature Correspondence


  Jay S. Stanley III Comp. Bio. & Bioinfo. Prog. Yale University New Haven, CT, USA jay.stanley@yale.edu Scott Gigante Comp. Bio. & Bioinfo. Prog. Yale University New Haven, CT, USA scott.gigante@yale.edu Guy Wolf 11footnotemark: 1 Dept. of Math. and Stat. Université de Montréal Montreal, QC, Canada guy.wolf@umontreal.ca Smita Krishnaswamy thanks: These authors contributed equally;  ` Corresponding author.   ` Depts. of Genetics & Comp. Sci. Yale University New Haven, CT, USA smita.krishnaswamy@yale.edu

\@float

noticebox[b]Preprint. Under review.\end@float

1 Introduction

Manifold alignment aims to map disparate datasets into a common representation, under the assumption that the datasets all originate from noisy sampling of a common manifold determined by the data generation process. Under this assumption, the intrinsic geometry of the data should be similar across datasets, and global differences between the geometry of different datasets are considered as noise or data collection artifacts. Therefore, a common representation, which aligns the datasets from their original measurements onto a shared data manifold, both eliminates such artifacts and recovers clean intrinsic relations across measured datasets. Furthermore, such alignment enables data fusion and transfer of knowledge between separate domains when datasets are taken from different data collection environments (e.g., different sensors, technologies, or subjects in biomedical data).

This problem is particularly relevant in biomedical data analysis nowadays, with the advent of many modalities of high throughput measurements generated in biomedical and cellular systems. For example, a population of cells can be measured using both single cell ATAC-sequencing (scATAC-seq)Buenrostro et al. (2015), which measures regions of open chromatin DNA in each cell, and single cell RNA-sequencing (scRNA-seq) Klein et al. (2015), which reveals gene expression profiles of cells. However, since both measurements are destructive, they are not measured on the same cells but rather from cells sampled from the same population. To integrate such datatypes it is important to register neighboring (i.e., likely matching) cells from both measurements. We note that often we can expect such registration to be feasible, even for different technologies, since their measurements reflect related properties capturing the underlying state of equivalent cell populations. For example, in the scATAC-seq/scRNA-seq case, if a gene is expressed then it must be in an open chromatin region, so the resulting measurements are different but not independent of each other.

Furthermore, even when using the same measurement technology, biomedical data is often systematically different based on machine calibration, day-to-day temperature variation and underlying background biological differences between individuals or staining and treatment differences Haghverdi et al. (2018). For example, patients with kidney disease at two different hospitals may have creatinine readings in two different ranges, simply due to machine calibration differences Delanaye et al. (2014), which are not always known a priori. As a result, studying real experimental differences, comparing outcomes across patient cohorts, and generalizing results to different hospitals is challenging, if not impossible, without proper alignment to make varied collection environments comparable while alleviating such systematic batch effects.

Here, we propose an alignment approach that explicitly takes advantage of the typical correspondence between the underlying features quantified by measurement and data collection systems that can be aligned. Indeed, related systems often observe similar “entities” (e.g., cells, patients) and aim to capture related properties in them. Our approach uses graph signal processing tools (see Sec. 3) to relate measured data features (seen as graph or manifold signals) to intrinsic coordinates over the intrinsic geometry of each dataset, which are revealed via diffusion maps Coifman and Lafon (2006). Then, as explained in Secs. 4 and 5, we leverage feature correspondence to capture pairwise relations between the intrinsic diffusion map coordinates of the separate data manifolds (i.e., of each dataset). Finally, we use these relations to compute an isometric transformation that aligns the data manifolds on top of each other without distorting their internal structure.

We demonstrate the results of our method on artificial manifolds created from corrupted MNIST digits, and single-cell biological data for both batch effect removal and multimodal data fusion. In each case, our method successfully aligns the data manifolds to recover appropriate data neighborhoods both within and across the two datasets. Further, we show an application of our approach in transfer learning by applying a lazy classifier to one unlabeled dataset based on labels provided by another dataset (with batch effects between them), and compare lazy classification accuracy before and after alignment. Finally, comparisons with recently developed methods such as the MNN-based method from Haghverdi et al. (2018) and the GAN-based method from Amodio and Krishnaswamy (2018) show significant improvements in alignment and neighborhood recovery achieved by our harmonic alignment methods.

2 Related Work

Algorithms for semi-supervised and unsupervised manifold alignment exist in classical statistics (Gower, 1975; Thompson, 1984), deep learning (Zhu et al., 2017; Kim et al., 2017; Amodio and Krishnaswamy, 2018) and manifold learning (Haghverdi et al., 2018; Wang and Mahadevan, 2008, 2009). A classic method for aligning linear structures (such as ones captured by PCA) is canonical correlation analysis (CCA) (Thompson, 1984), which can be used to project two datasets on a common basis formed by directions that maximize feature correlation between them. More recently, a linear manifold alignment method was presented in Wang and Mahadevan (2009) based on embedding a joint graph built over both datasets to preserve local structure in both manifolds. This method provides a mapping from both original features spaces to a new feature space defined by the joint graph, which is shared by both datasets with no assumption of feature correspondence. Finally, in biomedical data analysis, mutual nearest neighbors (MNN) batch correction (Haghverdi et al., 2018) focuses on families of manifold deformations that are often encountered in biomedical data. There, locally linear manifold alignment is provided by calculating a correction vector for each point in the data, as defined by the distances from the point to all points for which it is a mutual -nearest neighbor. This correction vector is then smoothed by taking a weighted average over a Gaussian kernel.

Beyond manifold learning settings, deep learning methods have been proposed to provide alignment and transfer learning between datasets. For example, cycle generative adversarial networks (Cycle GANs) (Zhu et al., 2017; Kim et al., 2017) are a class of deep neural network in which a generative adversarial network (GAN) is used to learn a nonlinear mapping from one domain to another, and then a second GAN is used to map back to the original domain. These networks are then optimized to (approximately) satisfy cycle consistency constraints such that the result of applying the full cycle to a data point reproduces the original point. Manifold Aligning GAN (MAGAN) (Amodio and Krishnaswamy, 2018) is a particular cycle GAN that adds a supervised partial feature correspondence to enforce alignment of two data manifolds over the mapping provided by the trained network.

In contrast, this work provides a nonlinear method for aligning two datasets using their diffusion maps under the assumption of a partial feature correspondence. However, unlike MAGAN, we do not need to know which features correspond. In doing so, we obtain more information from datasets with partial feature correspondence than methods that assume no correspondence, but without the burden of determining in advance which or how many features correspond. To evaluate out method, in Section 6 we compare our method to MAGAN, as a leading representative of deep learning approaches, and MNN, as a leading representative of manifold learning approaches. We note that to the best of our knowledge, the method in Wang and Mahadevan (2009) is not provided with standard implementation, and our attempts at implementing the algorithm have significantly underperformed other methods. For completeness, partial comparison to this method is demonstrated in Appendix F.

3 Preliminaries

Manifold Learning

High dimensional data can often be conceptually modeled as originating from an intrinsically low dimensional manifold that is mapped via nonlinear functions to observable high dimensional measurements; this is commonly referred to as the manifold assumption. Formally, given a dataset of high dimensional observations, manifold learning methods assume its data points originate from a sampling of the underlying manifold mapped via a nonlinear function , to the high dimensional feature space. Then, these methods aim to learn a low dimensional intrinsic representation that approximates the manifold geometry of (see, for example, Van Der Maaten et al. (2009); Izenman (2012); Lin et al. (2015); Moon et al. (2017a) and references within).

Diffusion Maps

To learn a manifold geometry from collected data, we use the popular diffusion maps construction Coifman and Lafon (2006). This construction starts by considering local similarities, which we quantify via an anisotropic kernel

(1)

where is the Gaussian kernel with neighborhood radius . As shown in Coifman and Lafon (2006), this kernel provide neighborhood construction that is robust to sampling density variations and enables separation of data geometry from its distribution. Next, the kernel is normalized to define transition probabilities that define a Markovian diffusion process over the data. Finally, a diffusion map is defined by organizing these probabilities in a row stochastic matrix (typically referred to as the diffusion operator) as , and using its eigenvalues and (corresponding) eigenvectors to map each to diffusion coordinates . The parameter in this construction represents a diffusion time, i.e., the number of transitions considered in the diffusion process. To simplify notations, we also use to denote the diffusion map of the entire dataset . We note that in general, as increases, most of the eigenvalue weights , , become numerically negligible, and thus truncated diffusion map coordinates (i.e., using only non-negligible ones) can be used for dimensionality reduction purposes, as discusses in Coifman and Lafon (2006).

Graph Fourier Transform

A classic result in spectral graph theory (see, e.g., Brooks et al. (1994)) shows that the discrete Fourier basis (i.e., pure harmonics, such as sines and cosines, organized by their frequencies) can be derived as Laplacian eigenvectors of the ring graphs. More recently, this result was recently used in graph signal processing Shuman et al. (2013) to define a graph Fourier transform (GFT) by treating eigenvectors of the graph Laplacian as generalized Fourier harmonics (i.e., intrinsic sines and cosines over a graph). Further, as discussed in Coifman and Lafon (2006); Nadler et al. (2006), diffusion coordinates are closely related to these Laplacian eigenvectors, and can essentially serve as geometric harmonics over data manifolds. Indeed, the kernel can be considered as defining edge weights on a graph whose vertices are the data point in . It can be verified that for this graph, its normalized graph Laplacian is given by , where is a diagonal matrix with . Therefore, the eigenvectors of can be written as with corresponding eigenvalues . The resulting GFT of a signal (or function) over can thus be written as . We note that here we treat either or as providing a “frequency” organization of their corresponding eigenvectors or (treated as intrinsic harmonics). In the latter case, eigenvectors with higher eigenvalues correspond to lower frequencies on the data manifold, and vice versa. This frequency-based organization, and more generally the duality between diffusion coordinates and harmonics, will be leveraged here to provide an isometric alignment between the intrinsic coordinates of data manifolds with (partially) corresponding features.

4 Harmonic alignment

0:  Dataset with features and two samples (i.e., sub-datasets), where each sample has observations
0:  Aligned diffusion map ,
1:  for   do
2:     Compute the anisotropic kernel (Eq. 1) over the sample .
3:     Compute the diffusion operator , its eigendecomposition, and the corresponding diffusion map (see Sec. 3)
4:  end for
5:  Compute the bandlimited harmonic correlation matrix (see Sec. 5)
6:  Orthogonalize via SVD to get
7:  Compute the unified diffusion map (Eq. 2)
Algorithm 1 Harmonic Alignment

Let be a collection of samples with features that also serve as the observed ambient-space coordinates of the data. We assume that at least a subset of these features aim to measure the same quantities in the data, but are also affected by sample-dependent artifacts. Further, while in general some features may be unique for specific samples (e.g., when collected by different technologies), for simplicity we focus here on overlapping features that conceptually correspond with each other across samples. As a result, we assume the number of features is independent of specific sample and can be denoted as . The entire data is thus given by .

For simplicity, we describe here our proposed approach for aligning two samples and processing data across them, but this can naturally be generalized to any number of samples. In order to find an isometric transformation between the diffusion geometries of the two samples, we compute a harmonic correlation matrix between the harmonics that serve (up to appropriate weighting) as coordinates of the diffusion maps and of the two samples. Notice that since we do not have any correspondence between data points, the computation of the correlations in cannot be done directly between the two sets of Laplacian eigenvectors of the two samples when expressed in terms of data points. However, since we assume feature correspondence between samples, we can compute these correlations by expressing these eigenvectors in terms of the GFT of the data features. Namely, for each sample we construct an matrix whose -th column is . Then, can be computed with correlations between rows of and .

We note that in fact, not all correlations should be computed, since each harmonic and corresponding GFT coefficients in represent intrinsic variations at a specific frequency expressed by the corresponding Laplacian eigenvalue . Therefore, we only expect information between similar frequency components to be considered for correlation, while components with very different frequencies can be considered as uncorrelated to begin with. As explained in Sec. 5, we incorporate this understanding in the construction of by only populating near-diagonal elements in this matrix with correlations that are computed using a sliding bandpass filter that, at each time, selects only a local band of coefficients to be correlated from and

Given the cross-sample harmonic correlation matrix , we use its singular value decomposition (SVD) to obtain its nearest orthogonal approximation (e.g., as shown in Schönemann (1966)), which defines an isometric transformation between the diffusion maps of the two samples. Finally, we now compute a unified diffusion map, which can be written in (block) matrix form as

(2)

where are diagonal matrices with the diffusion eigenvalues as their main diagonal. A succinct summary of the described algorithm, which we call harmonic alignment, is presented in Algorithm 1. While our presentation here is given in terms of two samples for simplicity, it naturally extents to multiple samples by considering blocks in (2), instead of blocks, with isometric transformations in each block with .

To complete the alignment process, we construct a new kernel over the combined diffusion coordinates in and build a robust unified diffusion geometry over the entire multi-sample data that is invariant to batch effects and also naturally denoises various sample-specific artifacts. This diffusion geometry can naturally be incorporated in diffusion-based methods for several data processing tasks, such as dimensionality reduction & visualization (Moon et al., 2017b), denoising & imputation (van Dijk et al., 2018), latent variable inference (Haghverdi et al., 2016; Lederman and Talmon, 2018), and data generation (Lindenbaum et al., 2018). In Sec. 6.3 we demonstrate the application of harmonic alignment to batch effect removal and multimodal data fusion, and in particular in single-cell data analysis.

5 Bandlimited correlation

As discussed in Sec. 3, diffusion coordinates are organized by frequency and thus, we can constrain the isometric transformation learned in harmonic alignment to maintain the general frequency structure between the two data manifolds. Indeed, if the two manifolds can be aligned, then intrinsic low-frequency trends in one dataset should map to low-frequency ones in the other, and similarly any frequency band should map to an equivalent one across the aligned datasets. Therefore, the frequency structure of diffusion maps already provides a coarse alignment, which we leverage here by only applying our alignment within local frequency bands of the diffusion operator spectrum instead of globally over all diffusion coordinates.

To take advantage of the described frequency structure, we propose to partition harmonic correlations using graph spectral wavelets Hammond et al. (2011). Since these wavelets are defined as functions of the Laplacian eigenvalues, they provide a natural extension of the Fourier-based alignment we have proposed. The definition of spectral graph wavelets in the Fourier basis allows one to select a wavelet basis that guarantees (1) uniform frequency response (called a tight frame) and (2) smooth partitioning. In particular, we use here the iterated sine wavelets of Perraudin et al. (2014) that are formed by translation and dilation of the generating kernel

(3)

This kernel yields a tight frame with a parameterized overlap between bands, as shown in the appendix (see Figure 5  there). In practice, we choose an overlap of , which creates smooth frequency response transitions between each wavelet band.

These wavelets are applied in the Fourier domain via element-wise multiplication of the sample itersine kernel , which defines the frequency response of the wavelet at scales , with the Fourier coefficients of an input signal. As there are wavelet scales, then spectrally partitioned signals are produced by this transform. A correlation matrix is then generated for each scale by correlating the GFT of wavelet transformed features (i.e., for the two samples), denoted and . The resulting correlation is bandlimited and preserves the distribution of manifold harmonics. Finally, each correlation band smoothly transitions to its surrounding bands, and these partial correlations are combined together to obtain sparse inter-band correlations. This is done by summing all bands together to obtain a full “fuzzy-diagonal” correlation matrix , which is composed of overlapping bandlimited correlations (see Figure 5  in the appendix). For further discussion of this construction, and the robustness it provides to signal noise that degrades nonlimited correlations, we refer the reader to Appendix D .

6 Empirical results

6.1 Artificial feature corruption

Figure 1: Recovery of k-nearest neighborhoods under feature corruption. Mean over 3 iterations is reported for each method. \subrefsubfig:KNNrecovery At each iteration, two sets and of points were sampled from MNIST. was then distorted by a corruption matrix for various identity percentages (see Section 6.1). Subsequently, a lazy classification scheme was used to classify points in using a 5-nearest neighbor vote from . Results for harmonic alignment with different filterbank sizes (see Sec. 5), mutual nearest neighbors (MNN), and classification without alignment are shown. \subrefsubfig:KNNrecons Reconstruction of digits with only 25% uncorrupted features. Left: Input digits. Left middle: 75% of the pixels in the input are corrupted. Right middle: Reconstruction without harmonic alignment. Right: Reconstruction after harmonic alignment. \subrefsubfig:compclassification Lazy classification accuracy relative to input size with unlabeled randomly corrupted digits with 35% preserved pixels. \subrefsubfig:transferlearning Transfer learning performance. For each ratio, 1K uncorrupted, labeled digits were sampled from MNIST, and then 1K, 2K, 4K, and 8K (x-axis) unlabeled points were sampled and corrupted with 35% column identity.

To demonstrate the alignment provided by our method, we assess its ability to recover -nearest neighborhoods after random feature corruption, and compare it to MNN (Haghverdi et al., 2018) and MAGAN Amodio and Krishnaswamy (2018), which are leading manifold- and deep-learning methods respectively, as discussed in Sec. 2. To this end, we drew two random samples and of MNIST digit images. Then, for each trial, we drew samples from a unit-variance Normal distribution to create a random matrix. We orthogonalized this matrix to yield the corruption matrix . To vary the amount of feature corruption, we produced partial corruption matrices (for several values of ) by randomly substituting of the columns in with columns of the identity matrix. Right multiplication of by these matrices yields corrupted images with only preserved pixels (see Figure 1, ‘Corrupted’). To assess the recovery of -nearest neighborhoods, we performed lazy classification on digits (i.e., rows) in by only using the labels of neighbors from . The results of this experiment, performed for , are reported in Figure 1. For robustness, at each we sampled three different non-overlapping pairs , and for each pair we sampled three matrices, each with random identity columns, for a total of nine trials per . It should be noted that while we report results in terms of mean classification accuracy, we do not aim to provide an optimal classifier here. Our evaluation merely aims to provide a quantitative assessment of neighborhood quality before and after alignment. We regard a lazy learner as ideal for such evaluation since it directly exposes the quality of data neighborhoods, rather than obfuscate it via a trained model. Results for harmonic alignment in conjunction with SVM classifier (in transfer learning settings) are discussed in Sec. 6.2 and demonstrated in Figure 1.

In general, none of the methods recovers -nearest neighborhoods under total corruption, showing 10% accuracy for very small , essentially giving random chance accuracy given that MNIST has ten classes. Note that in our case, it clearly violates our (partial) feature correspondence assumption. However, when using sufficiently many bandlimited filters, our harmonic alignment quickly recovers over accuracy and consistently outperforms both MNN and MAGAN, except under under very high correspondence (i.e., when ). The method proposed by Wang and Mahadevan (2009) was excluded since it did not show improvement over unaligned classification, but is discussed in Appendix F  for completeness.

Next, we examined the ability of harmonic alignment to reconstruct the corrupted data (see Figure 1). We performed the same corruption procedure as before with and selected ten examples of each MNIST digit. Ground truth from and corrupted result are shown in Figure 1. Then, reconstruction was performed by setting each pixel in a new image to the dominant class average of the nearest neighbors from . In the unaligned case, we see that most examples turn into smeared fives or ones; this is likely a random intersection formed by and (e.g., accounting for the baseline random chance classification accuracy). On the other hand, the reconstructions produced by harmonic alignment resemble their original input examples.

Finally, in Figure 1, we consider the affect of data size on obtained alignment. To this end, we fix and vary the size of the two aligned datasets. We compare harmonic alignment, MNN, and MAGAN on input sizes range from 200 to 1600 MNIST digits, while again using lazy classification accuracy to measure neighborhood preservation and quantify alignment quality. The results in Figure 1 show that both MNN and MAGAN are not significantly affected by dataset size, and in particular do not improve with additional data. Harmonic alignment, on the other hand, not only outperforms them significantly – its alignment quality increases monotonically with input size.

6.2 Transfer learning

An interesting use of manifold alignment algorithms is transfer learning. In this setting, an algorithm is trained to perform well on a small (e.g., pilot) dataset, and the goal is to extend the algorithm to a new larger dataset (e.g., as more data is being collected) after alignment. In Figure 1 we explore the utility of harmonic alignment in transfer learning and compare it to MNN Haghverdi et al. (2018) and MAGAN Amodio and Krishnaswamy (2018). In this experiment, we first randomly selected uncorrupted examples of MNIST digits, and constructed their diffusion map to use as our training set. Next, we took -corrupted unlabeled points (see Section 6.1) in batches of , , , and , as a test set for classification using the labels from the uncorrupted examples. As shown in 1, with a 5-nearest neighbor lazy classifier, harmonic alignment consistently improves as the dataset gets larger, even with up to eight test samples for every one training sample. When the same experiment is performed with a linear SVM, harmonic alignment consistently outperforms other methods with performance being independent of test set size (or train-to-test ratio). This is due to the increased robustness and generalization capabilities of trained SVM. Further discussion of transfer learning is given in Appendix E . In addition to showing the use of manifold alignment in transfer learning, this example also demonstrates the robustness of our algorithm to imbalance between samples.

6.3 Biological data

6.3.1 Batch effect correction

Figure 2: \subrefsubfig:noisy-\subrefsubfig:bioaligned Batch effect removal. 4K cells were subsampled from two single-cell immune profiles obtained via mass cytometry on blood samples of two patients infected with Dengue fever. Top: Both patients exhibit heightened IFN (x-axis), a pro-inflammatory cytokine associated with tumor necrosis factor alpha (TNF, y-axis) Bottom: IFN histograms for each batch. \subrefsubfig:noisy Data before denoising. \subrefsubfig:biounaligned Denoising of unaligned data enhances a technical effect between samples in IFN. \subrefsubfig:bioaligned Harmonic alignment corrects the IFN shift. \subrefsubfig:rnaseq_atacseq Multimodal data fusion. Percentage overlap of cell neighborhoods from joint gene expression and chromatin profiling of single cells. Harmonic alignment most accurately recovers the pointwise relationship between the two manifolds.

To illustrate the need for robust manifold alignment in computational biology, we turn to a simple real-world example obtained from Amodio et al. (2018) (see Figure 2). This dataset was collected by mass cytometry (CyTOF) of peripheral blood mononuclear cells (PBMC) from patients who contracted dengue fever. Subsequently, the Montgomery lab at Yale University experimentally introduced these PBMCs to Zika virus strains.

The canonical response to dengue infection is upregulation of interferon gamma (IFN), as discussed in Chesler and Reiss (2002); Chakravarti and Kumaria (2006); Braga et al. (2001). During early immune response, IFN works in tandem with acute phase cytokines such as tumor necrosis factor alpha (TNF) to induce febrile response and inhibit viral replication Ohmori et al. (1997). We thus expect to see upregulation of these two cytokines together, which we explore in Figure 2.

In Figure 2, we show the relationship between IFN and TNF without denoising. Note that there is a substantial difference between the IFN distributions of sample 1 and sample 2 (Earth Mover’s Distance (EMD) = 2.699). In order to identify meaningful relationships in CyTOF data, it is common to denoise it first Moon et al. (2017a). We used a graph low-pass filter proposed in van Dijk et al. (2018) to denoise the cytokine data. The results of this denoising are shown in Figure 2. This procedure introduced more technical artifacts by enhancing the difference between batches, as seen by the increased EMD (3.127) between the IFN distributions of both patients. This is likely due to a substantial connectivity difference between the two batch subgraphs in the total graph of the combined dataset.

Next, we performed harmonic alignment of the two patient profiles. We show the results of this in Figure 2. Harmonic alignment corrected the difference between IFN distributions and restored the canonical correlation of IFN and TNF (EMD=0.135). This example illustrates the utility of harmonic alignment for biological data, where it can be used for integrated analysis of data collected across different experiments, patients, and time points.

6.3.2 Multimodal Data Fusion

Since cells contain numerous types of components that are informative of their state (genes, proteins, epigenetics), modern experimental technologies are starting to obtain measurements of each of these components from different assays at the single cell level. Since most single-cell assays are destructive, it is challenging or impossible to obtain all desired measurements in the same cells. It is therefore desirable to perform each assay on a different subset of cells from a single sample, and align these datasets in silico in order to obtain a pseudo-joint profile of the multiple data types.

To demonstrate the utility of harmonic alignment in this setting, we use a dataset obtained from Cao et al. (2018) of 11,296 cells from adult mouse kidney collected by a joint measurement technique named sci-CAR, which measures both gene expression (scRNA-seq) and chromatin accessibility (scATAC-seq) in the same cells simultaneously. The datasets are normalized separately as in van Dijk et al. (2018), using a square root transformation for the scRNA-seq and a log transformation with a pseudocount of 1 for the scATAC-seq data, and finally the dimensionality of each dataset is reduced to 100 using truncated SVD. After randomly permuting the datasets to scramble the correspondence between them, we align the two manifolds in order to recover the known bijection between data modalities. Let be the scRNA-seq measurement of cell , and be the scATAC-seq measurement of cell . Figure 2 shows the average percentage overlap of the neighborhood of in with the neighborhood of in before and after alignment with: MAGAN, MNN and Harmonic Alignment. Harmonic Alignment most accurately recovers the cell neighborhoods, thereby allowing the generation of in silico joint profiles across data types and obviating the need for expensive or infeasible in vitro joint profiling.

7 Conclusion

We presented a novel method for aligning or batch-normalizing datasets, which involves learning and aligning their intrinsic geometries. Our method is based on the principle that corresponding features across samples or datasets should have similar “frequency” components on these intrinsic data geometries, represented as manifolds. Our harmonic alignment leverages this understanding to compute cross-dataset similarity between manifold harmonics, which is then used to construct an isometric transformation that aligns the data manifolds. Results show that our method successfully aligns artificially misaligned samples, as well as biological data containing batch effects. Our method has the advantage of aligning data geometry rather than density, and thus, it is insensitive to sampling differences. Further, our method inherently denoises the data as it obtains alignment of significant manifold dimensions rather than noise. We expect future applications of harmonic alignment to include, for example, the use of multimodal data fusion to understand complex molecular processes through three or more different data modalities.

References

  • Buenrostro et al. [2015] Jason D Buenrostro, Beijing Wu, Ulrike M Litzenburger, Dave Ruff, Michael L Gonzales, Michael P Snyder, Howard Y Chang, and William J Greenleaf. Single-cell chromatin accessibility reveals principles of regulatory variation. Nature, 523(7561):486, 2015.
  • Klein et al. [2015] Allon M Klein, Linas Mazutis, Ilke Akartuna, Naren Tallapragada, Adrian Veres, Victor Li, Leonid Peshkin, David A Weitz, and Marc W Kirschner. Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell, 161(5):1187–1201, 2015.
  • Haghverdi et al. [2018] Laleh Haghverdi, Aaron TL Lun, Michael D Morgan, and John C Marioni. Batch effects in single-cell rna-sequencing data are corrected by matching mutual nearest neighbors. Nature biotechnology, 36(5):421, 2018.
  • Delanaye et al. [2014] Pierre Delanaye, Etienne Cavalier, Jean-Paul Cristol, and Joris R Delanghe. Calibration and precision of serum creatinine and plasma cystatin c measurement: impact on the estimation of glomerular filtration rate. Journal of nephrology, 27(5):467–475, 2014.
  • Coifman and Lafon [2006] Ronald R Coifman and Stéphane Lafon. Diffusion maps. Applied and computational harmonic analysis, 21(1):5–30, 2006.
  • Amodio and Krishnaswamy [2018] Matthew Amodio and Smita Krishnaswamy. Magan: Aligning biological manifolds. arXiv preprint arXiv:1803.00385, 2018.
  • Gower [1975] John C Gower. Generalized procrustes analysis. Psychometrika, 40(1):33–51, 1975.
  • Thompson [1984] Bruce Thompson. Canonical correlation analysis: Uses and interpretation. Number 47 in Quantitative applications in the social sciences. Sage, 1984.
  • Zhu et al. [2017] Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. Unpaired image-to-image translation using cycle-consistent adversarial networks. In IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pages 2242–2251, 2017. doi: 10.1109/ICCV.2017.244. URL https://doi.org/10.1109/ICCV.2017.244.
  • Kim et al. [2017] Taeksoo Kim, Moonsu Cha, Hyunsoo Kim, Jung Kwon Lee, and Jiwon Kim. Learning to discover cross-domain relations with generative adversarial networks. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017, pages 1857–1865, 2017. URL http://proceedings.mlr.press/v70/kim17a.html.
  • Wang and Mahadevan [2008] Chang Wang and Sridhar Mahadevan. Manifold alignment using procrustes analysis. In Proceedings of the 25th international conference on Machine learning, pages 1120–1127. ACM, 2008.
  • Wang and Mahadevan [2009] Chang Wang and Sridhar Mahadevan. Manifold alignment without correspondence. In IJCAI, volume 2, page 3, 2009.
  • Van Der Maaten et al. [2009] Laurens Van Der Maaten, Eric Postma, and Jaap Van den Herik. Dimensionality reduction: a comparative. J Mach Learn Res, 10:66–71, 2009.
  • Izenman [2012] Alan Julian Izenman. Introduction to manifold learning. Wiley Interdisciplinary Reviews: Computational Statistics, 4(5):439–446, 2012.
  • Lin et al. [2015] Binbin Lin, Xiaofei He, and Jieping Ye. A geometric viewpoint of manifold learning. Applied Informatics, 2(1):3, 2015.
  • Moon et al. [2017a] Kevin R Moon, Jay Stanley, Daniel Burkhardt, David van Dijk, Guy Wolf, and Smita Krishnaswamy. Manifold learning-based methods for analyzing single-cell rna-sequencing data. Current Opinion in Systems Biology, 2017a.
  • Brooks et al. [1994] Robert Brooks, Carolyn Gordon, and Peter A Perry, editors. Geometry of the Spectrum, volume 173 of Proc. 1993 Joint Summer Res. Conf. on Spectral Geometry, University of Washington, Seattle, July 17-23, 1993; Contemporary Mathematics. American Mathematical Society, Providence, 1994.
  • Shuman et al. [2013] David I Shuman, Sunil K Narang, Pascal Frossard, Antonio Ortega, and Pierre Vandergheynst. The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains. IEEE Signal Processing Magazine, 30(3):83–98, 2013.
  • Nadler et al. [2006] Boaz Nadler, Stephane Lafon, Ioannis Kevrekidis, and Ronald R Coifman. Diffusion maps, spectral clustering and eigenfunctions of fokker-planck operators. In Advances in neural information processing systems, pages 955–962, 2006.
  • Schönemann [1966] Peter H Schönemann. A generalized solution of the orthogonal procrustes problem. Psychometrika, 31(1):1–10, 1966.
  • Moon et al. [2017b] Kevin R Moon, David van Dijk, Zheng Wang, Daniel Burkhardt, William Chen, Antonia van den Elzen, Matthew J Hirn, Ronald R Coifman, Natalia B Ivanova, Guy Wolf, et al. Visualizing transitions and structure for high dimensional data exploration. bioRxiv, page 120378, 2017b.
  • van Dijk et al. [2018] David van Dijk, Roshan Sharma, Juoas Nainys, Kristina Yim, Pooja Kathail, Ambrose Carr, Cassandra Burdziak, Kevin R Moon, Christine L Chaffer, Diwakar Pattabiraman, Brian Bierie, Linas Mazutis, Guy Wolf, Krishnaswamy Smita, and Data Pe’er. Recovering gene interactions from single-cell data using data diffusion. Cell, 174(3):716 – 729.e27, 2018. doi: 10.1016/j.cell.2018.05.061.
  • Haghverdi et al. [2016] Laleh Haghverdi, Maren Buettner, F Alexander Wolf, Florian Buettner, and Fabian J Theis. Diffusion pseudotime robustly reconstructs lineage branching. Nature methods, 13(10):845, 2016.
  • Lederman and Talmon [2018] Roy R Lederman and Ronen Talmon. Learning the geometry of common latent variables using alternating-diffusion. Applied and Computational Harmonic Analysis, 44(3):509–536, 2018.
  • Lindenbaum et al. [2018] Ofir Lindenbaum, Jay S Stanley III, Guy Wolf, and Smita Krishnaswamy. Data generation based on diffusion geometry. arXiv preprint arXiv:1802.04927, 2018.
  • Hammond et al. [2011] David K Hammond, Pierre Vandergheynst, and Rémi Gribonval. Wavelets on graphs via spectral graph theory. Applied and Computational Harmonic Analysis, 30(2):129–150, 2011.
  • Perraudin et al. [2014] Nathanaël Perraudin, Johan Paratte, David Shuman, Lionel Martin, Vassilis Kalofolias, Pierre Vandergheynst, and David K. Hammond. GSPBOX: A toolbox for signal processing on graphs. ArXiv e-prints, August 2014.
  • Amodio et al. [2018] Matthew Amodio, David van Dijk, Krishnan Srinivasan, William S Chen, Hussein Mohsen, Kevin R Moon, Allison Campbell, Yujiao Zhao, Xiaomei Wang, Manjunatha Venkataswamy, Anita Desai, Ravi V., Priti Kumar, Ruth Montgomery, Guy Wolf, and Smita Krishnaswamy. Exploring single-cell data with deep multitasking neural networks. bioRxiv, 2018. doi: 10.1101/237065.
  • Chesler and Reiss [2002] David A Chesler and Carol Shoshkes Reiss. The role of ifn- in immune responses to viral infections of the central nervous system. Cytokine & growth factor reviews, 13(6):441–454, 2002.
  • Chakravarti and Kumaria [2006] Anita Chakravarti and Rajni Kumaria. Circulating levels of tumour necrosis factor-alpha & interferon-gamma in patients with dengue & dengue haemorrhagic fever during an outbreak. Indian Journal of Medical Research, 123(1):25, 2006.
  • Braga et al. [2001] Elzinandes LA Braga, Patrícia Moura, Luzia MO Pinto, Sonia Ignácio, Maria José C Oliveira, Marly T Cordeiro, and Claire F Kubelka. Detection of circulant tumor necrosis factor-alpha, soluble tumor necrosis factor p75 and interferon-gamma in brazilian patients with dengue fever and dengue hemorrhagic fever. Memorias do Instituto Oswaldo Cruz, 96(2):229–232, 2001.
  • Ohmori et al. [1997] Yoshihiro Ohmori, Robert D Schreiber, and Thomas A Hamilton. Synergy between interferon- and tumor necrosis factor- in transcriptional activation is mediated by cooperation between signal transducer and activator of transcription 1 and nuclear factor b. Journal of Biological Chemistry, 272(23):14899–14907, 1997.
  • Cao et al. [2018] Junyue Cao, Darren A Cusanovich, Vijay Ramani, Delasa Aghamirzaie, Hannah A Pliner, Andrew J Hill, Riza M Daza, Jose L McFaline-Figueroa, Jonathan S Packer, Lena Christiansen, et al. Joint profiling of chromatin accessibility and gene expression in thousands of single cells. Science, 361(6409):1380–1385, 2018.
  • Shuman et al. [2016] David I Shuman, Benjamin Ricaud, and Pierre Vandergheynst. Vertex-frequency analysis on graphs. Applied and Computational Harmonic Analysis, 40(2):260–291, 2016.
  • Pedregosa et al. [2011] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.

Appendix A Pearson correlation as a measure of matrix diagonality

Given an matrix of frequencies, we can measure its diagonality by the sample correlation of the rows with the columns, noting that for a perfectly diagonal matrix, the rows and columns will be identical and hence the correlation will be . First note that we can write the sample correlation of two draws from a random distribution as

(4)

We now adapt this definition to the setting where the samples are frequencies. Let be the all ones vector, and . Then we define the corresponding expressions in Equation 4 as

which gives the sample correlation for rows and columns of a matrix as

(5)

We note that in Figure 5, this measure of diagonality correlates well with the ratio of offdiagonal elements to diagonal elements given by when considering a range of signal to noise. In the following, we discuss this measure with respect to the number of filters .

Appendix B Filter count affects alignment diagonality

In Figure 5, we showed that for filters, bandlimited correlations outperform direct Fourier correlations robustly across a broad range of SNR. To identify the optimal number of filters for this example, we examined the relationship between diagonality (measured according to the off diagonal ratio and the Pearson Correlation (equation 5)). In this experiment (Figure 3), we performed a similar experiment of identical swiss rolls as D, instead fixing SNR and dyadically varying the number of filters between and .

Figure 3: Filter Count and Alignment Diagonality. Two identical swiss roll graphs were generated with vertices [Perraudin et al., 2014]. The ground truth mapping between these manifold is the identity matrix. Subsequently, signals were generated by sampling the identity matrix with random gaussian noise added at a signal to noise ratio of . Next, a set of itersine wavelets (x-axis) were used to obtain a harmonic alignment matrix and the off diagonal ratio (blue) and Pearson diagonal correlation (equation 5 , orange) was recorded. These metrics are inversely correlated and are different measurements of the diagonality of a matrix. While the off diagonal ratio decays as more filters are added, the Pearson diagonality peaks at .
Figure 4: An example of a low-frequency feature incurring artificially high correlation with a highly coherent harmonic over a sensor network . (a) Magnitude of the low-frequency input feature ; (b) Magnitude of the second harmonic of ; (c) Magnitude of the high-frequency 14th harmonic of with high coherence around the vertex on which is centered; (d) Magnitude of the Fourier transform of over . The second-largest Fourier coefficient corresponds to the 14th harmonic; (e) Correlation matrix of with the harmonics of contains spurious correlations with the 14th harmonic.

We note that, as discussed in Section A, the off diagonal ratio is inversely correlated with the Pearson correlation across the range of . However, while the off diagonal ratio continues to decay as , the Pearson correlation reaches a maxima at . This represents a tradeoff between the diagonal strictness of the correlations and the flexibility to find correlations in low SNR settings.

Appendix C Bandlimiting prevents spurious correlations caused by highly coherent eigenvectors

In Section 5, we mentioned that localization in the graph eigenbasis can lead to spurious correlations, which we proposed to eliminate by using spectral graph wavelets. To see an example of such a localized eigenbasis, we generated a random sensor graph of 20 nodes using gspbox [Perraudin et al., 2014]. Letting be the normalized Laplacian of this graph, we identified the vertex and eigenvector pair with the largest coherence for this graph using where is a dirac delta centered at the -th vertex and is the -th eigenvector of . In matrix form, this is . For this graph, and . We show this harmonic in Figure 4(c), which is very close a delta. Next, we generated a low frequency signal localized on vertex 8, and added a small impulse to it (Figure 4(a)). This signal’s Fourier transform is shown in Figure 4(d)); its two largest coefficients correspond to the second eigenvector (letting the trivial eigenvector be ; Figure 4(b)) and the 14th eigenvector. The contribution of these vectors to is apparent when one considers the vertex domain representations in Figure 4 (a-c). However, when the Fourier Transform of is used to generate correlations, it adds off diagonal correlations between the second eigenvector and the 14th eigenvector of . Such correlations are a deleterious product of coherence, which is a common feature of general graphs and signals Shuman et al. [2016].

Appendix D Demonstration of bandlimited correlations

To empirically measure the effect of partitioning on learned correlations, we demonstrate a simple example in Figure 5. We used a Swiss roll dataset (obtained from Perraudin et al. [2014]) to generate the ground truth graph . Then, we produced a second graph that is identical to , producing a ground truth mapping that is identity (). We can then measure how near an alignment is to ground truth by its “diagonality”. To measure this, we used (1) the ratio of the norm of off diagonal elements to the norm of diagonal elements of the alignment matrix and (2) the correlation of the columns and the rows of the matrix (see Appendix A for a discussion of this correlation). Next, we sampled 10 copies of the identity matrix with normally distributed noise at signal to noise ratios (SNR) over a covering of to produce signals at each SNR. At high SNR, these signals are close to identity and transformations will converge to the ground truth mapping for both bandlimited and non-limited alignments. At low SNR, the signals that result are incompatible and alignment for small is infeasible. Despite this, we observe that alignment with bandlimited filters (shown spectrally in Figure 5) is robust to extreme noise regimes. In Appendix B we discuss the selection of , which optimizes the Pearson correlation in this example. In Figure 5, we show the ground truth (identity), non-limited, and bandlimited alignments obtained for one realization at SNR. This illustrates the patterns obtained by bandlimited alignment, which imposes mapping between compact ranges of the spectrum.

(a)

Figure 5: Bandlimited correlations restricted by itersine wavelets more accurately recover signals close to the diagonal. \subrefsubfig:itersine_wavelets Itersine wavelets are defined in the spectral domain as a translated kernel of the Laplacian eigenvalues (see Section 5 and Equation 3 for kernel). Each scale is represented as a different color in this figure. The frame bounds are shown in black; itersine wavelets form a tight frame. \subrefsubfig:diagonality Alignment diagonality is robust to noise when alignment is performed with itersine wavelets. Top: Pearson diagonality correlation, see Appendix Section A. Bottom: Ratio of norm of off diagonal elements to the norm of diagonal elements. Blue: Non-limited correlations. Orange: Bandlimited correlations. See Section D for experimental discussion. \subrefsubfig:alignments Top: Ground truth mapping for a test case of aligning a swiss roll graph to itself. The mapping is identity. Middle: Non-limited correlation matrix of graph harmonics gives many spurious correlations and poorly approximates the ground truth. Bottom: Bandlimited correlation matrix more closely approximates the ground truth by limiting entries far off the diagonal.

Appendix E Transfer learning with alternative classifiers

Transfer learning classification was performed in sklearn [Pedregosa et al., 2011] using default parameters for the 5-nearest neighbours, linear SVM and naive Bayes classifiers. Figure 6 shows results for all three classifiers. Although the classifiers’ individual dependence on data imbalance varies, in all three cases harmonic alignment outperforms MAGAN and MNN. Specifically, k-nearest neighbour classification improves as the test set gets larger due to harmonic alignment’s increased robustness with more data (see Figure 1), and approaches the performance of linear SVM as the test set increases in size. On the other hand, naive Bayes increasingly overfits to the training set as the test set increases in size. Further, naive Bayes is highly sensitive to changes in data density, which causes near-random performance with MNN and MAGAN, neither of which preserves the intrinsic density of the data. In contrast to this, harmonic alignment strictly preserves data density and hence achieves significantly improved performance in comparison.

Figure 6: Transfer learning performance with various classifiers. For each ratio, 1K uncorrupted, labeled digits were sampled from MNIST, and then 1K, 2K, 4K, and 8K (x-axis) unlabeled points were sampled and corrupted with 35% column identity.

Appendix F Comparison to Wang and Mahadevan

Despite being a natural candidate for comparison to our method, unfortunately no standard implementation of the method proposed by Wang and Mahadevan [2009] is available. Our implementation of their method performed extremely poorly (worse than random) on the comparisons and is extremely computationally intensive. The method is therefore not shown in the main comparisons; however, for completeness, the results are shown in Figure 7.

Figure 7: Recovery of k-neighborhoods under feature corruption. Mean over 3 iterations is reported for each method. \subrefsubfig:compclassification_wang Lazy classification accuracy relative to input size with unlabeled randomly corrupted digits with 35% preserved pixels. \subrefsubfig:transferlearning_wang Transfer learning performance. For each ratio, 1K uncorrupted, labeled digits were sampled from MNIST, and then 1K, 2K, 4K, and 8K (x-axis) unlabeled points were sampled and corrupted with 35% column identity.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
375852
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description