Metric recovery from directed unweighted graphs

Metric recovery from directed unweighted graphs

Abstract

We analyze directed, unweighted graphs obtained from by connecting vertex to iff . Examples of such graphs include -nearest neighbor graphs, where varies from point to point, and, arguably, many real world graphs such as co-purchasing graphs. We ask whether we can recover the underlying Euclidean metric and the associated density given only the directed graph and .

We show that consistent recovery is possible up to isometric scaling when the vertex degree is at least . Our estimator is based on a careful characterization of a random walk over the directed graph and the associated continuum limit. As an algorithm, it resembles the PageRank centrality metric. We demonstrate empirically that the estimator performs well on simulated examples as well as on real-world co-purchasing graphs even with a small number of points and degree scaling as low as .

\externaldocument

[S]supplement \newrefformatSthmTheorem SLABEL:#1 \newrefformatSlemLemma SLABEL:#1 \newrefformatScorCorollary SLABEL:#1 \newrefformatSconjConjecture SLABEL:#1 \nipsfinalcopy

1 Introduction

Data for unsupervised learning is increasingly available in the form of graphs or networks. For example, we may analyze gene networks, social networks, or general co-occurrence graphs (e.g., built from purchasing patterns). While classical unsupervised tasks such as density estimation or clustering are naturally formulated for data in vector spaces, these tasks have analogous problems over graphs such as centrality and community detection. We provide a step towards unifying unsupervised learning by recovering the underlying density and metric directly from graphs.

We consider “unweighted directed geometric graphs” that are assumed to have been built from underlying (unobserved) points , . In particular, we assume that graphs are formed by drawing an arc from each vertex to its neighbors within distance . Note that the graphs are typically not symmetric since the distance (the -ball) may vary from point to point. By allowing to be stochastic, e.g., depend on the set of points, the construction subsumes also typical -nearest neighbor graphs. Arguably, graphs from top friends/products, or co-association graphs may also be approximated in this manner.

The key property of our family of geometric graphs is that their structure is completely characterized by two functions over the latent space: the local density and the local scale . Indeed, global properties such as the distances between points can be recovered by integrating these quantities. We show that asymptotic behavior of random walks on the directed graphs relate to the density and metric. In particular, we show that random walks on such graphs with minimal degree at least can be completely characterized in terms of and using drift-diffusion processes. This enables us to recover both the density and distance given only the observed graph and the (hypothesized) underlying dimension .

The fact that we may recover the density (up to isometry) is surprising. For example, in -nearest neighbor graphs, each vertex has degree exactly . There is no immediate local information about the density, i.e., whether the corresponding point lies in a high-density region with small ball radii, or in a low-density region with large ball radii. The key insight of this paper is that random walks over such graphs naturally drift toward higher density regions, allowing for density recovery.

While the paper is primarily focused on the theoretical aspects of recovering the metric and density, we believe our results offer useful strategies for analyzing real-world networks. For example, we analyzed the Amazon co-purchasing graph where an edge is drawn from an item to if is among the top co-purchased items with . These Amazon products may be co-purchased if they are similar enough to be complementary, but not so similar that they are redundant. We extend our model to deal with connectivity rules shaped like an annulus, and demonstrate that our estimator can simultaneously recover product similarities, product categories, and central products by metric embedding.

1.1 Relation to prior work

The density estimation problem addressed by this paper was proposed and partially solved by von Luxburg-Alamgir in [14] using integration of local density gradients over shortest paths. This estimator has since been used for drawing graphs with ordinal constraints in [14] and graph down-sampling in [1]. However, the recovery algorithm is restricted to -dimensional -nearest neighbor graphs under the constraint . Our paper provides an estimator that works in all dimensions, applies to a more general class of graphs, and strongly outperforms that of von Luxburg-Alamgir in practice.

On a technical level, our work has similarities to the analysis of convergence of graph Laplacians and random walks on manifolds in [16, 6]. For example, in [13], Ting-Huang-Jordan used infinitesimal generators to capture the convergence of a discrete Laplacian to its continuous equivalent on -nearest neighbor graphs. However, their analysis was restricted to the Laplacian and did not consider the latent recovery problem. In addition, our approach proves convergence of the entire random walk trajectory and allows us to analyze the stationary distribution function directly.

2 Main results and proof outline

2.1 Problem setup

Let be an infinite sequence of latent coordinate points drawn independently from a distribution with probability density in . Let be a radius function which may depend on the draw of . In this paper, we fix a single draw of and analyze the quenched setting. Let be the unweighted directed neighborhood graph with vertex set and with a directed edge from to if and only if .

Fix now a large . We consider the random directed graph model given by observing the single graph . The model is completely specified by the latent function and the possibly stochastic . Under the conditions () to be specified below, we solve the following problem:

  • Given only and , form a consistent estimate of and up to proportionality constants.

The conditions we impose on , , and the stationary density function of the simple random walk on are the following, which we refer to as (). We assume () holds throughout the paper.

  • The density is differentiable with bounded on a path-connected compact domain with smooth boundary .

  • There is a deterministic continuous function on and scaling constants satisfying

    so that, a.s. in the draw of , converges uniformly to .

  • The rescaled density functions are a.s. uniformly equicontinuous.

Remark.

We conjecture that the last condition in () holds for any and satisfying the other conditions in () (see \prettyrefSconj:holder).

Let denote the set of out-neighbors of so that is in if there is a directed edge from to . The second condition in () implies for all that

(1)

2.2 Statement of results

Our approach is based on the simple random walk on the graph . Let denote the stationary density of . We first show that when appropriately renormalized, converges to an explicit function of and .

Theorem 2.1.

Given (), a.s. in , we have

(2)

for the normalization constant .

Combining this result with an estimate on the out-degree of points in gives our general result on recovery of density and scale. Let be the volume of the unit -ball.

Corollary 2.2.

Assuming (), we have a.s. in that

Proof.

Immediate from the out-degree estimate and Theorem 2.1. ∎

Remark.

If is constant, every edge is bidirectional, so is proportional to the degree of , and we recover the standard -ball density estimator.

Our estimator for density closely resembles the PageRank algorithm without damping [10]. In particular, for the -nearest neighbor graph, it gives the same rank ordering as PageRank, and it reduces to PageRank as .

When specializing to the -nearest neighbor density estimation problem posed by von Luxburg-Alamgir in [14], we obtain the following.

Corollary 2.3.

If is selected via the -nearest neighbors procedure with and satisfies the first and last conditions in (), we have a.s in that

Proof.

By [4], the empirical induced by the -nearest neighbors procedure satisfies the second condition of () with

2.3 Outline of approach

Our proof proceeds via the following steps.

  1. As , the simple random walks on converge weakly to an Itô process , yielding weak convergence between stationary measures. (Theorem 3.4)

  2. The stationary density is explicitly determined via Fokker-Planck equation. (Lemma 4.1)

  3. Uniform equicontinuity of yields convergence in density after rescaling. (Theorem 2.1)

An intuitive explanation for our results is as follows. For large , the simple random walk on , when considered with its original metric embedding, closely approximates the behavior of a drift-diffusion process. Both the process and the approximating walk move preferentially toward regions where is large and diffuse more slowly out of regions where is small. Occupation times therefore give us information about and which allow us to recover them.

Formally, the convergence of to follows by verifying the conditions of the Stroock-Varadhan criterion (Theorem 3.1) for convergence of discrete time Markov processes to Itô processes [12]. This criterion states that if the variance , expected value , a higher order moments of a jump are continuous and well-controlled in the limit, then the process converges to an Itô process under mild technical conditions. By using the Fokker-Planck equation, we can express the stationary density of this Itô process solely in terms of and the out-degree . This allows us to estimate the density using only the unweighted graph.

Let and be the closure and boundary of the support of . Let be the ball of radius centered at . Let be the time rescaling necessary for to have timescale equal to that of .

3 Convergence of the simple random walk to an Itô process

We will verify the regularity conditions of the Stroock-Varadhan criterion (see [12, Section 6]).

Theorem 3.1 (Stroock-Varadhan).

Let be discrete-time Markov processes defined over a domain with boundary . Define the discrete time drift and diffusion coefficients by

If we have , , , and regularity conditions to ensure reflection at (\prettyrefSthm:tightness and \prettyrefSthm:stroock), the time-rescaled stochastic processes converge weakly in Skorokhod space to an Itô process with reflecting boundary condition

with a standard -dimensional Brownian motion and .

Remark.

The original result of Stroock-Varadhan was stated for for all finite ; our version for is equivalent by [15, Theorem 2.8].

The technical conditions of Theorem 3.1 enforcing reflecting boundary conditions are checked in \prettyrefSthm:C to \prettyrefSthm:B. We focus on convergence of the drift and diffusion coefficients.

Lemma 3.2 (Strong LLN for local moments).

For a function such that , given () we have uniformly on that

Proof.

Denote the claimed value of the limit by . For convergence in expectation, we condition on and apply iterated expectation to get

For , we have , so Hoeffding’s inequality yields

(3)

for by (1). Borel-Cantelli then yields a.s. convergence. ∎

Remark.

This limit holds even for stochastic as long as a.s. converges uniformly to a deterministic continuous . All statements up to \prettyrefeq:lln hold regardless of stochasticity of and the overall bound only requires convergence of . An example of such a graph is the -nearest neighbors graph.

We now compute the drift and diffusion coefficients in terms of and .

Theorem 3.3 (Drift diffusion coefficients).

Almost surely on the draw of , as , we have

where is the Kronecker delta function.

Proof.

By Lemma 3.2, , , and converge a.s. to their expectations, so it suffices to verify that the integrals in Lemma 3.2 have the claimed limits. Because is differentiable on , for any we have the Taylor expansion

of at , where the convergence is uniform on compact sets. For large so that lies completely inside , substituting this expansion into the definitions of , , and and integrating over spheres yields the result. Full details are in \prettyrefSthm:coefs. ∎

Theorem 3.4.

Under (), as a.s. in the draw of the process converges in to the isotropic -valued Itô process with reflecting boundary condition defined by

(4)
Proof.

Lemma 3.2 and Theorem 3.3 show that fulfills the conditions of Theorem 3.1. The result follows from the Stroock-Varadhan criterion using the drift and diffusion terms from Theorem 3.3. ∎

4 Convergence and computation of the stationary distribution

4.1 Graphs satisfying condition ()

The Itô process is an isotropic drift-diffusion process, so the Fokker-Planck equation [11] implies its density at time satisfies

(5)

where and are given by

Lemma 4.1.

The process defined by (4) has absolutely continuous stationary measure with density

where was defined in (2).

Proof.

By (5), to check that , it suffices to show

We now prove Theorem 2.1 by showing that a rescaling of converges to .

Proof of Theorem 2.1.

The a.s. convergence of processes of Theorem 3.4 implies by Ethier-Kurtz [5, Theorem 4.9.12] that the empirical stationary measures

converge weakly to the stationary measure for . For any and , weak convergence against yields

By uniform equicontinuity of , for any there is small enough so that for all we have

which implies that

Combining with Lemma 4.1 yields the desired

4.2 Extension to isotropic graphs

To obtain our stationary distribution in Theorem 2.1 we require only convergence to some Itô process via the Stroock-Varadhan criterion. We can achieve this under substantially more general conditions. We define a class of neighborhood graphs on termed isotropic over which we have consistent metric recovery without knowledge of the graph construction method.

Definition 1 (Isotropic).

A graph edge connection procedure on is isotropic if it satisfies:

Distance kernel:

The probability of placing a directed edge from to is defined by a kernel function mapping locally scaled distances

with obeying () to probabilities

Nonzero mass:

The kernel function has nonzero integral .

Bounded tails:

For all , .

Continuity:

The scaling of the stationary distribution is uniformly equicontinuous.

This class of graph preserves the property that the random graph is entirely determined by the underlying density and local scale ; this allows us to have the same tractable form for the stationary distribution.

Both constant and -nearest neighbor graphs are isotropic upon assumption of uniform equicontinuity. Another interesting class of graphs allowed by this generalization is truncated Gaussian kernels, where connectivity probability decreases exponentially. Note that might not be monotonic or continuous in ; one surprising example is , which deterministically connects points in an annulus.

Corollary 4.2 (Generalization).

If a neighborhood graph is isotropic, then the limiting stationary distribution follows Theorem 2.1, and the density and distances can be estimated by Corollary 2.2.

Proof.

We check the Stroock-Varadhan condition stated in Theorem 3.1. For this, we use a version of Lemma 3.2 for isotropic graphs, which requires that the ball radius vanishes and that the neighborhood size scales as .

Vanishing neighborhood radius follows because bounded tails and the fact that the kernel is evaluated on ensure the isotropic graph is a subgraph of the -ball graph. Kolmogorov’s strong law implies that the stochastic out-degree concentrates around its expectation. It has the correct scaling because the argument of is scaled by . See \prettyrefSthm:general-degree for details. Thus the analogue of Lemma 3.2 holds.

We then check that the limiting local moments for isotropic graphs are proportional to those of -ball graphs in \prettyrefSlem:polyint. All but one of the conditions for the Stroock-Varadhan criterion follow from this; the last \prettyrefSthm:f4 follows from the bounded ball structure of the connectivity kernel.

To check that we obtain the same limiting process and stationary measure, note the ratios of integrals in Theorem 3.3 are unchanged in the isotropic setting. See \prettyrefSlem:polyint for details. Recovering the stationary distribution, density, and local scale is then done in the same manner as in the -ball setting. ∎

5 Distance recovery via paths

Our results in Theorem 2.1 give a consistent estimator for the density and the local scale . These two quantities specify up to isometry the latent metric embedding of .

In order to reconstruct distances between non-neighbor points we weight the edges of by weights and find the shortest paths over this graph, which we call . The results of Alamgir-von Luxburg [2, Section 4.1] show that in the -nearest neighbor graph case, setting for the estimator of results in consistent recovery of pairwise distances.

In \prettyrefSthm:dist, we give a straightforward extension of this approach to show that given any uniformly convergent estimator of , the shortest path on the weighted graph converges to the geodesic distance. Applying standard metric multidimensional scaling then allows us to embed these distances and recover the latent space up to isometry.

Figure 1: Accuracy vs sample and neighborhood size. Path integral (green, maroon) is from Alamgir-von Luxburg [14]. Our estimator (red, blue, black) is nearly perfect at all sample sizes and neighborhood sizes.
Figure 2: Examples of four density estimates: our method (red) using no metric information is indistinguishable from metric -nearest neighbor (blue) and close to ground truth (black). Path integral estimator of Alamgir-von Luxburg [14] (green) shows higher error in all cases.

6 Empirical results

We demonstrate extremely good finite sample performance of our estimator in simulated density reconstruction problems and two real-world datasets. Some details such as exact graph degrees and distribution parameters are in the supplementary code which reproduces all figures in this paper. Standard graph statistics such as centrality and Jaccard index are calculated via the igraph package [3].

-nearest neighbor graphs

We compared our random-walk based estimator and the path-integral based estimator of von Luxburg-Alamgir [14] to the metric -nearest neighbor density estimator. The number of samples was varied from to along with the sparsity level (Figure 2).

While our theoretical results suggest that both our algorithm and the path-integral estimator of von Luxburg-Alamgir [14] might fail to converge at and sparsity levels, in practice our estimator performs nearly perfectly at both low sparsity levels.

For constant degree we achieve near-perfect performance for all choices of , while the path-integral estimator fails to converge in the regime.

Some specific examples of our density estimator with are shown in Figure 2. The examples are mixture of uniforms (left), mixture of Gaussians (center), and -distribution (right). As predicted, our estimator tracks extremely closely with the metric -nearest neighbor estimator (red and blue), as well as the true density (black). The path integral estimator has high estimate variance at points with large density and fails to cope with the two mixture densities.

Varying the dimension for an isotropic multivariate normal with , we find that a large number of points are required to maintain high accuracy as grows large (red and blue lines in Figure 4). However, this is due to a global ‘flattening’ of the density. Measuring the correlation between the true and estimated log probabilities show that up to a global concentration parameter, the estimator maintains high accuracy across a large number of dimensions (black lines).

Figure 3: Estimate performance degrades in high dimensions due to over-smoothing (blue and red), but the estimator is still highly accurate up to log concentration parameter (black).
Figure 4: Example isotropic graphs. Our estimator (black) agrees with the true density (red) in all cases. Degree and stationary distribution (green and maroon) based density estimates work for some cases (right two panels) but cannot work if the degree is tied to spatial location (left).

Kernel graphs

We validate the nonparametric estimator in Corollary 4.2 for kernel graphs by constructing three drastically different kernel graphs. In all cases, we sampled 5000 points with the connection probability following . We varied the neighborhood structure in three ways: a constant kernel, ; -nearest neighbor kernel: ; and spatially varying kernel .

In Figure 4, we find that our nonparametric estimator (black) always matches the ground truth (red). This example also shows that both the degree and the stationary distribution can be valid density estimators under certain assumptions, but only our estimator can deal with arbitrary isotropic graph construction methods without assumptions.

Figure 5: Reconstruction closely matches projection of the true metric.
Figure 6: Distances estimated by our method are globally close to the true metric.
Figure 7: Items close in our weighted graph (bottom) are more similar than those under the Jaccard index (top).

Metric recovery on real data

As an example of metric reconstruction, we take the first 2000 examples in the U.S. postal service (USPS) digits dataset [7] and construct an unweighted -nearest neighbor graph. We use our method to reconstruct the metric and perform similarity queries, and the Jaccard index was used to tie-break direct neighbors.

The USPS digits dataset is known to have a high-density cluster of ones digits (orange). Results in Figure 7 show that we are able to successfully recover the density structure of the data (top). Inter-point distances estimated by our method (Figure 7, -axis) show nearly linear agreement to the true metric (-axis) at short distances and high similarity globally.

Performing a similarity query on the data (Figure 7) shows that the our reconstructed distances (bottom row) have a more coherent set of similar digits when compared to the Jaccard index (top row) [8]. The behavior of the unweighted Jaccard similarity is due to a known problem with shortest paths in -nearest neighbor graphs preferring low density regions [14].

Amazon co-purchasing data

Figure 8: Density estimates in the graph correlate well with sales rank, unlike other measures of centrality.
Figure 9: Embeddings from estimated distances recover the separation between different product categories.
Classics Literature Classical music Philosophy
The Prince The Stranger Beethoven: Symphonien Nos. 5 & The Practice of Everyday Life
The Communist Manifesto The Myth of Sisyphus Mozart: Symphonies Nos. 35-41 The Society of the Spectacle
The Republic The Metamorphosis Mozart: Violin Concertos The Production of Space
Wealth of Nations Heart of Darkness Tchaikovsky: Concerto No. 1/Rac Illuminations
On War The Fall Beethoven: Symphonies Nos. 3 & Space and Place: The Perspectiv
Table 1: Top 4 clusters formed by mapping each item to its mode (first row). Each group is a coherent genre.

Finally, we recover density and metric on a real network dataset with no ground truth. We analyzed the largest connected component of the Amazon co-purchasing network dataset [9]. Each vertex is a product on amazon.com along with its category and sales rank, and each directed edge represents a co-purchasing recommendation of the form “person who bought also bought .” This dataset naturally fulfills our assumptions of having edges that are asymmetric, where edges represent a notion of similarity in some space.

The items that lie in regions of highest density should be archetypal products for a category, and therefore be more popular. We show that the density estimates using our method with show a strong positive association between density and sales (Figure 9). We found that this effect persisted regardless of choice of . Other popular measures of network centrality such as betweenness and closeness fail to display this effect.

We then attempted metric recovery using our random walk based reconstruction (Figure 9). For visualization purposes, we used multidimensional scaling on the recovered metric to embed points belonging to categories with at least two hundred items. The embedding shows that our method captures the separation across different product categories. Notably, nonfiction and history have substantial overlap as expected, while classical music CD’s and computer science books have little overlap with the other clusters.

Analyzing the modes of the density estimate by clustering each point to its local mode, we find coherent clusters where top items serve as archetypes for the cluster (Table 1). This suggests that there may be a close connection between clustering in a metric space and community detection in network data. The overall performance of our method on density estimation and metric recovery for the Amazon dataset suggests that when a metric assumption is appropriate, our random walk based metric quantities can be used directly for centrality and cluster estimates on a network.

7 Conclusions

We have presented a simple explicit identity linking the stationary distribution of a random walk on a neighborhood graph to the density and neighborhood size.

The density estimator constructed by inverting this identity shows an extremely rapid convergence to the metric -nearest neighbor density estimator across a range of data point count, sparsity level, and distribution type (Figures 2,2). We also generalized the theorem to a large class of graph construction techniques and demonstrated that the choice of construction technique matters little for accuracy (Figures 4).

Our estimator performed well on real-world data, recovering underlying metric information in test data (Figures 7,7) and predicting popular Amazon products through density estimates (Figure 9).

There are several open questions left unanswered by our work. Our results required that the graphs be of degree rather than the required for connectivity. Our simulation results seem to suggest than even near the regime our estimator performs nearly perfectly, suggesting that the true degree lower bound may be much lower.

The close connection of our density estimate to PageRank suggests that combining the latent spatial map with vector space estimates may lead to highly effective and theoretically principled network algorithms.

References

  1. M. Alamgir, G. Lugosi, and U. von Luxburg. Density-preserving quantization with application to graph downsampling. In COLT, 2014.
  2. M. Alamgir and U. V. Luxburg. Shortest path distance in random k-nearest neighbor graphs. In Proceedings of the 29th International Conference on Machine Learning (ICML-12), pages 1031–1038, 2012.
  3. G. Csardi and T. Nepusz. The igraph software package for complex network research. InterJournal, Complex Systems:1695, 2006.
  4. L. P. Devroye and T. Wagner. The strong uniform consistency of nearest neighbor density estimates. The Annals of Statistics, pages 536–540, 1977.
  5. S. N. Ethier and T. G. Kurtz. Markov processes: characterization and convergence. John Wiley & Sons, 1986.
  6. M. Hein, J.-y. Audibert, U. V. Luxburg, and S. Dasgupta. Graph Laplacians and their convergence on random neighborhood graphs. Journal of Machine Learning Research, page 2007, 2006.
  7. J. J. Hull. A database for handwritten text recognition research. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 16(5):550–554, 1994.
  8. P. Jaccard. Étude comparative de la distribution florale dans une portion des Alpes et des Jura. Bulletin del la Société Vaudoise des Sciences Naturelles, 37:547–579, 1901.
  9. J. Leskovec, L. A. Adamic, and B. A. Huberman. The dynamics of viral marketing. ACM Transactions on the Web (TWEB), 1(1):5, 2007.
  10. L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: Bringing order to the web. 1999.
  11. H. Risken. Fokker-Planck Equation. Springer, 1984.
  12. D. Stroock and S. Varadhan. Diffusion processes with boundary conditions. Communications on Pure and Applied Mathematics, 24:147–225, 1971.
  13. D. Ting, L. Huang, and M. I. Jordan. An analysis of the convergence of graph Laplacians. In Proceedings of the 27th International Conference on Machine Learning (ICML-10), pages 1079–1086, 2010.
  14. U. Von Luxburg and M. Alamgir. Density estimation from unweighted -nearest neighbor graphs: a roadmap. In Advances in Neural Information Processing Systems, pages 225–233. Springer, 2013.
  15. W. Whitt. Some useful functions for functional limit theorems. Math. Oper. Res., 5(1):67–85, 1980.
  16. W. Woess. Random walks on infinite graphs and groups - a survey on selected topics. Bull. London Math. Soc, 26:1–60, 1994.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
122381
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description