Single Pass Spectral Sparsification in Dynamic Streams

Single Pass Spectral Sparsification in Dynamic Streams

      Michael Kapralov
MIT
kapralov@mit.edu
   Yin Tat Lee
MIT
yintat@mit.edu
   Cameron Musco
MIT
cnmusco@mit.edu
      Christopher Musco
MIT
cpmusco@mit.edu
   Aaron Sidford
MIT
sidford@mit.edu
Abstract

We present the first single pass algorithm for computing spectral sparsifiers of graphs in the dynamic semi-streaming model. Given a single pass over a stream containing insertions and deletions of edges to a graph , our algorithm maintains a randomized linear sketch of the incidence matrix of into dimension . Using this sketch, at any point, the algorithm can output a spectral sparsifier for with high probability.

While space algorithms are known for computing cut sparsifiers in dynamic streams [AGM12b, GKP12] and spectral sparsifiers in insertion-only streams [KL11], prior to our work, the best known single pass algorithm for maintaining spectral sparsifiers in dynamic streams required sketches of dimension [AGM13].

To achieve our result, we show that, using a coarse sparsifier of and a linear sketch of ’s incidence matrix, it is possible to sample edges by effective resistance, obtaining a spectral sparsifier of arbitrary precision. Sampling from the sketch requires a novel application of sparse recovery, a natural extension of the methods used for cut sparsifiers in [AGM12b]. Recent work of [MP12] on row sampling for matrix approximation gives a recursive approach for obtaining the required coarse sparsifiers.

Under certain restrictions, our approach also extends to the problem of maintaining a spectral approximation for a general matrix given a stream of updates to rows in .

1 Introduction

1.1 The Dynamic Semi-Streaming Model

When processing massive graph datasets arising from social networks, web topologies, or interaction graphs, computation may be as limited by space as it is by runtime. To cope with this issue, one might hope to apply techniques from the streaming model of computation, which restricts algorithms to few passes over the input and space polylogarithmic in the input size. Streaming algorithms have been studied extensively in various application domains – see [Mut05] for an overview. However, the model has proven too restrictive for even the simplest graph algorithms. For example, testing - connectivity requires space [HRR99].

The less restrictive semi-streaming model, in which the algorithm is allowed space, is more suited for graph algorithms [FKM05], and has received significant attention in recent years. In this model, a processor receives a stream of edges over a fixed set of nodes. Ideally, the processor should only have to perform a single pass (or few passes) over the edge stream, and the processing time per edge, as well as the time required to output the final answer, should be small.

In the dynamic semi-streaming model, the graph stream may include both edge insertions and deletions [AGM12a]. This extension captures the fact that large graphs are unlikely to be static. Dynamic semi-streaming algorithms allow us to quickly process general updates in the form of edge insertions and deletions to maintain a small-space representation of the graph from which we can later compute a result. Sometimes the dynamic model is referred to as the insertion-deletion model, in contrast to the more restrictive insertion-only model.

Work on semi-streaming algorithms in both the dynamic and insertion-only settings is extensive. Researchers have tackled connectivity, bipartiteness, minimum spanning trees, maximal matchings, and spanners among other problems [FKM05, ELMS11, Elk11, AGM12a, AGM12b]. In [McG14], McGregor surveys much of this progress and provides a more complete list of citations.

1.2 Streaming Sparsification

There has also been a focus on computing general purpose graph compressions in the streaming setting. The goal is to find a subgraph of an input graph that has significantly fewer edges than , but still maintains important properties of the graph. Hopefully, this sparsified graph can be used to approximately answer a variety of questions about with reduced space and time complexity. Typically, the goal is to find a subgraph with just edges in comparison to the possible edges in .

First introduced by Benczúr and Karger [BK96], a cut sparsifier of a graph is a weighted subgraph with only edges that preserves the total edge weight over every cut in to within a multiplicative factor. Cut sparsifiers can be used to compute approximations for minimum cut, sparsest cut, maximum flow, and a variety of other problems over . In [ST11], Spielman and Teng introduce the stronger spectral sparsifier, a weighted subgraph whose Laplacian spectrally approximates the Laplacian of . In addition to maintaining the cut approximation of Benczúr and Karger, spectral sparsifiers can be used to approximately solve linear systems over the Laplacian of , and to approximate effective resistances, spectral clusterings, random walk properties, and a variety of other computations.

The problem of computing graph sparsifiers in the semi-streaming model has received a lot of attention. Given just space, the hope is to compute a sparsifier using barely more space than required to store the sparsifier, which will typically have edges. Ahn and Guha give the first single pass, insertion-only algorithm for cut sparsifiers [AG09]. Kelner and Levin give a single pass, insertion-only algorithm for spectral sparsifiers [KL13]. Both algorithms store a sparse graph: edges are added as they are streamed in and, when the graph grows too large, it is resparsified. The construction is very clean, but inherently does not extend to the dynamic model since, to handle edge deletions, we need more information than just a sparsifier itself. Edges eliminated to create an intermediate sparsifier may become critically important later if other edges are deleted, so we need to maintain information that allows recovery of such edges.

Ahn, Guha, and McGregor make a very important insight in [AGM12a], demonstrating the power of linear graph sketches in the dynamic model. They present the first dynamic algorithm for cut sparsifiers, which initially required space and passes over the graph stream. However, the result was later improved to a single pass and space [AGM12b, GKP12]. Our algorithm extends the sketching and sampling approaches from these papers to the spectral problem.

In [AGM13], the authors show that linear graph sketches that capture connectivity information can be used to coarsely approximate spectral properties and they obtain spectral sparsifiers using space in the dynamic setting. However, they also show that their coarse approximations are tight, so a new approach is required to obtain spectral sparsifiers using just space. They conjecture that a dynamic algorithm for doing so exists. The development of such an algorithm is also posed as an open question in [McG14]. A two-pass algorithm for constructing a spectral sparsifier in the dynamic streaming model using space is presented in [KW14]. The approach is very different from ours: it leverages a reduction from spanner constructions to spectral sparsification presented in [KP12]. It is not known if this approach extends to a space efficient single pass algorithm.

1.3 Our Contribution

Our main result is an algorithm for maintaining a small graph sketch from which we can recover a spectral sparsifier. For simplicity, we present the algorithm in the case of unweighted graphs. However, in Section 6, we show that it is easily extended to weighted graphs. This model matches what is standard for dynamic cut sparsifiers [AGM12b, GKP12].

Theorem 1 (Main Result).

There exists an algorithm that, for any , processes a list of edge insertions and deletions for an unweighted graph in a single pass and maintains a set of linear sketches of this input in space. From these sketches, it is possible to recover, with high probability, a weighted subgraph with edges such that is a spectral sparsifier of . The algorithm recovers in time.

It is well known that independently sampling edges from a graph according to their effective resistances (i.e. leverage scores) gives a spectral sparsifier of with edges [SS11]. We can ‘refine’ any coarse sparsifier for by using it to approximate effective resistances and then resample edges according to these approximate resistances. We show how to perform this refinement in the streaming setting, extending graph sketching techniques initially used for cut sparsifiers ([AGM12b, GKP12]) and introducing a new sampling technique based on an heavy hitters algorithm. Our refinement procedure is combined with a clever recursive method for obtaining a coarse sparsifier introduced by Miller and Peng in a recent paper on iterative row sampling for matrix approximation [MP12].

The fact that our algorithm maintains a linear sketch of the streamed graph allows for the simple handling of edge deletions, which are treated as negative edge insertions. Additionally, due to their linearity, our sketches are composable – sketches of subgraphs can simply be added to produce a sketch of the full graph. Thus, our techniques are directly applicable in distributed settings where separate processors hold different subgraphs or each processes different edge substreams.

Our application of linear sketching also gives a nice information theoretic result on graph compression. A spectral sparsifier is a powerful compression for a graph. It maintains, up to an factor, all spectral information about the Laplacian using just space. At first glance, it may seem that such a compression requires careful analysis of the input graph to determine what information to keep and what to discard. However, the non-adaptive linear sketches used in our algorithm are completely oblivious: at each edge insertion or deletion, we do not need to examine the current compression at all to make the appropriate update. As in sparse recovery or dimensionality reduction, we essentially just multiply the vertex edge incidence matrix by a random projection matrix, decreasing its height drastically in the process. Nevertheless, the oblivious compression obtained holds as much information as a spectral sparsifier – in fact, we show how to extract a spectral sparsifier from it! Furthermore, the compression is only larger than by log factors. Our result is the first of this kind in the spectral domain. The only other streaming algorithm for spectral sparsification that uses space is distinctly non-oblivious [KL13] and oblivious subspace embeddings for compressing general matrices inherently require space, even when the matrix is sparse (as in the case of an edge vertex incidence matrix) [Sar06, CW13, MM13, NN13].

Finally, it can be noted that our proofs rely very little on the fact that our data stream represents a graph. We show that, with a few modifications, given a stream of row updates for a general structured matrix , it is possible to maintain a sized sketch from which a spectral approximation to can be recovered. By structured, we mean any matrix whose rows are selected from some fixed dictionary of size . Spectral graph sparsification is a special case of this problem: set to be the vertex edge incidence matrix of our graph. The dictionary is the set of all possible edge rows that may appear in and is the graph Laplacian.

1.4 Road Map

Section 2

Lay out notation, build linear algebraic foundations for spectral sparsification, and present lemmas for graph sampling and sparse recovery required by our algorithm.

Section 3

Give an overview of our central algorithm, providing intuition and motivation.

Section 4

Present an algorithm of Miller and Peng ([MP12]) for building a chain of coarse sparsifiers and prove our main result, assuming a primitive for sampling edges by effective resistance in the streaming model.

Section 5

Develop this sampling primitive, our main technical contribution.

Section 6

Show how to extend the algorithm to weighted graphs.

Section 7

Show how to extend the algorithm to general structured matrices.

Section 8

Remove our assumption of fully independent hash functions, using a pseudorandom number generator to achieve a final small space algorithm.

2 Notation and Preliminaries

2.1 Graph Notation

Let be the vertex edge incidence matrix of the undirected, unweighted complete graph over vertices. , the row corresponding to edge contains a in column , a in column , and ’s elsewhere.

We write the vertex edge incidence matrix of an unweighted, undirected graph as where is an diagonal matrix with ones at positions corresponding to edges contained in and zeros elsewhere.111Typically rows of that are all are removed, but we find this formulation more convenient for our purposes. The Laplacian matrix of is given by .

2.2 Spectral Sparsification

For any matrix , is a spectral sparsifier of if, , . This condition can also be written as where indicates that is positive semidefinite. More succinctly, denotes the same condition. We also use the slightly weaker notation to indicate that for all in the row span of . If has the same row span as this notation is equivalent to the initial notion of spectral sparsification.

While these definitions apply to general matrices, for our purposes, is typically the vertex edge incidence matrix of a graph and is a graph Laplacian. We do not always require our approximation to be the graph Laplacian of a weighted subgraph, which is a standard assumption. For this reason, we avoid the standard notation for the Laplacian. For our purposes, is always be a sparse symmetric diagonally dominant matrix with no more than non-zero entries. In fact, it will always be the Laplacian of a sparse subgraph, but possibly with weight added to its diagonal entries. Furthermore, the final approximation returned by our streaming algorithm will be a bonafide spectral graph sparsifier – i.e. the Laplacian matrix of a weighted subgraph of .

2.3 Leverage Scores and Row Sampling

For any with rank , consider the reduced singular value decomposition, . and have orthonormal columns and is diagonal and contains the non-zero singular values of . Then, . We let denote the Moore-Penrose pseudoinverse of :

The leverage score, , for a row in is defined as

The last inequality follows from the fact that every row in a matrix with orthonormal columns has norm less than 1. In a graph, , where is the effective resistance of edge and is the edge’s weight. Furthermore,

It is well known that by sampling the rows of according to their leverage scores it is possible to obtain a matrix such that with high probability. Furthermore, if obtaining exact leverage scores is computationally difficult, it suffices to sample by upper bounds on the scores. Typically, rows are sampled with replacement with probability proportional to their leverage score [SS11, LMP13]. We require an alternative procedure for sampling edges independently.

Lemma 1 (Spectral Approximation via Leverage Score Sampling).

Let be a vector of leverage score overestimates for ’s rows such that for all . For and fixed constant , define the sampling probability for row to be . Define a diagonal sampling matrix with with probability and otherwise. With high probability,

Furthermore, has non-zeros with high probability.

A proof of Lemma 1 based on a matrix concentration result from [Tro12] can be found in [CLM15] (Lemma 4). Note that, when applied to the vertex edge incidence matrix of a graph, leverage score sampling is equivalent to effective resistance sampling, as introduced in [SS11] for graph sparsification.

2.4 Sparse Recovery

While we cannot sample by leverage score directly in the streaming model, we can use a sparse recovery primitive to sample edges from a set of linear sketches. We use an heavy hitters algorithm that, for any vector , lets us recover from a small linear sketch , the index and the approximate value of for all such that .

Lemma 2 ( Heavy Hitters).

For any , there is a decoding algorithm and a distribution on matrices in such that, for any , given , the algorithm returns a vector such that has non-zeros and satisfies

with probability over the choice of . The sketch can be maintained and decoded in space.

This procedure allows us to distinguish from a sketch whether or not a specified entry in is equal to 0 or has value . We give a proof of Lemma 2 in Appendix A

3 Algorithm Overview

Before formally presenting a proof of our main result, Theorem 1, we give an informal overview of the algorithm to provide intuition.

3.1 Effective Resistances

As explained in Section 2.3, spectral sparsifiers can be generated by sampling edges, i.e. rows of the vertex edge incidence matrix. For an unweighted graph , each edge is sampled independently with probability proportional to its leverage score, . After sampling, we reweight and combine any sampled edges. The result is a subgraph of containing, with high probability, edges and spectrally approximating .

If we view as an electrical circuit, with each edge representing a unit resistor, the leverage score of an edge is equivalent to its effective resistance. This value can be computed by forcing unit of current out of vertex and unit of current into vertex . The resulting voltage difference between the two vertices is the effective resistance of . Qualitatively, if the voltage drop is low, there are many low resistance (i.e. short) paths between and . Thus, maintaining a direct connection between these vertices is less critical in approximating , so is less likely to be sampled. Effective resistance can be computed as:

Note that can be computed for any pair of vertices, , or in other words, for any possible edge in . We can evaluate even if is not present in the graph. Thus, we can reframe our sampling procedure. Instead of just sampling edges actually in , imagine we run a sampling procedure for every possible . When recombining edges to form a spectral sparsifier, we separately check whether each edge is in and only insert into the sparsifier if it is.

3.2 Sampling in the Streaming Model

With this procedure in mind, a sampling method that works in the streaming setting requires two components. First, we need to obtain a constant factor approximation to for any . Known sampling algorithms, including our Lemma 1, are robust to this level of estimation. Second, we need to compress our edge insertions and deletions in such a way that, during post-processing of our sketch, we can determine whether or not a sampled edge actually exists in .

The first requirement is achieved through the recursive procedure given in [MP12]. We will give the overview shortly but, for now, assume that we have access to a coarse sparsifier, . Computing gives a 2 factor multiplicative approximation of for each . Furthermore, as long as has sparsity , the computation can be done in small space using an iterative system solver (e.g. conjugate gradient) or a nearly linear time solver for symmetric diagonally dominant matrices (e.g. [KMP11]).

Solving part two (determining which edges are actually in ) is a bit more involved. As a first step, consider writing

Referring to Section 2, recall that is exactly the same as a standard vertex edge incidence matrix except that rows in corresponding to nonexistent edges are zeroed out instead of removed. Denote . Each nonzero entry in contains the voltage difference across some edge (resistor) in when one unit of current is forced from to .

When is not in , then the entry of , is . If is in , . Furthermore, . Given a space allowance of , the sparse recovery algorithm from Lemma 2 allows us to recover an entry if it accounts for at least an fraction of the total norm. Currently, , which could be much smaller than . However, suppose we had a sketch of with all but a fraction of edges randomly sampled out. Then, we would expect and thus, and sparse recovery would successfully indicate whether or not . What’s more, randomly zeroing out entries of can serve as our main sampling routine for edge . This process will set with probability , exactly what we wanted to sample by in the first place!

However, how do we go about sketching every appropriately sampled ? Well, consider subsampling our graph at geometrically decreasing rates, for . Maintain linear sketches of the vertex edge incidence matrix for every subsampled graph using the sparse recovery sketch distribution from Lemma 2. When asked to output a spectral sparsifier, for every possible edge , we compute using a rate that approximates .

Since each sketch is linear, we can just multiply on the right by to compute

where is sampled at rate . Then, as explained, we can use our sparse recovery routine to determine whether or not is present. If it is, we have obtained a sample for our spectral sparsifier!

3.3 A Chain of Coarse Sparsifiers

The final required component is access to some sparse . This coarse sparsifier is obtained recursively by constructing a chain of matrices, each weakly approximating the next. Specifically, imagine producing by adding a fairly light identity matrix to . As long as the identity’s weight is small compared to ’s spectrum, approximates . Add even more weight to the diagonal to form . Again, as long as the increase is small, approximates . We continue down the chain until , which will actually have a heavy diagonal after all the incremental increases. Thus, can be approximated by an appropriately scaled identity matrix, which is clearly sparse. Miller and Peng show that parameters can be chosen such that [MP12].

Putting everything together, we maintain sketches for . We first use a weighted identity matrix as a coarse approximation for , which allows us to recover a good approximation to from our sketch. This approximation will in turn be a coarse approximation for , so we can recover a good sparsifier of . Continuing up the chain, we eventually recover a good sparsifier for our final matrix, .

4 Recursive Sparsifier Construction

In this section, we formalize a recursive procedure for obtaining a chain of coarse sparsifiers that was introduced by Miller and Peng – “Introduction and Removal of Artificial Bases” [MP12]. We prove Theorem 1 by combining this technique with the sampling algorithm developed in Section 5.

Theorem 2 (Recursive Sparsification – [Mp12], Section 4).

Consider any PSD matrix with maximum eigenvalue bounded from above by and minimum non-zero eigenvalue bounded from below by . Let . For , define

So, and . Then the chain of PSD matrices, with

satisfies the following relations:

  1. ,

  2. for all ,

  3. .

When is the Laplacian of an unweighted graph, its largest eigenvalue and its smallest non-zero eigenvalue . Thus the length of our chain, , is .

For completeness, we include a proof of Theorem 2 in Appendix B. Now, to prove our main result, we need to state the sampling primitive for streams that we develop in Section 5. This procedure maintains a linear sketch of a vertex edge incidence matrix , and using a coarse sparsifier of , performs independent edge sampling as required by Lemma 1, to obtain a better sparsifier of .

Theorem 3.

Let be the vertex edge incidence matrix of an unweighted graph , specified by an insertion-deletion graph stream. Let be a fixed parameter and consider . For any , there exists a sketching procedure that outputs an sized sketch . There exists a corresponding recovery algorithm RefineSparsifier running in space, such that, if is a spectral approximation to with non-zeros and for some constant then:

returns, with high probability, , where , and contains only reweighted rows of with high probability. RefineSparsifier runs in time.

Using this sampling procedure, we can initially set and use it obtain a sparsifier for from a linear sketch of . This sparsifier is then used on a second sketch of to obtain a sparsifier for , and so on. Working up the chain, we eventually obtain a sparsifier for our original . While sparsifier recovery proceeds in several levels, we construct all required sketches in a single pass over edge insertions and deletions. Recovery is performed in post-processing.

Proof of Theorem 1.

Let be the Laplacian of our graph . Process all edge insertions and deletions, using MaintainSketches to produce a sketch, for each . We then use Theorem 3 to recover an approximation, , for any given an approximation for . First, consider the base case, . Let:

By Theorem 2, Relation 3:

Thus, with high probability, and contains entries.

Now, consider the inductive case. Suppose we have some such that . Let:

By Theorem 2, Relation 2:

Furthermore, by assumption we have the inequalities:

Thus:

So, with high probability RefineSparsifier returns such that and contains just nonzero elements. It is important to note that there is no “compounding of error” in this process. Every is an approximation for . Error from using instead of is absorbed by a constant factor increase in the number of rows sampled from . The corresponding increase in sparsity for does not compound – in fact Theorem 3 is completely agnostic to the sparsity of the coarse approximation used.

Finally, to obtain a bonafide graph sparsifier (a weighted subgraph of our streamed graph), let:

As in the inductive case,

Thus, it follows that, with high probability, has sparsity and . Since we set to 0 for this final step, simply equals for some that contains reweighted rows of . Any vector in the kernel of is in the kernel of , and thus any vector in the kernel of is in the kernel of . Thus, we can strengthen our approximation to:

We conclude that is the Laplacian of some graph containing reweighted edges and approximating spectrally to precision . Finally, note that we require recovery steps, each running in time. Thus, our total recovery time is . ∎

5 Streaming Row Sampling

In this section, we develop the sparsifier refinement routine required for Theorem 1.

Proof of Theorem 3.

Outside of the streaming model, given full access to rather than just a sketch it is easy to implement RefineSparsifier via leverage score sampling. Letting denote appending the rows of one matrix to another, we can define , so . Since and , for any row of we have

Let be the leverage score of approximated using . Let be the vector of approximate leverage scores, with the leverage scores of the rows corresponding to rounded up to . While not strictly necessary, including rows of the identity with probability will simplify our analysis in the streaming setting. Using this in Lemma 1, we can obtain with high probability. Since , we can write , where contains reweighted rows of with high probability.

The challenge in the semi-streaming setting is actually sampling edges given only a sketch of . The general idea is explained in Section 3, with detailed pseudocode included below.

Streaming Sparsifier Refinement

:
  1. For let be a uniform hash function. Let be with all rows except those with zeroed out. So is with rows sampled independently at rate . is simply .

  2. Maintain sketchs where are drawn from the distribution from Lemma 2 with .

  3. Output all of these sketches stacked: .



:
  1. Compute for each .

  2. For every edge in the set of possible edges:

    1. Compute and , where is the oversampling constant from Lemma 1. Choose such that .

    2. Compute and run the heavy hitters algorithm of Lemma 2. Determine whether or not or by checking whether the returned .

    3. If it is determined that set .

  3. Output .

We show that every required computation can be performed in the dynamic semi-streaming model and then prove the correctness of the sampling procedure.

Implementation in the Semi-Streaming Model.

Assuming access to uniform hash functions, MaintainSketches requires space in total and can be implemented in the dynamic streaming model. When an edge insertion comes in, use to compute which ’s should contain the inserted edge, and update the corresponding sketches. For an edge deletion, simply update the sketches to add to each appropriate .

Unfortunately, storing uniform hash functions over requires space, and is thus impossible in the semi-streaming setting. If Section 8 we show how to cope with this issue by using a small-seed pseudorandom number generator.

Step 1 of RefineSparsifier can also be implemented in space. Since has non-zeros and has rows, computing requires linear system solves in . We can use an iterative algorithm or a nearly linear time solver for symmetric diagonally dominant matrices to find solutions in space total.

For step 2(a), the chosen to guarantee could in theory be larger than the index of the last sketch maintained. However, if we take samplings, our last will be empty with high probability. Accordingly, all samplings for higher values of can be considered empty as well and we can just skip steps 2(b) and 2(c) for such values of . Thus, sampling levels are sufficient.

Finally, by our requirement that is able to compute factor leverage score approximations, with high probability, Step 2 samples at most edges in total (in addition to selecting identity edges). Thus, the procedure’s output can be stored in small space.

Correctness

To apply our sampling lemma, we need to show that, with high probability, RefineSparsifier independently samples each row of with probability where . Since the algorithm samples the rows of with probability , and since for all , by Lemma 1, with high probability, is a spectral sparsifier for . Furthermore, contains reweighted rows of .

In RefineSparsifier, an edge is only included in if it is included in the where

The probability that is included in the sampled matrix is simply , and sampling is done independently using uniform hash functions. So, we just need to show that, with high probability, any included in its respective is recovered by Step 2(b).

Let and . As explained in Section 3,

(1)

Furthermore, we can compute:

(Since )
(Since )
(2)

Now, writing , we expect to equal . We want to argue that the norm falls close to this value with high probability. This follows from claiming that no entry in is too large. For any edge define:

Lemma 3.

.

Proof.

Consider . Let and . If we have then

which implies as desired.

Now, is a weighted graph Laplacian added to a weighted identity matrix. Thus it is full rank and diagonally dominant. Since it has full rank, . Since is diagonally dominant and since is zero everywhere except at and , it must be that is the maximum value of and is the minimum value. So and .

From Lemma 3, the vector has all entries (and thus all squared entries) in so we can apply a Chernoff/Hoeffding bound to show concentration for . Specifically, we use the standard multiplicative bound [Hoe63]:

(3)

Since

(4)

we can set and conclude that

Accordingly, with high probability for some constant and .

Now, if , then our sparse recovery routine must return an estimated value for that is . We set