Decoding binary node labels from censored edge measurements

Decoding binary node labels from censored edge measurements: Phase transition and efficient recovery

Emmanuel Abbe, Afonso S. Bandeira, Annina Bracher, and Amit Singer Emmanuel Abbe is with the Program in Applied and Computational Mathematics (PACM) and the Department of Electrical Engineering, Princeton University, Princeton, NJ 08544, USA. E-mail: eabbe@princeton.edu. Afonso S. Bandeira is with PACM, Princeton University, Princeton, NJ 08544, USA. E-mail: ajsb@math.princeton.edu. Annina Bracher is with the Department of Electrical Engineering, Swiss Federal Institute of Technology, Zurich, ZH 8092, CH. E-mail: bracher@isi.ee.ethz.ch. Amit Singer is with the Department of Mathematics and PACM, Princeton University, Princeton, New Jersey 08544, USA. E-mail: amits@math.princeton.edu.
Abstract.

We consider the problem of clustering a graph into two communities by observing a subset of the vertex correlations. Specifically, we consider the inverse problem with observed variables , where is the incidence matrix of a graph , is the vector of unknown vertex variables (with a uniform prior), and is a noise vector with Bernoulli i.i.d. entries. All variables and operations are Boolean. This model is motivated by coding, synchronization, and community detection problems. In particular, it corresponds to a stochastic block model or a correlation clustering problem with two communities and censored edges. Without noise, exact recovery (up to global flip) of is possible if and only the graph is connected, with a sharp threshold at the edge probability for Erdős-Rényi random graphs. The first goal of this paper is to determine how the edge probability needs to scale to allow exact recovery in the presence of noise. Defining the degree rate of the graph by , it is shown that exact recovery is possible if and only if . In other words, is the information theoretic threshold for exact recovery at low-SNR. In addition, an efficient recovery algorithm based on semidefinite programming is proposed and shown to succeed in the threshold regime up to twice the optimal rate. For a deterministic graph , defining the degree rate as , where is the minimum degree of the graph, it is shown that the proposed method achieves the rate , where is the spectral gap of the graph .

A preliminary version of this paper appeared in ISIT 2014 [conferenceversion]. This version will appear in the IEEE Transactions on Network Science and Engineering.

1. Introduction

A large variety of problems in information theory, machine learning, and image processing are concerned with inverse problems on graphs, i.e., problems where a graphical structure governs the dependencies between the variables that are observed and the variables that are unknown. In simple cases, the dependency model is captured by an undirected graph with the unknown variables attached at the vertices and the observed variables attached at the edges. Let be a graph with vertex set and edge set , and let be the vertex- and the edge-variables. In many cases of interest (detailed below), the probabilistic model for the edge-variables conditionally on the vertex-variables has a simple structure: it factorizes as

 (1) P(yE|xV)=∏e∈EQ(ye|x[e]),

where denotes the variable attached to edge , denotes the two vertex-variables incident to edge , and is a local probability kernel. In this paper, we consider Boolean edge- and vertex-variables, and assume that the kernel is symmetric and depends only on the XOR of the vertex-variables.111Symmetry means that for some that satisfies . The edge-variables can then be viewed as a random vector that satisfies

 (2) YE=BGxV⊕ZE,

where is the incidence matrix of the graph, i.e., the matrix, with and , such that if and only if edge is incident to vertex , and is a random vector of dimension representing the noise.

In the above setting, the forward problem of recovering the most likely edge-variables given the vertex-variables is trivial and amounts to maximizing for each edge. The inverse problem, however, is more challenging: the most likely vertex-variables (say with a uniform prior) given the edge-variables cannot be found by local maximization.

This problem can be interpreted as a community detection problem with censored edges: Consider a population with vertices and two communities, the blues and the reds. The colors of the vertices, encoded by the binary variables , are unknown and the goal is to recover them by observing pairwise interactions of these nodes. However, not all interactions are observed, only the ones encoded by the graph . In the noiseless case, the observation is perfect and allows to determine whether and are in the same community or not, i.e., . Hence, recovering the partition in this case amounts to having a connected graph , and the recovery is obtained by picking a vertex label and recovering the other vertices along any spanning tree. Note that we can only hope to recover the partition and not the exact colors, as a global flipping of all the colors gives the same observations. In the more interesting setting, the observations are assumed to be noisy, i.e., with probability an error is made on the parity of the two colors: , where the ’s are i.i.d. Bernoulli. In this case, the connectivity of is a necessary condition, but it is in general not sufficient to cope with the noise. This paper investigates how to strengthen the connectivity assumption, in terms of the edge probability for random graphs or in terms of the spectral gap for deterministic graphs, in order to recover the partition despite the noise.

There are various interpretations and models that connect to this problem.

• Community detection: It is worth connecting the above model to other existing models for community networks. The model in (1) can be seen as a general probabilistic model of networks, that extends the basic Erdős-Rényi model [ER-seminal], which often turns out to be too simplistic since all vertices have the same expected degree and no cluster structure appears. One possibility to obtain cluster structure is precisely to attach latent variables to the vertices and assume an edge distribution that depends on these variables. There are various models with latent variables, such as the exchangeable, inhomogeneous or stochastic block models [airoldi, sbm1, sbm2, dyer, newman2, sbm-book]. The general model in (1) can be used for this purpose, as explained above in the special case of (2). The vertex-variables represent the community assignment, the edge-variables the connectivity, and the graph encodes where the information is available. The model (2) is related to the stochastic block model through the following censored block model, introduced in [random] in a different context. Given a base-graph and a community assignment , the following random graph is generated on the vertex set with ternary edge labels drawn independently with the following probability distribution:

 (3a) P{Eij=∗|E(G)ij=0}=1 (3b) P{Eij=1|Xi=Xj,E(G)ij=1}=q1, (3c) P{Eij=1|Xi≠Xj,E(G)ij=1}=q2.

Put differently, (3) is a graph model where information is only available on the base-graph , the -variable encodes the absence of information, and when information is available, two vertices are connected with probability if they are in the same community and with probability if they are in different communities. When is the complete graph and is uniformly distributed, this is the standard stochastic block model with two communities, and , gives the sparse regime of [decelle, mossel-sbm]. In the case of (2), the linear structure implies , which may be both of order 1, whereas the base-graph may be sparse. This raises an important distinction: in the sparse stochastic block model, it is assumed that most node pairs are unlikely to be connected, whereas in the model of this paper, it is assumed that information is not available for most node pairs. These are not the same, and the latter may help preventing false-alarm type of errors. However, we restrict ourselves in this paper to the symmetric case , which simplifies the computations.

• Correlation clustering: [corr_cluster] considers the problem of clustering a complete graph with edges labeled in in order to maximize the number of agreeing edges (having a label within a cluster and a label otherwise). Another variant is proposed in [nicolo]. The original motivation behind correlation clustering is to let the number of clusters be a design parameter, although the case of constraining the number of clusters has also been considered [corr_cluster_k]. In our setting, the number of clusters is fixed and assumed to be 2. More importantly, our goal is to understand how sparse the measurement graph can be in order to still be able to recover the original clustering, which is planted. In that regard, we are proposing a planted correlation clustering problem with a fixed number of clusters, censored measurements, and with a probabilistic model.

• Coding: Equation (2) provides the output on a binary symmetric channel of a code whose generator matrix is the adjacency matrix of the graph . More precisely, since here is assumed to be a graph and not a hyper-graph, this is a very simple code, namely a 2-right-degree LDGM code. While this is not a particularly interesting code by itself (e.g., at any fixed rate, it has a constant fraction of isolated vertices), it is a relevant primitive for the construction of other codes such as LT or raptor codes [kumar, LT-shokro]. Note that this paper will consider such a code at a vanishing rate, namely , and determine for which values of the successful decoding of this code is still possible. Somehow unexpectedly, the Shannon capacity will also arise in this regime as shown in our main results.

• Constraint satisfaction problems: (1) is a particular case of the graphical channel studied in [random] in the context of hypergraphs. This class of models allows in particular to recover instances of planted constraint satisfaction problems (CSPs) by choosing uniform kernels , where the vertex-variables represent the planted assignment and the edge-variables represent the clauses. In the case of a simple graph and not a hypergraph, this provides a model for planted formulae such as 2-XORSAT (model (2)).

• Synchronization: Equation (2) results also from the synchronization problem studied in [ASinger_2011_angsync, Bandeira_Singer_Spielman_OdCheeger, Wang_RobustSynchronization, Alexeev_PhaseRetrievalPolarization, boumal], if the dimension is one (e.g., when each vertex-variable is the 1-bit quantization of the reflection of a signal). The goal in synchronization over , the group of orthogonal matrices222Note that denotes the group of orthogonal matrices of size and does not refer to the big-O notation frequently used in algorithm analysis. of size , is to recover the original values of the node-variables in given the relative measurements , where is randomly drawn in if the vertices and are adjacent and all-zero otherwise.333If is the identity matrix, then the measurement is noise-free. When , we have and the synchronization problem is equivalent to (2).

While the above mentioned problems are all concerned with related inverse problems on graphs, there are various recovery goals that can be considered. This paper focuses on exact recovery, which requires all vertex-variables to be recovered simultaneously with high probability as the number of vertices diverges. The probability measure may depend on the graph ensemble or simply on the kernel if the graph is deterministic. Note, as mentioned previously, that exact recovery of all variables in the model (2) is not quite possible: the vertex-variables and produce the same output . Exact recovery is meant “up to a global flipping of the variables”. For partial recovery, only a strictly dominant constant fraction of the vertex-variables are to be recovered correctly with high probability as the number of vertices diverges. Put differently, the true assignment need only be positively correlated with the reconstruction.444We have recently became aware that [Heimlicher_SBM] studies partial recovery for the model of this paper. The recovery requirements vary with the applications, e.g., exact recovery is typically required in coding theory to ensure reliable communication, while both exact and partial recovery are of interest in community detection problems.

This paper focuses on exact recovery for the linear model (2) with Boolean variables, and on random Erdős-Rényi and deterministic base-graphs . For this setup, we identify the information theoretic (IT) phase transition for exact recovery in terms of the edge density of the graph and the noise level and devise an efficient algorithm based on semidefinite programming (SDP), which approaches the threshold up to a factor of in the Erdős-Rényi case. This SDP based method was first proposed in [ASinger_2011_angsync], and it shares many aspects with the SDP methods in several other problems [So_MIMO, Huang_Guibas_Graphics].

2. Related work

While writing this paper we became aware of various exciting related work that was being independently developed:

A similar exact recovery sufficient condition, as (LABEL:conditionneededdual) for the SDP, was independently obtained by Huang and Guibas [Huang_Guibas_Graphics] in the context of consistent shape map estimation (see Theorem 5.1. in [Huang_Guibas_Graphics]). Their analysis goes on to show, essentially, that as long as the probability of a wrong edge is a constant strictly smaller than , the probability of exact recovery converges to as the size of the graph is arbitrarily large. In the context of our particular problem, that claim was also shown in [Wang_RobustSynchronization]. Later, this analysis was improved by Chen, Huang, and Guibas [Chen_Huang_Guibas_Graphics] and, when restricted to our setting, it includes guarantees on the rates at which this phase transition happens. However, these rates are, to the best of our knowledge, only optimal up to polylog factors. On the other hand, we are able to show near tight rates. For a given that is arbitrarily close to we give an essentially-tight bound (off by at most a factor of ) on the size of the graph and edge density needed for exact recovery (Theorem LABEL:SDP_ErdosReinyi). To the best of our knowledge, our Theorem LABEL:SDPtheoremfordregular is the only available result for deterministic graphs.

On the IT side, both converse and direct guarantees were independently obtained by Chen and Goldsmith [Chen_Goldsmith_ISIT2014]. However, while considering a more general problem, the results they obtain are only optimal up to polylog factors.

3. Model and results

In this paper, we focus on the linear Boolean model

 (4) YE=BGxV⊕ZE,

where the vector components are in and the addition is modulo 2. We require exact recovery for and consider for the underlying graph , with , both the Erdős-Rényi model where the edges are drawn i.i.d. with probability , and deterministic -regular graphs. We assume that the noise vector has i.i.d. components, equal to 1 with probability . We assume555The noise model is assumed to be known, hence the regime can be handled by adding an all-one vector to . w.l.o.g. that , where means no noise (and exact recovery amounts to having a connected graph) and means maximal noise (and exact recovery is impossible no matter how connected the graph is). The prior on is assumed to be uniform. Note that the inverse problem would be much easier if the noise model caused erasures with probability , instead of errors. Exact recovery would then be possible if and only if the graph was still connected after the noisy edges had been erased. Since there is a sharp threshold for connectedness at , this would happen a.a.s. if for some . Hence is a sharp threshold in for the exact recovery problem with erasures and base-graph .

The goal of this paper is to find the replacement to the erasure threshold for the setting where the noise causes errors. Similarly to channel coding where the Shannon capacity of the BSC differs from the BEC capacity, we obtain for the considered inverse problem the expression

 D(1/2||ε) =(1−2ε)2/2+o((1−2ε)2) (5) =log(2)−H(ε)+o((1−2ε)2),

where is the Kullback-Leibler divergence666All logarithms have base , i.e., we denote by the Kullback-Leibler divergence between and and by the entropy (in nats) of a binary random variable that assumes the value with probability . between and . Hence the Shannon capacity provides the threshold for the low-SNR regime, although the considered inverse problem is a priori not related to the channel coding theorem.

More precisely, this paper establishes an IT necessary condition that holds for every graph (Theorem 4.1), an IT sufficient condition for Erdős-Rényi graphs (Theorem 4.2), and an IT sufficient condition that holds for any graph (Theorem LABEL:IT_sufficient_CheegerConstant) and depends on the graph’s Cheeger constant, a common measure of the connectivity of a graph (see (LABEL:def:Cheegerconstant44)) related to its spectral gap by Cheeger’s inequality (see Theorem LABEL:thm:Cheegersineq). Moreover, we also give a recovery guarantee that holds for an efficient algorithm based on SDP (Theorems LABEL:SDP_ErdosReinyi and LABEL:SDPtheoremfordregular).

In particular, we show that, for and for every : The bounds for the necessary condition for a general graph and the IT sufficient condition for the Erdős-Rényi graph match.777The regime is frequently studied in the synchronization problem in dimension . Remarkably, the sufficient condition for the efficient SDP-based method to achieve exact recovery matches the IT bound up to a factor of .

If the noise parameter is bounded away from both zero and , then all conditions imply , where is the expected average degree: . The factors by which the bounds differ decrease with an increasing noise parameter . Since in the noise-free case exact recovery is possible if and only if the graph is connected, which is true for trees (with ) and, for Erdős-Rényi graphs only when , the factors between the necessary condition and the sufficient conditions necessarily approach infinity when decreases to zero (since diverges).

4. Information Theoretic Bounds

This section presents necessary and sufficient conditions for exact recovery of the vertex-variables from the edge-variables . We speak of exact recovery if there is a decoding algorithm that recovers the vertex-variables up to an unavoidable additive offset with some probability that converges to as the number of vertices approaches infinity.

By definition, maximum a posteriori (MAP) decoding always maximizes the probability of recovering the correct vertex-variables. Since we assume uniform priors, maximum likelihood (ML) and MAP decoding coincide. Hence, our definition of exact recovery is tantamount to requiring that ML decoding recovers the vertex-variables up to an unavoidable additive offset with some probability that converges to as the number of vertices approaches infinity. Note that an ML decoder produces vertex-variables that minimize the number of edges of for which is non-zero.

4.1. A Necessary Condition for Successful Recovery

For each graph (drawn from the Erdős-Rényi model or not), the following result holds:

Theorem 4.1.

Let and let be the average degree of . If then, recovery with high probability is possible only if

 (6) dlogn≥1−3τ/2D(1/2||ε)−1logn+o(1D(1/2||ε)).

If , this condition implies

 (7)

Before proving this Theorem, we compare it with the necessary condition previously shown in [ASinger_2011_angsync, Section 5]. If does not depend on , then this condition only implies and is thus weaker than , which follows from Theorem 4.1. If , then , and we can write the condition in [ASinger_2011_angsync] as . If there is a for which , then Theorem 4.1 is tighter: it implies . However, if there is no such , then Theorem 4.1 cannot be applied.888Using Slud’s inequality [slud77] to lower-bound , one can improve the bound for and show that whenever there is a for which , then a necessary condition is .

Proof.

[of Theorem 4.1] Fix a vertex , and let denote the event that the variables attached to at least half of the edges that are incident to vertex  are noisy. As we argue next, if event occurs, then ML decoding recovers vertex-variables other than or with probability at least . Indeed, if ML decoding correctly recovers the vertex-variables that are attached to the vertices adjacent to up to a global additive offset , then—by assumption that event occurs—the probability that ML decoding recovers with offset is at least . In particular, this implies that ML decoding can only be successful if the event occurs. Let be an independent subset of , i.e. a set such that no two vertices in it are adjacent. Since the noise is drawn IID, the events are independent and the probability of the event is easily computable. Moreover, the event can only occur if occurs. A necessary condition for exact recovery thus is that the probability of the event converges to one as the number of vertices increases. In the following, we prove the claim by identifying an independent set and by upper-bounding the probability of the event .

Let be the degree of vertex , and assume w.l.o.g. . For every

 (8) dn≥n∑j=⌈δn⌉deg(vj)≥⌈(1−δ)n⌉deg(v⌈δn⌉).

For , we therefore find

 (9)

This implies that for every set , the vertices are disconnected from at least

 (10) ⌈δn⌉−|L|(1+d1−δ)

vertices in the set . We can construct an independent set by iteratively including vertices in while keeping independence, until no vertex can be added. In fact, using the degree bound in (10), it is easy to see that this process constructs an independent set such that

 (11) |Q|≥⌈δn⌉1+d1−δ≥δ(1−δ)nd+1−δ.

To simplify notation, we introduce the variables

 aj=⌊deg(vj)2⌋,bj=⌈deg(vj)2⌉.

If , then

 Prob[Ej] =deg(vj)∑k=bj(deg(vj)k)εk(1−ε)deg(vj)−k ≥(deg(vj)bj)εbj(1−ε)aj \lx@stackrela)≥√2πdeg(vj)deg(vj)deg(vj)εbj(1−ε)aje2√bjajbbjjaajj =e−12log(1−εε)−log2−deg(vj)D(1/2||ε)−12log(deg(vj)) (12)

where is due to Stirling’s formula

 1≤ℓ!√2πℓ(ℓ/e)ℓ≤e√2π,ℓ∈N,

is due to the inequality of arithmetic and geometric means, the relation , the fact that for every

 (t+12)t+12(t−12)t−12t2t=(1−14t2)t  ⎷1+12t1−12t<1.3,

and the inequality , and is due to (9).

Since the events are jointly independent,

 Prob⎡⎣⋂j∈QEcj⎤⎦ =∏j∈Q(1−Prob[Ej]) \lx@stackrela)≤e−∑j∈Qe−12log(1−εεd1−δ)−log(2)−dD(1/2||ε)1−δ (13) \lx@stackrelb)≤e−elog(δ(1−δ)n2(d+1−δ)√1−δd√ε1−ε)−dD(1/2||ε)1−δ,

where holds since for and because of (12), and is due to (11). Clearly, a necessary condition for the RHS of (13) to converge to is

 (14) dD(1/2||ε)1−δ≥log(δ(1−δ)n2(d+1−δ)√1−δd√ε1−ε).

Take . Clearly, the average degree must be nonnegative. If , then

 log(δ(1−δ)n2(d+1−δ)√1−δd√ε1−ε) ≥logn+log(δ(1−δ)322(2−δ))−12log(1−εε) \lx@stackrel(a)≥logn+log(δ(1−δ)322(2−δ))−12log(1ε(1−ε)) (15) \lx@stackrel(b)≥logn+Θ(loglogn)−D(1/2||ε),

where is due to , and holds because and since . If , then

 log(δ(1−δ)n2(d+1−δ)√1−δd√ε1−ε) =log(nd−32)+log(δ(1−δ)322(1+1−δd))+12log(ε1−ε) (16) \lx@stackrel(a)≥(1−3τ2)logn−D(1/2||ε)+Θ(loglogn),

where holds since , because , since , and because . For , we thus obtain from (15) (if ) or (16) (if ) that (14) cannot hold unless (6) holds.

4.2. Sufficient Conditions for Successful Recovery

We next present sufficient conditions for exact recovery. We first focus on graphs from the Erdős-Rényi model. Then, we consider arbitrary graphs and present a condition that is sufficient for every graph and depends only on the graph’s Cheeger constant.

For a random base-graph from the Erdős-Rényi model, we require the vertex-variables to be recoverable from the edge-variables except with some probability that vanishes as the number of vertices increases.

Theorem 4.2.

Suppose the base-graph is drawn from the Erdős-Rényi model with , and let denote its expected average degree, i.e., . Then the condition

 (17) dlogn≥1(1−√2lognd)D(1/2||ε)+o(1D(1/2||ε))

is sufficient to guarantee exact recovery with high probability. If , the condition is

 (18) dlogn≥2(1−2ϵ)2+o(1(1−2ε)2).
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters