Nonlocal p-Laplacian Variational problems on graphs

Nonlocal -Laplacian Variational problems on graphs

Yosra Hafiene Normandie Univ, ENSICAEN, UNICAEN, CNRS, GREYC, France.    Jalal M. Fadilifootnotemark:    Abderrahim Elmoataz11footnotemark: 1
Abstract

In this paper, we study a nonlocal variational problem which consists of minimizing in the sum of a quadratic data fidelity and a regularization term corresponding to the -norm of the nonlocal gradient. In particular, we study convergence of the numerical solution to a discrete version of this nonlocal variational problem to the unique solution of the continuous one. To do so, we derive an error bound and highlight the role of the initial data and the kernel governing the nonlocal interactions. When applied to variational problem on graphs, this error bound allows us to show the consistency of the discretized variational problem as the number of vertices goes to infinity. More precisely, for networks in convergent graph sequences (simple and weighted deterministic dense graphs as well as random inhomogeneous graphs), we prove convergence and provide rate of convergence of solutions for the discrete models to the solution of the continuous problem as the number of vertices grows.

Key words. Variational problems, nonlocal -Laplacian, discrete solutions, error bounds, graph limits.

AMS subject classifications. 65N12, 65N15, 41A17, 05C80, 05C90, 49M25, 65K15.

1 Introduction

1.1 Problem statement

We study the following variational problem

()

where and

(1)

is a bounded domain, and without loss of generality we take , and the kernel is a symmetric, nonnegative and bounded function. Here is a positive regularization parameter that balances the relative importance of the smoothness of the minimizer and fidelity to the initial data. The chief goal of this paper is to study numerical approximations of the nonlocal variational problem (), which in turn, will allow us to establish consistency estimates of the discrete counterpart of this problem on graphs.

In the context of image processing, smoothing and denoising are key processing tasks. Among the existing methods, the variational ones, based on nonlocal regularization such as (), provide a popular and versatile framework to achieve these goals. In image processing, such variational problems are in general formulated and studied on the continuum and then discretized on sampled images. On the other hand, many data sources, such as point clouds or meshes, are discrete by nature. Thus, handling such data necessitates a discrete counterpart of (), which reads

()

where

(2)

Our aim is to study the relationship between the variational problems () and (). More specifically we aim at deriving error estimates between the corresponding minimizers, respectively and .

1.2 Contributions

In this work we focus on studying the consistency of () in which we investigate functionals with a nonlocal regularization term corresponding to the -Laplacian operator. We first give a general error estimate in controlling the error of between the continuous extension of the numerical solution to the discrete variational problem () and its continuum analogue of (). The dependence of the error bound on the error induced by discretizing the kernel and the initial data is made explicit. Under very mild conditions on and , typically belonging to a large class of Lipschitz functional spaces (see Section 2.3 for details on these spaces), convergence rates can be exhibited.

Secondly, we apply these results, using the the theory graph limits (for instance graphons), to dynamical networks on simple and weighted dense graphs to show that the approximation of minimizers of the discrete problems on simple and weighted graph sequences converge to those of the continuous problem. This sets the question that solving a discrete variational problem on graphs has indeed a continuum limit. Under very mild conditions on and , typically belonging to Lipschitz functional spaces, precise convergence rates can be exhibited. These functional spaces allow to cover a large class of graphs (through ) and initial data , including those functions of bounded variation. For simple graph sequences, we also show how the accuracy of the approximation depends on the regularity of the boundary of the support of the graph limit.

Finally, building upon these error estimates, we study networks on random inhomogeneous graphs. We combine them with sharp deviation inequalities to establish nonasymptotic convergence claims and give the rate of convergence of the discrete solution to its continuous limit with high probability under the same assumptions on the kernel and the initial data .

1.3 Relation to prior work

Nonlocal regularization in machine learning

The authors in [17] studied the consistency of rescaled total variation minimization on random point clouds in with a clustering application. They considered the total variation on graphs with a radially symmetric and rescaled kernel , . This corresponds to an instance of for and . For an appropriate scaling of with respect to and under some assumptions on , those authors they proved that the discrete total variation on graphs -converges in an appropriate topology, as , to weighted local total variation, where the weight function is the density of the point cloud distribution. This work were extended in [30] to the graph -Laplacian for semisupervised learning in . More precisely, the authors considered a constrained and penalized minimization of with a radially symmetric and rescaled kernel as explained before. They investigated asymptotic behavior when the number of unlabeled points increases, with a fixed number of training points. They uncovered ranges on the scaling of with respect to for the asymptotic consistency (in -convergence sense) to hold. For the same problem, the authors of [13] obtained iterated pointwise convergence of graph -Laplacians to the continuum -Laplacian; see [30] for a thorough review in the context of machine learning. Note however that all these results on asymptotic behavior of minimizers do not provide any error estimates for finite and do not provide precise guidance on what would lead to best approximation.

Nonlocal regularization in imaging

Several edge-aware filtering schemes have been proposed in the literature [38, 31, 35, 32]. The nonlocal means filter [9] averages pixels that can be arbitrary far away, using a similarity measure based on distance between patches. As shown in [33, 28], these filters can also be interpreted within the variational framework with nonlocal regularization functionals. They correspond to one step of gradient descent on () with , where is computed from the input noisy image using either a distance between the pixels and  [38, 35, 32] or a distance between the patches around and [9, 34]. This nonlocal variational denoising can be related to sparsity in an adapted basis of eigenvector of the nonlocal diffusion operator [11, 34, 28]. This nonlocal variational framework was also extended to handle several linear inverse problems [33, 18, 8, 19]. In [29, 14, 37], the authors proposed a variational framework with nonlocal regularizers on graphs to solve linear inverse problems in imaging where both the image to recover and the graph structure are inferred.

Consistency of the ROF model

For local variational problems, the only work on consistency that we are aware of is the one of [36] who studied the numerical approximation of the Rudin-Osher-Fatemi (ROF) model, which amounts to minimizing in the well-known energy functional

where , and denotes the total variation seminorm. They bound the difference between the continuous solution and the solutions to various finite-difference approximations (including the upwind scheme) to this model. They gave an error estimate in of the difference between these two solutions and showed that it scales as , where is the smoothness parameter of the Lipschitz space containing .

However, to the best of our knowledge, there is no such consistency result in the nonlocal variational setting. In particular, the problem of the continuum limit and consistency of () with error estimates is still open in the literature. It is our aim in this work to rigorously settle this question.

1.4 Paper organisation

The rest of this paper is organized as follows. Section 2 collects some notations and preliminaries that we will need in our exposition. In Section 3 we we briefly discuss well-posedness of problems () and () and recall some properties of the corresponding minimizers. Section 4 is devoted to the main result of the paper (Theorem 4.1) in which we give a bound on the -norm of the difference between the unique minimizers of () and (). In this section, we also state a key regularity result on the minimizer of (). This result is then used to study networks on deterministic dense graph sequences in Section 5. First we deal with networks in simple graphs, and show in Corollary 5.1 the influence of the regularity of the boundary of the support of the graphon on the convergence rate. Secondly, in Section 5.2 we study networks on weighted graphs. Section 6 deals with networks on random inhomogeneous graphs. We quantify the rate of convergence with high probability. Numerical results are finally reported in Section 7 to illustrate our theoretical findings.

2 Notations and preliminaries

To provide a self-contained exposition, we will recall two key frameworks our work relies on. The first is the limit graph theory which is the notion of convergence for graph sequences developed for the analysis of networks on graphs. The second is that of Lipschitz spaces that will be instrumental to quantify the rate of convergence in our error bounds.

2.1 Projector and injector

Let , and divide into intervals

and let denote the partition of , . Denote . Without loss of generality, we assume that the points are equispaced so that , where is the measure of . The discussion can be easily extended to non-equispaced points by appropriate normalization; see Section 6.

We also consider the operator

This operator can be also seen as a piecewise constant projector of on the space of discrete functions. For simplicity, and with a slight abuse of notation, we keep the same notation for the projector .

We assume that the discrete initial data and the discrete kernel are constructed as

(3)

where

(4)

Our aim is to study the relationship between the minimizer of and the discrete minimizer of and estimate the error between solutions of discrete approximations and the solution of the continuous model. But the solution of problem () being discrete, it is convenient to introduce an intermediate model which is the continuous extension of the discrete solution. Towards this goal, we consider the piecewise constant injector of the discrete functions and into , and of into , respectively. This injector is defined as

(5)

where we recall that is the characteristic function of the set , i.e., takes on and otherwise.

With these definitions, we have the following well-known properties whose proofs are immediate. We define the norm, for a given vector , ,

with the usual adaptation for .

Lemma 2.1.

For a function , , we have

(6)

and for

(7)

In turn

(8)

It is immediate to see that the composition of the operators and yields the operator which is the orthogonal projector on the subspace of .

2.2 Graph limit theory

We now briefly review some definitions and results from the theory of graph limits that we will need later since it is the key of our study of the discrete counterpart of the problem () on dense deterministic graphs. We follow considerably [6, 23], in which much more details can be found.

An undirected graph , where stands for the set of nodes and denotes the edges set, without loops and parallel edges is called simple.

Let , , be a sequence of dense, finite, and simple graphs, i.e; , where now denotes the cardinality of a set.

For two simple graphs and , indicates the number of homomorphisms (adjacency-preserving maps) from to . Then, it is worthwhile to normalize the homomorphism numbers and consider the homomorphism densities

(Thus is the probability that a random map of into is a homomorphism).

Definition 2.1.

(cf.[23]) The sequence of graphs is called convergent if is convergent for every simple graph .

Convergent graph sequences have a limit object, which can be represented as a measurable symmetric function , here stands for . Such functions are called graphons.

Let denote the space of all bounded measurable functions such that for all . We also define the set of all graphons.

Proposition 2.1 ([6, Theorem 2.1]).

For every convergent sequence of simple graphs, there is such that

(9)

for every simple graph . Moreover, for every , there is a sequence of graphs satisfying (9).

Graphon in (9) which is uniquely determined up to measure-preserving transformations, is the limit of the convergent sequence . Indeed, every finite simple graph such that can be represented by a function

Hence, geometrically, the graphon can be interpreted as the limit of for the standard distance (called the cut-distance), see [6, Theorem 2.3]. An interesting consequence of this interpretation is that the space of graphs , or equivalently pixel kernels , is not closed under the cut distance. The space of graphons (larger than the space of graphs) defines the completion of this space.

2.3 Lipschitz spaces

We introduce the Lipschitz spaces , for , , which contain functions with, roughly speaking, ”derivatives” in  [12, Ch. 2, Section 9].

Definition 2.2.

For , , we define the (first-order) modulus of smoothness by

(10)

The Lipschitz spaces consist of all functions for which

We restrict ourselves to values as for , only constant functions are in . It is easy to see that is a semi-norm. is endowed with the norm

The space is the Besov space  [12, Ch. 2, Section 10] which are very popular in approximation theory. In particular, contains the space of functions of bounded variation on , i.e. the set of functions such that their variation is finite:

where are the coordinate vectors in ; see [12, Ch. 2, Lemma 9.2]. Thus Lipschitz spaces are rich enough to contain functions with both discontinuities and fractal structure.

Let us define the piecewise constant approximation of a function (a similar reasoning holds of course on ) on a partition of into cells of maximal mesh size ,

One may have recognized in these expressions non-equispaced versions of the projector and injector defined above.

We have the following error bounds whose use standard arguments from approximation theory; see [21, Section 6.2.1] for details.

Lemma 2.2.

There exists a positive constant , depending only on , such that for all , , , ,

(11)

Let . If, in addition, , then there exists a positive constant , depending on , and such that

(12)

3 Well posedness

We start by proving existence and uniqueness of the minimizer for () and ().

Theorem 3.1.

Suppose that , is a nonnegative measurable mapping, and . Then, has a unique minimizer in , and has a unique minimizer.

Proof : The arguments are standard (coercivity, lower semicontinuity and strict convexity) but we provide a self-contained proof (only for ). Let be a minimizing sequence in . By optimality and Jensen’s inequality, we have

(13)

Moreover

(14)

Thus is bounded uniformly in so that the Banach-Alaoglu theorem for and compactness provide a weakly convergent subsequence (not relabelled) with a limit . By lower semicontinuity of the norm with respect to weak convergence and that of , must be a minimizer. The uniqueness follows from strict convexity of and convexity of .

Remark 3.1.

Theorem 3.1 can be extended to linear inverse problems where the data fidelity in  is replaced by , and where is a continuous linear operator. The case where is injective is immediate. The general case is more intricate and would necessitate appropriate assumptions on and a Poincaré-type inequality. For instance, if , and the kernel of intersects constant functions trivially, then using the Poincaré inequality in [1, Proposition 6.19], one can show existence and uniqueness in , and thus in if . We omit the details here as this is beyond the scope of the paper.

We now turn to provide useful characterization of the minimizers and . We stress that the minimization problem () that we deal with is considered over ( only for ) over which the function may not be finite. In correspondence, we will consider the subdifferential of the proper lower semicontinuous convex function on defined as

and if .

Lemma 3.1.

Suppose that the assumptions of Theorem 3.1 hold. Then is the unique solution to () if and only if

(15)

Moreover, the proximal mapping is non-expansive on , i.e., for , the corresponding minimizers obey

(16)

A similar claim is easily obtained for () as well.

Proof : The proof is again classical. By the first order optimality condition and since the squared -norm is Fréchet differentiable, is the unique solution to () if, and only if,

and the first claim follows. Writing the subgradient inequality for and we have

Adding these two inequalities we get

and we conclude upon applying Cauchy-Schwartz inequality.

We now formally derive the directional derivative of when . For this the symmetry assumption on is needed as well. Let . Then the following derivative exists

Since is symmetric, we apply the integration by parts formula in [21, Lemma A.1] (or split the integral in two terms and apply a change of variable ), to conclude that

where

is precisely the nonlocal -Laplacian operator, see [1, 21]. This shows that under the above assumptions, is Fréchet differentiable (hence Gâteaux differentiable) on with Fréchet gradient .

4 Error estimate for the discrete variational problem

4.1 Main result

Our goal is to bound the difference between the unique minimizer of the continuous functional defined on and the continuous extension by of that of . We are now ready to state the main result of this section.

Theorem 4.1.

Suppose that and is a nonnegative measurable, symmetric and bounded mapping. Let and be the unique minimizers of () and (), respectively. Then, we have the following error bounds.

  1. If , then

    (17)

    where is a positive constant independent of .

  2. If , then for any ,

    (18)

    where is a positive constant independent of .

Observe that for . Thus by standard embeddings of spaces for bounded, we have for

which means that our bound in (17) not only does not require an extra-assumption on but is also sharper than (18). The assumption on in the second statement seems difficult to remove or weaken. Whether this is possible or not is an open question that we leave to a future work.

Proof :

  1. Since is a strongly convex function, we have

    (19)

    A closer inspection of and and equality (7) allows to assert that

    (20)

    Now, applying the Cauchy-Schwarz inequality and using (20), we have

    (21)

    As we suppose that and since is the (unique) minimizer of (by virtue of (20)), it is immediate to see, using (8), that

    and thus

    (22)

    Since , by Hölder and triangle inequalities, and (13) applied to , we have that

    (23)