Optimising Spatial and Tonal Data for PDEbased Inpainting^{†}^{†}thanks: Partly funded by the Deutsche Forschungsgemeinschaft (DFG).
Abstract
Some recent methods for lossy signal and image compression store only a few selected pixels and fill in the missing structures by inpainting with a partial differential equation (PDE). Suitable operators include the Laplacian, the biharmonic operator, and edgeenhancing anisotropic diffusion (EED). The quality of such approaches depends substantially on the selection of the data that is kept. Optimising this data in the domain and codomain gives rise to challenging mathematical problems that shall be addressed in our work.
In the 1D case, we prove results that provide insights into the difficulty of this problem, and we give evidence that a splitting into spatial and tonal (i.e. function value) optimisation does hardly deteriorate the results. In the 2D setting, we present generic algorithms that achieve a high reconstruction quality even if the specified data is very sparse. To optimise the spatial data, we use a probabilistic sparsification, followed by a nonlocal pixel exchange that avoids getting trapped in bad local optima. After this spatial optimisation we perform a tonal optimisation that modifies the function values in order to reduce the global reconstruction error. For homogeneous diffusion inpainting, this comes down to a least squares problem for which we prove that it has a unique solution. We demonstrate that it can be found efficiently with a gradient descent approach that is accelerated with fast explicit diffusion (FED) cycles. Our framework allows to specify the desired density of the inpainting mask a priori. Moreover, is more generic than other data optimisation approaches for the sparse inpainting problem, since it can also be extended to nonlinear inpainting operators such as EED. This is exploited to achieve reconstructions with stateoftheart quality.
Apart from these specific contributions, we also give an extensive literature survey on PDEbased image compression methods.
Keywords: Inpainting, Image Compression, Optimisation, Free Knot Problem, Diffusion, Partial Differential Equations (PDE’s), Interpolation, Approximation
Mathematics Subject Classification (2000) MSC 94A08, MSC 65Nxx, MSC 65Kxx
1 Introduction
One of the most fascinating properties of variational methods and partial differential equations (PDE’s) in image analysis is their property to fill in missing data. This fillingin effect has a long tradition: It can be found already in the seminal optic flow paper of Horn and Schunck [67], where the smoothness term propagates information to regions in which the data term is small or even vanishes. More explicit interpolation ideas are exploited in socalled inpainting approaches. They go back to Masnou and Morel [94], became popular by the work of Beltalmío et al. [9], and have been modified and extended by many others such as [12, 24, 60, 134]. Inpainting problems arise when certain parts of the image domain are degraded in such a way that any useful data is unavailable. In the remaining parts of the image domain, the data is available and undegraded. The goal is to reconstruct the missing information by solving a suitable boundary value problem where the data in the undegraded regions serve as Dirichlet boundary conditions.
PDEbased image compression methods constitute a challenging application area of inpainting ideas; see Section 2 for a detailed review and many references. These lossy compression approaches drive inpainting to the extreme: They store only a very small fraction of the data and inpaint the missing data with a suitable differential operator. However, PDEbased compression differs fundamentally from classical inpainting by the fact that it has the liberty to select the data that is kept. Typically one specifies only that a certain fraction of the image data is stored, and the codec^{1}^{1}1A codec is a system for image coding and decoding. can optimise this data in order to minimise the reconstruction error. Obviously this involves a spatial optimisation step that selects the locations of the kept pixels. We call these locations the inpainting mask. Interestingly, it may also be beneficial to optimise the grey values of the selected pixels: While changing the grey values deteriorates the approximation quality inside the mask, it can improve the approximation in the (usually much larger) inpainting domain where no data is specified. This grey value optimisation is also called tonal optimisation.
In order to judge the potential of PDEbased compression approaches, it is important to go to the extreme and find the limits that inpainting with optimised sparse data can offer. This should include both spatial and tonal optimisation for different inpainting operators. In a first step, it can make sense to have a clear focus on quality and postpone considerations of computational efficiency and coding costs to later work when the quality questions are answered in a satisfactory way.
Following this philosophy, the goal of our contribution is to explore the potential of PDEbased inpainting of sparse data that can be optimised both spatially and tonally. In order to evaluate different inpainting operators in a fair way, we aim at algorithms that are as generic and widely applicable as possible, and we optimise quality rather than speed. Computation time becomes only relevant for us when different methods lead to identical results. First our insights and algorithms are derived in more restricted settings. Afterwards, we investigate how these ideas can be extended and generalised.
Our work is organised as follows. We start by presenting a survey on PDEbased image compression in Section 2. Afterwards we discuss the homogeneous diffusion inpainting framework that is used for most of our optimisation algorithms in this work. In Section 4, we present methods that aim at optimal spatial and greyvalue data for homogeneous diffusion interpolation in 1D. The subsequent Section 5 deals with optimisation strategies in 2D. In Section 6, we present extensions to higher order and nonlinear diffusion inpainting operators. Finally we conclude with a summary in Section 7.
Our discussions are based on a conference paper [91], in which we have introduced probabilistic sparsification and tonal optimisation for homogeneous diffusion inpainting. Here we extend these results in a number of ways. Our main innovations are:

We present a new section that gives a detailed review of PDEbased image compression methods. Since this area has developped in a very fruitful way and no review article is available so far, we feel that such a review section can be a useful starting point for readers who would like to learn more on this topic.

In another new section we derive a mathematically sound approach to solve the data optimisation problem for homogeneoues diffusion inpainting of strictly convex signals in 1D. This restricted 1D setting allows us to gain insights into the nonconvexity of the problem and to quantify the errors that are caused by separating spatial and tonal optimisation.

We introduce a new algorithm for tonal optimisation, based on a gradient descent approach with an acceleration by a fast explicit diffusion (FED) strategy. While it achieves the same quality as previous approaches, we show that it is more efficient.

We explain how to extend our methods to more advanced linear or nonlinear inpainting operators such as biharmonic inpainting and inpainting by edgeenhancing anisotropic diffusion. In this way we achieve sparse PDEbased reconstructions of hitherto unparalleled quality.
2 A Review of PDEBased Image Compression
Before describing our approach on spatial and tonal data optimisation in the forthcoming sections, we first review existing approaches to PDEbased image compression. These methods belong to the class of lossy compression techniques. Hence, they aim at finding very compact file representations such that the resulting reconstructions approximate the original image well.
PDEbased appoaches are alternatives to established transformbased codecs such as the JPEG standard [104] which uses the discrete cosine transform (DCT), or its successor JPEG 2000 [126] that involves the wavelet transform. However, since they follow a completely different concept, it is worthwhile to give an overview of their various aspects.
To this end, we review data optimisation questions, the choice of appropriate inpainting operators, and strategies to store the optimised data efficiently. Afterwards we describe PDEbased codecs that are based on specific image features, and we discuss algorithmic aspects, variants and extensions, as well as relations to other approaches.
Our review focuses on methods that use PDE’s as a main tool for compression. This means that we do not discuss the numerous PDEbased or variational techniques that have been advocated as a preprocessing step before coding images or videos (see e.g. [128, 78]) or as a postprocessing tool for removing coding artifacts (such as [52, 3, 16]).
2.1 Data Optimisation
Galić et al. [53, 54] have pioneered research on data selection methods for PDEbased image compression by proposing a subdivision strategy that inserts mask points at locations where the approximation error is too large. While these authors used a triangular subdivision, Schmaltz et al. [118] modified this concept to rectangular subdivisions and added several algorithmic improvements that allowed them to beat the quality of JPEG 2000 with an anisotropic diffusion operator. Belhachmi et al. [8] have used the theory of shape optimisation to derive analytic results for the spatial optimisation problem in the case of homogeneous diffusion inpainting. Discrete approaches for the spatial and tonal optimisation problem with homogeneous diffusion inpainting have been presented by Mainberger et al. [91]. The spatial optimisation is based on a probabilistic sparsification strategy with nonlocal pixel exchange, while tonal optisation is formulated as a linear least squares problem. Hoeltgen et al. [63] considered a control theoretic approach to the problem of data optimisation, also for homogeneous diffusion inpainting. They minimised the quadratic reconstruction error with a sparsity prior on the mask and a nonconvex inpainting constraint. Their numerical approach solves a series of convex optimisation problems with linear constraints. A similar constrained optimisation model was considered by Ochs et al. [101] who used this problem as a test scenario for their iPiano algorithm. Qualitatively both techniques gave similar results, but the iPiano approach offered advantages in terms of efficiency.
While all the before mentioned strategies pay much attention to spatial optimisation, less work has been devoted to tonal optimisation so far. Early heuristic attempts go back to Galić et al. [54]. They lifted the grey values up or down to the next quantisation levels in order to improve the approximation quality in the vicinity of the data. A first systematic treatment in terms of a least squares optimisation problem was given by Mainberger et al. [91], who used a randomised GaußSeidel algorithm. Faster numerical algorithms for the same problem have been proposed by Chen et al. [29], who applied the LBFGS method, and by Hoeltgen and Weickert [64], who advocated the LSQR algorithm and a primal–dual technique.
2.2 Finding Good Inpainting Operators
Although many theoretical investigations on data selection methods are based on homogeneous diffusion inpainting, the task of finding better inpainting operators has been a research topic for a decade. Already in 2005, Galić et al. [53] have shown that edgeenhancing anisotropic diffusion (EED) [131] is better suited for PDEbased image compression than homogeneous diffusion. The favourable performance of anisotropic diffusion approaches of EED type has been confirmed by more detailed comparisons later on, both for randomly scattered data [54, 14] and for compressionoptimised data [29, 118]. Some of these evaluations involve many other inpainting operators such as biharmonic interpolation [43] and its triharmonic counterpart, the absolute minimal Lipschitz extension [22], isotropic nonlinear diffusion [105, 23] and its approximations of total variation (TV) inpainting [26, 114], as well as the inpainting operators of Bornemann and März [12] and of Tschumperlé [127]. Total variation inpainting performed consistently bad, showing that operators which work fairly well for denoising are not necessarily good for sparse inpainting. Bihamonic inpainting, on the other hand, turned out to be an interesting alternative to EED, when linearity or absence of any need for parameter specification are important, and over and undershoots are acceptable.
2.3 Storing the Data
Besides finding sparse inpainting data that allow to approximate the original image in high quality, every practically useful codec must be able to store these data efficiently. Attempting this in a naive way would create a huge overhead, and the resulting codec would not be competitive to the quality that established transformbased codecs such as JPEG or JPEG 2000 can achieve for the same file size. The search for data that can be stored efficiently may even lead to compromises w.r.t. the data optimality: In order to use less bits, it is common to round the intensity values to a relatively small set of quantisation levels. Regarding the data localisation, it can be helpful to avoid a completely free configuration of data points and allow only masks which are represented efficiently with binary trees that result from subdivision strategies [53, 54, 118]. Moreover, also lossless entropy coders such as Huffman coding [68], JBIG [72], or PAQ [89] are highly useful to remove the redundancy of the data.
2.4 Featurebased Methods
Methods that rely on specific – often perceptually relevant – features rather than on data that are optimised for compression applications can be seen as predecessors or variants of PDEbased codecs. Let us now discuss the main types of such features and their relevance for coding.
2.4.1 Contourbased Features
Edges are perceptually relevant features that have been analysed for a long time. Already in 1935, Werner [136] has investigated fillingin mechanisms from edges in biological vision systems. In computer vision, there are numerous appoaches in scalespace and wavelet theory that attempt to reconstruct an image from its edge information [88, 32, 142, 28, 69, 58, 93, 42, 115, 45], often over multiple scales and supplemented with additional information.
Also within coding, publications that exploit information from edges or segment boundaries and combine it with inpainting processes have a long tradition [21, 1, 36, 80, 87, 137, 7, 90, 143, 27, 55, 84, 65, 57, 18]. However, for general images these features are often suboptimal as inpainting data. On the other hand, for specific applications where the images are piecewise constant or piecewise smooth, such codecs can achieve competitive results. This includes cartoonlike images [90] or depth maps [27, 55, 65, 84]. For homogeneous diffusion inpainting and piecewise almost constant data, choosing data near edges can be justified by the theory of Belhachmi et al. [8]. It suggests to select the mask density as an increasing function of the Laplacian magnitude. Contourbased approaches also benefit from the fact that encoding of contour data is relatively inexpensive: One can use e.g. chain codes, and a smooth intensity variation along a contour allows subsampling without substantial quality degradations; see e.g. [90].
Related ideas are also advocated in computer graphics for image editing and vectorised image representations [46, 70, 103, 102]. Moreover, contourbased codecs that rely on inpainting processes with PDE’s have been used successfully for encoding digital elevation maps [48, 121, 138], where information is available in terms of level lines.
2.4.2 Pointbased Features
In order to end up with more compact image representations, it appears tempting to consider pointbased features instead of contourbased ones. Also here several results can be found in the literature.
Johansen et al. [71] and Kanters et al. [74] have performed research on reconstructing images from their top points in scalespace. Lillholm et al. [85] present more general discussions on how to reconstruct an image from a suitable set of feature points and their derivatives (local jet). More recently, also reconstruction attempts are described that are based on SIFT features [135] or local binary descriptors [33].
So far, all image reconstructions from isolated feature points that allow some perceptual interpretations or have shown their merits in other applications such as image matching do not offer a quality level that is sufficient for compression applications. It appears that these features are too sparse while lacking optimality.
2.5 Fast Algorithms and RealTime Aspects
In realtime applications, the efficiency of a codec becomes a central criterion. Here one should distinguish between realtime capabilities during encoding and realtime decoding. Often the latter one is more important: For instance, when a company encodes a movie, it is acceptable if the creation of an optimised file takes many hours, as long as the customer is able to decode it in realtime.
On the encoding side, the approach of Belhachmi et al. [8] is algorithmically sufficiently simple to allow realtime performance: It computes the Laplacian magnitude of a Gaussiansmoothed image and converts the result to a impainting binary mask by means of a halftoning method such as Floyd–Steinberg dithering [51]. Also the codec of Mainberger et al. [90] is of comparable computational simplicity. Most other approaches are not realtime capable during encoding since they apply a sophisticated optimisation of data and parameters. However, also for data optimisation, substantial algorithmic accelerations have been achieved recently [101, 64].
On the decoder side, no timeconsuming optimisation steps are required, and the main computational effort for PDEbased codecs consists of solving an inpainting problem. Thus, realtime decoding is possible if one uses appropriate algorithms and exploits the capabilities of modern hardware. In 2007, Köstler et al. [79] presented realtime algorithms for the subdivision approach of Galić et al. [53]. With multigrid methods for homogeneous diffusion inpainting or lagged diffusivity multilevel approaches for anisotropic diffusion inpainting, they could process more than 25 greyscale images per second of size on a Sony PlayStation 3. Also Mainberger et al. [90] proposed a multigrid approach for their edgebased homogeneous diffusion codec. On a PC CPU they reported runtimes around frames per second for decoding a colour image. This is about times higher than JPEG 2000 and times higher than JPEG. For linear inpainting problems with very sparse data, algorithms based on discrete Green’s functions can be an interesting alternative to multigrid methods [66]. Recently, Peter et al. [106] managed to decode greyscale images per second with VGA resolution by means of anisotropic diffusion inpainting on a PC architecture with GPU support. This was accomplished with socalled FED schemes [59] that are wellsuited for parallel implementations.
2.6 Hybrid Image Compression Methods
The usefulness of embedding inpainting concepts into existing transformbased image compression methods is studied in a number of publications [87, 111, 130, 139, 140]. Their goal is to benefit simultaneously from the advantages of inpainting methods and transformbased codecs. While PDEbased inpainting approaches often perform better at edges, transformbased codecs are favoured inside the regions, in particular if they are dominated by texture. By construction, such hybrid methods restrict themselves to the main application fields of both paradigms rather than pushing inpainting ideas to the limit.
2.7 Modifications, Extensions, and Applications
A progressive mode option is a useful feature of a codec. It allows to encode and transmit data in a coarsetofine way. Thus, in the decoding phase, the representation can be subsequently refined while reading the data stream. In [117], two progressive modes have been suggested for the EEDbased codec, and it has been shown that for high compression rates, their quality is competitive to JPEG and JPEG 2000.
By construction, PDEbased codecs are wellsuited for encoding specific regions of interest with higher precision. All one has to do is to increase the density of the inpainting data in the region of interest. For more details, we refer to [106].
Since a large fraction of modern imagery consists of colour image data, codecs should be able to handle colour images efficiently. While for most PDE approaches, extensions to vector and even matrixvalued data exist, these modifications are not necessarily optimal for compression applications. As a remedy, in [107] an extension of the EEDbased codec to colour images is proposed that uses the YCbCr representation. The perceptually relevant luma channel is stored with fairly high accuracy, while the chroma channels are encoded with very sparse data. In the reconstruction phase the chroma inpainting is guided by the structural information from the luma channel.
Extending PDEbased codecs from 2D images to threedimensional data sets does not create severe difficulties. Most PDE’s have natural extensions to higher dimensions, and also other concepts generalise in a natural way: For instance a rectangular subdivision in 2D is replaced by a cuboidal subdivision in 3D [118]. Because of the richer neighbourhood structure, the redundancy in higher dimensional data allows to achieve a higher compression rate for the same quality as a lower dimensional codec.
PDEbased compression strategies have also been investigated for encoding contours [118] and surfaces [6, 113]. These publications follow the philosophy of PDEbased image compression, but replace the Laplacian by the contour curvature or the Laplace–Beltrami operator.
Obviously one can apply any method for compressing 3D data also to video coding, if one models a video or parts of it as a spatiotemporal data block. Other video coding approaches have been suggested that use inpainting concepts within the H.264/MPEG4 AVC video coding standard [41, 97, 86]. In [119], the authors have implemented a video compression system that combines PDEbased image compression with pose tracking.
Cryptographic applications are studied in [92], where the authors combine PDEbased compression with steganographic concepts in order to hide one image in another one, or parts of an image in other parts.
2.8 Relations to Other Methods
PDEbased codecs that select the inpainting mask with triangular or rectangular subdivisions have structural similarities to piecewise polynomial image approximations based on adaptive triangulations or quadtrees. Extensive research has been performed on these approximations; see [44, 124, 125, 40, 35, 75, 82, 77, 13, 122, 30, 83] and the references therein. Often such approaches use linear polynomials within each triangle or rectangle. In this case, one may also interpret them as a solution of the Laplace equation with Dirichlet boundary data obtained from a linear interpolation of the vertex data. Thus, they can be seen as specific localised PDEbased codecs. Alternative interpretations can be given in terms of finite element approximations. In the onedimensional case, the linear spline approximation is even fully equivalent to homogeneous diffusion inpainting.
Inpainting with linear differential operators allows also an analytical representation of its solution in terms of the Green’s function of the operator. This has been used in [66] to relate linear PDEbased inpainting to sparsity concepts: Discrete Green’s functions serve as atoms in a dictionary that gives a sparse representation of the inpainting solution. In the onedimensional case, discrete analytic derivations are presented in [110].
On the other hand, continuous Green’s functions are also used as radial basis functions in scattered data interpolation [20]. Moreover, some radial basis functions can be seen as rotationally invariant multidimensional extensions of spline interpolation. This establishes a connection between PDEbased inpainting and image representations in terms of radial basis functions and splines, such as [4, 129, 49, 38].
3 Inpainting with Homogeneous Diffusion
In this section, we will briefly present the homogeneous diffusion reconstruction method (also known as Laplace interpolation) that lies at the basis of our approach and will be used for most of the results presented here. Homogeneous diffusion is among the simplest inpainting processes that one can consider. This makes it suitable for theoretical investigations. Nevertheless, one should emphasise that even such a simple method can yield very good results if the interpolation data is chosen in an appropriate way; see e.g. [8, 63, 91, 101].
Let be a smooth function on some bounded domain with a sufficiently regular boundary . Throughout this work, we will restrict ourselves to the case (1D signals) and (greyscale images), although many results will also be valid for arbitrary . Moreover, let us assume that there exists a set of known data . Homogeneous diffusion inpainting considers the following partial differential equation with mixed boundary conditions.
(1)  
(2)  
(3) 
where denotes the derivative of in outer normal direction. We assume that both boundary sets and are nonempty. Equations of this type are commonly referred to as mixed boundary value problems and sometimes also as Zaremba’s problem named after Stanislaw Zaremba, who studied such equations already in 1910 [141]. The existence and uniqueness of solutions has been extensively analysed during the last century. Showing that (1)–(3) is indeed solvable is by no means trivial. Generally, one can either show the existence of solutions in very weak settings or one has to impose strong regularity conditions on the domain. The references [5, 96] discuss the solvability in a general way. In [95] it is shown that a Hölder continuous solution exists if the data is sufficiently regular. In [19], the author discusses the regularity of solutions on Lipschitz domains. A more general existence theory for solutions is given in [50]. Further investigations on mixed boundary value problems can also be found in [56, 81]. A particularly easy case is the 1D setting, where the solution can obviously be expressed using piecewise linear splines interpolating data on .
For convenience we introduce the confidence function which states whether a point is known or not:
(4) 
Then it is possible to write (1)–(3) as
(5)  
(6) 
For most parts of this text we will prefer this formulation, as it is more comfortable to work with in the discrete setting, which can be obtained as follows. Let be the set of indices enumerating the discrete sample positions, and the subset of indices of known samples. Then we can express the discrete version of as a vector and the corresponding solution as a vector . The binary mask , where if and otherwise, indicates the positions of the Dirichlet boundary data. At last, the Laplacian is discretised by standard means of finite differences [98]. Hence, a straightforward discretisation of (5)–(6) on a regular grid yields
(7) 
where is the identity matrix, is a diagonal matrix with the components of as its entries, and is a symmetric matrix, describing the discrete Laplace operator with homogeneous Neumann boundary conditions on . Its entries are given by
(8) 
where are the neighbours of pixel in direction, and is the corresponding grid size. By a simple reordering of the terms, (7) can be rewritten as the following linear system:
(9) 
It has been shown in [90] that this linear system of equations has a unique solution and that it can be solved efficiently with bidirectional multigrid methods.
4 Optimisation Strategies in 1D
The choice of the position of the Dirichlet boundary data has a strong influence on the quality of the reconstruction. In order to gain some analytical insight on this problem, we restrict ourselves in this section to the 1D continuous setting and consider in Section 4.1 how to find the optimal positions of the mask points . Optimising position and value of the Dirichlet data simultaneously will be treated in Section 4.2.
4.1 Optimal Knots for Interpolating Convex Functions
In this section, we assume that is always a strictly convex and continuously differentiable function. Our goal is to find a distribution of knot sites in the interval such that the interpolation error with piecewise linear splines becomes minimal in the norm. For 1D formulations, this is equivalent to determining the Dirichlet boundary data in (1)–(3) such that the solution gives the best possible reconstruction to in the sense. In concrete terms, this means that we seek positions inside the interval with
(10) 
and a piecewise linear spline interpolating at the positions , such that the error
(11) 
becomes minimal. This optimisation problem is also called free knot problem and has been studied for more than fifty years. We refer to [61, 76] for similar considerations as in our work and to [11, 37, 39, 73] and the references therein for more general approaches. Note that it is quite common to relax the interpolation condition and to generalise the problem to explicitly allow approximating functions. Further details on interpolation and approximation techniques can be found in [112]. Alternative ways to optimise linear spline interpolation are discussed e.g. in [10].
For technical reasons, the knots and in (10) are fixed at the boundary of the considered interval. The choice of the norm is especially attractive in this case: Due to the convexity of , the integrand
is nonnegative for all in . Thus, we can simply omit the absolute value in (11). This observation simplifies the derivation of optimality conditions below. Furthermore, the requirement that the linear spline must coincide with at the knot sites allows us to state the interpolating function in an analytic form: If , then
(12) 
Note that requiring that the are distinct is necessary to avoid a division by . A straightforward computation gives
(13) 
This expression also corresponds to the error of the composite trapezoidal rule for the numerical integration of with nonequidistant integration intervals.
Equation (4.1) will be our starting point for developing an algorithm to determine the optimal knot sites. However, before we will do so, we present a result which shows the difficulty of the free knot optimisation problem.
Proposition 1
If the function is strictly convex and twice continuously differentiable, then the energy (11) is convex in for 3 knots (i.e. ). In general, it is not convex for any other number of knots larger than 3 (i.e. ).
Proof
In the case of three knots, we only have one free variable, and it follows from (4.1) that the error is given by
(14) 
Further, the second derivative of is given by
(15) 
In order to demonstrate that the error function is nonconvex for a higher number of interpolation points, it suffices to provide a counterexample. Let us consider the function on the interval as well as the two knot sets and . If were convex, then it must also be convex along the line in that connects and (interpreting both knot sets as vectors in ). However, the plot of with depicted in Fig. 1 displays a nonconvex behaviour. ∎
We remark that the previous proposition does not claim that the energy can never be convex for more than three knots. Indeed, for affine functions of the form with real coefficients and the energy is identical 0 for any number of knots, and thus also convex. This shows that even under weaker conditions as in the proposition, the energy may be convex.
4.1.1 A New Algorithm for the Free Knot Problem for Linear Spline Interpolation
A necessary condition for a minimiser of (4.1) is . However, it follows from Proposition 1 that this condition is not sufficient. There may exist several global and/or local minima. A simple computation leads to the following system of nonlinear equations in the unknowns ,…,:
(16) 
It should be noted that each knot only depends on its direct neighbours. Therefore, odd indexed knots only depend on even indexed knots and vice versa. Since is strictly convex, it follows that is strictly monotonically increasing. Thus, its inverse exists and is unique at every point of the considered interval. This motivates the following iterative scheme.
 Algorithm 1:

Spatial Optimisation in 1D
 Input:

, the desired number of knots.
 Initialisation:

Choose any initial distribution with and , e.g. a uniform distribution of the knots on the interval .
 Repeat until a fixed point is reached:

 Update even knots for all possible :

(17)  Update odd knots for all possible :

(18)
 Output:

The final knot distribution .
Observe that the above scheme is similar to a RedBlack Gauß–Seidel scheme for the solution of linear systems: We update the variables iteratively and use newly gained information as soon as it becomes available without interfering with the direct neighbours of the data point. An important issue is that the knots are not allowed to fall together. The following proposition shows that this cannot happen.
Proposition 2
Proof
Since is differentiable on for all , the mean value theorem guarantees the existence of a in such that
(20) 
Thus, our iterative scheme must necessarily preserve the order of the knots.∎
The next theorem shows that the iterates from our algorithm monotonically decrease the considered energy.
Theorem 4.1
Proof
By alternating between the update of the odd and even indexed sites, the problem decouples. The new value of only depends on and , which are fixed. Therefore, the problem is localised, and we can update all the even/odd indexed knots independently of each other. It follows that one iteration step is equivalent to finding the optimal such that the interpolation error becomes minimal on for all even/odd . The global error can now be written as the sum of all the errors over the intervals and will necessarily decrease when each term of this sum decreases. Proposition 1 shows that the considered energy is convex for three knots. Thus, is not only a necessary, but also a sufficient condition for being a minimum on . This means that (17) will not increase the error when updating even indexed knots and subsequently, (18) will not increase the error while updating the odd numbered sites. Therefore, it follows that the overall error cannot increase in an iteration step.∎
Since the error is bounded from below by 0, we also obtain the following result.
Corollary 1
The sequence is convergent.
Note that the previous statements do not claim the convergence of the sequence . Since the problem is nonconvex, the global minimum of the considered energy is not necessarily unique. In that case, our algorithm might alternate between several of the minimisers. These minimisers are, from a qualitative point of view, all equivalent, since they yield the same error. However, they might not be the global minimiser. Also note that the theorem of BolzanoWeierstrass asserts that contains at least one convergent subsequence since all the must necessarily lie in the interval . Finally, we remark that in our test cases the results were always of very good quality. This gives rise to the conjecture that the found knot distributions are close to a global minimum.
4.1.2 Numerical Experiments
Let us now perform experiments with our new algorithm. We consider the convex function
on the interval . Figure 2 depicts the evolution of the error, while Fig. 3 exhibits the resulting distribution of the knots. The experiments were done with a randomised initial distribution of the knots and iterations. Interestingly, the iterates always converged already after very few iterations. In accordance with the theory from the previous section, we also note that the error is monotonically decreasing, both with respect to the number of knots and with the number of iterations. Another interesting observation is the influence of an additionally introduced knot on the other knots: Adding further interpolation sites has a global influence on all the other knots. Moreover, we encounter a higher knot density in regions with large curvature than in flat regions. This is in agreement with the theory of Belhachmi et al. [8].
4.2 Optimal Knots for Approximating Convex Functions
In the previous section, we have seen how to optimise the location of the interpolation data. A next step would be to investigate how much an optimisation of the grey value data at these sites could further improve the result. To do so, we no longer require that the function value and the value of the interpolant must coincide at the knot locations. Instead, we consider an approximation problem and require that the overall reconstruction minimises the error on the considered domain. As in the previous section, we again use piecewise linear splines. Such approximations by means of first degree splines have a long history, and there exist results for many special cases. In [123], the best approximation of strictly convex functions in the least squares sense has been analysed, while [34] cites general conditions for the determination of best approximations of strictly convex functions in the sense. Theoretical results can also be found in [73]. In [100], it is shown that an optimal approximation of convex functions with splines is not necessarily unique, a problem which we already mentioned in the stricter case of convex spline interpolation. Finally, [31, 109] supply algorithms for determining such approximations.
In order to minimise the error between a strictly convex function and a piecewise linear spline in an approximation setting, we use an algorithm by Hamideh [61]. It relies on a classical result from approximation theory:
Theorem 4.2
For any function , which is strictly convex in , the optimal straight line approximation to in the sense on interpolates the function at the points
(22) 
Proof
This result can be verified by direct computation. Alternatively, a detailed proof can also be found in [112]. ∎
This gives rise to the following algorithm.
 Algorithm 2:

Optimal Approximation in 1D
 Input:

the number of desired knots.
 Initialisation:

Choose an arbitrary distribution of knots with and .
 Compute:

Repeat these steps until a fixed point is reached.

On each subinterval , define the points
(23) as well as the corresponding line passing through these points:
(24) 
Determine for all the new knot position by intersecting the lines and , e.g. solve
(25) for .

 Output:

The final knot distribution .
The algorithm of Hamideh is similar to our Algorithm 1 for interpolation: Both use iteratively locally optimal solutions to perform a global optimisation. In addition, the following properties are shown in [61]:

The resulting sequence of errors is convergent.

For all , it holds that
(26a) (26b) where is our sequence of knot sets. Thus, two distinct knots cannot fall together.

If the unknown optimal spline fulfils certain continuity conditions, one can guarantee that the above algorithm convergences towards an optimal solution.
Further analytic results on the optimal knot distribution can also be found in [76].
4.2.1 Numerical Experiments
We investigate the knot distribution for different number of knots and repeat the experiments of Section 4.1.2 using now the algorithm of Hamideh. Our main interest lies in the performance of a combined optimisation over a sequential optimisation of the position and the corresponding value of the interpolation data. To this end, we also consider a tonal optimisation that adjusts the function values for our free knot result. This additional task is carried out as a postprocessing step. Since the position of the mask values is optimised in an setting, we use the same framework for the tonal optimisation, too. Due to the fact that for a fixed set of knots, we can express our interpolating spline as a linear combination of first degree BSplines, we can express the tonal optimisation as a linear regression task with respect to the norm. It is wellknown that such problems can be reduced to linear programs which can efficiently be solved by standard solvers from the literature. Our findings are summarised in Tab. 1, and a visual comparison between the obtained mask sets for a set of seven knots is given in Fig. 4.
Our method  Hamideh  

no tonal optim.  with tonal optim.  
5  12.501  4.229  3.982 
7  5.134  1.810  1.748 
9  2.785  0.999  0.977 
Although the knot distribution in Fig. 4 for the approach of Hamideh is similar to the one found with interpolation, the corresponding errors in Tab. 1 are significantly lower, since the reconstruction can adapt much better to the original function when compared to the interpolation framework. Nevertheless, we also observe a substantial gain of the tonal optimisation. Even though we cannot outperform the combined optimisation, we achieve competitive results, in particular for larger values of .
Our investigations in the 1D case show that a careful optimisation of the positions (spatial information) and greyvalues (tonal information) of the data can lead to significant improvements compared to a purely spatial tuning. Moreover, a sequential optimisation is almost as good as a combined optimisation of the data. The next step in our strategy will be to adapt these ideas to the twodimensional setting such that we can efficiently apply them on discrete image data.
5 Optimisation Strategies in 2D
Unfortunately, our interpolation algorithm from Section 4 can hardly be used directly on 2D image data. First of all, we would be restricted to convex/concave images. Of course, one could always segment an arbitrary image into convex and concave regions and treat them separately, but in many cases this would lead to heavily oversegmented images and a suboptimal global distribution of the data points for the reconstruction. Secondly, the solution of (5)–(6) in higher dimensions cannot be written in terms of simple piecewise linear interpolation: To characterise it analytically would require more complicated expressions that involve Green’s functions [66]. Therefore, we want to consider other approaches here. Nevertheless, they exploit the basic ideas and findings of a spatial and tonal optimisation from the previous section.
For practical reasons, we present a twostep optimisation strategy: First we consider an interpolation approach to optimise the spatial data. Afterwards we optimise the tonal information at the points obtained in the first step. This strategy can be justified by the fact that in the 1D case, the obtained knot distributions for the interpolation and approximation algorithms were similar. For the optimisation of the data sites we investigate two methods: An analytic approach proposed in [8] that exploits the theory of shape optimisation, and our probabilistic sparsification approach from [91]. Finally we complement our pure interpolation framework with a best approximation scheme that incorporates tonal optimisation in our model.
5.1 Optimising Spatial Data
5.1.1 Analytic Approach
In order to approach the question about the optimal data selection, Belhachmi et al. [8] use the mathematical theory of shape optimisation. This theory optimises topological properties of given objects. Belhachmi et al. seek the optimal shape of the set of Dirichlet boundary data in (1)–(3). They show that the density of the data points should be chosen as an increasing function of the Laplacian magnitude of the original image. However, this optimality result returns a continuous density function rather than a discrete pixel mask. This yields an additional problem that is also discussed in [8], namely, how to obtain the best discrete (binary) approximation to a continuous density function. Belhachmi et al. suggest the following strategy to obtain a point mask based upon . First one applies a small amount of Gaussian presmoothing with standard deviation to obtain . This is a common procedure in image analysis to ensure the differentiability of the data. Then one computes the Laplacian magnitude and rescales it such that its mean represents the desired point density given as fraction of all pixels. Finally any dithering algorithm that preserves the average grey value can be applied to obtain the binary point mask.
In [8] the classical error diffusion method of Floyd and Steinberg [51] is used. However, we favour the more sophisticated electrostatic halftoning [116] over simpler dithering approaches, since it has proven to yield very good results for discretising a continuous distribution function. Figure 5 shows the superiority of electrostatic halftoning over FloydSteinberg dithering in terms of the mean squared error (MSE)
(27) 
where denotes the reconstruction, the original image and the set of all pixel indices. Since any dithering method introduces errors, it remains an open question if this is the most suitable approach to discretise the continuous optimality result.
The theory of Belhachmi et al. [8] demands the data points to be chosen as an increasing function of . However, the optimal increasing function depends on the details of the underlying model. One option is to use the identity function of the Laplacian magnitude, as was done in [91]. In the present work, we introduce an additional parameter and dither instead. This choice can also be motivated from the original paper [8] and allows to tune the density of the selected points in homogeneous regions. The complete method, which we call the analytic approach, is summarised below.
 Algorithm 3:

Analytic Approach
 Input:

Original image , Gaussian standard deviation , exponent , desired pixel density .
 Compute:


Perform Gaussian presmoothing with standard deviation : .

Compute .

Rescale to
where is the maximal possible grey value.

Apply electrostatic halftoning to obtain .

 Output:

Discrete pixel mask .
In order to evaluate the analytic approach, let us apply it on the test image trui (see Fig. 5(a)). We aim at a mask pixel density of 4% () of all pixels. The parameters and are chosen such that the MSE of the reconstruction becomes minimal. This was achieved with and . For comparison, we also consider two masks with the same amount of pixels: a mask with points on a regular grid and a randomly sampled mask. Figure 8 shows these two masks and the one created with the analytic approach in the first column as well as the corresponding reconstructions in the second column. We observe that the reconstruction quality highly benefits from a dedicated point selection. These results are confirmed by the Columns 3, 4 and 5 of Tab. 4, which depicts quantitative results for several test images from Fig. 5(a), Fig. 6(a), and Fig. 6(c).
As already mentioned, the analytic approach is realtime capable if a fast dithering method is used. However, rather than on speed, the focus of our present work is to maximise the quality of the reconstruction. Therefore, in the next subsection we present an alternative algorithm that is slower but outperforms the analytic approach in terms of quality.
5.1.2 Probabilistic Sparsification
We have seen that the analytic approach offers a clean strategy how to choose optimal spatial data in a continuous image. However, due to certain degrees of freedom in the modelling, errors caused by the dithering algorithm, and discretisation effects, its results on digital images cannot be optimal. As an alternative, we consider now a discretise–then–optimise strategy. This way we can directly search for a binaryvalued mask by working with the discrete inpainting formulation from (7). An immediate consequence of this approach is that there are only finitely many combinations for . Unfortunately, already for an image of size pixels and a desired pixel density of 4% there are possible masks. To tackle this combinatorial problem we suggest a method called probabilistic sparsification. It uses a greedy strategy to reduce the search space.
Let be a given discrete image and let be the function that computes the solution of the discrete homogeneous inpainting process (9) with a mask :
(28) 
The goal is to find the pixel mask that selects a given fraction of all pixels and minimises .
Starting with a full mask, where every pixel is chosen, probabilistic sparsification iteratively removes the least significant mask pixels until a desired density is reached. More specifically, we randomly choose a fraction of candidate pixels from the current mask. These pixels are removed from the mask, and an inpainting reconstruction is calculated. The significance of a candidate pixel can then be estimated by computing the local error, i.e. the squared grey value difference of the inpainted and original image in this pixel. Then we permanently remove the fraction of the candidates that exhibit the smallest local error from the mask, and we insert back again the remaining fraction of the candidates. A detailed description of our algorithm is given below.
 Algorithm 4:

Stochastic Sparsification
 Input:

Original image , fraction of mask pixels used as candidates, fraction of candidate pixels that are removed in each iteration, desired pixel density .
 Initialisation:

and .
 Compute:

Do

Choose randomly a candidate set of pixel indices from .

For all set .

Compute .

For all compute the error .

For all of the largest values of reassign .

Remove the indices from and clear .
while .

 Output:

Pixel mask , such that
The larger the parameters and are chosen, the faster the algorithm converges, since in each step, pixels are removed. After steps there are mask pixels left. Hence, for a density , the algorithm terminates after at most iterations, where denotes the ceiling function, giving the smallest integer not less than its argument.
Because there is a global interdependence between all selected mask pixels, probabilistic sparsification cannot guarantee to give optimal solutions. Therefore, the question arises how the parameters and influence the quality of the resulting mask. To this end, we run several experiments with different and . The results are depicted in Tab. 2. Note that we set the candidate set as well as the set of pixels that are removed to if or would lead to sets smaller than one pixel. The optimal is usually not very large: If too many candidates are removed from the mask, the local error does not provide enough information to select good pixels to remove permanently. The parameter can usually be chosen as small as possible, i.e. such that only one candidate is removed in each iteration. For our test images, this was the case for . With larger values for , the probability that we remove important pixels increases.
0.01  0.02  0.05  0.1  0.2  0.3  0.4  

103.7 1.88  98.2 1.72  87.9 1.61  77.6 1.40  67.7 1.40  66.1 1.36  70.7 1.76  
103.7 1.77  98.0 1.87  87.7 1.74  77.4 1.33  67.8 1.26  66.3 1.41  70.5 1.67  
103.1 1.69  98.3 1.91  88.9 1.76  81.8 1.68  73.0 1.44  68.9 1.81  69.4 1.84  
103.7 1.99  98.7 1.74  92.6 1.57  85.3 1.71  76.4 1.78  71.4 1.56  70.6 1.83  
104.9 2.19  102.7 1.96  98.1 2.07  91.6 1.82  82.7 2.02  77.1 2.03  74.6 1.79  
110.3 2.36  107.2 2.63  103.7 2.26  97.7 2.14  89.5 2.00  83.9 2.22  80.8 2.24 
Tab. 2 illustrates the robustness of the algorithm. Note that the approach is not deterministic and thus always returns different results since the candidates are chosen randomly at each iteration. Although the obtained masks for different seeds differ in most of the selected pixels, we obtain qualitatively comparable results: The standard deviation does not exceed a value of and is even smaller for optimal values for and , when the algorithm is run several times.
In Fig. 8, the images in the third row show the results of the probabilistic sparsification for the test image trui with a mask pixel density of 4% (). To optimise the quality, we use and (cf. Tab. 2). Both the visual as well as the quantitative results outperform the ones of the analytic approach. This can also be observed for the two test images walter and peppers, as is shown in Tab. 4. The parameters and are optimised for each image individually.
5.1.3 Nonlocal Pixel Exchange
As we have seen in the previous section, probabilistic sparsification outperforms the analytic approach. Nevertheless, it is not guaranteed to find an optimal solution. An obvious drawback of probabilistic sparsification is the fact that due to its greedy nature, once a point is removed, it will never be put back into the mask again. Thus, especially at later stages, where only few mask pixels are left, important points might have been removed, so that we end up in a suboptimal local minimum. We now present a method called nonlocal pixel exchange that allows to further improve the results of any previously obtained mask, in our case the one from probabilistic sparsification. It starts with a sparse, possibly suboptimal mask that contains already the desired density of mask pixels. In each step, it randomly selects a set of nonmask pixels as candidates. The candidate that exhibits the largest local error is then exchanged with a randomly chosen mask pixel. If the inpainting result with the new mask is worse than before, we revert the switch. Otherwise we proceed with the new mask. By construction, the nonlocal pixel exchange can only improve the result. This algorithm always converges towards an optimal solution in terms of a exchange of two pixels. Since we exchange at each iteration the same number of candidate pixels it follows that this approach is not equivalent to an exhaustive search through all possible combinations. Thus, one cannot guarantee convergence towards the global minimum. The description below shows the details of the algorithm.
 Algorithm 5:

Nonlocal Pixel Exchange
 Input:

Original image , pixel mask , size of candidate set, the set of pixel indices of the mask .
 Initialisation:

and .
 Compute:

Repeat

Choose randomly pixel indices from and compute the local error .

Exchange step:
Choose randomly a and set .
For the largest value of , set . 
Compute .

If

and .

Update .
else

Reset .

until no pairs can be found for exchange.

 Output:

Optimised mask .
As for the previous algorithms, we are interested in an optimal parameter selection. Table 3 shows the results for different choices of when the nonlocal pixel exchange is applied to the masks from the previous section (randomly selected, regular grid, and analytic approach from Fig. 8) and in addition the one we obtained by probabilistic sparsification (also Fig. 8). Technically one could always choose , but a noticeable speedup can be obtained by choosing a larger . In our experiments, values around result in fastest convergence.
mask  

1  5  10  20  30  40  50  100  
randomly selected  49.88  46.43  44.99  44.78  45.00  45.07  45.25  48.22 
regular grid  49.67  45.79  45.19  44.76  45.16  45.56  45.76  48.33 
analytic approach  49.19  45.88  45.14  44.70  45.33  45.56  46.13  49.31 
probabilistic sparsification  43.72  42.45  42.34  41.92  41.97  42.49  42.19  43.65 
The nonlocal pixel exchange improves the masks from any method we have considered so far; see Fig. 7. Especially within the first few iterations we achieve significant quality gains. After iterations, we reach with all three masks an MSE below . The best reconstruction with an MSE of is obtained with the mask from probabilistic sparsification. It is depicted in Fig. 8. As for the previous methods, Tab. 4 provides also quantitative results for the test images walter and peppers. They support the above observations.
5.2 Optimising Tonal Data
So far, all our 2D optimisation approaches only propose a solution for the spatial optimisation problem. However, the results in Section 4.2 show that a tonal optimisation, i.e. an optimisation of the grey values, can be very worthwhile. From a data compression point of view, it is important to notice that changing the grey values at the chosen data points does not increase the amount of data that needs to be stored. The quality improvements, on the other hand, can be remarkable. In this section, we present an approach that allows us to determine the optimal grey values for any given mask.
In order to find the optimal grey values for a fixed mask , we consider the following minimisation approach:
(29) 
where is the standard Euclidean norm, denotes the original image, and the reconstruction from (28). Due to the linearity of with respect to , this is a linear least squares problem. In our next steps, we analyse its wellposedness properties and propose an efficient numerical algorithm.
5.2.1 Existence and Uniqueness Results
Let denote the th canonical basis vector of , i.e. if , and otherwise. Then we call the inpainting echo of pixel . Since is linear in we can express the reconstruction as a superposition of its inpainting echoes:
(30) 
Since is for (i.e. for ), we can simplify this to a summation over :
(31) 
For our minimisation problem (29), this means that the coefficients can be chosen arbitrarily if . For simplicity, we fix them at 0. The remaining with can be obtained by considering the least squares problem
(32) 
where is a vector of size , and is a matrix that contains the vectors as columns. The associated normal equations are given by
(33) 
By construction, the matrix is positive semidefinite. It is, however, not obvious that the eigenvalue cannot appear. The theorem below excludes such a singular situation.
Theorem 5.1
Let be nonempty. Then the matrix is invertible, and thus the linear system (33) has a unique solution.
Proof
Mainberger et al. [90] have proven that for a nonempty set , the matrix is invertible, i.e. exists. Moreover, for we have
(34) 
This shows that is the th column of . Since is invertible, also is regular. Thus, all column vectors of have to be linearly independent. In particular, this implies that also all columns of the matrix are linearly independent. Therefore, is invertible, and the linear system (33) has a unique solution. ∎
5.2.2 Numerical Algorithm
To find the solution of the minimisation problem (29), Mainberger et al. [91] have solved the associated normal equations (33) iteratively. In particular, they have chosen a randomised Gauß–Seidel scheme that updates the grey values at the individual mask points one after another. Alternatively, one could also employ different iterative solvers or solve the system of equations with direct methods such as LU or QRdecompositions [62]. In general, however, one has to state that typical methods which rely on inpainting echoes suffer from a relatively high computational cost to obtain the individual echoes. Although one can precompute and reuse them for the subsequent iteration steps, they still need to be computed at least once. This is particularly inefficient when a large amount of mask points is present. Moreover, precomputing inpainting echoes leads to higher memory requirements. The more recent strategies in [29, 64] avoid the direct computation of the inpainting echoes. Instead, the original minimisation problem (29) is solved directly with the help of primaldual methods or related sophisticated optimisation strategies. This gives more efficient algorithms for tonal optimisation.
In the following, we propose an alternative approach to find optimal tonal data. It is based on an accerated gradient descent strategy that allows an efficient grey value optimisation. Similar to [29, 64], we consider the original minimisation problem (29) directly. Thus, we also avoid computing inpainting echoes. This leads to a reduced runtime as well as to low memory requirements. Below we first explain the classical gradient method with exact line search. Afterwards we present a novel variant that benefits from an accelation with a fast explicit diffusion scheme in the sense of Grewenig et al. [59].
Our goal is to minimise the objective function
(35) 
Starting with an initialisation , a gradient descent scheme minimises this energy by iteratively updating the current grey values for as
(36) 
with a step size and the gradient depending on the iterates . Denoting the inpainting solution at iteration step by , i.e. , the gradient can be written as
(37) 
where is the Jacobian of with respect to the second component. With the definition of from (28), we obtain
(38) 
where we have exploited the symmetries of the matrices and . Computing the iterates and the gradient means to solve a linear system of equations for each of them. This can be done efficiently with a bidirectional multigrid solver as is suggested in [90].
The main parameter that we have to specify is the step size . As one possibility, it can be optimised to yield the largest possible decay of the energy in each step. This comes down to the least squares problem
(39) 
Exploiting the linearity of in the second component allows to obtain the minimiser in closed form:
(40) 
This strategy is also known as exact line search. It guarantees that the sequence converges to the minimum of the energy [15]. As stopping criterion, we consider the relative norm of the gradient, which should approach zero at the optimum. In other words, we stop as soon as
(41) 
with some small number . An algorithmic overview of the classical gradient descent method with exact line search is shown below. It serves as our baseline method.
 Algorithm 6:

Grey Value Optimisation with Exact Line Search
 Input:

Original image , pixel mask .
 Initialisation:

, .
 Compute:
 Output:

Optimised grey values .
In order to speed up the gradient descent approach, we propose an accelerated algorithm based on a socalled fast explicit diffusion (FED) scheme. First applications of FED to image processing problems go back to Grewenig et al. [59]. FED can be used to speed up any explicit diffusionlike algorithm that involves a symmetric matrix. While classical explicit schemes employ a constant time step size that has to satisfy a restrictive stability limit, FED schemes involve cycles of time step sizes where up to 50 % of them can violate this stability limit. Nevertheless, at the end of each cycle, stability in the Euclidean norm is achieved. In contrast to classical explicit schemes that reach a stopping time of order in steps, FED schemes with cycle length progress to . This allows a very substantial acceleration.
Since the gradient descent scheme can be seen as an explicit scheme with a symmetric matrix, FED is applicable: If denotes a fixed step size for which (36) is stable in the Euclidean norm, one replaces it by the cyclically varying step sizes
(44) 
For large cycle lengths , one should permute the order of the step sizes to tame rounding errors; see [133] for more details on this and an exhaustive explanation of the FED framework in general. The FED cycles should be iterated until the stopping criterion (41) is fulfilled. A related cyclic optimisation strategy has also been investigated in [120].
In order to determine the individual step sizes within each cycle, we have to find a step size that is within the stability limit. The following result is wellknown in optimisation theory [99]: If is continuous and its gradient is Lipschitz continuous, i.e. there is a constant sucht that
(45) 
for all and , then the gradient descent scheme is stable for all step sizes fulfilling
(46) 
It is straightforward to verify that in our case can be chosen as the squared spectral norm of the inpainting matrix , i.e.
(47) 
where denotes the spectral radius of the symmetric matrix . One possibility to estimate the spectral radius is to use Gershgorin’s circle theorem. However, this may give a too pessimistic estimate. Instead, we propose to use the power method to determine the maximum eigenvalue of ; see e.g. [2]. The convergence of this method turns out to be relatively fast, such that one obtains already a reasonable estimate of the spectral radius after iterations.
Although it may appear tempting to choose close to the stability limit , this can result in a suboptimal convergence speed, since high frequent error components are damped too slowly. Our experiments suggest that a good choice for is two third of the stability limit:
(48) 
Similar strategies are also common e.g. in multigrid approaches that use a damped Jacobi method with damping factor as a baseline solver [17].
Below we give an algorithmic overview over all steps to perform tonal optimisation with FEDaccelerated gradient descent:
 Algorithm 7:

Grey Value Optimisation with FED
 Input:

Original image , pixel mask , FED cycle length .
 Initialisation:

, .
 Compute:
 Output:

Optimised grey values .
Since the grey value optimisation problem is strictly convex, all convergent algorithms yield the same minimiser and are therefore qualitatively equivalent. They only differ by their run times. The following experiment gives an impression of realistic run times of both gradient descent algorithms for greyvalue optimisation. As before, we consider the inpainting problem with the test image trui ( pixels) and 4 % mask density. The stopping parameter for our iterations was set to . With a C implementation on a desktop PC with Intel Xeon processor (3.2GHz), the exact line search algorithm requires seconds to perform iterations. A corresponding FED algorithm with needs only seconds to compute cycles of length . In comparison, a CPU implementation of the primal–dual approach of Hoeltgen and Weickert [64] requires for the same problem a run time of seconds. This illustrates the favourable performance of the FEDaccelerated gradient descent method.
As the FED algorithm is based on an explicit scheme, it is also wellsuited for implementations on parallel hardware such as GPUs. This can lead to very substantial additional accelerations.
5.2.3 Qualitative Evaluation
In order to evaluate the capabilities of the grey value optimisation, we apply it to the masks obtained so far for the test image trui. In all cases this results in a clear improvement of the reconstruction quality, both visually and in terms of their MSE; see Fig. 8 as well as Tab. 4. This confirms also our impressions from the 1D scenario in Section 4.2. Especially suboptimal masks benefit a lot from grey value optimisation: It can compensate for many deficiencies that are caused by an inferior spatial selection strategy. This becomes particularly apparent when considering the result for the random mask or the mask on the regular grid. It is remarkable how much gain in quality is possible by simply choosing different grey values. However, also the reconstruction we obtain with the best mask using probabilistic sparsification and nonlocal pixel exchange can be improved substantially: The MSE is reduced from to . Similar improvements can also be observed for the images walter and peppers; see Tab. 4. The best reconstructions are depicted in Fig. 6.

mask  reconstruction  reconstruction with optimal tonal data 

random mask 


MSE: 273.10  MSE: 151.25  
regular grid 


MSE: 181.72  MSE: 101.62  
anal. approach 


MSE: 84.04  MSE: 42.00  
probab. sparsif. 


MSE: 66.11  MSE: 36.04  
nonl. pixel exch. 


MSE: 41.92  MSE: 27.24 
image  GVO  randomly  regular  analytic  probabilistic  nonlocal  

selected  grid  approach  sparsification  pixel exchange  
trui  no  273.10  181.72  84.04  (, )  66.11  (, )  41.92  () 
yes  151.25  101.62  42.00  36.04  27.24  
walter  no  297.07  184.00  39.85  (, )  32.96  (, )  18.37  () 
yes  155.50  91.97  20.19  19.24  12.45  
peppers  no  278.61  185.04  70.05  (, )  44.85  (, )  29.63  () 
yes  156.15  104.41  43.77  28.58  25.10 
6 Extensions to Other Inpainting Operators
We have seen that optimising the interpolation data allows to obtain high quality reconstructions with only of all pixels. These results are also remarkable in view of the fact that so far we have used a very simple interpolation operator: the Laplacian. It is unlikely that it offers the best performance. In this section we investigate the question if the algorithms of Section 5 can be used with, or extended to more advanced inpainting operators. We focus on two representative operators that have proven their good performance in the context of PDEbased image compression with sparse data [53, 54, 29, 118]: the biharmonic operator and the EED operator.
A straightforward extension of the Laplacian is the biharmonic operator, i.e. we replace in (5) by . Using it for interpolation comes down to thin plate spline interpolation [43], a rotationally invariant multidimensional generalisation of cubic spline interpolation. Compared to the Laplace operator, it yields a smoother solution around the interpolation data, since its Green’s function is twice differentiable. This avoids the typical singularities that distort the visual quality with homogeneous diffusion inpainting. These artifacts are caused by the logarithmic singularities of the Green’s function of the twodimensional Laplacian. On the other hand, biharmonic inpainting is prone to over and undershoots, i.e. the values of may leave the range of the inpainting data in . This cannot happen for homogeneous diffusion inpainting which fulfils a maximum–minimum principle. Nevertheless, a number of evaluations show that biharmonic inpainting can offer a better reconstruction quality than homogeneous diffusion inpainting [53, 54, 29, 118].
Secondly we consider an anisotropic nonlinear diffusion operator, namely edge enhancingdiffusion (EED). Originally it has been introduced for image denoising [131], and its application in image compression goes back to Galić et al. [53]. Using EED means that we replace in (5) by . The positive definite matrix is the socalled diffusion tensor. It steers the diffusion process by its eigenvectors and eigenvalues. They depend on the gradient of a Gaussiansmoothed version of the image , where denotes the standard deviation of the Gaussian. The first eigenvector of is chosen to be orthogonal to