Parametric estimation in noisy blind deconvolution model: a new estimation procedure

# Parametric estimation in noisy blind deconvolution model: a new estimation procedure

\fnmsEmmanuelle \snmGautheratlabel=e1]gauthera@ensae.fr [    \fnmsGhislaine \snmGayraudlabel=e2]gayraud@ensae.fr [ E. Gautherat, CREST, Timbre J340, 3 av. P. Larousse, 92241 Malakoff Cedex and, Economic Faculty de Reims, France, \printeade1
G. Gayraud, CREST, Timbre J340, 3 av. P. Larousse, 92241 Malakoff Cedex and, LMRS, University de Rouen, France, \printeade2
###### Abstract

In a parametric framework, the paper is devoted to the study of a new estimation procedure for the inverse filter and the level noise in a complex noisy blind discrete deconvolution model. Our estimation method is a consequence of the sharp exploitation of the specifical properties of the Hankel forms. The distribution of the input signal is also estimated. The strong consistency and the asymptotic distribution of all estimates are established. A consistent simulation study is added in order to demonstrate empirically the computational performance of our estimation procedures.

[
\kwd

2006 \volume0 \issue0 \firstpage1 \lastpage23 \startlocaldefs \endlocaldefs \runtitleParametric estimation in noisy blind discrete deconvolution

{aug}

and

class=AMS] \kwd[Primary ]62M10 62M09 \kwd62F12 \kwd[Secondary ]60E07

Identification \kwdnoisy blind deconvolution kwdlinear discrete system \kwddigital signal \kwdinverse filter \kwdinfinitely divisibility \kwdHankel matrix \kwdcontrast function

## 1 Introduction

Let be the output process of an unknown deterministic linear time-invariant sequence , which is driven by an unobservable input sequence added with a noise sequence , where is an unknown level noise. In other words, is issued of the noisy blind deconvolution model defined by

 Yt = (u⋆X)t+σ0Wt=∑k∈ZZukXt−k+σ0Wt,∀t∈ZZ (1.1)

where is assumed to be a complex discrete finite valued input process and the real-valued filter is supposed to invertible.

Since the sequence is invertible, it exists the inverse filter of , such that . Note that if has a finite length, the system is a noisy moving average and if has a finite length, the system is a noisy autoregressive. From observations , the objective is to restitute the distribution of the input process which requires the estimation of the level of noise and the filter . In digital signal framework, that is, when is a discrete valued input, the Bayesian theory combined with the MCMC methods is often used to obtain the a posteriori distribution of the signal process (Liu & Chen [1995]; Li & Shedden [2001]). Here, we adopt the approach which consists in estimating the inverse filter instead of estimating the filter itself . The problem of estimating as well as the distribution of the input process from the observed output data is known as an identification problem. This identification problem when the distribution of the input signal is discrete with a finite number of possible values, is discussed in a number of papers: Li [1992, 1993, 1995, 1999, 2003], Li & Mbarek [1997], Gamboa & Gassiat [1996] or Gassiat & Gautherat [1998, 1999] and Gautherat [1997, 2002]. In the real and complex cases without noise, Li [1995] proposed an estimation method for the inverse filter when the support points of the input signal are supposed to be known. Gamboa & Gassiat [1996] in the non-noisy real case under a general setting (unknown distribution of the input signal) proved the consistency of the inverse filter estimate and that of the cardinality of the support’s points. Gassiat & Gautherat [1998] extended in some sense the previous paper in considering data with an additive noise and proposed consistent estimates for the support’s points and also for the level noise. Gassiat & Gautherat [1999] studied the rate of convergence of the signal estimate and the inverse filter estimate in the parametric framework whereas Gautherat [2002] also established asymptotic distribution for the points and their corresponding mass of the signal distribution. To construct the cost function, some of these authors explicitly used the alphabet of the signal values (Li [2003, 1992, 1993, 1995, 1999]; Li & Mbarek [1997]), whereas Gamboa & Gassiat [1996], Gassiat & Gautherat [1998, 1999] and Gautherat [1997, 2002] used only the cardinality of the alphabet (non communicative situation).

Chen & Li [1995], Gamboa & Gassiat [1996, 1997a, 1997b], and Gunther & Swindlehurst [2000] showed that incorporating the finite alphabet information into blind deconvolution procedures can greatly improve the accuracy of the filter estimation and that of the signal distribution in the non-noisy situation. Due to the judicious utilization of the finite alphabet information, these methods enjoyed a number of desirable properties in the non-noisy situation such that, the ability to handle (with super statistical efficiently) a large class of filters , the ability to handle non-stationary (see the papers of Li) or non i.i.d. signals (Gamboa & Gassiat [1996]) without modeling or estimating their statistical characteristics.

In the model (1.1), the blind deconvolution of the data is much more complicated due to the presence of the noise. Gassiat and Gautherat [1998, 1999] and Gautherat [1997, 2002] proposed a consistent estimation procedure which is based on the minimization of a penalized empirical constrast function. In practice, this method needs to adjust the penalty term and requires a starting point which is near enough to the true value in order to avoid a local minimizer.

Among these works, the main contribution of our paper is to provide in the complex case, a new estimation procedure of the level of noise and the inverse filter, which is a consequence of the sharp study of the Hankel matrix when the noise is supposed to be Gaussian. Our estimation procedure is built on an explicit empirical criteria (it is not the same in the papers of Gassiat Gautherat [1998, 1999] and Gautherat [1997, 2002]), and it is based on the roots of an empirical function. Our estimation method is competitive from a theoretical point of view (consistency and asymptotic distribution of all estimates) and from a practical side (our numerical results are quite good).

The paper is organized as follows. The assumptions on the model defined by (1.1) are given in Section 2. In the same section, the level of noise, the inverse filter and the law of the input signal are characterized. These characterizations are used to define in Section 3 our estimation procedures. The strong consistency and the asymptotic distribution of all the estimates are stated in Section 4. The proofs are postponed in Section 7. Section 5 deals with a simulation study in which the computational performance of our estimation procedures is empirically demonstrated. Some concluding remarks including a comparison of our numerical results with those given in Gassiat Gautherat [1998], are made in Section 6.

## 2 Assumptions and Characterization

### 2.1 Assumptions

• is a sequence of discrete complex random variables with a common discrete support ; is unknown, for . The integer is known. The components of are given by the lexicographical order.

• is identically distributed with support points and is such that and for any . The integer is known. The components of are given by the lexicographical order.

• is a stationary ergodic process.

• , ,

• , , where and are independent. The random variables and are real centered i.i.d. Gaussian with a variance equal to and they are both independent of .

• is continuous and does not vanish on .

Note that Assumption (M5) guarantees that is invertible and that both and its inverse are in . Since is invertible, the initial observed process can be transformed by applying any filter to . The resulting process is then defined by

 ∀t∈ZZ,Zt(s)=(s⋆Y)t. (2.1)
• The set of the filters is defined by , where the function is known, is injective. The true inverse filter is in . The set is compact such that if satisfy . The parameter is unknown.

###### Remark 2.1.

(i) Assumption (M6) describes the set at which the filters we consider, belong. The set is parametric via the unknown parameter vector , so that estimating the inverse filter is reduced to estimate .

(ii) Since the true inverse filter belongs to and since the function is injective, it implies that it exists in the interior of such that . Moreover, the last part of Assumption (M6) guarantees the identifiability of the model; in particular, it allows to avoid problems of scale and delay.

(iii) In the non noisy case, note that where is defined by equation (2.1). Our estimation procedures will be based on the process , .

### 2.2 Characterizations

In the non-noisy framework (), Gamboa & Gassiat [1996] stated that, under (M1), (M3), (M4) and (M5), takes at most distinct values if and only if up to scale and delay. The definition of as a subset of avoids identifiability problems which could be generated by possible changes in scale or delay. But since the characterizations of and are valid for any inverse filter in , from now and only for this paragraph, let us consider any filter in .

The random variable for having at least points of support, the characterization of and is made via a contrast function which is able to distinguish discrete random variables whose support is of cardinality from others whose support is of cardinality greater than .
Let be the conjugate moment column vector of dimension defined by

 dj(p+1)+(k+1)(s)=IE((Z0(s))k¯(Z0(s))j),∀(j,k)∈{0,…,p}2,∀s∈l1(ZZ). (2.2)

Note that if one rewrites in a –matrix where and correspond respectively to the columns and the rows then is a Hankel matrix. For all and , it is always possible to derive the conjugate pseudo-moment from in inverting the following system:

 d(s)=A(σ∥s∥2)~d(σ,s), (2.3)

where denotes the -norm of . The matrix , which depends on , is an invertible -matrix such that for any pair and ,

 Aj(p+1)+k+1,m(p+1)+l+1(β)=ClkCmjγj−m,k−lβk−l+j−m,

where . Note that under (M4) one gets . Since , denote . The inversion of the system (2.3) is explicit and is defined by the following: and ,

 A−1j(p+1)+k+1,m(p+1)+k−j+m+1(β)=(−1)j−mCj−mkCmjγj−mβ2(j−m). (2.4)
###### Remark 2.2.

Denote by and , and . Then, and due to the infinite divisibility of the Gaussian distribution, is equal to

 IE[((R(s))0+√σ20−σ2(V(s))0))k((¯¯¯¯¯¯¯¯¯¯¯R(s))0+√σ20−σ2¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯(V(s))0)j].

On the other hand, if , has no explicit form, one does not know if it corresponds to a random variable moment. This explains why is called the pseudo-moment vector.

Next, transform the pseudo-moment vector in a –matrix where and correspond respectively to the columns and the rows i.e. . Then, let be the real function defined by:

 ∀σ≥0,∀s∈l1(ZZ),J(σ,s) = det(~D(σ,s)). (2.5)

The characterizations of , , and in the model described by relation (1.1) or equivalently by relation (2.1), are made through the function defined by (2.5). They are established in Gassiat & Gautherat [1999] for and , and in Gautherat [1997, 2002] for and under a more general setting:

• (i) Under assumptions (M1), (M3), (M4) and (M5), the true level noise and the true inverse filter satisfy

 ∀σ<σ0, J(σ,s(ξ))>0,∀ξ∈K (2.6) J(σ0,s(ξ))=0 iff s(ξ)=θ {\rm up to scale an delay}. (2.7)
• (ii) Under (M1), the distribution points are the roots of the polynomial function in defined by , where denotes the eigenvector associated with the smallest eigenvalue of the matrix .

• (iii) Under (M1b), the distribution is uniquely determined as the solution of the following linear system in :

Note that (i) implies that .

## 3 Estimation procedures

To construct our estimates, we consider again the filters of the form , where is unknown and is a known function. For any , let us consider the truncated sequence of as , where is a sequence of nonnegative integers increasing with . Denote also . Define the truncated version of relation (2.1) that is,

 ∀ξ∈K,∀t=1+k(n),…,n−k(n),Zt(¯s(ξ))=k(n)∑k=−k(n)sk(ξ)Yt−k.

Denote the empirical conjugate moment vector of dimension , whose general term is defined as the empirical version of (2.2),

 ∀ξ∈K,dj(p+1)+k+1,n(ξ)=1n−2k(n)n−k(n)∑t=1+k(n)Zkt(¯s(ξ))¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯Zjt(¯s(ξ)).

Then, similarly to (2.3), define as the empirical conjugate pseudo-moment vector whose general term is the solution of the following triangular system:

 ∀ξ∈K,∀σ≥0,~dn(σ,ξ) = A−1(σ∥¯s(ξ)∥2)dn(ξ), (3.1)

where is the matrix defined by (2.4). Finally, let be the empirical version of defined by (2.5):

 ∀ξ∈K,∀σ≥0,Jn(σ,ξ)=det(~Dn(σ,¯s(ξ))),

where is the -matrix corresponding to the rewriting of the empirical pseudo-moment vector in a matrix form. Finally, let us define all the estimates:

• is the solution of the following system

 {Jn(ˆσ0,ˆξ0)=0,ˆσ0=min{σ∈IR+;∃ξ∈K:Jn(σ,ξ)=0}.
• is defined as .

• The support points in are the roots rearranged by the lexicographic order of the polynomial function in , where denotes the eigenvector associated with the smallest eigenvalue of the matrix .

• The probability vector is uniquely determined as the solution of the following linear system in :

Existence of for any .
For any , fix and observe . Then, is obtained via equation (3.1) and depends only on the unknown parameter throughout since is supposed to be fixed. For any , consider the hermitian form

 Qv(σ)=p∑k=0p∑j=0~dj(p+1)+(k+1),n(σ,ξ)vk¯¯¯vj.

Note that is a polynomial function in with real coefficients. For any , note that , where is a Hankel matrix of dimension which corresponds to the rewriting of the vector . Note also that the highest degree of is equal to .
Then, if is an odd number, the coefficient of the highest term is equal to which is negative. It entails that it exists a real positive such that that is, the hermitian form is degenerate and then .
On the other hand, if is an even number, choose any such that , where the denotes the transposed vector of . Then, the term of the highest degree is negative and is of order . As previously, it allows to conclude to the existence of a real positive which is a zero for .

## 4 Main results

Up to now, for any function with one or more arguments, set be the value at of the -th differential of and set be the value at of the -th partial derivative of , where is the order of the derivative with respect to its -th coordinate. Denote also the -vector with components defined by relation (2.2).
Some extra other assumptions are needed to establish the consistency and the asymptotic distribution of all estimates.

• . Denote the asymptotic variance of .

• The application is twice continuously differentiable. For any , and are in . Moreover, and are linearly independent.

###### Theorem 4.1.

Suppose that assumptions (M1)-(M7) hold, then as goes to infinity, converges a.s. to and converges a.s. to 0, where denotes the Euclidean norm in .

###### Corollary 4.1.

Suppose that assumptions (M1b), (M2)-(M7) hold, then as goes to infinity, both and converge a.s. to 0, where denotes the Euclidean norm in .

Before giving the asymptotic distribution of our estimates, recall that the vector is the eigenvector associated with the smallest eigenvalue of , and denote any vector in associated with such that the ’s are the complex coefficients of the polynomial function in having the components of as roots. In particular, note that .

###### Theorem 4.2.

Under assumptions (M1)-(M8) and (P),

 √n(ˆξ0−ξˆσ0−σ0) L−−−−−→n→+∞ N(0d+1,ND1h(~d(σ0,ξ0))M(D1h(~d(σ0,ξ0)))′N′),

where is the -dimensional zero vector and

 M = A−1(σ0∥θ∥2)Γ1(A−1(σ0∥θ∥2))′, N = 1α(∂22J(σ0,ξ0)−1∂1,11,2J(σ0,ξ0)1), α = −∂11J(σ0,ξ0) h(~d(σ0,ξ0)) = det((~dj(p+1)+i+1(σ0,ξ0))i,j∈{0,⋯,p}).
###### Corollary 4.2.

Under (M1b), (M2)-(M8) and (P) and using the previous notations, one gets

 √n(^a−a) L−−−−−→n→+∞ N(0p,14|v⋆p|4C−1BRM¯B′¯R′C−1), √n(^Π−Π) L−−−−→n→∞ N(0p,GRM¯R′¯¯¯¯G′),

where,

 C = diag(K1,…,Kp),{\rm with }Kj=IE(p∏i=1,i≠j|X0−ai|2), B is a p×(p+1)2-matrix which is defined by its columns as follows ∀l∈{1,…,p},{\rm with }(i,j)∈{0,…,p}2, Bl,i(p+1)+j+1=(¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯∂1lvi(a)v∗j+∂1lvj(a)¯¯¯v∗i) R = (Id(p+1)2+∂1,11,2~d(σ0,ξ0)D1h(~d(σ0,ξ0))N) where {\rm Id}d is the identity matrix of size d, G = L−1(Proj+FC−12|v∗p|2B), L = (aij)0≤i≤p−1;1≤j≤p,{% \rm where i denotes the rows and j the columns}, F = (0′p(πjiai−1j)i=2,⋯,p,j=1,⋯,p)′, where i denotes the rows, j denotes the columns, Proj is the projection of C\AAFf\char 63(p+1)2 on C\AAFf\char 63p such that ∀w∈C% \AAFf\char 63(p+1)2,Proj(w)=v,v=(w1,…,wp)′.
###### Remark 4.1.

The proof of Theorem 4.2 is obtained using similar arguments as in Gassiat & Gautherat’s proof of Theorem 4.2 [1999]. The gain, we obtain, with our estimation procedures (without penalty term) is the asymptotic marginal distribution of both and . This is an essential point to obtain the asymptotic distribution of both and .

## 5 Simulation study

In this section, the estimates of , , and are provided using our theoretical estimation procedures. To give stable results, each estimation value is an average over independent simulations of sequence of observations. These estimation values are denoted by in the arrays below. The stability of each estimation value is measured by ”std”, the empirical standard deviation calculated over the simulation runs. The simulations which lead to a negative value of are eliminated and in the arrays below is the number of those eliminated simulations.

Two models are considered in this simulation study: the mixture model and the autoregressive model that is, each simulation sequence is issued of one of these models which are particular cases of model defined by (1.1). In both cases, the filter has a finite length, so we identify the inverse filter to . We restrict ourselves to and we deal with two values of : and ; a small (respectively a large ) give small perturbations (resp. large perturbations) to the corresponding non-noisy model defined by (1.1) so that it is more complicated to estimate it well in the case of a large . To illustrated the asymptotic efficiency of our method, we deal with several () and () but only few significant results are presented here. We use the function ”fsolve” in MATLAB version 7 which allows to find a root of a given function. The problem of the use of such a function is its very sensitivity to the starting point since it searches a zero near the starting point. To overcome this difficulty, we try several starting points in order to select as the initial point the one which seems more stable during the simulations. To avoid problems relied on both scale and delay of , we fix the scale and delay in considering such that and has the greatest value among all the components of .

Mixture model. The observed sequence is given by

 Yt=Xt+σ0Wt,∀t∈{1,…,n},

so that the filter and its inverse coincide i.e. . One must note that the model is over-parameterized since we choose (first array) and (second array) whereas the true inverse filter is reduced to one value which corresponds to . The cases and are presented in the arrays below. When , the estimation values of and are really good even for a small () and . For both and and both and , the estimation values of and are strongly different between and : they are much more better for large. These arrays illustrate that it is more difficult to obtain a good estimation when is large or/and when or/and when . It is worthwhile to note that for the number of non-used sequences is (, ), (, ), (, ) and (, ), so that it would probably mean that we do not find an interesting starting point; moreover it entails a strong variability on the values (see the ). Figure 1 represents the observations , the support points and their estimates , and which are denoted . When is small (), the observations are concentrated on the true support points and the estimation for is visually very good even for a small sample (Figure 1, left side). When is large (), the observations are more scattered over the square even one can distinguish three attractive areas where the support points lie. For this case and , the estimation with observations is visually much more better than the one with observations (Figure 1, right side). The same phenomenon is observed in Figure 2 but in addition Figure 2 illustrates the improvement of the estimation: when is increasing, the estimates approach the true values and their empirical standard deviations tend to zero (except for one case).

Second order autoregressive model. The observed sequence is given by

 Yt=~Yt+σ0Wt,∀t∈{1,…,n},

where

 ~Yt=0∑k=−∞ukXt−k⟺Xt=2∑k=0θk~Yt−k,

with . One must note that when we choose , the model is over-parameterized since the true model corresponds to . In the left hand side of the third table () with a small , the estimations of all the parameters are quite good and they are quite similar for both large and small . All other cases i.e with and , large ( or ) lead to an improvement in the estimation values since some estimations for are very far from the true values (see both the right hand side of the third table and the fourth table). However, in some cases it seems that we do not take an interesting starting point since the variability of the results is too large (see for example the for and ) combined with an important number of eliminated simulations (see for and ). One can see in Figure 3 that the observations are more dispersed over the square than those of the mixture model (Figure 1). It would mean that an autoregressive model is more difficult to estimate than a mixture model. When is small (), Figure 3 and Figure 4 show that the support’s points are well estimated for even for a small ; this is not the case for since the estimates seem visually (see Figure 4) far from the true values even for large ().

Numerical illustration of the existence of the estimate . Let us consider the second order autoregressive model with and let be equal to . This corresponds to the most difficult estimation problem as it is illustrated in the above simulations. Figure 5 represents the graph of the function which has the same roots as the function . In the left hand side, we consider the particular case of which is the true value of the inverse filter, whereas in the right hand side, we consider the particular case of which differs from the true value of the inverse filter (this filter corresponds to a mixture model). In both cases, one can see that admits some zeros. In the left hand side, one must note that the convergence is achieved very quickly and accurately.

Importance of the choice of the starting point in the algorithm. One can see in Figure 5 that the first zero of the function is neither over nor under . Actually, the Matlab toolbox algorithm searches the zero of the function which is the nearest to the starting point. This starting point being a -dimensional vector, the algorithm searches a zero in directions. So an iterative stochastic algorithm, which is able to find a multidimensional zero, would be an useful tool since the gradient of could be formally computed.