Bayesian kernel-based system identification with quantized output data

# Bayesian kernel-based system identification with quantized output data

[    [    [
###### Abstract:

In this paper we introduce a novel method for linear system identification with quantized output data. We model the impulse response as a zero-mean Gaussian process whose covariance (kernel) is given by the recently proposed stable spline kernel, which encodes information on regularity and exponential stability. This serves as a starting point to cast our system identification problem into a Bayesian framework. We employ Markov Chain Monte Carlo (MCMC) methods to provide an estimate of the system. In particular, we show how to design a Gibbs sampler which quickly converges to the target distribution. Numerical simulations show a substantial improvement in the accuracy of the estimates over state-of-the-art kernel-based methods when employed in identification of systems with quantized data.

First]Giulio Bottegal Second]Gianluigi Pillonetto First]Håkan Hjalmarsson

ACCESS Linnaeus Centre, School of Electrical Engineering, KTH Royal Institute of Technology, Stockholm, Sweden
(e-mail:
{bottegal; hjalmars}@kth.se)

11footnotetext: The research leading to these results has received funding from the Swedish Research Council under contract 621-2009-4017 and the European Union Seventh Framework Programme [FP7/2007-2013] under grant agreement no. 257462 HYCON2 Network of excellence, by the MIUR FIRB project RBFR12M3AC - Learning meets time: a new computational approach to learning in dynamic systems

\@xsect

Identification of systems from quantized data finds applications in a wide range of areas such as communications, networked control systems, bioinformatics (see e.g. [bae2004gene] and [wang2010system]).

From a system identification perspective, identification of systems having quantized output data constitutes a challenging problem. In fact, the presence of a quantizer cascaded to a dynamic system, causes a loss of information on the behavior of that dynamic system. Thus, standard system identification techniques, such as least-squares or prediction error method (PEM) [Ljung], [Soderstrom], may give poor performances. For this reason, in recent years several techniques for system identification from quantized data have been proposed in a series of papers. Some of these methods are specifically tailored for identification of systems with binary measurements [wang2003system], [wang2006joint], and are possibly implemented in a recursive fashion [guo2013recursive], [jafari2012convergence]. Other methods, such as [colinet2010weighted], exploit the knowledge of a dithering signal to improve the identification performances. Specific input design techniques are studied in [godoy2014novel], [casini2011input] and [casini2012input]. Methods for handling general types of quantization of data have been proposed recently [godoy2011identification], [chen2012impulse]. In such contributions, the problem of identifying a linear dynamic system with quantized data is posed as a likelihood problem. In particular, in [chen2012impulse] authors exploit the recently proposed Bayesian kernel-based formulation of the linear dynamic system identification problem (see [pillonetto2014kernel] for a survey).

Similarly to [chen2012impulse], the starting point of this paper is the formulation of the problem of identifying a linear dynamic systems with quantized data using a Bayesian approach. We model the impulse response of the unknown system as a realization of a Gaussian random process. Such a process has zero mean and its covariance matrix (in this context also called a kernel) is given by the recently introduced stable spline kernels, [SS2010], [SS2011], [bottegal2013regularized] which are specifically designed for linear system identification purposes. The structure of this type of kernels depends on two hyperparameters, namely a scaling parameter and a shaping parameter, whose tuning permits more flexibility in the identification process and can be seen as a model selection step. In the standard setting (i.e. when there is no quantizer), kernel hyperparameters are chosen as those maximizing the marginal likelihood of the output measurements, obtained by integrating out the dependence on the system. Once the hyperparameters are chosen, the impulse response of the system is computed as the minimum mean-square Bayes estimate given the observed input/output data (see e.g. [SS2010], [ChenOL12]).

A key assumption in kernel-based methods is that the output data and the system admit a joint Gaussian description. Such an assumption does not hold with quantized data and we need to think of a different approach. In this paper we propose a solution based on Markov Chain Monte Carlo (MCMC) techniques [Gilks]. To this end, we define a target probability density; the estimate of the system can be obtained by drawing samples from it. Such a probability density is function of the following random variables: the (unavailable) non-quantized output of the linear system, the scaling hyperparameter of the kernel, the unknown measurement noise variance, the impulse response of the system. The main contribution of this paper is to show how to design a Gibbs sampler [Gilks] by exploiting the knowledge of the conditional densities of the target distribution. The main advantage of the Gibbs sampler is that it does not require any rejection criterion of the generated samples and quickly converges to the target distribution. Note that MCMC-based approaches have recently gained popularity in system identification [Ninness2010], [lindsten2012], [bottegal2014outlier].

The paper is organized as follows. In the next section, we introduce the problem of the identification of dynamic systems from quantized data. In Section Bayesian kernel-based system identification with quantized output data, we give a Bayesian description of the variables entering the system. In Section Bayesian kernel-based system identification with quantized output data, we describe the proposed method for identification. Section Bayesian kernel-based system identification with quantized output data shows some simulations to assess the performances of the proposed method. Some conclusions end the paper.

\@xsect

We consider the following linear time-invariant BIBO stable output error system

 zt=(g∗u)t+vt, (1)

where is the impulse response characterizing the unknown system, which is fed by the input . The set corresponds to either or , depending on whether the system is continuous-time or discrete-time. The output is corrupted by the additive white Gaussian noise , which has zero mean and unknown variance , and measured at the time instants . If the system is continuous-time, then can represent any non-uniform sampling, whereas in the discrete-time case we shall consider (i.e., no downsampling). For ease of exposition, in this paper we shall derive our algorithm in the discrete-time case only; the extension to the continuous-time is quite straightforward (see e.g. [Wahba1990], [SS2010]).

Actually, the output is not directly measurable, and only a quantized version is available, namely

 yt=Q[zt], (2)

where is a known map (our quantizer) of the type

 Q[x]=pkif x∈(qk−1,qk], (3)

with and being known (and typically and ).

###### Remark 2.1

A particular and well-studied case is the binary quantizer, defined as

 Q[x]={−1if x

It is well-known that a condition on the threshold to guarantee identifiability of the system is . In fact, when , the system can be determined up to a scaling factor [godoy2011identification].

Without loss of generality, let us assume the system to be strictly causal, i.e. . We assume that input-output data samples , are collected during an experiment. We formulate our system identification problem as the problem of estimating the impulse response for time instants, namely obtain . Recall that, if is sufficiently large, these samples can be used to approximate the dynamics of the systems with arbitrary accuracy [ljung1992]. Introducing the vector notation

 g:=⎡⎢ ⎢⎣g1⋮gn⎤⎥ ⎥⎦,y:=⎡⎢ ⎢⎣y1⋮yN⎤⎥ ⎥⎦,z:=⎡⎢ ⎢⎣z1⋮zN⎤⎥ ⎥⎦,v:=⎡⎢ ⎢⎣v1⋮vN⎤⎥ ⎥⎦
 U=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣u00…0u1u00…0⋮⋮⋱⋮uN−2uN−3…uN−n+10uN−1uN−2……uN−n⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦∈RN×n,

the input-output relation for the available samples can be written

 z =Ug+v yt =Q[zt],t=1,…,N

so that our estimation problem can be cast in, say, a “linear regression plus quantization” form.

\@xsect\@xsect

In this paper we cast the system identification problem into a Bayesian framework. Our starting point is the setting of a proper prior on . Following a Gaussian regression approach [Rasmussen], we model as a zero-mean Gaussian random vector, i.e. we assume the following probability density function for :

 p(g|λ,β)∼N(0,λKβ), (5)

where is a covariance matrix whose structure depends on the value of the shaping hyperparameter and is the scaling hyperparameter. In this context, is usually called a kernel and determines the properties of the realizations of . In this paper, we choose from the family of stable spline kernels [SS2010], [SS2011]. Such kernels are specifically designed for system identification purposes and give clear advantages compared to other standard kernels [bottegal2013regularized], [SS2010] (like the quadratic kernel or the Laplacian kernel, see [Scholkopf01b]). In this paper we make use of the first-order stable spline kernel (or TC kernel in [ChenOL12]). It is defined by

 {Kβ}i,j:=βmax(i,j),0<β<1, (6)

The above kernel is parameterized by , which regulates the decaying velocity of the generated impulse responses.

\@xsect

Since we have assumed Gaussian distribution of the noise , the joint distribution of the vectors and , given values of , and the noise variance , is jointly Gaussian, namely

 p([zg]∣∣∣λ,β,σ2)∼N([00],[ΣzΣzgΣgzλKβ]), (7)

where

 Σz=λUKβUT+σ2IN (8)

and . It follows also that the posterior distribution of , given the knowledge of , is Gaussian, namely

 p(g|z,λ,β,σ2)=N(Cz,P), (9)

where

 P=(UTUσ2+(λKβ)−1)−1,C=PUTσ2. (10)

In [SS2010], an impulse response estimator is derived starting from (9). In fact, the minimum mean-squared error (MMSE) estimate of is (see e.g. [Anderson:1979])

 ^g=E[g|z,λ,β,σ2]=Cz. (11)

Such an estimator depends on the kernel hyperparameters and the noise variance. A common strategy to choose the kernel hyperparameters is to maximize the marginal likelihood of , that is

 (^λ,^β) =argmaxλ,βlogp(z|λ,β) =argminλ,βlogdetΣz+zTΣ−1zz. (12)

An estimate of can be computed by means of the following steps:

1. Compute the least-squares estimate of , i.e.

 ^gLS=(UTU)−1UTz, (13)

in order to obtain an estimate of ;

2. Compute the empirical estimate of

 ^σ2=(z−U^gLS)T(z−U^gLS)N−n. (14)

Clearly, the system identification method described above is not applicable in our problem, since is not available. However, we can draw some information on such a vector from the quantized output . First, note that from (7) it follows that

 p(z|g,σ2)=N(Ug,σ2I). (15)

Note also that, once is given, (15) is independent of and . Let denote the -th row of . Then, for each entry it holds that (see e.g. [bae2004gene])

 p(zt|g,σ2,yt=pk)=Nqkqk−1(Utg,σ2), (16)

where denotes a Gaussian distribution truncated below and above , whose original mean and variance are and respectively. Note that, for

 p(zt,zj|g,σ2,y)=p(zt|g,σ2,y)p(zj|g,σ2,y), (17)

due to the assumption on whiteness of noise.

\@xsect

As mentioned in the previous subsection, knowing the values of hyperparameters is of paramount importance in kernel-based methods. The marginal likelihood maximization approach (Bayesian kernel-based system identification with quantized output data) is not applicable here, so we have to think of alternative ways to estimate the values of the hyperparameters.

Let us denote by the Gamma distribution with parameters and . The following result, drawn from [MagniPAMI], shows the marginal distribution of the inverse of the hyperparameter given and .

###### Lemma 3.1

The posterior probability distribution of given and is

 p(λ−1|g,β)∼Ga⎛⎝n2,gTK−1βg2⎞⎠ (18)
###### Remark 3.2

To obtain the result of the above lemma, we have implicitly set an improper prior on with non-negative support, i.e., , where is the indicator function with support .

A similar argument holds for the noise variance . Recalling that is Gaussian with variance equal to and readapting Lemma 3.1 to this case, it turns out that

 p(σ−2|v)∼Ga(N2,vTv2), (19)

where also here we have assumed the improper prior .

In some situations, e.g. when the quantizer has mild effects on the measured signal (e.g., when the resolution of the quantizer is high), a sufficiently reliable estimate of can be obtained by using (14), with replaced by . However, note that in general is not a consistent estimate of .

We conclude this section by recalling that, unfortunately, establishing a Bayesian model for is still an unsolved problem. In this paper, we shall consider such an hyperparameter as deterministic. A method for its choice will be discussed in the next section.

\@xsect

In this section we show how to exploit the Bayesian models introduced in the previous section to derive our system identification method. Assume for the moment that is known and let us drop it from the notation below. The impulse response estimate can be obtained by computing the following integral

 ^g=∫gp(z,λ,σ2,g|y)dzdλdσ2dg, (20)

where

 p(z,λ,σ2,g|y) (21)

denotes the joint distribution of the random quantities described in the previous section, given the quantized output . The above integral can be computed using Markov Chain Monte Carlo (MCMC) methods [Gilks], by drawing a large number of samples from the distribution (usually called target distribution) and then averaging over . In general, drawing samples from a distribution is a hard problem, if its probability density function does not admit a closed-form expression. However, if all the conditional probability densities of such a distribution are available in closed-form, the problem of sampling from the target distribution can be solved efficiently by resorting on a special case of the Metropolis-Hastings method, namely the Gibbs sampler (see e.g. [Gilks]). The basic idea is that each conditional random variable is the state of a Markov chain; then, by drawing samples from each conditional probability density iteratively, we converge to the stationary state of this Markov chain and generate samples of the target distribution. Here, the conditionals of (21) are as follows.

1. . As discussed in Section Bayesian kernel-based system identification with quantized output data, once is given, this conditional density does not depend on . Moreover, due to the assumptions on noise, it density factorizes as follows

 N∏t=1p(zt|σ2,g,yt), (22)

where each of the factors is a truncated Gaussian according to (16).

2. . Once is given, becomes independent of all the other variables, i.e. such conditional density becomes and corresponds to (18), namely a Gamma distribution with parameters .

3. . Once and are given, one can compute Recalling (19), it follows that this conditional density can be written as and is distributed as a Gamma random variable with parameters .

4. . Given , information carried by becomes redundant and can be discarded, so that this conditional probability density corresponds to Its closed-form expression is given by (9), namely a Gaussian distribution with mean and covariance matrix (see (10)).

Given the above conditional densities, we are in position to illustrate the proposed identification algorithm. \@floatalgorithm[ht!] Algorithm: Bayesian system identification with quantized output measurements
Input:
Output:

1. Initialization: Compute initial values , and set

2. For to :

1. Draw the sample , from ,

2. Draw the sample from

3. Draw the sample from

4. Draw the sample from

3. Compute

In the above algorithm, the parameters and are introduced. the number of samples to be generated; clearly, large values of should guarantee more accurate estimates of . is the number of initial samples drawn from the conditional of to be discarded and is also known as burn-in [meyn2009markov]. In fact, the conditionals from which those samples are drawn are to be considered as non-stationary, since the Gibbs sampler takes a certain number of iterations to get close to a stationary distribution.

\@xsect

The initial estimate can be computed using the kernel-based method introduced in [SS2010] and briefly revisited in Section Bayesian kernel-based system identification with quantized output data. Replacing with in (11), one can obtain a (very) rough estimate of , which can serve as initial condition for the Gibbs sampler.

Similarly, the initial value can be computed from (14), again by replacing with .

\@xsect

It remains to set a scheme for estimating . In [chen2012impulse], an exact marginal likelihood maximization approach is proposed, but it is shown that such an approach needs a solution of a complicated integral. In this paper, we adopy a simple (and approximate) way to estimate . It consists in maximizing the cost function of (Bayesian kernel-based system identification with quantized output data), where, instead of using the non-available data , we plug the quantized output . Clearly, one should not expect to get very good results in general, especially when the difference between and is high (e.g. when the quantizer is binary). However, numerical experiments have shown that the accuracy on the estimation of with this strategy is satisfactory enough in order to obtain a good performance of the algorithm.

\@xsect

Although the methods proposed here and the method proposed in [chen2012impulse] exploit the same Bayesian modeling of the unknown system, the techniques used to carry out the estimate of are substantially different. In [chen2012impulse], the impulse response is obtained as the maximum a posteriori (MAP) estimate given the quantized output data. Here instead, is computed by means of (20), that is a minimum mean square error Bayes estimator.

\@xsect

We test the proposed algorithm by means of 2 Monte Carlo experiments of 100 runs each. For each Monte Carlo run, we generate a linear system by picking 10 pairs of complex conjugate zeros with magnitude randomly chosen in and random phase. Similarly, we pick 10 pairs of complex conjugate poles with magnitude randomly chosen in and random phase. The goal is to estimate samples of the impulse response from input-output data. The inputs are realizations of white noise with unit variance. We compare the following estimators.

• B-Q-GS: This is the method described in this paper, namely a Bayesian system identification method for Quantized output data that uses the Gibbs Sampler. The parameter , denoting the number of samples generated by the sampler, is set to . The first samples are discarded. The validity of the choice of and is checked by assessing that quantiles 0.25, 0.5, 0.75 are estimated with good precision [Raftery1996].

• SS-ML: This is the nonparametric kernel-based method proposed in [SS2010] and revisited in [ChenOL12], which is not designed for handling quantized data. This method requires the estimation of the same parameters as our proposed method. The kernel adopted for identification is (6). Its hyperparameters are estimated using (Bayesian kernel-based system identification with quantized output data), while is estimated through (14) (in both cases replacing with ).

• LS: This is the least-squares estimator, where the data employed to estimate are the quantized output measurements . Note that, in principle, here the parameter should be estimated from data using complexity criteria such as AIC or BIC [Ljung]. Here, for simplicity, we fix it to 50.

• SS-ML-NQ: Same as SS-ML. However, here we make use of the non-quantized vector . Hence, this estimator exploits information which is not available in practice in this problem.

• LS-NQ: Least-squares estimator having access to the vector . The parameter is fixed as for the LS estimator.

The performances of the estimators are evaluated by means of the fitting score, computed as

 FITi=1−∥gi−^gi∥2∥gi−¯gi∥2, (23)

where is the impulse response generated at the -th run, its mean and the estimate computed by the tested methods.

\@xsect

The first experiment is on the following binary quantizer

 Qb[x]:={1 if x≥1−1 if x<1.

For each Monte Carlo run, the noise variance is such that , i.e. the ratio between the variance of the noiseless (non-quantized) output and the noise is equal to 10. We generate data samples.

Figure 2 shows the results of the Monte Carlo runs. The advantage of using the proposed identification technique, compared to methods which do not account for the quantizer, is evident. Despite the large loss of information caused by the quantizer, the proposed method gives a fit which is quite comparable to the oracle methods. Figure 3 reports one of the generated scenarios. It can be seen that there is a substantial difference between and . Nonetheless, the accuracy of the estimation of the impulse response is acceptable.

\@xsect

In the second experiment we test the performance of our method on systems followed by a ceil-type quantizer, which is defined as

 Qc[x]:=⌈x⌉.

Again, for each Monte Carlo run, the noise variance is such that . We generate data samples.

As shown in Figure 4, in this case, if one compares the oracle-type methods (i.e. SS-ML-Or. and LS-Or.) with the same methods using quantized data (SS-ML and LS), the loss of accuracy is relatively low. This because this type of quantizer has a mild effect on the measurements. It can be seen, however, that the proposed method is able to give a fit that is comparable to the standard kernel-based method that uses non-quantized data (SS-ML-Or). Moreover, it outperforms the least-squares estimator equipped with the knowledge of non-quantized data.

\@xsect

In this paper, we have introduced a novel method for system identification when the output is subject to quantization. We have proposed a MCMC scheme that exploits the Bayesian description of the unknown system. In particular, we have shown how to design an integration scheme based on the Gibbs sampler by exploiting the knowledge of the conditional probability density functions of the variables entering the system. We have highlighted, through some numerical experiments, the advantages of employing our method when quantizers affect the accuracy of measurements.

Important questions such as consistency of the method and robust selection of the kernel hyperparameter are currently under study.

## References

You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters