Bayesian kernelbased system identification with quantized output data
Abstract:
In this paper we introduce a novel method for linear system identification with quantized output data. We model the impulse response as a zeromean Gaussian process whose covariance (kernel) is given by the recently proposed stable spline kernel, which encodes information on regularity and exponential stability. This serves as a starting point to cast our system identification problem into a Bayesian framework. We employ Markov Chain Monte Carlo (MCMC) methods to provide an estimate of the system. In particular, we show how to design a Gibbs sampler which quickly converges to the target distribution. Numerical simulations show a substantial improvement in the accuracy of the estimates over stateoftheart kernelbased methods when employed in identification of systems with quantized data.
First]Giulio Bottegal Second]Gianluigi Pillonetto First]Håkan Hjalmarsson
ACCESS Linnaeus Centre, School of Electrical Engineering,
KTH Royal Institute of Technology, Stockholm, Sweden
(email: {bottegal; hjalmars}@kth.se)
Department of Information Engineering, University of Padova, Padova, Italy (email: giapi@dei.unipd.it)
^{1}^{1}footnotetext: The research leading to these results has received funding from the Swedish Research Council under contract 62120094017 and the European Union Seventh Framework Programme [FP7/20072013] under grant agreement no. 257462 HYCON2 Network of excellence, by the MIUR FIRB project RBFR12M3AC  Learning meets time: a new computational approach to learning in dynamic systems
Identification of systems from quantized data finds applications in a wide range of areas such as communications, networked control systems, bioinformatics (see e.g. [bae2004gene] and [wang2010system]).
From a system identification perspective, identification of systems having quantized output data constitutes a challenging problem. In fact, the presence of a quantizer cascaded to a dynamic system, causes a loss of information on the behavior of that dynamic system. Thus, standard system identification techniques, such as leastsquares or prediction error method (PEM) [Ljung], [Soderstrom], may give poor performances. For this reason, in recent years several techniques for system identification from quantized data have been proposed in a series of papers. Some of these methods are specifically tailored for identification of systems with binary measurements [wang2003system], [wang2006joint], and are possibly implemented in a recursive fashion [guo2013recursive], [jafari2012convergence]. Other methods, such as [colinet2010weighted], exploit the knowledge of a dithering signal to improve the identification performances. Specific input design techniques are studied in [godoy2014novel], [casini2011input] and [casini2012input]. Methods for handling general types of quantization of data have been proposed recently [godoy2011identification], [chen2012impulse]. In such contributions, the problem of identifying a linear dynamic system with quantized data is posed as a likelihood problem. In particular, in [chen2012impulse] authors exploit the recently proposed Bayesian kernelbased formulation of the linear dynamic system identification problem (see [pillonetto2014kernel] for a survey).
Similarly to [chen2012impulse], the starting point of this paper is the formulation of the problem of identifying a linear dynamic systems with quantized data using a Bayesian approach. We model the impulse response of the unknown system as a realization of a Gaussian random process. Such a process has zero mean and its covariance matrix (in this context also called a kernel) is given by the recently introduced stable spline kernels, [SS2010], [SS2011], [bottegal2013regularized] which are specifically designed for linear system identification purposes. The structure of this type of kernels depends on two hyperparameters, namely a scaling parameter and a shaping parameter, whose tuning permits more flexibility in the identification process and can be seen as a model selection step. In the standard setting (i.e. when there is no quantizer), kernel hyperparameters are chosen as those maximizing the marginal likelihood of the output measurements, obtained by integrating out the dependence on the system. Once the hyperparameters are chosen, the impulse response of the system is computed as the minimum meansquare Bayes estimate given the observed input/output data (see e.g. [SS2010], [ChenOL12]).
A key assumption in kernelbased methods is that the output data and the system admit a joint Gaussian description. Such an assumption does not hold with quantized data and we need to think of a different approach. In this paper we propose a solution based on Markov Chain Monte Carlo (MCMC) techniques [Gilks]. To this end, we define a target probability density; the estimate of the system can be obtained by drawing samples from it. Such a probability density is function of the following random variables: the (unavailable) nonquantized output of the linear system, the scaling hyperparameter of the kernel, the unknown measurement noise variance, the impulse response of the system. The main contribution of this paper is to show how to design a Gibbs sampler [Gilks] by exploiting the knowledge of the conditional densities of the target distribution. The main advantage of the Gibbs sampler is that it does not require any rejection criterion of the generated samples and quickly converges to the target distribution. Note that MCMCbased approaches have recently gained popularity in system identification [Ninness2010], [lindsten2012], [bottegal2014outlier].
The paper is organized as follows. In the next section, we introduce the problem of the identification of dynamic systems from quantized data. In Section Bayesian kernelbased system identification with quantized output data, we give a Bayesian description of the variables entering the system. In Section Bayesian kernelbased system identification with quantized output data, we describe the proposed method for identification. Section Bayesian kernelbased system identification with quantized output data shows some simulations to assess the performances of the proposed method. Some conclusions end the paper.
We consider the following linear timeinvariant BIBO stable output error system
(1) 
where is the impulse response characterizing the unknown system, which is fed by the input . The set corresponds to either or , depending on whether the system is continuoustime or discretetime. The output is corrupted by the additive white Gaussian noise , which has zero mean and unknown variance , and measured at the time instants . If the system is continuoustime, then can represent any nonuniform sampling, whereas in the discretetime case we shall consider (i.e., no downsampling). For ease of exposition, in this paper we shall derive our algorithm in the discretetime case only; the extension to the continuoustime is quite straightforward (see e.g. [Wahba1990], [SS2010]).
Actually, the output is not directly measurable, and only a quantized version is available, namely
(2) 
where is a known map (our quantizer) of the type
(3) 
with and being known (and typically and ).
Remark 2.1
A particular and wellstudied case is the binary quantizer, defined as
(4) 
It is wellknown that a condition on the threshold to guarantee identifiability of the system is . In fact, when , the system can be determined up to a scaling factor [godoy2011identification].
Without loss of generality, let us assume the system to be strictly causal, i.e. . We assume that inputoutput data samples , are collected during an experiment. We formulate our system identification problem as the problem of estimating the impulse response for time instants, namely obtain . Recall that, if is sufficiently large, these samples can be used to approximate the dynamics of the systems with arbitrary accuracy [ljung1992]. Introducing the vector notation
the inputoutput relation for the available samples can be written
so that our estimation problem can be cast in, say, a “linear regression plus quantization” form.
In this paper we cast the system identification problem into a Bayesian framework. Our starting point is the setting of a proper prior on . Following a Gaussian regression approach [Rasmussen], we model as a zeromean Gaussian random vector, i.e. we assume the following probability density function for :
(5) 
where is a covariance matrix whose structure depends on the value of the shaping hyperparameter and is the scaling hyperparameter. In this context, is usually called a kernel and determines the properties of the realizations of . In this paper, we choose from the family of stable spline kernels [SS2010], [SS2011]. Such kernels are specifically designed for system identification purposes and give clear advantages compared to other standard kernels [bottegal2013regularized], [SS2010] (like the quadratic kernel or the Laplacian kernel, see [Scholkopf01b]). In this paper we make use of the firstorder stable spline kernel (or TC kernel in [ChenOL12]). It is defined by
(6) 
The above kernel is parameterized by , which regulates the decaying velocity of the generated impulse responses.
Since we have assumed Gaussian distribution of the noise , the joint distribution of the vectors and , given values of , and the noise variance , is jointly Gaussian, namely
(7) 
where
(8) 
and . It follows also that the posterior distribution of , given the knowledge of , is Gaussian, namely
(9) 
where
(10) 
In [SS2010], an impulse response estimator is derived starting from (9). In fact, the minimum meansquared error (MMSE) estimate of is (see e.g. [Anderson:1979])
(11) 
Such an estimator depends on the kernel hyperparameters and the noise variance. A common strategy to choose the kernel hyperparameters is to maximize the marginal likelihood of , that is
(12) 
An estimate of can be computed by means of the following steps:

Compute the leastsquares estimate of , i.e.
(13) in order to obtain an estimate of ;

Compute the empirical estimate of
(14)
Clearly, the system identification method described above is not applicable in our problem, since is not available. However, we can draw some information on such a vector from the quantized output . First, note that from (7) it follows that
(15) 
Note also that, once is given, (15) is independent of and . Let denote the th row of . Then, for each entry it holds that (see e.g. [bae2004gene])
(16) 
where denotes a Gaussian distribution truncated below and above , whose original mean and variance are and respectively. Note that, for
(17) 
due to the assumption on whiteness of noise.
As mentioned in the previous subsection, knowing the values of hyperparameters is of paramount importance in kernelbased methods. The marginal likelihood maximization approach (Bayesian kernelbased system identification with quantized output data) is not applicable here, so we have to think of alternative ways to estimate the values of the hyperparameters.
Let us denote by the Gamma distribution with parameters and . The following result, drawn from [MagniPAMI], shows the marginal distribution of the inverse of the hyperparameter given and .
Lemma 3.1
The posterior probability distribution of given and is
(18) 
Remark 3.2
To obtain the result of the above lemma, we have implicitly set an improper prior on with nonnegative support, i.e., , where is the indicator function with support .
A similar argument holds for the noise variance . Recalling that is Gaussian with variance equal to and readapting Lemma 3.1 to this case, it turns out that
(19) 
where also here we have assumed the improper prior .
In some situations, e.g. when the quantizer has mild effects on the measured signal (e.g., when the resolution of the quantizer is high), a sufficiently reliable estimate of can be obtained by using (14), with replaced by . However, note that in general is not a consistent estimate of .
We conclude this section by recalling that, unfortunately, establishing a Bayesian model for is still an unsolved problem. In this paper, we shall consider such an hyperparameter as deterministic. A method for its choice will be discussed in the next section.
In this section we show how to exploit the Bayesian models introduced in the previous section to derive our system identification method. Assume for the moment that is known and let us drop it from the notation below. The impulse response estimate can be obtained by computing the following integral
(20) 
where
(21) 
denotes the joint distribution of the random quantities described in the previous section, given the quantized output . The above integral can be computed using Markov Chain Monte Carlo (MCMC) methods [Gilks], by drawing a large number of samples from the distribution (usually called target distribution) and then averaging over . In general, drawing samples from a distribution is a hard problem, if its probability density function does not admit a closedform expression. However, if all the conditional probability densities of such a distribution are available in closedform, the problem of sampling from the target distribution can be solved efficiently by resorting on a special case of the MetropolisHastings method, namely the Gibbs sampler (see e.g. [Gilks]). The basic idea is that each conditional random variable is the state of a Markov chain; then, by drawing samples from each conditional probability density iteratively, we converge to the stationary state of this Markov chain and generate samples of the target distribution. Here, the conditionals of (21) are as follows.

. As discussed in Section Bayesian kernelbased system identification with quantized output data, once is given, this conditional density does not depend on . Moreover, due to the assumptions on noise, it density factorizes as follows
(22) where each of the factors is a truncated Gaussian according to (16).

. Once is given, becomes independent of all the other variables, i.e. such conditional density becomes and corresponds to (18), namely a Gamma distribution with parameters .

. Once and are given, one can compute Recalling (19), it follows that this conditional density can be written as and is distributed as a Gamma random variable with parameters .
Given the above conditional densities, we are in position to illustrate the proposed identification algorithm.
\@floatalgorithm[ht!]
Algorithm: Bayesian system identification with quantized output measurements
Input:
Output:

Initialization: Compute initial values , and set

For to :

Draw the sample , from ,

Draw the sample from

Draw the sample from

Draw the sample from


Compute
In the above algorithm, the parameters and are introduced. the number of samples to be generated; clearly, large values of should guarantee more accurate estimates of . is the number of initial samples drawn from the conditional of to be discarded and is also known as burnin [meyn2009markov]. In fact, the conditionals from which those samples are drawn are to be considered as nonstationary, since the Gibbs sampler takes a certain number of iterations to get close to a stationary distribution.
The initial estimate can be computed using the kernelbased method introduced in [SS2010] and briefly revisited in Section Bayesian kernelbased system identification with quantized output data. Replacing with in (11), one can obtain a (very) rough estimate of , which can serve as initial condition for the Gibbs sampler.
Similarly, the initial value can be computed from (14), again by replacing with .
It remains to set a scheme for estimating . In [chen2012impulse], an exact marginal likelihood maximization approach is proposed, but it is shown that such an approach needs a solution of a complicated integral. In this paper, we adopy a simple (and approximate) way to estimate . It consists in maximizing the cost function of (Bayesian kernelbased system identification with quantized output data), where, instead of using the nonavailable data , we plug the quantized output . Clearly, one should not expect to get very good results in general, especially when the difference between and is high (e.g. when the quantizer is binary). However, numerical experiments have shown that the accuracy on the estimation of with this strategy is satisfactory enough in order to obtain a good performance of the algorithm.
Although the methods proposed here and the method proposed in [chen2012impulse] exploit the same Bayesian modeling of the unknown system, the techniques used to carry out the estimate of are substantially different. In [chen2012impulse], the impulse response is obtained as the maximum a posteriori (MAP) estimate given the quantized output data. Here instead, is computed by means of (20), that is a minimum mean square error Bayes estimator.
We test the proposed algorithm by means of 2 Monte Carlo experiments of 100 runs each. For each Monte Carlo run, we generate a linear system by picking 10 pairs of complex conjugate zeros with magnitude randomly chosen in and random phase. Similarly, we pick 10 pairs of complex conjugate poles with magnitude randomly chosen in and random phase. The goal is to estimate samples of the impulse response from inputoutput data. The inputs are realizations of white noise with unit variance. We compare the following estimators.

BQGS: This is the method described in this paper, namely a Bayesian system identification method for Quantized output data that uses the Gibbs Sampler. The parameter , denoting the number of samples generated by the sampler, is set to . The first samples are discarded. The validity of the choice of and is checked by assessing that quantiles 0.25, 0.5, 0.75 are estimated with good precision [Raftery1996].

SSML: This is the nonparametric kernelbased method proposed in [SS2010] and revisited in [ChenOL12], which is not designed for handling quantized data. This method requires the estimation of the same parameters as our proposed method. The kernel adopted for identification is (6). Its hyperparameters are estimated using (Bayesian kernelbased system identification with quantized output data), while is estimated through (14) (in both cases replacing with ).

LS: This is the leastsquares estimator, where the data employed to estimate are the quantized output measurements . Note that, in principle, here the parameter should be estimated from data using complexity criteria such as AIC or BIC [Ljung]. Here, for simplicity, we fix it to 50.

SSMLNQ: Same as SSML. However, here we make use of the nonquantized vector . Hence, this estimator exploits information which is not available in practice in this problem.

LSNQ: Leastsquares estimator having access to the vector . The parameter is fixed as for the LS estimator.
The performances of the estimators are evaluated by means of the fitting score, computed as
(23) 
where is the impulse response generated at the th run, its mean and the estimate computed by the tested methods.
The first experiment is on the following binary quantizer
For each Monte Carlo run, the noise variance is such that , i.e. the ratio between the variance of the noiseless (nonquantized) output and the noise is equal to 10. We generate data samples.
Figure 2 shows the results of the Monte Carlo runs. The advantage of using the proposed identification technique, compared to methods which do not account for the quantizer, is evident. Despite the large loss of information caused by the quantizer, the proposed method gives a fit which is quite comparable to the oracle methods. Figure 3 reports one of the generated scenarios. It can be seen that there is a substantial difference between and . Nonetheless, the accuracy of the estimation of the impulse response is acceptable.
In the second experiment we test the performance of our method on systems followed by a ceiltype quantizer, which is defined as
Again, for each Monte Carlo run, the noise variance is such that . We generate data samples.
As shown in Figure 4, in this case, if one compares the oracletype methods (i.e. SSMLOr. and LSOr.) with the same methods using quantized data (SSML and LS), the loss of accuracy is relatively low. This because this type of quantizer has a mild effect on the measurements. It can be seen, however, that the proposed method is able to give a fit that is comparable to the standard kernelbased method that uses nonquantized data (SSMLOr). Moreover, it outperforms the leastsquares estimator equipped with the knowledge of nonquantized data.
In this paper, we have introduced a novel method for system identification when the output is subject to quantization. We have proposed a MCMC scheme that exploits the Bayesian description of the unknown system. In particular, we have shown how to design an integration scheme based on the Gibbs sampler by exploiting the knowledge of the conditional probability density functions of the variables entering the system. We have highlighted, through some numerical experiments, the advantages of employing our method when quantizers affect the accuracy of measurements.
Important questions such as consistency of the method and robust selection of the kernel hyperparameter are currently under study.