Particle Filters for Partially-Observed Boolean Dynamical Systems

# Particle Filters for Partially-Observed Boolean Dynamical Systems

[    [
###### Abstract

Partially-observed Boolean dynamical systems (POBDS) are a general class of nonlinear models with application in estimation and control of Boolean processes based on noisy and incomplete measurements. The optimal minimum mean square error (MMSE) algorithms for POBDS state estimation, namely, the Boolean Kalman filter (BKF) and Boolean Kalman smoother (BKS), are intractable in the case of large systems, due to computational and memory requirements. To address this, we propose approximate MMSE filtering and smoothing algorithms based on the auxiliary particle filter (APF) method from sequential Monte-Carlo theory. These algorithms are used jointly with maximum-likelihood (ML) methods for simultaneous state and parameter estimation in POBDS models. In the presence of continuous parameters, ML estimation is performed using the expectation-maximization (EM) algorithm; we develop for this purpose a special smoother which reduces the computational complexity of the EM algorithm. The resulting particle-based adaptive filter is applied to a POBDS model of Boolean gene regulatory networks observed through noisy RNA-Seq time series data, and performance is assessed through a series of numerical experiments using the well-known cell cycle gene regulatory model.

Paestum]Mahdi Imani, Paestum]Ulisses Braga-Neto

Department of Electrical and Computer Engineering,Texas A&M University, College Station, TX, USA

Key words:  Adaptive Filtering, Partially-Observed Boolean Dynamical Systems, Boolean Kalman Filter, Auxiliary Particle-Filter, Fixed-Interval Smoother, Maximum-Likelihood Estimation, Expectation Maximization, Gene Regulatory Networks, RNA-Seq data.

## 1 Introduction

Partially-observed Boolean dynamical systems consist of a Boolean state process, also known as a Boolean network, observed through an arbitrary noisy mapping to a measurement space [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]. Instances of POBDSs abound in fields such as genomics [12], robotics [13], digital communication systems [14], and more. The optimal recursive minimum mean-square error (MMSE) state estimators for this model are called the Boolean Kalman Filter (BKF) [1] and the Boolean Kalman Smoother (BKS) [15]. These filters have many desirable properties; in particular, it can be shown that the MMSE estimate of the state vector provides both the MMSE and the maximum-a-posteriori (MAP) estimates of each state vector component. Notice that the software tool “BoolFilter” [16] is available under R library for estimation and identification of partially-observed Boolean dynamical systems.

However, for large systems with large number of state variables, the computation of both the BKF and BKS becomes impractical due to large computational and memory requirements. In [3], an approximate sequential Monte-Carlo (SMC) algorithm was proposed to compute the BKF using sequential importance resampling (SIR). By contrast, we develop here SMC algorithms for both the BKF and fixed-interval BKS based on the more efficient auxiliary particle filter (APF) algorithm [17].

The BKF and BKS require for their application that all system parameters be known. In the case where noise intensities, the network topology, or observational parameters are not known or only partially known, an adaptive scheme to simultaneously estimate the state and parameters of the system is required. An exact adaptive filtering framework to accomplish that task was proposed recently in [18], which is based on the BKF and BKS in conjunction with maximum-likelihood estimation of the parameters. In this paper, we develop an accurate and efficient particle filtering implementation of the adaptive filtering framework in [18], which is suitable for large systems.

In the case where the parameter space is discrete (finite), the adaptive filter corresponds to a bank of particle filters in parallel, which is reminiscent of the multiple model adaptive estimation (MMAE) procedure for linear systems [19]. If the parameter space is continuous, then a particle-based version of the Expectation Maximization (EM) algorithm [20] is developed. The computational complexity of EM method arises from three main parts:

1. The computational complexity of applying smoothing at the E-step.

2. Memory necessary to store the required matrices and vectors (e.g. the posterior probability vectors) from the E-Step to the M-Step.

3. The complexity of each iteration in the M-step in which several function evaluations are required.

Our proposed particle-based implementation addresses each of the above issues.

Our application of interest in this paper is to model Boolean gene regulatory networks [12, 22] observed through a single time series of RNA-seq data [23]. Using the POBDS model, we employ the proposed approximate adaptive ML algorithm to estimate the gene expression state simultaneously to the inference of the network topology and noise and expression parameters. Performance is assessed through a series of numerical experiments using the well-known cell-cycle gene regulatory model [24]. The influence of transition noise, expression parameters, and RNA-seq measurement noise (data dispersion) on performance is studied, and the consistency of the adaptive ML filter (i.e., convergence to true parameter values) is empirically established.

The article is organized as follows. In Section 2, the POBDS signal model and the Boolean Kalman Filter and Boolean Kalman Smoother are reviewed, while in Section 3, a detailed description of the APF-based filtering and smoothing algorithms proposed in this paper is provided. In Section 4, the particle-based ML adaptive filter is developed for discrete and continuous parameter spaces. A POBDS model for gene regulatory networks observed though RNA-seq measurements is reviewed in Section 5. Results for the numerical experiments with the cell-cycle network are presented in Section 6. Finally, Section 7 contains concluding remarks.

## 2 Optimal State Estimators for POBDS

In this section, we review the POBDS model and exact algorithms for computation of its optimal state estimators. For more details see [1, 4, 5, 15]. For a proof of optimality of the BKF, see [18].

We assume that the system is described by a state process , where is a Boolean vector of size . The sequence of states is observed indirectly through the observation process , where is a vector of (typically non-Boolean) measurements. The states are assumed to be updated and observed at each discrete time through the following nonlinear signal model:

 Xk =fk(Xk−1)⊕nk(state model) (1) Yk =hk(Xk,vk)(observation model)

for Here, is Boolean transition noise, “” indicates componentwise modulo-2 addition, is a Boolean function, called the network function, whereas is a general function mapping the current state and observation noise into the measurement space, for . The noise processes are assumed to be “white” in the sense that the noises at distinct time points are independent random variables. It is also assumed that the noise processes are independent from each other and from the initial state ; their distribution is otherwise arbitrary.

We will assume further that the Boolean process noise is zero-mode, i.e., is the most probable value of the noise vector at each time . This implies that the most likely value of at each time is — this could be seen as the counterpart of the zero-mean noise assumption in continuous state-space models. As is the case with nonzero mean noise, nonzero mode noise introduces a systematic error component, which can always be removed by moving it into the function . Hence, the state model in (2) is a general model for a first-order Markov Boolean stochastic process. For a specific example, which will be adopted in Section 6, one has , for , and independently for . In this case, is zero-mode if and only if . The systematic bias introduced in the case can be removed by considering the equivalent state process with , where is the vector with all components equal to 1, and , i.e., by moving the systematic bias into the model. Therefore, one effectively needs only to consider the case .

A Boolean estimator predicts the Boolean state based on the sequence of observations . The estimator is called a filter, smoother, or predictor according to whether , , or , respectively. The set of all Boolean estimators for a given and shall be denoted by . The (conditional) mean-square error (MSE) of given is:

 MSE(^Xk|r∣Y1:r)=E[||^Xk|r−Xk||2∣Y1:r]. (2)

We would like to obtain the Boolean MMSE estimator, i.e., a Boolean estimator such that

 ^XMSk|r=argmin^Xk|r∈\@fontswitchXk|rMSE(^Xk|r∣Y1:r), (3)

at each value of (so that it also minimizes the frequentist expected MMSE over all possible realizations of ). For a Boolean vector , define the binarized vector , such that if and otherwise, for , the complement vector , such that , for , and the -norm . It can be proved [18] that the solution to (2) is given by

 ^XMSk|r=¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯E[Xk∣Y1:r], (4)

with optimal MMSE

 MSE(^XMSk|r∣Y1:r) (5) =∣∣∣∣min{E[Xk∣Y1:k],E[Xk∣Y1:k]c}∣∣∣∣1,

where the minimum is computed componentwise.

The optimal Boolean MMSE estimator will be called a Boolean Kalman Filter (BKF), Boolean Kalman Smoother (BKS), or Boolean Kalman Predictor (BKP), according to whether , , or , respectively. The terminology is justified by the fact that we seek the MMSE estimator for a nonstationary process, as in the case of the classical Kalman Filter, as opposed to, say, the Maximum-A-Posteriori (MAP) estimator, which is more common in discrete settings. Interestingly, we can show that each component of the MMSE estimator is both the MMSE and the MAP estimator of the corresponding state variable , for . Perhaps surprisingly, the MAP estimator does not enjoy in general the property that is the MAP estimator of , for . In cases where optimal estimation performance for each component of is required (e.g., estimating the state of each gene in a gene regulatory network), then this is an important distinction.

Let be an arbitrary enumeration of the possible state vectors, define the state conditional probability distribution vector of length via

 Πk|r(i)=P(Xk=xi∣Y1:r),i=1,…,2d, (6)

and let be a matrix of size . Then it is clear that , so it follows from (2) and (2) that

 ^XMSk|r=¯¯¯¯¯¯¯¯¯¯¯¯¯¯AΠk|r, (7)

with optimal MSE

 MSE(^XMSk|r∣Y1:r)=||min{AΠk|r,(AΠk|r)c}||1. (8)

The distribution vector can be computed by a matrix-based procedure similar to the “forward-backward” algorithm [25]. Briefly, let of size be the transition matrix of the Markov chain defined by the state model:

 (Mk)ij=P(Xk=xi∣Xk−1=xj) (9) =P(nk=f(xj)⊕xi),i,j=1,…,2d.

Additionally, given a value of the observation vector at time , let be a diagonal matrix of size defined by:

 (Tk(Yk))ii=p(Yk∣Xk=xi),i=1,…,2d, (10)

where is either a probability density or a mass function, according to the nature of the measurement .

At this point, we distinguish two cases. The first is a recursive implementation of the Boolean Kalman Filter (BKF), which does not need a backward iteration, and can be iterated forward as new observations arrive, for as long as desired. In this case, we use (2) and (2) with to get the optimal filter estimator and its minimum MSE [1]. The entire procedure is given in Algorithm 1.

The second case is a fixed-interval Boolean Kalman Smoother, where a fixed batch of observations of length is available, and it is desired to obtain estimates of the state at all points in the interval . In this case, a backward iteration will be needed (unless ). Define the probability distribution vector of length via

 Δk|s(i)=p(Ys+1,…,YT∣Xk=xi),i=1,…,2d, (11)

for , where is defined to be , the vector with all components equal to 1. It can be shown that

 Πk|T=Πk|k−1∙Δk|k−1||Πk|k−1∙Δk|k−1||1, (12)

where “” denotes componentwise vector multiplication. We then use (2) and (2) with to get the optimal smoothed estimator and its minimum MSE [15]. The entire procedure is given in Algorithm 2.

## 3 Particle Filters for State Estimation

When the number of states is large, the exact computation of the BKF and the BKS becomes intractable, due to the large size of the matrices involved, which each contain elements, and approximate methods must be used, such as sequential Monte-Carlo methods, also known as particle filter algorithms which have been successfully applied in various fields [26, 27, 28, 29, 30]. In the next subsections, we describe particle filter implementations of the BKF and BKS.

### 3.1 Auxiliary Particle Filter Implementation of the BKF (APF-BKF)

The basic algorithm to perform particle filtering is called sequential importance resampling (SIR). Importance sampling is used when direct sampling of the target distribution is difficult. The idea is to approximate the target distribution using sample points (“particles”) drawn from a proposal distribution , which is easier to sample than the target distribution. The discrepancy created by sampling from instead of is compensated by weighting each particle. After a few iterations of the algorithm, a condition in usually reached where only few of the particles have significant weights, whereas most particles have negligible weight. To address this degeneracy problem, SIR performs resampling of the particles, whereby a fresh set of particles is drawn (with replacement) from the approximate current posterior distribution.

The original particle filtering implementation of the BKF in [3] was based on the SIR algorithm; we therefore call it the SIR-BKF algorithm. We present here a more sophisticated implementation based on the Auxiliary Particle Filter (APF) of [17]. The APF algorithm can be seen as a variation of SIR, and is thus also known as auxiliary SIR (ASIR). Basically, APF is a look-ahead method that at time step tries to predict the location of particles with high probability at time , with the purpose of making the subsequent resampling step more efficient. Without the look-ahead, the basic SIR algorithm blindly propagates all particles, even those in low probability regions. As put in [31], “it is natural to ask whether it is possible to employ knowledge about the next observation before resampling to ensure that particles which are likely to be compatible with that observation have a good chance of surviving.”

The APF algorithm augments the state vector to , where is an auxiliary variable. Particles are drawn from the filtering distribution (to be specified below), and the auxiliary variable is simply dropped to obtain particles from . Given particles at time , with associated weights , the APF algorithm defines

 P(Xk,ζk∣Yk)∝p(Yk∣Xk)P(Xk∣xk−1,ζk)Wk−1,ζk, (13)

for . The auxiliary variable functions thus as an index for the particles at the previous time point. As will be seen below, sampling from (3.1) will have the effect of “selecting” the particles that are compatible with the observation at time .

One can sample from (3.1) by using SIR on the following approximation:

 P(Xk,ζk∣Yk)∝p(Yk∣μk,ζk)P(Xk∣xk−1,ζk)Wk−1,ζk, (14)

for , where is a characteristic of given , which can be the mean, the mode or even a sample from [17]. In our implementation, we use the mode:

 μk,i =Mode[Xk∣xk−1,i] (15) =Mode[f(xk−1,i)⊕nk]=f(xk−1,i),

for , where we used (2). The approximation on the right is accurate as long as the “noise intensity” is low, i.e., the probability of nonzero is small.

Sampling from (3.1) is done in two steps. In the first step, is obtained from the particles using (3.1) and the first-stage weights are computed as:

 Vk,i=p(Yk∣μk,i)Wk−1,i, (16)

for . In the second step, the auxiliary variables (i.e., the indices of the selected particles) are obtained as a sample from the discrete distribution defined by (after proper normalization). For example, if and , , and , then the indices will be independent and each will be twice as likely to be 1 or 2 than 3 or 4. We denote this by , where “Cat” stands for the categorical (discrete) distribution.

Finally, the new particles and associated second-stage weights can be obtained as follows:

 xk,i=μk,ζk,i⊕nk∼P(Xk∣xk−1,ζk,i), (17)
 ~Wk,i=p(Yk∣xk,i)p(Yk∣μk,ζk,i). (18)

It can be shown that the unbiased estimator of the unnormalized posterior probability at each time step can be obtained by [32]

 ||^\boldmathβk||1=(1NN∑i=1Vk,i)(1NN∑i=1~Wk,i). (19)

This quantity will be needed in Section 4 when the particle filter for maximum-likelihood adaptive estimation is discussed.

Given the normalized second-stage weights , one can write

 E[Xk∣Y1:k]≈zk=N∑i=1Wk,ixk,i. (20)

From (2) and (2), it follows that the MMSE state estimate and conditional MSE at time step are approximated as:

 ^XMSk|k=¯¯¯¯¯zk, (21)

with optimal MMSE

 MSE(^XMSk|k,Y1:k)=||min{zk,zck}||1. (22)

The entire procedure of APF-BKF is summarized in Algorithm 3.

### 3.2 Auxiliary Particle Filter Implementation of the BKS (APF-BKS)

There are a few different approximate Monte-Carlo smoothing methods in the literature of nonlinear and non-Gaussian systems [33, 29, 34]. It should be noted that some of these particle smoother methods suffer from degeneracy problems or can only be applied in a few special conditions (such as MC with good forgetting properties). We follow an approach similar to the well-known fixed-interval smoother of [35] to approximate the Boolean Kalman Smoother.

As described in Section 2, a fixed-interval smoother is a forward-backward method, such that the filtering distributions for are computed in the forward step, and the smoothed distributions are found in a backward step. The forward process is obtained here by running the APF-BKF algorithm of Section 3.1, while the backward process is performed by correcting the filtering weights in the backward iteration. We explain next how the backward step is applied efficiently.

First, assume , are the forward particles and weights obtained by the APF-BKF algorithm for the sequence of measurements . Due to the finite number of states in the POBDS, one can compute unique particles and their associated weights at different time steps as:

 {xuk,i,Wuk,i}Fki=1Unique←−−−−{xk,i,Wk,i}Ni=1,k=0,…,T. (23)

where is the number of unique forward particles, and is the th unique particle with aggregated weight , both at time step .

The backward process is based on the following equation:

 P(Xs, Xs+1∣Y1:T) (24) = P(Xs∣Xs+1,Y1:T)P(Xs+1∣Y1:T) = P(Xs∣Xs+1,Y1:s)P(Xs+1∣Y1:T) = P(Xs+1∣Xs)P(Xs∣Y1:s)P(Xs+1∣Y1:T)P(Xs+1∣Y1:s),

where and is the smoothed distribution at time step . The summation over in both sides of equation (3.2) results in

 P(Xs∣ Y1:T)=P(Xs∣Y1:s) (25) ×∑Xs+1P(Xs+1∣Xs)P(Xs+1∣Y1:T)P(Xs+1∣Y1:s).

As we mentioned before, the filter and smoother estimate at final time are the same. Therefore, the smoothed weights are defined in the same way as the forward unique weights , so that

 P(XT∣Y1:T)≈FT∑i=1WT|T,iδxuT,i. (26)

Now, using equation (3.2), the the smoothed weights at time can be obtained as:

 Ws|T,j=Wus,jFs+1∑i=1P(xus+1,i∣xus,j)Ws+1|T,i∑FslP(xus+1,i∣xus,l)Wus,l. (27)

The smoothed weights are obtained by solving equation (3.2) in a backward fashion using the terminal condition , . The computational complexity of equation (3.2) is of order which can be much smaller than in practice.

Using the smoothed weights, we can write

 E[Xs∣Y1:T]≈zs=Fs∑i=1Ws|T,jxus,i. (28)

From (2) and (2), it follows that the MMSE state estimate and conditional MSE at time step are approximated as:

 ^XMSs|T=¯¯¯¯¯zs, (29)

with optimal MMSE

 MSE(^XMSs|T,Y1:T)=||min{zs,zcs}||1. (30)

The entire procedure is summarized in Algorithm 4.

This particle smoother is an efficient method for state estimation, as will be shown in Section 6, but it is not appropriate for parameter estimation, as we will argue in the next section. A different particle smoother will be used in the next section to perform continuous parameter estimation.

## 4 Particle Filters For Maximum-Likelihood Adaptive Estimation

Suppose that the nonlinear signal model in (2) is incompletely specified. For example, the deterministic functions and may be only partially known, or the statistics of the noise processes and may need to be estimated. By assuming that the missing information can be coded into a finite-dimensional vector parameter , where is the parameter space, we propose next particle filtering approaches for simultaneous state and parameter estimation for POBDS. For simplicity and conciseness, we consider two cases: a Boolean Kalman Filter algorithm with finite (i.e., discrete) and a Boolean Kalman Smoother algorithm with , but the algorithms can be modified and even combined to perform other estimation tasks. Exact algorithms for such filters can be found in [18].

### 4.1 APF Implementation of the Discrete-Parameter ML Adaptive BKF (APF-DPMLA-BKF)

In this case, . Given the observations up to time , the log-likelihood function can be written as

 Lk(θi) =logpθi(Y1:k) (31) =logpθi(Yk∣Y1:k−1)+logpθi(Y1:k−1) =logpθi(Yk∣Y1:k−1)+Lk−1(θi),

for , where can be approximated by running the APF-BKF algorithms discussed in the previous section tuned to parameter vector .

The approximate log-likelihood is updated via

 ^Lk(θi)=^Lk−1(θi)+log∥^% \boldmathβθik∥1,i=1,…,M, (32)

with , for , and the ML estimator for both parameter and state at time can be directly obtained by running particle filters in parallel, each tuned to a candidate parameter , for :

 ^θMLk=\operatornamewithlimitsargmaxθ∈{θ1,…,θM}^Lk(θ), (33)
 (34)

for

The computation in (32)–(34) is parallelized, on-line, and entirely recursive: as a new observation at time arrives, the ML estimator can be updated easily without restarting the computation from the beginning. The procedure is summarized in Figure 1 and Algorithm 5.

### 4.2 APF Implementation of the Continuous-Parameter ML Adaptive BKS (APF-CPMLA-BKS)

Here, , and the approach developed in the last subsection for discrete parameter spaces is not directly applicable. There are two options: 1) discretize the parameters using a suitable quantization grid and apply an approach similar to the one in the last section; 2) attempt to obtain a good approximation of the MLE in the continuous parameter space directly. In this section, we describe how to implement the second option using the expectation-maximization (EM) algorithm for a particle filter implementation of a fixed-interval Boolean Kalman Smoother.

In our case, maximum likelihood estimation attempts to find the value of that maximizes the “incomplete” log-likelihood function . The EM algorithm considers instead the “complete” log-likelihood function , which includes the unknown state sequence, the assumption being that maximising the complete log-likelihood is easier than maximising the incomplete one (the reader is referred to [18] for more information on the EM algorithm for POBDS).

The EM algorithm obtains a sequence of parameter estimates . Given the current estimate , the algorithm obtain the next estimate in the sequence by computing (E-step) the function (see [18]):

 Q(θ,θ(n)) =∑X0:Tlogpθ(X0:T,Y1:T)pθ(n)(X0:T∣Y1:T) (35) =I1(θ,θ(n))+I2(θ,θ(n))+I3(θ,θ(n)),

where

 I1(θ,θ(n)) =2d∑i=1logPθ(X0=xi)Pθ(n)(X0=xi∣Y1:T), (36)
 I2(θ,θ(n))=T∑s=12d∑i=12d∑j=1logPθ(Xs=xi∣Xs−1=xj) (37) ×Pθ(n)(Xs=xi,Xs−1=xj∣Y1:T),
 I3(θ,θ(n)) =T∑s=12d∑i=1logpθ(Ys∣Xs=xi) (38) ×Pθ(n)(Xs=xi∣Y1:T),

and then maximizing (M-step) this function:

 θ(n+1)=\operatornamewithlimitsargmaxθQ(θ,θ(n)). (39)

In [18] this computation is carried out exactly. For large systems, this is impractical, for the following reasons:

1. The E-Step (computing the function) requires performing a Boolean Kalman Smoother, which is too expensive computationally.

2. The transition matrix and filtered and smoothed posterior probability vectors at all time steps must be stored, demanding large amounts of memory.

3. In certain cases, such as when and are linear in the parameter vector , it is possible to maximize using closed-form expressions (e.g. [34]). However, in general, one needs to resort to gradient-based optimization methods in the M-step. This requires evaluating and computing its derivatives, which is analytically intractable.

To address all these issues, we develop our EM algorithm based on the Forward Filter Backward Simulation [27] method. This method tries to capture the most probable state trajectories and use them to find the smoothing particles. The method contains two main steps: 1) Forward Step: the APF-BKF algorithm is employed to obtain the particles and their weights from time 0 to (). 2) Backward Step: the backward simulation procedure, which is explained in detail in the sequel, computes trajectories , where

 P(X0:T∣Y1:T) (40) =P(XT∣Y1:T)T−1∏s=0P(Xs∣Xs+1:T,