Particle Filters for
PartiallyObserved Boolean Dynamical Systems
Abstract
Partiallyobserved Boolean dynamical systems (POBDS) are a general class of nonlinear models with application in estimation and control of Boolean processes based on noisy and incomplete measurements. The optimal minimum mean square error (MMSE) algorithms for POBDS state estimation, namely, the Boolean Kalman filter (BKF) and Boolean Kalman smoother (BKS), are intractable in the case of large systems, due to computational and memory requirements. To address this, we propose approximate MMSE filtering and smoothing algorithms based on the auxiliary particle filter (APF) method from sequential MonteCarlo theory. These algorithms are used jointly with maximumlikelihood (ML) methods for simultaneous state and parameter estimation in POBDS models. In the presence of continuous parameters, ML estimation is performed using the expectationmaximization (EM) algorithm; we develop for this purpose a special smoother which reduces the computational complexity of the EM algorithm. The resulting particlebased adaptive filter is applied to a POBDS model of Boolean gene regulatory networks observed through noisy RNASeq time series data, and performance is assessed through a series of numerical experiments using the wellknown cell cycle gene regulatory model.
Paestum]Mahdi Imani, Paestum]Ulisses BragaNeto
Department of Electrical and Computer Engineering,Texas A&M University, College Station, TX, USA
Key words: Adaptive Filtering, PartiallyObserved Boolean Dynamical Systems, Boolean Kalman Filter, Auxiliary ParticleFilter, FixedInterval Smoother, MaximumLikelihood Estimation, Expectation Maximization, Gene Regulatory Networks, RNASeq data.
1 Introduction
Partiallyobserved Boolean dynamical systems consist of a Boolean state process, also known as a Boolean network, observed through an arbitrary noisy mapping to a measurement space [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]. Instances of POBDSs abound in fields such as genomics [12], robotics [13], digital communication systems [14], and more. The optimal recursive minimum meansquare error (MMSE) state estimators for this model are called the Boolean Kalman Filter (BKF) [1] and the Boolean Kalman Smoother (BKS) [15]. These filters have many desirable properties; in particular, it can be shown that the MMSE estimate of the state vector provides both the MMSE and the maximumaposteriori (MAP) estimates of each state vector component. Notice that the software tool “BoolFilter” [16] is available under R library for estimation and identification of partiallyobserved Boolean dynamical systems.
However, for large systems with large number of state variables, the computation of both the BKF and BKS becomes impractical due to large computational and memory requirements. In [3], an approximate sequential MonteCarlo (SMC) algorithm was proposed to compute the BKF using sequential importance resampling (SIR). By contrast, we develop here SMC algorithms for both the BKF and fixedinterval BKS based on the more efficient auxiliary particle filter (APF) algorithm [17].
The BKF and BKS require for their application that all system parameters be known. In the case where noise intensities, the network topology, or observational parameters are not known or only partially known, an adaptive scheme to simultaneously estimate the state and parameters of the system is required. An exact adaptive filtering framework to accomplish that task was proposed recently in [18], which is based on the BKF and BKS in conjunction with maximumlikelihood estimation of the parameters. In this paper, we develop an accurate and efficient particle filtering implementation of the adaptive filtering framework in [18], which is suitable for large systems.
In the case where the parameter space is discrete (finite), the adaptive filter corresponds to a bank of particle filters in parallel, which is reminiscent of the multiple model adaptive estimation (MMAE) procedure for linear systems [19]. If the parameter space is continuous, then a particlebased version of the Expectation Maximization (EM) algorithm [20] is developed. The computational complexity of EM method arises from three main parts:

The computational complexity of applying smoothing at the Estep.

Memory necessary to store the required matrices and vectors (e.g. the posterior probability vectors) from the EStep to the MStep.

The complexity of each iteration in the Mstep in which several function evaluations are required.
Our proposed particlebased implementation addresses each of the above issues.
Our application of interest in this paper is to model Boolean gene regulatory networks [12, 22] observed through a single time series of RNAseq data [23]. Using the POBDS model, we employ the proposed approximate adaptive ML algorithm to estimate the gene expression state simultaneously to the inference of the network topology and noise and expression parameters. Performance is assessed through a series of numerical experiments using the wellknown cellcycle gene regulatory model [24]. The influence of transition noise, expression parameters, and RNAseq measurement noise (data dispersion) on performance is studied, and the consistency of the adaptive ML filter (i.e., convergence to true parameter values) is empirically established.
The article is organized as follows. In Section 2, the POBDS signal model and the Boolean Kalman Filter and Boolean Kalman Smoother are reviewed, while in Section 3, a detailed description of the APFbased filtering and smoothing algorithms proposed in this paper is provided. In Section 4, the particlebased ML adaptive filter is developed for discrete and continuous parameter spaces. A POBDS model for gene regulatory networks observed though RNAseq measurements is reviewed in Section 5. Results for the numerical experiments with the cellcycle network are presented in Section 6. Finally, Section 7 contains concluding remarks.
2 Optimal State Estimators for POBDS
In this section, we review the POBDS model and exact algorithms for computation of its optimal state estimators. For more details see [1, 4, 5, 15]. For a proof of optimality of the BKF, see [18].
We assume that the system is described by a state process , where is a Boolean vector of size . The sequence of states is observed indirectly through the observation process , where is a vector of (typically nonBoolean) measurements. The states are assumed to be updated and observed at each discrete time through the following nonlinear signal model:
(1)  
for Here, is Boolean transition noise, “” indicates componentwise modulo2 addition, is a Boolean function, called the network function, whereas is a general function mapping the current state and observation noise into the measurement space, for . The noise processes are assumed to be “white” in the sense that the noises at distinct time points are independent random variables. It is also assumed that the noise processes are independent from each other and from the initial state ; their distribution is otherwise arbitrary.
We will assume further that the Boolean process noise is zeromode, i.e., is the most probable value of the noise vector at each time . This implies that the most likely value of at each time is — this could be seen as the counterpart of the zeromean noise assumption in continuous statespace models. As is the case with nonzero mean noise, nonzero mode noise introduces a systematic error component, which can always be removed by moving it into the function . Hence, the state model in (2) is a general model for a firstorder Markov Boolean stochastic process. For a specific example, which will be adopted in Section 6, one has , for , and independently for . In this case, is zeromode if and only if . The systematic bias introduced in the case can be removed by considering the equivalent state process with , where is the vector with all components equal to 1, and , i.e., by moving the systematic bias into the model. Therefore, one effectively needs only to consider the case .
A Boolean estimator predicts the Boolean state based on the sequence of observations . The estimator is called a filter, smoother, or predictor according to whether , , or , respectively. The set of all Boolean estimators for a given and shall be denoted by . The (conditional) meansquare error (MSE) of given is:
(2) 
We would like to obtain the Boolean MMSE estimator, i.e., a Boolean estimator such that
(3) 
at each value of (so that it also minimizes the frequentist expected MMSE over all possible realizations of ). For a Boolean vector , define the binarized vector , such that if and otherwise, for , the complement vector , such that , for , and the norm . It can be proved [18] that the solution to (2) is given by
(4) 
with optimal MMSE
(5)  
where the minimum is computed componentwise.
The optimal Boolean MMSE estimator will be called a Boolean Kalman Filter (BKF), Boolean Kalman Smoother (BKS), or Boolean Kalman Predictor (BKP), according to whether , , or , respectively. The terminology is justified by the fact that we seek the MMSE estimator for a nonstationary process, as in the case of the classical Kalman Filter, as opposed to, say, the MaximumAPosteriori (MAP) estimator, which is more common in discrete settings. Interestingly, we can show that each component of the MMSE estimator is both the MMSE and the MAP estimator of the corresponding state variable , for . Perhaps surprisingly, the MAP estimator does not enjoy in general the property that is the MAP estimator of , for . In cases where optimal estimation performance for each component of is required (e.g., estimating the state of each gene in a gene regulatory network), then this is an important distinction.
Let be an arbitrary enumeration of the possible state vectors, define the state conditional probability distribution vector of length via
(6) 
and let be a matrix of size . Then it is clear that , so it follows from (2) and (2) that
(7) 
with optimal MSE
(8) 
The distribution vector can be computed by a matrixbased procedure similar to the “forwardbackward” algorithm [25]. Briefly, let of size be the transition matrix of the Markov chain defined by the state model:
(9)  
Additionally, given a value of the observation vector at time , let be a diagonal matrix of size defined by:
(10) 
where is either a probability density or a mass function, according to the nature of the measurement .
At this point, we distinguish two cases. The first is a recursive implementation of the Boolean Kalman Filter (BKF), which does not need a backward iteration, and can be iterated forward as new observations arrive, for as long as desired. In this case, we use (2) and (2) with to get the optimal filter estimator and its minimum MSE [1]. The entire procedure is given in Algorithm 1.
The second case is a fixedinterval Boolean Kalman Smoother, where a fixed batch of observations of length is available, and it is desired to obtain estimates of the state at all points in the interval . In this case, a backward iteration will be needed (unless ). Define the probability distribution vector of length via
(11) 
for , where is defined to be , the vector with all components equal to 1. It can be shown that
(12) 
where “” denotes componentwise vector multiplication. We then use (2) and (2) with to get the optimal smoothed estimator and its minimum MSE [15]. The entire procedure is given in Algorithm 2.
3 Particle Filters for State Estimation
When the number of states is large, the exact computation of the BKF and the BKS becomes intractable, due to the large size of the matrices involved, which each contain elements, and approximate methods must be used, such as sequential MonteCarlo methods, also known as particle filter algorithms which have been successfully applied in various fields [26, 27, 28, 29, 30]. In the next subsections, we describe particle filter implementations of the BKF and BKS.
3.1 Auxiliary Particle Filter Implementation of the BKF (APFBKF)
The basic algorithm to perform particle filtering is called sequential importance resampling (SIR). Importance sampling is used when direct sampling of the target distribution is difficult. The idea is to approximate the target distribution using sample points (“particles”) drawn from a proposal distribution , which is easier to sample than the target distribution. The discrepancy created by sampling from instead of is compensated by weighting each particle. After a few iterations of the algorithm, a condition in usually reached where only few of the particles have significant weights, whereas most particles have negligible weight. To address this degeneracy problem, SIR performs resampling of the particles, whereby a fresh set of particles is drawn (with replacement) from the approximate current posterior distribution.
The original particle filtering implementation of the BKF in [3] was based on the SIR algorithm; we therefore call it the SIRBKF algorithm. We present here a more sophisticated implementation based on the Auxiliary Particle Filter (APF) of [17]. The APF algorithm can be seen as a variation of SIR, and is thus also known as auxiliary SIR (ASIR). Basically, APF is a lookahead method that at time step tries to predict the location of particles with high probability at time , with the purpose of making the subsequent resampling step more efficient. Without the lookahead, the basic SIR algorithm blindly propagates all particles, even those in low probability regions. As put in [31], “it is natural to ask whether it is possible to employ knowledge about the next observation before resampling to ensure that particles which are likely to be compatible with that observation have a good chance of surviving.”
The APF algorithm augments the state vector to , where is an auxiliary variable. Particles are drawn from the filtering distribution (to be specified below), and the auxiliary variable is simply dropped to obtain particles from . Given particles at time , with associated weights , the APF algorithm defines
(13) 
for . The auxiliary variable functions thus as an index for the particles at the previous time point. As will be seen below, sampling from (3.1) will have the effect of “selecting” the particles that are compatible with the observation at time .
One can sample from (3.1) by using SIR on the following approximation:
(14) 
for , where is a characteristic of given , which can be the mean, the mode or even a sample from [17]. In our implementation, we use the mode:
(15)  
for , where we used (2). The approximation on the right is accurate as long as the “noise intensity” is low, i.e., the probability of nonzero is small.
Sampling from (3.1) is done in two steps. In the first step, is obtained from the particles using (3.1) and the firststage weights are computed as:
(16) 
for . In the second step, the auxiliary variables (i.e., the indices of the selected particles) are obtained as a sample from the discrete distribution defined by (after proper normalization). For example, if and , , and , then the indices will be independent and each will be twice as likely to be 1 or 2 than 3 or 4. We denote this by , where “Cat” stands for the categorical (discrete) distribution.
Finally, the new particles and associated secondstage weights can be obtained as follows:
(17) 
(18) 
It can be shown that the unbiased estimator of the unnormalized posterior probability at each time step can be obtained by [32]
(19) 
This quantity will be needed in Section 4 when the particle filter for maximumlikelihood adaptive estimation is discussed.
3.2 Auxiliary Particle Filter Implementation of the BKS (APFBKS)
There are a few different approximate MonteCarlo smoothing methods in the literature of nonlinear and nonGaussian systems [33, 29, 34]. It should be noted that some of these particle smoother methods suffer from degeneracy problems or can only be applied in a few special conditions (such as MC with good forgetting properties). We follow an approach similar to the wellknown fixedinterval smoother of [35] to approximate the Boolean Kalman Smoother.
As described in Section 2, a fixedinterval smoother is a forwardbackward method, such that the filtering distributions for are computed in the forward step, and the smoothed distributions are found in a backward step. The forward process is obtained here by running the APFBKF algorithm of Section 3.1, while the backward process is performed by correcting the filtering weights in the backward iteration. We explain next how the backward step is applied efficiently.
First, assume , are the forward particles and weights obtained by the APFBKF algorithm for the sequence of measurements . Due to the finite number of states in the POBDS, one can compute unique particles and their associated weights at different time steps as:
(23) 
where is the number of unique forward particles, and is the th unique particle with aggregated weight , both at time step .
The backward process is based on the following equation:
(24)  
where and is the smoothed distribution at time step . The summation over in both sides of equation (3.2) results in
(25)  
As we mentioned before, the filter and smoother estimate at final time are the same. Therefore, the smoothed weights are defined in the same way as the forward unique weights , so that
(26) 
Now, using equation (3.2), the the smoothed weights at time can be obtained as:
(27) 
The smoothed weights are obtained by solving equation (3.2) in a backward fashion using the terminal condition , . The computational complexity of equation (3.2) is of order which can be much smaller than in practice.
Using the smoothed weights, we can write
(28) 
From (2) and (2), it follows that the MMSE state estimate and conditional MSE at time step are approximated as:
(29) 
with optimal MMSE
(30) 
The entire procedure is summarized in Algorithm 4.
This particle smoother is an efficient method for state estimation, as will be shown in Section 6, but it is not appropriate for parameter estimation, as we will argue in the next section. A different particle smoother will be used in the next section to perform continuous parameter estimation.
4 Particle Filters For MaximumLikelihood Adaptive Estimation
Suppose that the nonlinear signal model in (2) is incompletely specified. For example, the deterministic functions and may be only partially known, or the statistics of the noise processes and may need to be estimated. By assuming that the missing information can be coded into a finitedimensional vector parameter , where is the parameter space, we propose next particle filtering approaches for simultaneous state and parameter estimation for POBDS. For simplicity and conciseness, we consider two cases: a Boolean Kalman Filter algorithm with finite (i.e., discrete) and a Boolean Kalman Smoother algorithm with , but the algorithms can be modified and even combined to perform other estimation tasks. Exact algorithms for such filters can be found in [18].
4.1 APF Implementation of the DiscreteParameter ML Adaptive BKF (APFDPMLABKF)
In this case, . Given the observations up to time , the loglikelihood function can be written as
(31)  
for , where can be approximated by running the APFBKF algorithms discussed in the previous section tuned to parameter vector .
The approximate loglikelihood is updated via
(32) 
with , for , and the ML estimator for both parameter and state at time can be directly obtained by running particle filters in parallel, each tuned to a candidate parameter , for :
(33) 
(34) 
for
4.2 APF Implementation of the ContinuousParameter ML Adaptive BKS (APFCPMLABKS)
Here, , and the approach developed in the last subsection for discrete parameter spaces is not directly applicable. There are two options: 1) discretize the parameters using a suitable quantization grid and apply an approach similar to the one in the last section; 2) attempt to obtain a good approximation of the MLE in the continuous parameter space directly. In this section, we describe how to implement the second option using the expectationmaximization (EM) algorithm for a particle filter implementation of a fixedinterval Boolean Kalman Smoother.
In our case, maximum likelihood estimation attempts to find the value of that maximizes the “incomplete” loglikelihood function . The EM algorithm considers instead the “complete” loglikelihood function , which includes the unknown state sequence, the assumption being that maximising the complete loglikelihood is easier than maximising the incomplete one (the reader is referred to [18] for more information on the EM algorithm for POBDS).
The EM algorithm obtains a sequence of parameter estimates . Given the current estimate , the algorithm obtain the next estimate in the sequence by computing (Estep) the function (see [18]):
(35)  
where
(36) 
(37)  
(38)  
and then maximizing (Mstep) this function:
(39) 
In [18] this computation is carried out exactly. For large systems, this is impractical, for the following reasons:

The EStep (computing the function) requires performing a Boolean Kalman Smoother, which is too expensive computationally.

The transition matrix and filtered and smoothed posterior probability vectors at all time steps must be stored, demanding large amounts of memory.

In certain cases, such as when and are linear in the parameter vector , it is possible to maximize using closedform expressions (e.g. [34]). However, in general, one needs to resort to gradientbased optimization methods in the Mstep. This requires evaluating and computing its derivatives, which is analytically intractable.
To address all these issues, we develop our EM algorithm based on the Forward Filter Backward Simulation [27] method. This method tries to capture the most probable state trajectories and use them to find the smoothing particles. The method contains two main steps: 1) Forward Step: the APFBKF algorithm is employed to obtain the particles and their weights from time 0 to (). 2) Backward Step: the backward simulation procedure, which is explained in detail in the sequel, computes trajectories , where
(40)  