Modelling modal gating of ion channels with hierarchical Markov models
Abstract.
Many ion channels spontaneously switch between different levels of activity. Although this behaviour known as modal gating has been observed for a long time it is currently not well understood. Despite the fact that appropriately representing activity changes is essential for accurately capturing time course data from ion channels, systematic approaches for modelling modal gating are currently not available. In this paper, we develop a modular approach for building such a model in an iterative process. First, stochastic switching between modes and stochastic opening and closing within modes are represented in separate aggregated Markov models. Second, the continuoustime hierarchical Markov model, a new modelling framework proposed here, then enables us to combine these components so that in the integrated model both mode switching as well as the kinetics within modes are appropriately represented. A mathematical analysis reveals that the behaviour of the hierarchical Markov model naturally depends on the properties of its components. We also demonstrate how a hierarchical Markov model can be parameterised using experimental data and show that it provides a better representation than a previous model of the same data set. Because evidence is increasing that modal gating reflects underlying molecular properties of the channel protein, it is likely that biophysical processes are better captured by our new approach than in earlier models.
1. Introduction
Ion channels regulate the flow of ions across the cell membrane by stochastic opening and closing. As soon as it became possible to detect currents generated by the movement of charged ions through the channel via the patchclamp technique [21], Colquhoun and Hawkes [7] developed the theory of modelling single ion channels with continuoustime Markov models which describe the timecourse of opening and closing that is reflected in singlechannel currents by stochastic jumps between zero (closed) and one or more small nonzero current levels in the pA range (open). The activity of an ion channel is usually measured by its open probability . But by 1983, Magleby and Pallotta [19, 18] had already observed spontaneous changes between different levels of channel activity in the calciumactivated potassium channel. Since then this phenomenon, known as modal gating, has been ubiquitously observed across a wide range of ion channels but the significance of modal gating has remained unclear.
In this study we present a general framework for building datadriven models of ion channels that account for modal gating. This is essential for accurately representing the dynamics of an ion channel—instead of producing a misleading constant intermediate open probability , a model should represent the switching between highly different levels of activity characteristic of each mode. This is illustrated in Figure 1 where data points labelled M form a segment characterised by a low open probability whereas, the segment labelled M is characterised by a high open probability. In a realistic time series, the changes between M and M occur on a time scale so slow that a model fitted directly to the sequence of closed and open events would not be able to resolve this. Thus, instead of infrequent switching between high and low open probabilities, a model fitted directly to the data would show an intermediate open probability rather than switching between high and low open probabilities. On the other hand, modes of an ion channel have been associated with biophysical properties of the channel protein [26]. Therefore, a model accounting for modal gating is more likely to appropriately relate the dynamics of ion channels to underlying biophysical states of the channel protein.
Nevertheless, except for two recent models of the inositol trisphosphate receptor (IPR), see Ullah et al. [29], Siekmann et al. [27], modal gating is usually not considered in ion channel models. One difficulty in appropriately representing modal gating of ion channels in a model is the fact that for a time series of measurements collected from an ion channel it is impossible to infer directly in which mode the channel is at a given point in time. However, Siekmann et al. [26] have shown how this information can be obtained by statistical changepoint analysis, see Figure 1. The method identifies significant changes of the open probability between adjacent segments in time series of open and closed events recorded from an ion channel.
:  M  M  M  M  M  M  M  M  M  M  
:  O  C  C  O  C  O  O  O  C  C 
As a result, after this analysis has been carried out, for each point in the time series it is not only known if the channel is open (O) or closed (C) but also, in which of the modes M, M, … the channel is. Previously, we observed stochastic switching between a nearly inactive mode M and a highly active mode M in data from the IPR [26]. In this paper we will represent the stochastic process of switching between an arbitrary number of different modes by a continuoustime Markov model with infinitesimal generator . For data by Wagner and Yule [31], empirical histograms suggest that the sojourn time distribution within mode M is not exponential (see Figures 5 and 6 in Siekmann et al. [26] and Figure (a)a). For this reason, in general, more than one state is needed for accurately representing the process of switching between modes. This means that modal sojourn times are represented by phasetype distributions, a class of distributions which is defined by the time a Markov chain spends in a set of transient states until exiting to an absorbing state [22, 23]. We assume that the infinitesimal generator representing the switching between modes , , has the following block structure:
(1) 
where the block matrices , , on the diagonal describe transitions between states that represent the same mode whereas the offdiagonal blocks represent transitions between states representing different modes and , . An example for a model for switching between two modes M and M is shown in Figure (a)a.
Our modal gating analysis illustrated in Figure 1 not only enables us to represent the stochastic process of switching between modes but by studying the dynamics within representative segments we can investigate the processes of stochastic opening and closing characteristic of each mode. For the example in Figure 1 the dynamics within mode M can be analysed by considering the sequence of open and closed events between and . The dynamics within a mode can be represented by a Markov model with infinitesimal generator which is obtained by fitting to representative segments of the same mode [27]. Similar to the sojourn times in the modes , the open and closed time distributions and , respectively, are nonexponential and more than one open or closed state may be needed for accurately representing the dynamics. For the example shown in Figure 1 we obtain two models with infinitesimal generators and , see Figure (b)b.
In this paper we develop a new mathematical model, the continuoustime hierarchical Markov model, that accounts simultaneously for both transitions between modes as well as the stochastic opening and closing within modes. Whereas a hierarchical Markov model in discrete time has been previously described [8] we are not aware of a continuoustime version discussed in the literature, so we develop the mathematical theory in detail and prove some fundamental properties. For the example of modal gating we assume that switching between modes is a toplevel process that regulates the bottomlevel process, the opening and closing of the channel characteristic of a particular mode . This is illustrated in Figure 4.
The states are numbered consecutively by subscripts whereas the superscripts indicate the mode . While the model is in mode Mor analogously within one of the states or (Figure (a)a), its opening and closing is described by the infinitesimal generator (Figure (b)b). As soon as M is left to state , the current state of model is vacated and a state of model is entered. Now, opening and closing is accounted for by until the state and mode M is left and state is entered.
The transitions between modes described via and the dynamics within modes captured by illustrated in Figure 4 can be represented in a Markov model with infinitesimal generator that is derived from the individual components and . The idea is illustrated in Figure 5 and developed formally in Section 2.
In order to account for the states as well as the states and representing the opening and closing within , the state space of the full model consists of the Cartesian products of the with the and . Thus, the state space of the full model consists of open and closed states and , respectively, where the superscripts , refer to the state in the model shown in Figure (a)a whereas the subscript is the index of the state within a model shown in Figure (b)b. For the example shown in the figure, the closed states and as well as the open states and are connected by the transition rates and . Because M is modelled by two states and , two “copies” of appear in the full model whereas there is only one “copy” of which is represented by only one state in . For transitions between modes, the rates exiting M and exiting M are weighted with stochastic vectors and that can be interpreted as initial distributions when entering M or M. The mathematical details of the construction of this model are presented in Section 2.
It is a strength of our approach that it enables us to build datadriven models of modal gating in a modular way. After segmenting ion channel data with the method by Siekmann et al. [26] we obtain a stochastic sequence of events that describes the time course of transitions between different modes. The infinitesimal generators and the can then be parameterised from these data. We demonstrate the practical implementation of this approach in Section 3 using experimental data by Wagner and Yule [31] and compare the results with our previously published model of the same data set [27].
We investigate the mathematical structure of the continuoustime hierarchical Markov model in more detail in Section 4. In particular we show that many important properties of the infinitesimal generator of the full model can be derived from the generators and . We expect that similar to its discretetime counterpart [8], the continuoustime hierarchical Markov model will have a variety of applications beyond the modelling of modal gating considered here.
We discuss our approach to modal gating in Section 5. In particular we explain why our new modelling framework is not only a better representation of ion channel dynamics but also more likely than other modelling approaches to provide a structure that realistically captures biophysical processes.
2. Methods
2.1. Preliminaries
We now develop formally the hierarchical Markov model illustrated graphically in Figures 4 and 5. First, let us describe the structure of the probability distribution over the states of the hierarchical Markov model. Let denote a state probability distribution of the model . That is, for , is the probability distribution of the states in mode . In general, we will allow to be an aggregated Markov model so that each of the components of the vector may itself be a vector. We make the convention that components and that are meant to refer to a vector are separated by semicolons, whereas components of a vector are separated by commas. Let us first assume for simplicity that all modes are represented by only one state so that the components are scalars. Then the distribution over the states of the full model is a weighting of the distributions over the distributions over the states of the models . Thus, we obtain . Here ‘’ denotes scalar multiplication of vectors with scalars . If more than one state is needed for representing the modes we must generalise appropriately the “weighting” of a vector with a vector . Such a generalisation is provided by the tensor product ‘’.
Definition 2.1 (Kronecker product ).
We will only need the special case of the tensor product for matrices, the Kronecker product. Let , . Then
(2) 
The Kronecker product also applies to vectors by identifying column vectors with  and row vectors with matrices.
Definition 2.2 (Kronecker sum ).
The Kronecker sum of square matrices and is
(3) 
where and are the identity matrices of the respective dimensions.
For some properties of Kronecker product and sum that we require for our analysis of the hierarchical Markov model (Section 4) we refer to Appendix A. For a distribution over the states of an aggregated Markov model, subvectors that represent the distributions over the states of the same mode can be naturally described by partitions.
Definition 2.3 (Partitioned vectors, multiindices).
A multiindex is any
vector . We define the absolute
value and
denote the dimension of .
A vector is
partitioned by a multiindex if
and for each we have .
Selection of the th partition of is written as
The vector space of partitioned vectors is denoted .
How distributions over the states of a hierarchical Markov model relate to distributions over the states of and can be clarified by the tensor product of partitioned vector spaces.
Definition 2.4 (Tensor product of partitioned vector spaces).
Let , , be partitioned vectors. Then the tensor product of partitioned vectors and is defined by
(4) 
with the componentwise product of and . With the tensor product ‘’ we obtain the vector space
of the partitioned vector spaces and .
Remark 2.1.
We make some remarks regarding the interpretation of Definition 2.4:

It can be easily verified that ‘’ fulfils the properties of a tensor product on the vector space .

Vectors can be written as linear combinations
(5) where . By choosing bases , , , , we obtain systems of linearly independent vectors
Thus, from (5) it is easy to see that
where again denotes the componentwise product of and .
2.2. A hierarchical Markov model for modal gating
Based on the block structure (1) of we now show how a transition matrix for the full model can be calculated from its components . Let and be the multiindices defined above. The transitions within the modes are represented in the full model by block matrices . It follows that . Moreover, we define the matrix of initial conditions for a transition from to by
(6) 
where the row vector is the initial condition for from Definition 2.5, and is a column vector of ones. We observe that so that, for we have . We can now define the components of a continuoustime hierarchical Markov model and calculate its infinitesimal generator:
Analogous to the discretetime hierarchical Markov model by Fine et al. [8], we define a continuoustime hierarchical Markov model.
Definition 2.5 (Components of a continuoustime hierarchical Markov model).
A continuoustime hierarchical Markov model (with a twolevel hierarchy) is specified by the components :

An infinitesimal generator of a Markov model with initial distribution with aggregates of states , . The are referred to as modes.

For each mode a Markov model with infinitesimal generator and initial distribution .
Then the infinitesimal generator of the aggregated model for modal gating is calculated as follows:
(7) 
It is straightforward to generalise this definition recursively to an arbitrary number of hierarchies. From Definition 2.4 and (4) we know that an arbitrary distribution over the states of the full model can be represented by a linear combination of tensor products of the form (4). We now require for initial distributions that they should arise from a single tensor product of initial distributions over the states of and initial distributions over the states of the .
Definition 2.6 (Initial distribution over the states of a hierarchical Markov model).
Let be the initial distribution over the states of the toplevel model and , a vector whose components are initial distributions over the states of the models . Then the initial distribution over the states of the full model is calculated by the tensor product ‘’ introduced in Definition 2.4:
(8) 
Remark 2.2.
We make some remarks regarding the interpretation of Definition 2.6:

Note that whereas is a stochastic vector, is not. It is easy to see that is a stochastic vector.

Algebraically, Definition 2.6 constrains initial distributions to socalled pure tensors which can be written as a single tensor product rather than a linear combination of tensor products.
It is an interesting question if the timedependent solution or the stationary distribution of the full model remain in the form for . In fact, this is generally not the case.
Remark 2.3.
Caution: In most situations, cannot be written as a pure tensor for . As discussed in Proposition 4.4 we obtain a solution for a solution of and a vector of stationary solutions of if and only if we choose initial conditions for all .
2.3. Example
As an example for the construction of the infinitesimal generator from the components we present a model that will be used in Section 3 for experimental data from the inositol trisphosphate receptor (IPR).
Let the infinitesimal generator for the switching between modes be
(9) 
and the models representing the intramodal kinetics
(10) 
with initial conditions
(11) 
Then
(12) 
with .
2.4. Parameterising the model with experimental data
In order to parameterise the components of our model, the infinitesimal generators and have to be inferred from ion channel data. We assume that the original data, a sequence of current measurements recorded with a constant sampling interval has been statistically analysed so that it has the form of Figure 1. Then each measurement has been classified as open (O) or closed (C) and it has also been determined in which mode the channel was at this point in time. The Markov model is inferred from the sequence of modes whereas the models are parameterised from sequences of that are representative of a particular mode. For example, in Figure 1, the five data points between and could be used for inferring the model representing the stochastic opening and closing within mode M.
All models are parameterised with the Bayesian method developed in Siekmann et al. [28, 25]. For inferring the infinitesimal generator the likelihood has the form
(13) 
where is a sequence of observations of modes separated by the sampling interval , is the infinitesimal generator of an aggregated Markov model, is the stationary distribution of and is a column vector of ones. The matrices project to the states of the model that represent the mode observed at data point . For example,
(14) 
with the same block structure as in (1) projects to states representing mode M, the other projection matrices are defined equivalently. The likelihood for inferring the infinitesimal generators from representative segments of of open (O) and closed (C) events (Figure 1) is analogous to (13). See Siekmann et al. [28, 25] for a detailed description of the method.
3. Datadriven modelling of modal gating
Our new framework enables us to easily construct and parameterise models for modal gating following a transparent iterative process:

Model the process of mode switching by parameterising an infinitesimal generator (Figure (a)a).

From segments of representative for the opening of closing within each of the modes M, M, … (Figure (b)b) parameterise infinitesimal generators , , …

Choose initial distributions and and combine all components by calculating the infinitesimal generator of the full model (Figure 5).
Inferring and using the Bayesian approach briefly described in Section 2.4 ensures that the resulting model will be highly parsimonious because at each step a model with the optimal number of parameters for representing stochastic switching between modes, and opening and closing within modes, is determined. We demonstrate the practical implementation of this process using data collected by Wagner and Yule [31] and compare the results with our previously published model of the same data set [27].
3.1. Step (i): Statistical analysis of modal gating
Previously, we have statistically analysed mode switching exhibited in the data by Wagner and Yule [31] and found two modes, the nearly inactive mode M with a very low open probability and the highly active mode M with , see Siekmann et al. [26] for details. As illustrated in Figure 1 we have a stochastic sequence of events M and M that are separated by a sampling interval . We have results from two types of the inositol trisphosphate receptor (type I IPR and type II IPR) for various calcium concentrations (Ca), , and , at fixed concentrations of inositol trisphosphate (IP) and adenosine trisphosphate (ATP). Empirical histograms of the sojourn times in M and M for all except one data set indicate that whereas time spent in the active mode M may be represented satisfactorily by one state, accurately representing sojourn times in the nearly inactive mode M seems to require at least two states, see Figure 16 for an example. Whereas one state accounts for the support of the sojourn time density in mode M (Figure (b)b) the more widespread sojourn time density in mode M is better approximated by two states (Figure (a)a). Thus, for five of our six data sets we parameterise with the structure of (9). For one data set (type II IPR at Ca), the histograms suggests that we need a model with two states representing M and two states representing M (Figure 20). Thus, for these data we use the following infinitesimal generator:
(15) 
3.2. Step (ii): Parameterising
Fitting to a time series of M and M using our MCMC method [28, 25] is a challenging problem. Because in a time series of a few hundred thousand up to about a million data points the number of transitions between the two modes is only in the order of hundreds, the data from which the rate constants have to be inferred are effectively very limited—despite the large number of data points. An example of a convergence plot shown in Figure 11 demonstrates that values of the two rates, and , alternate. This is due to symmetry in the model structure chosen for the model where the two states and can be swapped without changing the model. This effect can be removed by considering only one mode of the multimodal posterior, in this case by considering only samples where exceeds a certain threshold. Nevertheless, even after this correction some parameters such as the rate show a high degree of uncertainty indicated by a widespread marginal distribution (Figure 11). Mean values and standard deviations of the distributions of the model parameters are summarised in Tables 1 and 2.
Type I IPR  

Ca[M]  
0.00236708 0.000201138  0.0545511 0.00294464  
0.01  0.069589 0.0510011  0.00318407 0.00203495 
0.0020581 0.000389371  0.0619096 0.00894936  
0.05  0.01873 0.0111274  0.0101597 0.00693104 
0.00311881 0.00027425  0.0564093 0.00404197  
5  0.160984 0.06707  0.00472598 0.00189248 
Type II IPR  

Ca[M]  
0.00134665 0.000250273  0.0724154 0.00817478  
0.01  0.0714618 0.0454381  0.0139203 0.00588837 
0.00435935 0.00027004  0.0284326 0.00151654  
5  0.146953 0.0424637  0.00230764 0.000712072 
0.00112896 0.000402963  0.0732717 0.034058  
0.05  0.000756359 0.000133923  0.083959 0.0184139 
0.0451628 0.0168949  0.001816 0.000335203 
3.3. Step (iii): Parameterising and
In our previous study Siekmann et al. [27] we have already fitted a model with two states to representative segments of the inactive mode M and a model with four states for representing M, see (10) for the form of the infinitesimal generators and . Interestingly, we could show that and were independent of the concentrations of IP, ATP and Ca. The parameter values from the Supplementary Material of Siekmann et al. [27] are reproduced here for convenience (Table 3).
M  
Type I IPR  11.1 1.01  3.33 0.27 
Type II IPR  4.14 6.7  3.42 0.496 
M  
1.24 0.121  0.0879 0.0117  
Type I IPR  3.32 1.64  0.0694 0.0266 
10.5 0.0771  4.01 0.0293  
1.14 0.0956  0.0958 0.00945  
Type II IPR  4.75 1.53  0.0119 0.00357 
10.1 0.0668  3.27 0.0221 
3.4. Step (iv): The generator of the full model
After the models , and have been obtained, we finally need to specify the initial distributions , and . Consistent with the experimental assumption that recording of the data was started when the channel has reached steady state we set , and where , and are the stationary distributions of , and , respectively. After all components of our model have been specified, the infinitesimal generator of the full model can be calculated using (7).
3.5. Results
Due to the problems with fitting the infinitesimal generator (9) mentioned in Section 3.2 one may ask if a simpler twostate model representing the dynamics of modal gating would be preferable. However, the ability of a threestate model to approximate the sojourn distribution of the nearly inactive mode M more accurately (Figure (a)a) was found to be crucial for obtaining a better fit of the closed time distribution in comparison with the model from [27] (Figure (c)c). That the model structure of the hierarchical model proposed here is better able to capture the properties of the entire time series data seems even more convincing because it has—unlike the original model from Siekmann et al. [27]— been built without directly fitting to the time series at any step of its construction.
In Figure 17 we show that the bimodal closed time distribution observed for some combinations of ligand concentrations arises due to the mixing of the closed time distributions within nearly inactive mode M and active mode M both of which only have one distinct maximum.
Stronger differences between both models are observed for a data set collected from type II IPR for IP, ATP and Ca. For this experimental condition, the effect of modal gating can be observed without statistical analysis (Figure (a)a). Figure 20 shows that both modes M and M exhibit a widespread distribution of sojourn times which can only approximately be captured by a fourstate model with two states each for both M and M. Whereas the new hierarchical model can approximate the empirical distributions of both modes relatively well, the model from Siekmann et al. [27] fails due to the fact that only one characteristic sojourn time for each mode can be captured by the pair of transition rates accounting for modal gating in this model (Figure 20).
Due to the failure to account for the modal sojourn time distributions, we expect the model from Siekmann et al. [27] to reproduce the kinetics observed in the data much less accurately than the new hierarchical model. In order to illustrate this we simulated both the Siekmann et al. [27] model (Figure (c)c) and the new model (Figure (b)b). The sample path was plotted in blue when the channel was in mode M whereas it was plotted in brown when the channel was in mode M. The same colours were used for colouring the data (Figure (a)a) based on the results of the statistical analysis from Siekmann et al. [26]. The comparison shows that mode switching happens much more frequently in the model from Siekmann et al. [27] than observed in the data and the proportion of relatively long sojourns is increased with respect to the data. The frequency of mode switching and the widespread distribution of sojourn lengths is better approximated by the hierarchical model. The burst of activity observed in the data starting after approximately is not captured by either model. This would require separate statistical analysis of the trace before and after , the observed change of activity.
4. Mathematical analysis of the hierarchical Markov model
In the previous section we demonstrated that the hierarchical Markov model introduced in Section 2 provides a statistically efficient framework for systematically building models for modal gating. Now, we focus on some interesting aspects of the mathematical structure of the hierarchical Markov model and show that many important properties of the infinitesimal generator of the full model can be derived from the components of the model.
In Section 4.1 we calculate the eigenvalues of . The spectrum of consists of two parts: the eigenvalues of and a subset of the eigenvalues of the blocks . But whereas the eigenvalues of the submatrices appear in the spectrum of the submatrices , they are not eigenvalues of the full model .
From a modelling point of view it is an important question if properties of the components are preserved when they are combined in the full model. In Section 4.2 we demonstrate that the sojourn time distribution in the states representing a particular mode in the model is preserved for the analogous distribution calculated for the augmented state space of .
When the initial distributions coincide with the stationary distributions, , we calculate the full timedependent solution and the stationary distribution of from the components