Imprecise Continuous-Time Markov Chains

Imprecise Continuous-Time Markov Chains

Thomas Krak Jasper De Bock jasper.debock@ugent.be Arno Siebes
Abstract

Continuous-time Markov chains are mathematical models that are used to describe the state-evolution of dynamical systems under stochastic uncertainty, and have found widespread applications in various fields. In order to make these models computationally tractable, they rely on a number of assumptions that—as is well known—may not be realistic for the domain of application; in particular, the ability to provide exact numerical parameter assessments, and the applicability of time-homogeneity and the eponymous Markov property. In this work, we extend these models to imprecise continuous-time Markov chains (ICTMC’s), which are a robust generalisation that relaxes these assumptions while remaining computationally tractable.

More technically, an ICTMC is a set of “precise” continuous-time finite-state stochastic processes, and rather than computing expected values of functions, we seek to compute lower expectations, which are tight lower bounds on the expectations that correspond to such a set of “precise” models. Note that, in contrast to e.g. Bayesian methods, all the elements of such a set are treated on equal grounds; we do not consider a distribution over this set. Together with the conjugate notion of upper expectation, the bounds that we provide can then be intuitively interpreted as providing best- and worst-case scenarios with respect to all the models in our set of stochastic processes.

The first part of this paper develops a formalism for describing continuous-time finite-state stochastic processes that does not require the aforementioned simplifying assumptions. Next, this formalism is used to characterise ICTMC’s and to investigate their properties. The concept of lower expectation is then given an alternative operator-theoretic characterisation, by means of a lower transition operator, and the properties of this operator are investigated as well. Finally, we use this lower transition operator to derive tractable algorithms (with polynomial runtime complexity w.r.t. the maximum numerical error) for computing the lower expectation of functions that depend on the state at any finite number of time points.

Keywords: Continuous-Time Markov Chain; Imprecise Probability; Model Uncertainty; Lower and Upper Expectation; Lower Transition Operator

1 Introduction

Continuous-time Markov chains are mathematical models that can describe the behaviour of dynamical systems under stochastic uncertainty. In particular, they describe the stochastic evolution of such a system through a discrete state space and over a continuous time-dimension. This class of models has found widespread applications in various fields, including queueing theory [3, 9], mathematical finance [19, 38, 41], epidemiology [18, 27, 30], system reliability analysis [8, 20, 48], and many others [51].

In order to model a problem by means of such a continuous-time Markov chain, quite a lot of assumptions need to be satisfied. For example, it is common practice to assume that the user is able to specify an exact value for all the parameters of the model. A second important assumption is the Markov condition, which states that the future behaviour of the system only depends on its current state, and not on its history. Other examples are homogeneity, which assumes that the dynamics of the system are independent of time, and some technical differentiability assumptions. As a result of all these assumptions, continuous-time Markov chains can be described by means of simple analytic expressions.

However, we would argue that in many cases, these assumptions are not realistic and are grounded more in pragmatism than in informed consideration of the underlying system. In those cases, despite the fact that such issues are to be expected in any modelling task, we think that it is best to try and avoid these assumptions. Of course, since they are typically imposed to obtain a tractable model, relaxing these assumptions while maintaining a workable model is not straightforward. Nevertheless, as we will see, this can be achieved by means of imprecise continuous-time Markov chains [42, 44] (ICTMC’s). These ICMTC’s are quite similar to continuous-time Markov chains: they model the same type of dynamical systems, and they therefore have the same fields of application. However, they do not impose the many simplifying assumptions that are traditionally adopted, and are therefore far more robust. Notably, as we will show in this work, these models allow us to relax these assumptions while remaining computationally tractable.

The following provides a motivating toy example. It is clearly too simple to be of any practical use, but it does allow us to illustrate the simplifying assumptions that are usually adopted, and to provide a basic idea of how we intend to relax them.

Example 1.1.

Consider a person periodically becoming sick, and recovering after some time. If we want to model this behaviour using a continuous-time Markov chain with a binary state space , we need to specify a rate parameter for each of the two possible state-transitions: from healthy to sick, and from sick to healthy. Loosely speaking, such a rate parameter characterises how quickly the corresponding state-transition happens. Technically, it specifies the derivative of a transition probability. For example, for the transition from healthy to sick, the corresponding rate parameter is the derivative of with respect to , for , where is the probability of a person being sick at time , given that he or she is healthy at time . Together with the initial probabilities of a person being sick or healthy at time zero, these rate parameters uniquely characterise a continuous-time Markov chain, which can then be used to answer various probabilistic queries of interest. For instance, to compute the probability that a person will be sick in ten days, given that he or she is healthy today.

In this example, the defining assumptions of a continuous-time Markov chain impose rather strict conditions on our model. First of all: it is necessary to provide exact values—that is, point-estimates—for the initial probabilities and for the rate parameters. If these values are incorrect, this will affect the resulting conclusions. Secondly, in order to be able to define the rate parameters, the transition probabilities of the model need to be differentiable. Thirdly, the Markov assumption implies that for any time points ,

that is, once we know the person’s health state at time , his or her probability of being sick at time  does not depend on whether he or she has ever been sick before time . If it is possible to develop immunity to the disease in question, then clearly, such an assumption is not realistic. Also, it implies that the rate parameters can only depend on the current state, and not on the previous ones. Fourthly, the rate parameters are assumed to remain constant over time, which, for example, excludes the possibility of modelling seasonal variations. This fourth condition can easily be removed by considering a continuous-time Markov chain that is not homogeneous. However, in that case, the rate parameters become time-dependent, which requires us to specify (or learn from data) even more point-estimates. Similarly, although a more complex model would be able to account for history-dependence, for example by adding more states to the model, this would vastly increase the computational complexity of working with the model.

In the imprecise continuous-time Markov chains that we consider, these four conditions are relaxed in the following way. First of all, instead of providing a point-estimate for the initial probabilities and the rate parameters, we allow ourselves to specify a set of values. For the purpose of this example, we can take these sets to be intervals. The other three conditions are then dropped completely, provided that they remain compatible with these intervals. For example, the rate parameters do not need to be constant, but can vary in time in an arbitrary way, as long as they remain within their interval. Similarly, the rate parameters are also allowed to depend on the history of the process, that is, the value of the previous states. In fact, the rate parameters—as defined above—do not even need to exist, since we do not require differentiability either.

The idea of relaxing the defining assumptions of a continuous-time Markov chain is not new. Various variations on it have been developed over the years. For example, there are plenty of results to be found that deal with non-homogeneous continuous-time Markov chains [36, 1, 28]. Dropping the Markov condition is less common, but nevertheless definitely possible [24]. However, the approaches that drop this Markov assumption will typically replace it by some other, weaker assumption, instead of dropping it altogether.

A common property of all of these approaches is that they still require the parameters of the model to be specified exactly. Furthermore, since these models are typically more complex, the number of parameters that needs to be specified is a lot larger than before. Therefore, in practice, specifying such a generalised model is a lot more difficult than specifying a normal, homogeneous continuous-time Markov chain. In contrast, our approach does not introduce a large number of new parameters, but simply considers imprecise versions of the existing parameters. In particular, we provide constraints on the traditional parameters of a continuous-time Markov chain, by requiring them to belong to some specified set of candidate parameters. In this sense, our approach can be regarded as a type of sensitivity analysis on the parameters of a continuous-time Markov chain. However, instead of simply varying the parameters of a continuous-time Markov chain, as a more traditional sensitivity analysis would do, we also consider what happens when these parameters are allowed to be time- and history-dependent. Furthermore, a sensitivity analysis is typically interested in the effect of infinitesimal parameter variations, whereas we consider the effect of variations within some freely chosen set of candidate parameters.

Our approach should also not be confused with Bayesian methods [26]. Although these methods also consider parameter uncertainty, they model this uncertainty by means of a prior distribution, thereby introducing even more (hyper)parameters. Furthermore, when integrating out this prior, a Bayesian method ends up with a single ‘averaged’ stochastic process. In contrast, our approach considers set-valued parameter assessments, and does not provide a prior distribution over these sets. Every possible combination of parameter values gives rise to a different process, and we treat all of these processes on equal grounds, without averaging them out.

Among the many extensions of continuous-time Markov chains that are able to deal with complex parameter variations, the ones that resemble our approach the most are continuous-time Markov decision processes [22] and continuous-time controlled Markov chains [23]. Similar to what we do, these models vary the parameters of a continuous-time Markov chain, and allow these variations to depend on the history of the process. However, their variations are more restrictive, because the parameters are assumed to remain constant in between two transitions, whereas we allow them to vary in more arbitrary ways. The most important difference with our approach though, is that in these models, the parameter changes are not at all uncertain. In fact, the parameters can be chosen freely, and the goal is to control the evolution of the process in some optimal way, by tuning its parameters as the process evolves.

Having situated our topic within the related literature, let us now take a closer look at what we actually mean by an imprecise continuous-time Markov chain. In order to formalise this concept, we turn in this work to the field of imprecise probability [47, 43, 4]. The basic premise of this field is that, whenever it is impossible or unrealistic to specify a single probabilistic model, say , it is better to instead consider a set of probabilistic models , and to then draw conclusions that are robust with respect to variations in this set. In the particular case of an imprecise continuous-time Markov chain, will be a set of stochastic processes. Some of these processes are homogeneous continuous-time Markov chains. However, the majority of them are not. As explained in Example 1.1, we also consider other, more general stochastic processes, which are not required to be homogeneous, do not need to satisfy the Markov condition, and do not even need to be differentiable.

From a practical point of view, once a probabilistic model has been formulated and its parameters have been specified, one is typically interested in computing inferences, such as the probability of some event, or the expectation of some function. For example, as in Example 1.1, we might want to know the probability that a person will be sick in ten days, given that he or she is healthy today. Similarly, for expectations, one might for example like to know the expected utility of some financial strategy [41], the expected time until some component in a system breaks down [8] or the expected speed at which the clinical symptoms of a disease develop [18]. However, these inferences might depend crucially on the estimated values of the model parameters, and on the defining assumptions of the model. It is of interest, therefore, to study the robustness of such inferences with respect to parameter changes and relaxations of the defining assumptions of the model.

In our approach, we investigate this type of robustness as follows: we simply consider the probability or expectation of interest for each of the stochastic processes in , and then report the corresponding lower and upper bound. Intuitively, this can be regarded as providing best- and worst-case scenarios with respect to all the models in . Of course, since the stochastic processes in are not required to satisfy the Markov condition, a naive optimisation method will be highly inefficient, and in most cases not even possible. However, as we will see, it is possible to develop other, more efficient methods for computing the lower and upper bounds that we are after. If the lower and upper bounds that we obtain are similar, we can conclude that the corresponding inference is robust with respect to the variations that are represented by . If the lower and upper bound are substantially different, then the inference is clearly sensitive to these variations, and policy may then have to be adapted accordingly.

Readers that are familiar with the literature on imprecise continuous-time Markov chains [42, 44] should recognise the main ideas behind the approach that we have just described, but will also notice that our presentation differs from the one that is adopted in References [42] and [44]. Indeed, the seminal work by Škulj [42], which provided the first—and so far only—theoretical study of imprecise continuous-time Markov chains, characterised these models by means of the conditional lower and upper expectations of functions that depend on a single time point. In particular, this work defined such lower and upper expectations directly, through a generalisation of the well-known differential equation characterisation of “normal” continuous-time Markov chains. This approach allowed the author to focus on developing algorithms for solving this generalised differential equation, thereby yielding methods for the efficient computation of lower and upper expectations of functions that depend on a single time point. These results have since been successfully applied to conduct a robust analysis of failure-rates and repair-times in power-grid networks [44].

However, due to its direct characterisation of lower and upper expectations, this pioneering work left open a number of questions about which sets of stochastic processes these quantities correspond to; for instance, the question of whether or not these sets also include non-Markov processes. Furthermore, due to the focus on functions that depend on a single time point, computational methods for more general functions do not follow straightforwardly from this earlier work.

In contrast, we will in this present paper address these issues directly, by characterising imprecise continuous-time Markov chains explicitly as sets of stochastic processes. There are a number of advantages to working with such sets of processes. First of all, it removes any ambiguity as to the elements of such a set, which allows us to state exactly which assumptions the model robustifies against. Secondly, this approach allows us to prove some properties of such a set’s corresponding lower and upper expectations which, in turn, allow us to derive tractable algorithms to compute lower and upper expectations of functions that depend on any finite number of time points. Finally, our approach allows us to derive algorithms that, for the special case of functions that depend on a single time point, improve upon the computational complexity of the algorithm in Reference [42].

In summary then, our aims with the present paper are threefold. First of all, to solidify the theoretical foundations of imprecise continuous-time Markov chains. Secondly, to extend and generalise existing methods for computing the corresponding lower expectations and, as a particular case, upper expectations and lower and upper probabilities. Thirdly, to provide analytical tools that can be used for future analysis of these models, and for the development of new algorithms.

Our main contributions can be summarised as follows.

  1. We provide a unified framework for describing finite-state stochastic processes in continuous time, using the formalism of full conditional probabilities. This framework covers the full range of (non-)homogeneous, (non-)Markovian, and (non-)differentiable stochastic processes.

  2. We use this framework to formalise imprecise continuous-time Markov chains: sets of stochastic processes that are in a specific sense consistent with user-specified set-valued assessments of the parameters of a continuous-time Markov chain. We conduct a thorough theoretical study of the properties of these sets of processes, and of the lower expectations that correspond to them.

  3. We introduce a lower transition operator for imprecise continuous-time Markov chains, and show that this operator satisfies convenient algebraic properties such as homogeneity, differentiability, and Markovian-like factorisation—even if the underlying set of processes does not. Furthermore, and perhaps most importantly, we show that we can use this operator to compute lower expectations of functions that depend on the state at an arbitrary but finite number of time points.

To be upfront, we would like to conclude this introduction with a cautionary remark to practitioners: the main (computational) methods that we present do not enforce the homogeneity and—in some cases also—the Markovian independence assumptions of a traditional Markov chain, but allow these to be relaxed. Therefore, if one is convinced that the (true) system that is being modelled does satisfy these properties, then it is in general quite likely that the lower- and upper bounds that are reported by our methodology will be conservative. This is not to say that the methods that we present lead to vacuous or non-informative conclusions—see, e.g., Reference [39] for a successful application of our approach in telecommunication. However, tighter bounds—i.e. more informative conclusions—might then be obtainable by methodologies that do enforce these properties, provided of course that such methods are available and tractable. If one is uncertain, however, about whether these properties hold for one’s system of interest, then the methods that we present are exactly applicable, and the bounds that we derive will be tight with respect to this uncertainty.

1.1 Finding Your Way Around in This Paper

Given the substantial length of this paper, we provide in this section some suggestions as to what readers with various interests might wish to focus on. In principle, however, the paper is written to be read in chronological order, and is organised as follows.

First, after we introduce our notation in Section 2 and discuss some basic mathematical concepts that will be used throughout, Section 3 discusses some crucial algebraic notions that will allow us to describe stochastic processes. Section 4 then goes on to formally introduce stochastic processes and provides some powerful tools for describing their dynamics.

Next, once we have all our mathematical machinery in place, we shift the focus to imprecise continuous-time Markov chains. We start in Section 5 by considering the special case of—precise—continuous-time Markov chains and then, in Section 6, we finally formalise the imprecise version that is the topic of this paper, and prove some powerful theoretical properties of this model.

The next three sections discuss computational methods: Section 7 introduces a lower transition operator for imprecise continuous-time Markov chains, and in Sections 8 and 9, we use this operator to compute lower expectations of functions that depend on the state at an arbitrary but finite number of time points.

The last two sections provide some additional context: Section 10 relates and compares our results to previous work on imprecise continuous-time Markov chains, and in Section 11, we conclude the paper and provide some ideas for future work.

Finally, the proofs of our results are gathered in an appendix, where they are organised by section and ordered by chronological appearance. This appendix also contains some additional lemmas that may be of independent interest, a basic exposition of the gambling interpretation for coherent conditional probabilities, and proofs for some of the claims in our examples.

For readers with different interests, then, we recommend to focus on the following parts of the paper. First, if one is interested in the full mathematical theory, we strongly encourage the reader to go through the paper in chronological order. However, if one has a more passing interest in the mathematical formalism, and is content with a more conceptual understanding of what we mean by an imprecise continuous-time Markov chain, then he or she may wish to start in either Section 5 or 6. Finally, readers who are mainly interested in how to compute lower expectations for ICTMC’s might want to focus on Sections 8 and 9, referring back to Section 6.3—for details about the specific types of ICTMC’s that we consider—and Section 7.2—for the requisite details about lower transition rate operators.

Of course, skipping parts of the paper may introduce some ambiguity about the meaning and definition of the many technical concepts that we refer to throughout. In order to alleviate this as much as possible, Table 1 provides a short glossary of the most important notation and terminology.

Symbol or term Explanation Page
finite, ordered sequence of time points 2.1
set of all finite, ordered sequences of time points 2.1
set of all finite, ordered sequences of time points before time 2.1
set of all finite, ordered sequences of time points that partition the time interval 2.1
finite state space of a stochastic process 2.2
joint state space at time points 2.2
set of all functions 2.2
set of all functions 2.2
maximum () norm of a function, operator, or set of matrices 2.3
transition matrix system: a family of transition matrices satisfying certain properties 3.3
exponential transition matrix system that corresponds to the transition rate matrix 3.5
transition matrix system that corresponds to the stochastic process 4.5
restriction of the transition matrix system to the closed time interval 3.3
concatenation operator for two restricted transition matrix systems 3.6
well-behaved loosely speaking, this means “with bounded rate-of-change”; see Definitions 3.4 and 4.4 3.4, 4.4
set of all continuous-time stochastic processes 4.3
set of all well-behaved continuous-time stochastic processes 4.4
set of all well-behaved continuous-time Markov chains 5.1
set of all well-behaved, homogeneous continuous-time Markov chains 5.2
, , outer partial derivatives of the transition matrix of a stochastic process, for time and history 4.8
set of all transition rate matrices 3.2
set of transition rate matrices 6.1
initial set of probability mass functions 6.1
, , three different types of ICTMC’s; also see Definition 6.3 and , and 6.4
, , lower expectation operators of the ICTMC’s , and 6.5
lower transition rate operator 7.2
lower transition operator from time point to , corresponding to some given 7.4
family of lower transition operators corresponding to some given 7.5
Table 1: Glossary table of important notation and terminology.

2 Preliminaries

We denote the reals as , the non-negative reals as , the positive reals as and the negative reals as . For any , , and have a similar meaning. The natural numbers are denoted by , and we also define . The rationals will be denoted by .

Infinite sequences of quantities will be denoted , possibly with limit statements of the form , which should be interpreted as . If the elements of such a sequence belong to a space that is endowed with an ordering relation, we may write or if the limit is approached from above or below, respectively.

When working with suprema and infima, we will sometimes use the shorthand notation to mean that there exists a such that , and similarly for .

For any set and any subset of , we use to denote the indicator of , defined for all by if and , otherwise. If is a singleton , we may instead write .

2.1 Sequences of Time Points

We will make extensive use of finite sequences of time points. Such a sequence is of the form , with and, for all , . These sequences are taken to be ordered, meaning that for all with , it holds that . Let denote the set of all such finite sequences that are non-degenerate, meaning that for all with , it holds that for all such that . Note that this does not prohibit empty sequences. We therefore also define .

For any finite sequence of time points, let . For any time point , we then write if , and similarly for other inequalities. If , then is taken to be trivially true, regardless of the value of . We use to denote the subset of that consists of those sequences for which , and, again, similarly for other inequalities.

Since a sequence is a subset of , we can use set-theoretic notation to operate on such sequences. The result of such operations is again taken to be ordered. For example, for any , we use to denote the ordered union of and . Similarly, for any and any with , we use to denote the sequence .

As a special case, we consider finite sequences of time points that partition a given time interval , with such that . Such a sequence is taken to include the end-points of this interval. Thus, the sequence is of the form . We denote the set of all such sequences by . Since these sequences are non-degenerate, it follows that consists of a single sequence . For any with , we also define the sequential differences , for all . We then use to denote the maximum such difference.

2.2 States and Functions

Throughout this work, we will consider some fixed finite state space . A generic element of this set is called a state and will be denoted by . Without loss of generality, we can assume the states to be ordered, and we can then identify with the set , where is the number of states in .

We use to denote the set of all real-valued functions on . Because is finite, a function can be interpreted as a vector in . Hence, we will in the sequel use the terms ‘function’ and ‘vector’ interchangeably when referring to elements of .

We will often find it convenient to explicitly indicate the time point that is being considered, in which case we write to denote the state space at time , and to denote a state at time . This notational trick also allows us to introduce some notation for the joint state at (multiple) explicit time points. For any finite sequence of time points such that , we use

to denote the joint state space at the time points in . A joint state is a tuple that specifies a state for every time point in . Note that if only contains a single time point , then we simply have that . If , then is a “dummy” placeholder, which typically leads to statements that are vacuously true. For any , we use to denote the set of all real-valued functions on .

2.3 Norms and Operators

For any and any , let the norm be defined as

As a special case, we then have for any that .

Linear maps from to will play an important role in this work, and will be represented by matrices. Because the state space is fixed throughout, we will always consider square, , real-valued matrices.

As such, we will for the sake of brevity simply refer to them as ‘matrices’. If is such a matrix, we will index its elements as for all , where the indexing is understood to be row-major. Furthermore, will denote the -th row of , and will denote its -th column. The symbol will be reserved throughout to refer to the identity matrix.

Because we will also be interested in non-linear maps, we consider as a more general case operators that are non-negatively homogeneous. An operator from to is non-negatively homogeneous if for all and all . Note that this includes matrices as a special case.

For any non-negatively homogeneous operator from to , we consider the induced operator norm

(1)

If is a matrix, it is easily verified that then

(2)

Finally, for any set of matrices, we define .

These norms satisfy the following properties; Reference [13] provides a proof for the non-trivial ones.

Proposition 2.1.

For all , all from to that are non-negatively homogeneous, all and all , we have that


3 Transition Matrix Systems

We provide in this section some definitions that will later be useful for characterising continuous-time Markov chains. Because we have not yet formally introduced the concept of a continuous-time Markov chain, for now, the definitions below can be taken to be purely algebraic constructs. Nevertheless, whenever possible, we will of course attempt to provide them with some intuition.

For the purposes of this section, it suffices to say that a continuous-time Markov chain is a process which at each time is in some state . As time elapses, the process moves through the state space in some stochastic fashion. We here define tools with which this stochastic behaviour can be conveniently described.

3.1 Transition Matrices and Transition Rate Matrices

A transition matrix is a matrix that is row-stochastic, meaning that, for each , the row is a probability mass function on .

Definition 3.1 (Transition Matrix).

A real-valued matrix is said to be a transition matrix if

  1. for all ;

  2. for all .

We will use to denote the set of all transition matrices.

{restatable}

propositionlemmacompositiontransitionmatrix For any two transition matrices and , their composition is also a transition matrix.

The interpretation in the context of Markov chains goes as follows. The elements of a transition matrix describe the probability of the Markov chain ending up in state at the next time point, given that it is currently in state . In other words, the row contains the state-transition probabilities, conditional on currently being in state . We will make this connection more explicit when we formalise continuous-time stochastic processes in Section 4.

For now, we note that in a continuous-time setting, this notion of “next” time point is less obvious than in a discrete-time setting, because the state-transition probabilities are then continuously dependent on the evolution of time. To capture this aspect, the notion of transition rate matrices [33] is used.

Definition 3.2 (Transition Rate Matrix).

A real-valued matrix is said to be a transition rate matrix, or sometimes simply rate matrix, if

  1. for all ;

  2. for all such that .

We use to denote the set of all transition rate matrices.

The connection between transition matrices and rate matrices is perhaps best illustrated as follows. Suppose that at some time point , we want to describe for any state the probability of ending up in state at some time . Let denote the transition matrix that contains all these probabilities. Note first of all that it is reasonable to assume that, if time does not evolve, then the system should not change. That is, if we are in state at time , then the probability of still being in state at time , should be one. Hence, we should have , with the identity matrix.

A rate matrix is then used to describe the transition matrix after a small period of time, , has elapsed. Specifically, the scaled matrix serves as a linear approximation of the change from to . The following proposition states that, for small enough , this linear change still results in a transition matrix.

{restatable}

propositionpropstochasticfromratematrix Consider any transition rate matrix , and any such that . Then the matrix is a transition matrix. This also explains the terminology used; a rate matrix describes the “rate of change” of a (continuously) time-dependent transition matrix over a small period of time.

Of course, this notion can also be reversed; given a transition matrix , what is the change that it underwent compared to ? The following proposition states that such a change can always be described using a rate matrix. {restatable}propositionpropratefromstochasticmatrix Consider any transition matrix , and any . Then, the matrix is a transition rate matrix. Note that Proposition 3.1 essentially states that the finite-difference is a rate matrix. Intuitively, if we now take the limit as this goes to zero, this states that the derivative of a continuously time-dependent transition matrix is given by some rate matrix —assuming that this limit exists, of course. We will make this connection more explicit in Section 4.

We next introduce a function that is often seen in the context of continuous-time Markov chains: the matrix exponential [45] of , with a rate matrix and . There are various equivalent ways in which such a matrix exponential can be defined. We refer to [45] for some examples, and will consider some specific definitions later on in this work. For now, we restrict ourselves to stating the following well-known result.

Proposition 3.1.

[33, Theorem 2.1.2] Consider a rate matrix and any . Then is a transition matrix.

We conclude this section with some comments about sets of rate matrices. First, note that the set of all rate matrices, , is closed under finite sums and multiplication with non-negative scalars. Consider now any set of rate matrices. Then is said to be non-empty if and is said to be bounded if . The following proposition provides a simple alternative characterisation of boundedness.

{restatable}

propositionpropalternativedefforbounded A set of rate matrices is bounded if and only if

(3)

3.2 Transition Matrix Systems that are Well-Behaved

In the previous section, we used the notation to refer to a transition matrix that contains the probabilities of moving from a state at time , to a state at time . We now consider families of these transition matrices. Such a family specifies a transition matrix for every such that .

We already explained in the previous section that it is reasonable to assume that . If the transition matrices of a family satisfy this property, and if they furthermore satisfy the semi-group property—see Equation 4 below—we call this family a transition matrix system. We will use to refer to the set of all transition matrix systems.

Definition 3.3 (Transition Matrix System).

A transition matrix system is a family of transition matrices , defined for all with , such that for all with , it holds that

(4)

and for all , .

It will turn out that there is a strong connection between transition matrix systems and continuous-time Markov chains. We will return to this in Section 5.

In the previous section, we have seen that for any transition matrix and any , the matrix is a rate matrix, and therefore, in particular, that the finite difference is a rate matrix. We here note that this is also the case for the term whenever .

We now consider this property in the context of a transition matrix system . For all and all , such a transition matrix system specifies a transition matrix and—if —a transition matrix . We now consider the behaviour of these matrices for various values of . In particular, we look what happens to these finite differences if we take to be increasingly smaller.

For each , due to the property that we have just recalled, there will be a rate matrix that corresponds to these finite differences. If the norm of these rate matrices never diverges to as we take to zero, we call the family well-behaved.

Definition 3.4 (Well-Behaved Transition Matrix System).

A transition matrix system is called well-behaved if

(5)

Observe that this notion of well-behavedness does not imply differentiability; the limit need not exist. Rather, it implies that the rate of change of the transition matrices in is bounded at all times. In this sense, this notion of well-behavedness is similar to a kind of local Lipschitz continuity. The locality stems from the fact that, although the rate of change must be bounded for each , Equation (5) does not impose that it must be uniformly bounded (at all ) by a single “Lipschitz constant”. That said, we do not stress this connection any further, because we will shortly consider a more involved notion of well-behavedness for which this connection is less immediate; see Section 4.3.

We finally consider an important special type of transition matrix systems. We have seen in the previous section that for any and any , the matrix exponential is a transition matrix. We here consider for any the family that is generated by such transition matrices.

Definition 3.5.

For any rate matrix , we use to denote the family of transition matrices that is defined by

We call this family the exponential transition matrix system corresponding to .

{restatable}

propositionpropsystemQ For any , is a well-behaved transition matrix system.

This exponential transition matrix system corresponding to a will turn out to play a large role in the context of continuous-time Markov chains. We return to this in Section 5.

3.3 Restricted Transition Matrix Systems

We finally consider yet another construct that will be useful later: the restriction of a transition matrix system to a closed interval in the time-line .

By a closed interval , we here mean a non-empty closed subset that is connected, in the sense that for any such that , and any , it holds that . Note that for any , is such a closed interval.

For any transition matrix system and any such closed interval , we use to denote the restriction of to . Such a restriction is a family of transition matrices that is defined for all such that . We call such a family a restricted transition matrix system on . The set of all restricted transition matrix systems on is denoted by .

{restatable}

propositionproprestrtransmatsystemifsemigroup Consider any closed interval , and let be a family of transition matrices that is defined for all with . Then is a restricted transition matrix system on if and only if, for all with , it holds that and .

We call a restricted transition matrix system well-behaved if it is the restriction to of a well-behaved transition matrix system.

{restatable}

propositionpropwellrestrtransmatsystemiflimsup Consider any closed interval , and let be a restricted transition matrix system on . Then is well-behaved if and only if

(6)

where and .

Now, because these restricted transition matrix systems are only defined on some given closed interval, it will be useful to define a concatenation operator between two such systems.

Definition 3.6 (Concatenation Operator).

For any two closed intervals such that , and any two restricted transition matrix systems and , the concatenation of and is denoted by , and defined as the family of transition matrices that is given by

where , and and denote the transition matrices corresponding to and , respectively.

{restatable}

propositionpropconcatrestrtransmatsystemsissystem Consider two closed intervals such that , and any two restricted transition matrix systems and . Then their concatenation is a restricted transition matrix system on . Furthermore, if both and are well behaved, then is also well-behaved.

Example 3.1.

Consider any two rate matrices such that , and let and be their exponential transition matrix systems, which, as we know from Proposition 3.2, are well-behaved. Now choose any and define

It then follows from Proposition 3.3 that is a well-behaved transition matrix system. Furthermore, for any such that , the transition matrix that corresponds to is given by .

We also introduce a metric between restricted transition matrix systems that are defined on the same interval . For any two such restricted transition matrix systems and , we let

(7)

where, for all , it is understood that corresponds to and to . This metric allows us to state the following result.

{restatable}

propositionlemmarestrictedtransmatsystemcauchyconverges Consider any interval and let be the metric that is defined in Equation (7). The metric space is then complete. Note that this result includes as a special case that the set of all (unrestricted) transition matrix systems is complete. The following example illustrates how this result can be used.

Example 3.2.

Consider some positive constant and let be a sequence of rate matrices such that, for all , . We can then construct the following sequence. For , we let , and for all , we let

(8)

where, for all , . The resulting sequence is then clearly a subset of and, because of Proposition 3.2 and 3.3, every transition matrix system in this sequence is well-behaved. Furthermore, as is proved in Appendix I, is a Cauchy sequence, which basically means that its elements become arbitrarily close to each other as the sequence progresses.

The reason why this is of interest to us is because in a complete metric space, every Cauchy sequence converges to a limit that belongs to the same space. Hence, since is Cauchy, Proposition 3.3 allows us to infer that converges to a limit in .

As this example illustrates, Proposition 3.3 allows us to (a) establish the existence of limits of sequences of (restricted) transition matrix systems and (b) prove that these limits are restricted transition matrix systems themselves. In order to make this concept of a limit of transition matrix systems less abstract, we now provide, for a particular case of the sequence in Example 3.2, closed-form expressions for some of the transition matrices that correspond to its limit.

Example 3.3.

Let be two commuting rate matrices. For example, let be an arbitrary rate matrix and let , with .

Now let be defined by if is odd and if is even, define for all , and consider the corresponding sequence of transition matrix systems that was defined in Example 3.2. Since for all , the sequence clearly satisfies the conditions in Example 3.2—just choose —and therefore, as we have seen, converges to a limit in .

As proved in Appendix I, it then holds that for any , the transition matrix from to that corresponds to the transition matrix system is equal to

(9)

with

(10)

and

(11)

Furthermore, it can be shown that the transition matrix system is well-behaved—again, see Appendix I for a proof.

The transition matrix system in our previous example was well-behaved, and was constructed as a limit of well-behaved transition matrix systems. Therefore, one might think that the former is implied by the latter. However, as our next example illustrates, this is not the case. A limit of well-behaved transition matrix systems need not be well-behaved itself.

Example 3.4.

Consider any rate matrix such that and, for all , define and let and be defined as in Example 3.2. Then since satisfies the conditions of Example 3.2 with , the sequence has a limit in .

However, despite the fact that we know from Example 3.2 that each of the transition matrix systems , , is well-behaved, the limit itself is not well-behaved; see Appendix I for a proof.

4 Continuous-Time Stochastic Processes

We will in this section formalise the notion of a continuous-time stochastic process. However, we do not adopt the classical—Kolmogorovian, measure-theoretic—setting, but will instead be using the framework of full conditional probabilities. Our reasons for doing so are the following.

First of all, our results on imprecise continuous-time Markov chains will be concerned with events or functions that depend on an finite number of time points only. Therefore, we do not require the use of typical measure-theoretic concepts such as -algebras, -additivity, measurability, etcetera. Instead, we will impose only the bare minimum of assumptions that are required for our results. The extra structural and continuity assumptions that are typically imposed in a measure-theoretic setting are regarded as optional.

Secondly, our approach does not suffer from some of the traditional issues with probability zero. In standard settings, conditional probabilities are usually derived from unconditional probabilities through Bayes’s rule, which makes them undefined whenever the conditioning event has probability zero. Instead, we will regard conditional probabilities as primitive concepts. As a result, we can do away with some of the usual ‘almost surely’ statements and replace them with statements that are certain.

Finally, and most importantly, in our ‘imprecise’ setting, we will be working with a set of stochastic processes rather than with a single stochastic process. In this context, we will often need to prove that such a set contains a stochastic process that meets certain specific requirements. Full conditional probabilities provide a convenient framework for constructing such existence proofs, through the notion of coherence.

We start in Section 4.1 by introducing full conditional probabilities and explaining their connection with coherence. In Section 4.2, we then use these concepts to formalise continuous-time stochastic processes. Section 4.3 describes a specific subclass of stochastic processes, which we call well-behaved, and on which we will largely focus throughout this work. Finally, Section 4.4 provides some tools with which we can describe the dynamics of stochastic processes.

4.1 Full and Coherent Conditional Probabilities

Consider a variable that takes values in some non-empty—possibly infinite—outcome space . The actual value of is taken to be uncertain, in the sense that it is unknown. This uncertainty may arise because is the outcome of a random experiment that has yet to be conducted, but it can also simply be a consequence of our lack of information about . We call any subset of an event, we use to denote the set of all such events, and we let be the set of all non-empty events. A subject’s uncertainty about the value of can then be described by means of a full conditional probability [17].

Definition 4.1 (Full Conditional Probability).

A full conditional probability is a real-valued map from to that satisfies the following axioms. For all and all :

  1. ;

  2. if ;

  3. if ;

  4. if .

For any and , we call the probability of conditional on . Also, for any , we use the shorthand notation and then call the probability of . The following additional properties can easily be shown to follow from 13; see Appendix B for a proof. For all and all :

  1. ;

  2. ;

  3. ;

  4. .

Basically, 14 are just the standard rules of probability. However, there are four rather subtle differences with the more traditional approach. The first difference is that a full conditional probability takes conditional probabilities as its basic entities: is well-defined even if . The second difference, which is related to the first, is that Bayes’s rule—4—is stated in a multiplicative form; it is not regarded as a definition of conditional probabilities, but rather as a property that connects conditional probabilities to unconditional ones. The third difference is that we consider all events, and do not restrict ourselves to some specific subset of events—such as a -algebra. The fourth difference, which is related to the third, is that we only require finite additivity—3—and do not impose -additivity.

The ‘full’ in full conditional probability refers to the fact that the domain of is the complete set . At first sight, this might seem unimportant, and one might be inclined to introduce a similar definition for functions whose domain is some subset of . However, unfortunately, as our next example illustrates, such a definition would have the property that it does not guarantee the possibility of extending the function to a larger domain , with .

Example 4.1.

Let be the set of possible values for the throw of a—possibly unfair—die and let , where the events and correspond to an odd or even outcome of the die throw, respectively. The map that is defined by

then satisfies 1-4 on its domain. However, if we extend the domain by adding the trivial couple , it becomes impossible to satisfy 1-4, because 2 and 3 would then require that

which is clearly a contradiction.

In order to avoid the situation in this example, that is, in order to guarantee the possibility of extending the domain of a conditional probability in a sensible way, we use the concept of coherence [6, 16, 35, 49, 50].

Definition 4.2 (Coherent conditional probability).

Let be a real-valued map from to . Then is said to be a coherent conditional probability on if, for all and every choice of and , ,111Many authors replace the maximum in this expression by a supremum, and also impose an additional inequality, where the maximum—supremum—is replaced by a minimum—infimum—and where the inequality is reversed [7, 6, 35]. This is completely equivalent to our definition. First of all, if the maximum is replaced by a supremum, then since is finite and because, for every , and can only take two values— or —it follows that this supremum is taken over a finite set of real numbers, which implies that it is actually a maximum. Secondly, replacing the maximum by a minimum and reversing the inequality is equivalent to replacing the in our expression by their negation, which is clearly allowed because the coefficients can take any arbitrary real value.

with .

The interested reader is invited to take a look at Appendix H, where we provide this abstract concept with an intuitive gambling interpretation. However, for our present purposes, this interpretation is not required. Instead, our motivation for introducing coherence stems from the following two results. First, if , then coherence is equivalent to the axioms of probability, that is, properties 14.

Theorem 4.1.

[35, Theorem 3] Let be a real-valued map from to . Then is a coherent conditional probability if and only if it is a full conditional probability.

Secondly, for coherent conditional probabilities on arbitrary domains, it is always possible to extend their domain while preserving coherence.

Theorem 4.2.

[35, Theorem 4] Let be a coherent conditional probability on . Then for any , can be extended to a coherent conditional probability on .

In particular, it is therefore always possible to extend a coherent conditional probability on , to a coherent conditional probability on . Due to Theorem 4.1, this extension is a full conditional probability. The following makes this explicit.

{restatable}

corollarycorolcoherentextendable Let be a real-valued map from to . Then is a coherent conditional probability if and only if it can be extended to a full conditional probability. Note, therefore, that if is a coherent conditional probability on , we can equivalently say that it is the restriction of a full conditional probability. Hence, any coherent conditional probability on is guaranteed to satisfy properties 1-4. However, as was essentially already illustrated in Example 4.1, and as our next example makes explicit, the converse is not true.

Example 4.2.

Let , , , and be defined as in Example 4.1. Then as we have seen in that example, satisfies 1-4 on its domain . However, is not a coherent conditional probability on , because if it was, then according to Corollary 4.1, could be extended to a full conditional probability. Since includes , the argument at the end of Example 4.1 implies that this is impossible. A similar conclusion can be reached by verifying Definition 4.2 directly; we leave this as an exercise.

4.2 Stochastic Processes as a Special Case

A (continuous-time) stochastic process is now simply a coherent conditional probability on a specific domain , or equivalently, the restriction of a full conditional probability to this domain —see Definition 4.3. However, before we get to this definition, let us first provide some intuition.

Basically, a continuous-time stochastic process describes the behaviour of a system as it moves through the—finite—state space over a continuous time line . A single realisation of this movement is called a path or a trajectory. We are typically uncertain about the specific path that will be followed, and a stochastic process quantifies this uncertainty by means of a probabilistic model, which, in our case, will be a coherent conditional probability. These ideas are formalised as follows.

A path is a function from to , and we denote with the value of at time . For any sequence of time points and any path , we will write to denote the restriction of to . Using this notation, we write for any that if, for all , it holds that .

The outcome space of a stochastic process is a set of paths. Three commonly considered choices are to let be the set of all paths, the set of all right-continuous paths [33], or the set of all cadlag paths (right-continuous paths with left-sided limits) [37]. However, our results do not require such a specific choice. For the purposes of this paper, all that we require is that

(12)

Thus, must be chosen in such a way that, for any non-empty finite sequence of time points and any state assignment on those time points, there is at least one path that agrees with on . Essentially, this condition guarantees that is “large enough to be interesting”. It serves as a nice exercise to check that the three specific sets in the beginning of this paragraph each satisfy this condition.

For any set of events , we use to denote the algebra that is generated by them. That is, is the smallest subset of that contains all elements of , and that is furthermore closed under complements in and finite unions, and therefore also under finite intersections. Furthermore, for any and , we define the elementary event

and, for any , we let

be the set of elementary events whose time point is either preceded by or belongs to , and we let be the algebra that is generated by this set of elementary events.

Consider now any . Then on the one hand, for any , it clearly holds that . On the other hand, for any , the event

belongs to , because it follows from Equation (12) that this event is non-empty. Hence, for any and , we find that . Since this is true for every , it follows that

is a subset of . It is also worth noting that if , then is vacuously true, which implies that in that case, .

That being said, we can now fi