Deciding Probabilistic Program Equivalence in NetKAT
We tackle the problem of deciding whether a pair of probabilistic programs are equivalent in the context of Probabilistic NetKAT, a formal language for reasoning about the behavior of packet-switched networks. We show that the problem is decidable for the history-free fragment of the language. The main challenge lies in reasoning about iteration, which we address by a reduction to finite-state absorbing Markov chains.
This approach naturally leads to an effective decision procedure based on stochastic matrices that we have implemented in an OCaml prototype. We demonstrate how to use this prototype to reason about probabilistic network programs.
Program equivalence is one of the most fundamental problems in Computer Science: given a pair of programs, do they describe the same computation? The problem is undecidable in the general case, but it can often be solved in the context of domain-specific languages based on restricted computational models. For example, a classical approach for deciding whether a pair of regular expressions denote the same language is to first convert the expressions to deterministic finite automata, which then admit an equivalence check in almost linear time (Tarjan75). In addition to the obvious theoretical motivation, there is also an important practical reason to study program equivalence: it is a powerful tool that can be used to solve a wide range of problems in verification, compilation, and synthesis.
This paper tackles the problem of deciding equivalence in Probabilistic NetKAT (ProbNetKAT), a language for modeling and reasoning about the behavior of packet-switched networks. As its name suggests, ProbNetKAT is based on NetKAT (AFGJKSW13a; FKMST15a; compilekat), which is in turn based on Kleene algebra with tests (KAT), an algebraic system obtained by combining Boolean predicates and regular expressions. ProbNetKAT extends NetKAT with a random choice operator and a semantics based on Markov kernels (probnetkat-scott). The framework can be used to encode and reason about the behavior of randomized protocols (e.g., a routing scheme that uses random paths to forward packets to balance load (valiant82)); uncertainty about traffic demands (e.g., the diurnal/nocturnal fluctuation in access patterns commonly seen in networks for large content providers (roy15)); and failures (e.g., switches or links that are known to fail with some probability (gill11)).
The semantics of ProbNetKAT is surprisingly subtle. In particular, because the language provides an iteration operator, it is possible to write programs that generate continuous distributions over the uncountable space of history sets (probnetkat-cantor, Theorem 3). This makes reasoning about convergence non-trivial, and raises the issue of representing infinitary objects in an implementation. To address these issues, prior work (probnetkat-scott) developed a domain-theoretic characterization of ProbNetKAT that provides notions of approximation and continuity, which can be used to reason about programs using only discrete distributions. However, that work left the decidability of program equivalence as an open problem. In this paper, we settle this question positively for the history-free fragment of the language. This is a subtle and challenging problem, as many problems in probabilistic extensions of regular languages turn out to be undecidable—e.g., emptiness of probabilistic automata or, more generally, the threshold problem (i.e., is some word accepted with probability at least ?) Hence, the problem we tackle in this paper lies at the edge of decidability and requires care in its formulation.
At a technical level, our decision procedure for history-free ProbNetKAT follows a general approach: we transform programs into canonical representations for which checking equivalence is straightforward. Specifically, we define a big-step semantics that interprets each program as a finite stochastic matrix—equivalently, a Markov chain that transitions from input to output in a single step. Equivalence is trivially decidable on this representation, but the challenge lies in computing the big-step matrix in the case of iteration. Intuitively, the matrix needs to capture the result of an infinite stochastic process. We address this by embedding the system in a second Markov chain with a larger state space that models iteration in the spirit of a small-step semantics. With some care, this chain can be transformed to an absorbing Markov chain, which admits a closed form analytic solution using elementary matrix operations that represents the limit of the iteration. We prove the soundness of this approach.
Although the history-free fragment of ProbNetKAT is a restriction of the general language, it captures the “input-output” behavior of a network and is still expressive enough to handle a wide range of practical problems of interest. Many practical problems in networking are concerned with end-to-end behavior and do not require knowledge of specific routes, such as reachability, loop freedom, and isolation, and several other contemporary network verification tools including Anteater (anteater), Header Space Analysis (hsa), and Veriflow (veriflow) are also limited to a history-free model. In ProbNetKAT, the main advantage of the restriction is that it lowers the complexity of the implementation by an exponential factor. This is critical for a tool that tracks probabilities in addition to packet-forwarding behavior.
Readers familiar with prior work on probabilistic automata might wonder if we could directly apply known results on (un)decidability of probabilistic rational languages. This is not the case—probabilistic automata accept distributions over words, while ProbNetKAT programs encode distributions over languages. Hence, we believe that having a domain-specific tool for deciding equivalence of probabilistic network programs is of value. Similarly, probabilistic programming languages, which have gained popularity in the last decade motivated by applications in machine learning, focus largely on Bayesian inference. They typically come equipped with a primitive for probabilistic conditioning and often have a semantics based on sampling. ProbNetKAT is somewhat different in that it focuses on verification. Thus, having a precise and complete semantics—given by a denotational model that interprets programs as functions mapping sets of input packet histories to distributions over sets of output histories—is crucial.
We have built a prototype implementation of our approach in OCaml. It leverages Eigen and BLAS as back-end libraries for representing and transforming matrices and incorporates a number of optimizations to improve performance. Although building a scalable implementation would require much more engineering (and is not the primary focus of this paper), our prototype is already able to handle inputs of moderate size. We have used it to carry out several case studies, including one based on modeling and verifying load (im)balance in data centers. Importantly, unlike an earlier implementation of ProbNetKAT (probnetkat-scott), which implemented iteration through an infinite convergent sequence of approximations with no guaranteed bounds on the rate of convergence, our new implementation computes fixpoints directly.