A Network Coding Approach to Loss Tomography

# A Network Coding Approach to Loss Tomography

## Abstract

Network tomography aims at inferring internal network characteristics based on measurements at the edge of the network. In loss tomography, in particular, the characteristic of interest is the loss rate of individual links and multicast and/or unicast end-to-end probes are typically used. Independently, recent advances in network coding have shown that there are advantages from allowing intermediate nodes to process and combine, in addition to just forward, packets. In this paper, we study the problem of loss tomography in networks with network coding capabilities. We design a framework for estimating link loss rates, which leverages network coding capabilities, and we show that it improves several aspects of tomography including the identifiability of links, the trade-off between estimation accuracy and bandwidth efficiency, and the complexity of probe path selection. We discuss the cases of inferring link loss rates in a tree topology and in a general topology. In the latter case, the benefits of our approach are even more pronounced compared to standard techniques, but we also face novel challenges, such as dealing with cycles and multiple paths between sources and receivers. Overall, this work makes the connection between active network tomography and network coding.

Network Coding, Network Tomography, Link Loss Inference.

## I Introduction

Distributed Internet applications often need to know information about the characteristics of the network. For example, an overlay or peer-to-peer network may want to detect and recover from failures or degraded performance of the underlying Internet infrastructure. A company with several geographically distributed campuses may want to know the behavior of one or several Internet service providers (ISPs) connecting the campuses, in order to optimize traffic engineering decisions and achieve the best end-to-end performance. To achieve this high-level goal, it is necessary for the nodes participating in the application or overlay to monitor Internet paths, assess and predict their behavior, and eventually make efficient use of them by taking appropriate control and traffic engineering decisions both at the network and at the application layers. Therefore, accurate monitoring at minimum overhead and complexity is of crucial importance in order to provide the input needed to take such informed decisions. However, there is currently no incentive for ISPs to provide detailed information about their internal operation and performance or to collaborate with other ISPs for this purpose. As a result, distributed applications usually rely on their own end-to-end measurements between nodes they have control over, in order to infer performance characteristics of the network.

Over the past decade, a significant research effort has been devoted to a class of monitoring problems that aim at inferring internal network characteristics using measurements at the edge [tomography-survey]. This class of problems is commonly referred to as tomography due to its analogy to medical tomography. In this work, we are particularly interested in loss tomography, i.e., inferring the loss probabilities (or loss rates) of individual links using active end-to-end measurements [minc, general, nowak, tomo-unicast-1, tomo-unicast-2]. The topology is assumed known and sequences of probes are sent and collected between a set of sources and a set of receivers at the network edge. Link-level parameters, in this case loss rates of links, are then inferred by the observations at the receivers. The bandwidth efficiency of these methods can be measured by the number of probes needed to estimate the loss rates of interest within a desired accuracy. Despite its significance and the research effort invested, loss tomography remains a hard problem for a number of reasons, including complexity (of optimal probe routing and of estimation), bandwidth overhead, and identifiability (the fundamental fact that tomography is an inverse problem and we cannot directly observe the parameters of interest). Moreover, there are some practical limitations such as the lack of cooperation of ISPs, the need for synchronization of sources in some schemes, etc.

Recently, a new paradigm to routing information has emerged with the advent of network coding [nc1, nc2, netcodingwebpage]. The main idea in network coding is that, if we allow intermediate nodes to not only forward but also combine packets, we can obtain significant benefits in terms of throughput, delay and robustness of distributed algorithms. Our work is based on the observation that, in networks equipped with network coding capabilities, we can leverage these capabilities to significantly improve several aspects of loss tomography. For example, with network coding, we can combine probes from different paths into one, thus reducing the bandwidth needed to cover a general graph and also increasing the information per packet. Furthermore, the problem of optimal probe routing, which is known to be NP-hard, can be solved with linear complexity when network coding is used.

This paper proposes a framework for loss tomography (including mechanisms for probe routing, probe and code design, estimation, and identifiability guarantees) in networks that already have network coding capabilities. Such capabilities do not exist yet on the Internet today, but are available in wireless mesh networks, peer-to-peer and overlay networks and we expect them to appear in more environments as network coding becomes more widely adopted. We show that, in those settings, our network coding-based approach improves the following aspects of the loss tomography problem: how many links of the network we can infer (identifiability); the tradeoff between how well we can infer link loss rates (estimation accuracy) and how many probes we need in order to do so (bandwidth efficiency); how to select sources and receivers and how to route probes between them (optimal probe routing). Overall, this is a novel application of network coding techniques to a practical networking problem, and it opens a promising research direction.

The structure of the paper is as follows. Section II discusses related work. Section III states the problem and summarizes the challenges and main results. Section IV presents a motivating example and provides the conditions of identifiability. Sections V and VI present in detail the framework and mechanisms in the cases of trees and general topologies, respectively. Section LABEL:sec:conclusion concludes the paper.

## Ii Related Work

Network Tomography. The term network tomography typically refers to a family of problems that aim at inferring internal network characteristics from measurements at the edge of the network. Internal characteristics of interest may include link-level parameters (such as loss and delay metrics) or the network topology. Another type of tomography problem aims at inferring path-level traffic intensity (e.g., traffic matrices) from link-level measurements [vardi]. Our paper focuses on inferring the loss rates of internal links using active end-to-end measurements and assuming that the topology is known. Therefore, it is related to the literature on loss tomography, part of which is discussed below.

Caceres et al. considered a single multicast tree with a known topology and inferred the link loss rates from the receivers’ observations [minc]. In particular, they developed a low-complexity algorithm to compute the maximum likelihood estimator (MLE), by taking into account the dependencies introduced by the tree hierarchy to factorize the likelihood function and eventually compute the MLE in a recursive way. Throughout this paper, we refer to the MLE for a multicast tree, developed in [minc], as MINC, and we build on it. Bu et al. used multiple multicast trees to cover a general topology and proposed an EM algorithm for link loss rate estimation [general]. Follow-up approaches have been developed for unicast probes [tomo-unicast-1, tomo-unicast-2], joint inference of topology and link loss rates [nowak], and adaptive tomography and delay inference [tomo-delay]. The above list of references is not comprehensive. Good surveys of network tomography can be found in [tomography-survey, gmichael-survey].

Active vs. Passive Tomography. Tomography can be based either on active (generating probe traffic) or on passive (monitoring traffic flows and sampling existing traffic) measurements. Passive approaches have been most commonly used for estimating path-level information, in particular, origin-destination traffic matrices, from data collected at various nodes of the network [vardi]. This approach and problem statement are well-suited for the needs of a network provider. For the problem of inferring link loss rates, active probes are typically used, and information about individual packets received or lost is analyzed at the edge of the network. This approach is better suited for end users that do not have access to the network. However, there are also papers that study link loss inference by using existing traffic flows to sample the state of the network [passive1, passive2]. Once measurements have been collected following either of the two methods, statistical inference techniques are applied to determine network characteristics that are not directly observed.

The passive approach has the advantage that it does not impose additional burden on the network and that it measures the actual loss experienced by real traffic. However, it must also ensure that the characteristics of the traffic (e.g., TCP) do not bias the sample. In the active approach, one has more control over designing the probes, which can thus be optimized for efficient estimation. The downside is that we inject measurement traffic that may increase the load of the network, may be treated differently than regular traffic, or may even be dropped e.g., due to security concerns.

Our Work. We make the connection between active network tomography and network coding capabilities. In [allerton05], we introduced the basic idea of leveraging network coding capabilities to improve network monitoring. In [ita07], we studied link loss estimation in tree topologies. In [globecom08], we extended the approach to general graphs. In [netcod2011], we built on MINC [minc], and we provided the MLEs of the loss rates for all links simultaneously, in multiple-source tree topologies with multicast and network coding; similarly to MINC, we presented an efficient algorithm for computing the MLEs, we proved the correctness, and we analyzed the rate of convergence. This paper combines ideas from these preliminary conference papers into a common framework, and extends them by a more in-depth analysis of identifiability, routing, estimation and code design.

Our approach is active in that probes are sent/received from/to the edge of the network and observations at the receivers are used for statistical inference. Intermediate nodes forward packets using unicast, multicast and simple coding operations. However, the operations at the intermediate nodes need to be set-up once, fixed for all experiments, and be known for inference. Therefore, our approach requires more support from the network than traditional tomography, for the benefit of more accurate/efficient estimation. Our methods may also be applicable to passive tomography, where instead of sending specialized probes, one can view the coding coefficients on a network coded packet as the “probe”, thus overloading them with both communication and tomographic goals, as it is the case in [jaggi, jaggi-arxiv]. In this paper, we focus exclusively on the tomographic goals by taking an active approach, i.e., sending, collecting, and analyzing specialized probes for tomography.

## Iii Problem Statement

### Iii-a Model and Definitions

#### Network and Monitoring Scheme

We consider a network represented as a graph , where is the set of nodes and is the set of edges corresponding to logical links1. We use the notation for the link connecting vertex to vertex . We assume that has no self-loops and that there is a loss rate associated with every edge in .2 The topology is assumed to be known.

We assume that packet loss on a link is i.i.d Bernoulli with probability , where , and is the success probability of link . Losses are assumed to be independent across links. Let be the vector of the link success probabilities3. In loss tomography, we are interested in estimating all or a subset of the parameters in . We use additional notation for the case of tree topologies, as we explain in Section V-B1.

A set of source nodes in the periphery of the network can inject probe packets, while a set of receivers can collect such packets. Several problem variations in the choice of sources and receivers are possible, and we will discuss the following in this paper: (i) the set of sources and the set of receivers are given and fixed; (ii) a set of nodes that can act as either sources or receivers is given (and we can select among them); (iii) we are allowed to select any node to act as a source or a receiver. We assume that intermediate nodes are equipped with unicast, multicast and network coding capabilities. Probe packets are routed and coded inside the network following specific paths and according to specified coding operations. We assume that the packets incur zero transmission, propagation and processing delay as they travel through the network. The routes selected and the operations the intermediate nodes perform are part of the design of the tomography scheme: they are chosen once at set-up time and are kept the same throughout all experiments; all operations of intermediate nodes are known during estimation. For the theoretical results of this paper, we focus on synchronized acyclic networks with zero delay4; for cyclic networks, we convert them to acyclic networks by a proper choice of routing and sources/receivers.

In general, a probe packet is a vector of symbols, with each symbol being in a finite field . This includes as special cases: scalar network coding (for ), operations over binary vectors (for ), and more generally, vector network coding (for )5. In one experiment, we send probes from all sources and we collect probes at the receivers: each source injects one probe packet in the network, and each receiver receives one probe . The observations at all receivers is a vector in the space . For a given set of link success probabilities , the probability distribution of all observations will be denoted by . The probability mass function for a single observation is .

To estimate the success rates of links, we perform a sequence of independent experiments. Let denote the number of probes for which the observation is obtained, where . The probability of independent observations (each ) is:

 p(x1,⋯,xn;α)=n∏t=1p(xt;α)=∏x∈Ωp(x;α)n(x) (1)

It is convenient to work with the log-likelihood function, which calculates the logarithm of this probability:

 L(α)=logp(x1,⋯,xn;α)=∑x∈Ωn(x)logp(x;α) (2)

We make two assumptions, which are both realistic in practice and standard in the tomography literature:

• We perform sufficient measurements so that each observation at the receivers occurs at least once, i.e., . This ensures that no term in the likelihood function becomes a constant (due to a zero exponent). Note that the final equality in Eq.(1) and Eq.(2) is valid due to this assumption.

• The probability of loss on a link is not 1, i.e., . This ensures that the log-likelihood function is well-defined and differentiable.

The goal is to use the observations at the receivers, the knowledge of the network topology, and the knowledge of the routing/coding scheme to estimate the success rates of internal links of interest. We may be interested in estimating the success rate on a subset of links, or on all the links.

###### Definition 1

A monitoring scheme for a given graph refers to a set of source nodes, a set of receivers, a set of paths that connect the sources to the receivers, the probe packets that sources send, and the operations that intermediate nodes perform on these packets.

We use the notion of link identifiability as it was defined in [minc] (Theorem 3, Condition (i)):

###### Definition 2

A link is called identifiable under a given monitoring scheme iff: and implies .

To illustrate the concept, consider two consecutive links and in a row, where node has degree 2, and is neither a source nor a receiver. These links are not identifiable, as maximizing the log-likelihood function would only allow us to identify the value of the product , and thus, would lead to an infinite number of solutions. This is because, it is not possible to distinguish whether a packet gets dropped on link or . Note, however, that the case of having two links in a row is ruled out by our assumption of working on a graph with logical links (all vertices in the graph have degree three or greater). Another case that are not identifiable, which is possible even on a graph with logical links, is when both links belong to every path used from any source to any receiver.

Identifiability is not only a property of the network topology, but also depends on the monitoring scheme. One of the main goals of the monitoring scheme design is to maximize the number of identifiable links. However, our definition of identifiability does not depend on the estimator employed. Essentially, identifiability depends on the probability distribution and on whether this uniquely determines .

#### Estimation

The maximum likelihood estimator (MLE) identifies the parameters that maximize the probability of the observations :

 ˘α=argmaxα∈(0,1]|E|L(α) (3)

Candidates for the MLE are the solutions of the likelihood equation:

 ∂L∂αe(α)=0,e∈E (4)

We can compute the MLE for tree networks as we see in Section V-B. However, it becomes computationally hard for large networks; this creates the need for faster algorithms that provide good approximate performance in practice.

To measure the per link estimation accuracy, we use the mean-squared error (MSE): . In order to measure the estimation performance on all links , we need a metric that summarizes all links. We use an entropy measure that captures the residual uncertainty. Since we expect the scaled estimation errors to be asymptotically Gaussian (similar to the case in [minc]), we define the quality of the estimation across all links as

 ENT=∑e∈Elog(E[^αe−αe]2), (5)

which is a shifted version of the entropy of independent Gaussian random variables with the given variances [CoverThomas91]. If the entire error covariance matrix is available, then we can compute the metric as , which captures also the correlations among the errors on different links. The metric defined above captures only the diagonal elements of , i.e., the for each link independently of the others.

In some cases, we approximate the error covariance matrix using the Fisher information matrix . Under mild regularity conditions (see for example Chapter 7 in [Lehmann99]), the scaled asymptotic covariance matrix of the optimal estimator is lower-bounded by the Cramer-Rao bound . The Fisher information matrix is a square matrix with element defined as

 Ip,q(α)=−E[∂∂αplogp(X(R);α)∂∂αqlogp(X(R);α)] (6)

where are the success probabilities of two links. In particular, under the regularity conditions, the MLE is asymptotically efficient; i.e., it asymptotically, in sample size achieves this lower bound.

### Iii-B Subproblems

Given a certain network topology, a monitoring scheme for loss tomography can be designed by solving the following subproblems.

1) Identifiability: For each link , derive conditions that the scheme should satisfy so that the edge is identifiable. Whether the goal is to maximize the number of identifiable edges, or to measure the link success rate on a particular set of edges, the identifiability conditions will guide the routing and code design choices.

2) Routing: Select the sources and receivers of probe packets, the paths through which probes are routed, and the nodes where they will be linearly combined.6 The design goals include minimizing the utilized bandwidth, and improving the estimation accuracy, while respecting the required identifiability conditions.

3) Probe and Code Design: Select the contents of the probes sent by the sources and the operations performed at intermediate nodes. The goal is to use the simplest operations and the smallest finite field, while ensuring that the identifiability conditions are met.

4) Estimation Algorithm: This is the algorithm that processes the collected probes at the receivers and estimates the link loss rates. The objective is low complexity with good estimation performance. There is clearly a tradeoff between the estimation error and the measurement bandwidth.

We note that these steps are not independent from each other. In fact, the design of routing, probe and code design needs to be done with identifiability and estimation in mind.

### Iii-C Main Results

In this paper, we propose a monitoring scheme for loss tomography in networks that have multicast and network coding capabilities. In Sections V and VI, we present our design for the cases of trees and general topologies, respectively. We evaluate all our schemes through extensive simulation results. Below we preview the main results, in each subproblem.

1) Identifiability: (1) We provide simple necessary and sufficient conditions for identifying the loss rate of a single link. In (logical) tree topologies, all links are identifiable, using a very simple monitoring scheme7. In general topologies, where identifiability depends on the routing and code design as well, these conditions still apply. (2) We also prove a structural property, which we call reversibility: if a link is identifiable under a given monitoring scheme, it remains identifiable if we reverse the directionality of all paths and exchange the role of sources and receivers (which we call the dual configuration).

2) Routing: (1) For a given set of sources and receivers over an arbitrary topology, the problem of selecting a routing that meets the identifiability conditions while minimizing the employed bandwidth is NP-hard. We prove that, when network coding is used, this problem can be solved in polynomial time. (2) Moreover, we demonstrate, via simulation, that the choice of sources and receivers affects the estimation accuracy. (3) Finally, we present heuristic orientation algorithms for general graphs, designed to achieve identifiability, small number of receivers, and high estimation accuracy.

3) Probe and Code Design: (1) In trees, we show that binary vectors sent by the sources and deterministic code design with XOR operations at the intermediate nodes are sufficient. (2) In general graphs, we need to use operations over higher finite fields. We provide bounds on the required alphabet size, and we propose and evaluate deterministic code design.

4) Loss Estimation: (1) In a tree topology (under mild conditions on the selection of sources and receivers), we develop a low-complexity method for computing the MLE of the loss rates for all links simultaneously. Our algorithm builds on and extends MINC (the well-known ML estimator [minc] for a multicast tree) to multiple-source multiple-destination tree topologies (with multicast at branching points and network coding at joining points). We describe the algorithm, prove its correctness, and analyze its rate of convergence. (2) A key property that we formulate, prove, and extensively use in this work, is reversibility, i.e., the fact that the MLE’s for a configuration and its dual (defined as the same topology, but with the role of sources and receivers reversed) have the same functional form. For example, the MLE for a reverse multicast tree (with several sources and one receiver) has the same functional form as MINC for a multicast tree (with the role of the source and the receivers reversed); we refer to the MLE for the reverse multicast tree as RMINC. (3) For topologies other than trees, no efficient MLE algorithm is known for estimating the loss rates of all links simultaneously. Therefore, we propose a number of heuristic algorithms, including belief propagation and subtree decomposition algorithms, and we evaluate their performance through simulation. (4) We provide a simple algorithm for computing the MLE of a single link at a time in any topology. This is particularly useful in practice because: (i) a few bottleneck links are typically congested, thus of interest; and (ii) the method is applicable to any topology, even if it is not of the type (1) above.

The use of network coding at intermediate nodes, in addition to unicast and multicast, offers several benefits for loss tomography: it increases the number of identifiable links; it improves the tradeoff between number of probes and estimation accuracy; and it reduces the complexity of selecting probe paths for minimum cost monitoring of a general graph from NP-hard to linear. The approach gracefully generalizes from trees to general topologies (e.g., having the same identifiability conditions, using the same estimation algorithm, and avoiding the use of overlapping trees or paths), where its advantages are amplified.

## Iv Motivating Example

In this section, we present a motivating example to demonstrate the benefits of network coding in identifying the link loss rates; we derive the conditions of identifiability for a single link; and we discuss the identifiability of all links in the network.

###### Example 1

Consider the 5-link topology depicted in Fig. 1. Nodes and send probes and nodes and receive them. Every link can drop a packet according to an i.i.d. Bernoulli distribution, with probability , independently of other links. We are interested in estimating the success probabilities of all links, namely , , , , and .

The traditional multicast-based tomography approach would use two multicast trees rooted at nodes and and ending at and . This approach is depicted in Fig. 1-(a) and (b). At each experiment, source sends packet and source sends packet . The receivers and infer the link loss rates by keeping track of how many times they receive packets and . Note that, due to the overlap of the two trees, for each experiment, links , , and are used twice, leading to inefficient bandwidth usage. Moreover, from this set of experiments, we cannot calculate , and thus edge is not identifiable. Indeed, by observing the outcomes of experiments on each multicast tree, we cannot distinguish whether packet is dropped on edge or ; similarly, we cannot distinguish whether packet is dropped on edge or . (Note that if we restricted ourselves to unicast only, four unicast probes from to would be needed to cover all five links. Not only would the problems of identifiability and overlap of probe paths still be present, but they would be further amplified.)

If network coding capabilities are available, they can help alleviate these problems. Assume that the intermediate node can combine incoming packets before forwarding them to outgoing links. Node sends to a probe packet with payload that contains the binary string . Similarly, node sends probe packet to node . If node receives only or only , then it just forwards the received packet to node ; if receives both packets and , then it creates a new packet, with payload their linear combination , and forwards it to node ; more generally, , where is the bit-wise XOR operation. Node multicasts the incoming packet to both outgoing links and . The flow of packets in this experiment is shown in Fig. 1(c). In every experiment, probe packets are sent from , , and may or may not reach , , depending on the state of the links. Observe that with the network coding approach, link becomes identifiable. Moreover, we have avoided the overlap of probes on link CD during each experiment.

Table I lists the 10 possible observed outcomes, the state of the links that leads to a particular outcome, the probability , of observing this outcome, and the number of times , we observe this outcome in a sequence of independent experiments. The probability of observing an outcome can be computed from the success probabilities of the five links. E.g., for outcomes 1-4:

 p0=1−p1⋯−p9=1−(1−¯¯¯¯αAC¯¯¯¯αBC)αCD(1−¯¯¯¯αDE¯¯¯¯αDF)p1=αAC¯¯¯¯αBCαCDαDE¯¯¯¯αDFp2=¯¯¯¯αACαBCαCDαDE¯¯¯¯αDFp3=αACαBCαCDαDE¯¯¯¯αDF⋯ (7)

and we can write similar expressions for the probabilities of the remaining observations. Thus, we can explicitly write down the probability distribution of the observations .

In a sequence of independent experiments, the frequency of each event is . After sending independent probes, the log-likelihood function of the observations given the set of parameters is: . The MLE would compute the ’s that maximize .

In general, we may be interested in estimating one of the variables, some of them, or all five of them. In the next Section, we discuss a single link, namely link . Note that the remaining four links can depict the equivalent paths connecting to the sources and receivers. In Section IV-B, we discuss the identifiability of all links.

### Iv-a Identifiability of One Link

Let us focus on a single link with success probability . Consider Fig. 2 , which generalizes the motivating example of the previous Section. Note that links other than can be viewed as summarizing paths: e.g., AC could correspond to a path from A to C, possibly consisting of the concatenation of several links.

For a given choice of sources and receivers and a coding scheme described in Section V-B1 (which is extremely simple: just pick any leaf or leaves as sources and the remaining leaves as receivers; sources send binary vectors; intermediate nodes simply code using bit-wise XOR or multicast), we want to translate the conditions for identifiability of link in Definition 2 to graph properties of the network. Our intuition is that a link is identifiable if is a source, a coding point or a branching point, and is a receiver, a coding point or a branching point. These are the structures depicted in Fig. 2, where we want to identify the link success rate associated with edge , and interpret the remaining edges as corresponding to paths. The top two cases of Fig. 2 depict the simple cases where node is a source, or node is a receiver; the four bottom cases depict the cases where and are coding or branching points.

To formalize this intuition, consider the following two conditions:

• Condition 1: At least one of the following holds:
(a) .
(b) There exist two edge-disjoint paths and that do not employ edge , with distinct .
(c) There exist two paths and that do not employ edge , with , .

• Condition 2: At least one of the following holds:
(a) .
(b) There exist two edge-disjoint paths and that do not employ edge , with distinct .
(c) There exist two paths and that do not employ edge , with , .

###### Theorem IV.1

For a given choice of sources and receivers and for the simple coding scheme described above, link is identifiable if and only if both Conditions 1 and 2 hold.

The proof is provided in Appendix A.1.

### Iv-B Identifiability of All Links

In fact, we can identify all links at the same time. It is sufficient to ensure that each link is identifiable, according to the conditions of Theorem IV.1. This is true in all directed trees, where each leaf node is either a source or a receiver, and each intermediate node satisfies the following mild conditions: (i) it has degree at least three (which is true in all logical topologies); (ii) it has in-degree at least one (otherwise, the node should be a source); and (iii) it has out-degree at least one (otherwise, the node should be a receiver).

###### Example 2

Table II lists which links are identifiable in the four bottom cases of Fig. 2, if we use our approach vs. if we use multicast tomography. All four configurations depict the same basic 5-link topology, but they differ in the choice of sources and receivers. Our approach is able to identify all links for any sets of sources and receivers. This is not always the case for the multicast tomography.

## V Tree topologies

In this Section, we consider tree topologies, and we describe our design choices in the four subproblems: we have already discussed identifiability in the previous Section. Next, we describe routing in Section V-A, probe and code design in Section V-B1 (operation of sources and intermediate nodes), and estimation algorithms in Sections V-BV-C, and V-D.

### V-a Routing, Selection of Sources and Receivers

Routing in trees is well defined: there exists a single path that connects a source to a receiver, through which probes flow. For a tree with leaf nodes, some leaves act as sources and the remaining leaves act as receivers . Intermediate nodes simply combine (XOR) the probes coming on all incoming links and forward (multicast) to all their outgoing links. This Section looks at situations where we may have some freedom in the choice of the nodes that act as sources and receivers. If such flexibility is not available (as it is assumed in most tomography work), this step can be skipped. We study the effect of the selection of sources and receivers on estimation accuracy and we come up with empirical guidelines for source selection, obtained through a number of examples and simulation scenarios.

In Example 2, we saw that, with network coding, all links are identifiable, while if we use two multicast trees, they are not. In Appendix B.2, we revisit the basic 5-link topology of Fig. 2 and we show that, even though with network coding links are identifiable for all four cases, the estimation accuracy differs depending on the number of sources and their relative positions in the tree. This idea also applies to larger topologies. For example, in [technicalReport], we consider a 9-link tree and we run simulations for different number and location of sources and we summarize the intuition obtained.

Link loss tomography is essentially a parameter estimation problem, and different choices of sources and receivers lead to different estimators. That is, for a fixed number of probes, each topology leads to a different estimation accuracy; put differently, to achieve the same mean square error (), we may need a different number of probes for each topology. In general, the optimal selection of the number and location of sources depends on the network topology, the values of link loss rates, and possibly the number of employed probes. This is currently an open problem.

### V-B Maximum Likelihood Estimation of All Link Loss Rates

In this Section, we focus on tree topologies and we develop an efficient maximum likelihood estimator to estimate all link loss rates from the observations at the receivers. In the special case where the topology is a multicast tree, i.e., probes are sent between one source and several receivers, an efficient ML estimator (MINC) has been designed in the pioneering paper [minc]. We build on MINC, and we extend it to multiple-source multiple-receiver trees, where multicast is used at all branching points and network coding is used at all joining points [netcod2011]. We propose Alg. 1 in Section V-B4, which provides an efficient way to compute the MLE of all links at the same time.

A key property that we formulate, prove, and extensively use in this Section, is reversibility, as discussed in Section III-C, and as we describe in detail in Section V-B2. In Section V-C, we also describe how to efficiently compute the MLE for a single link at a time (in both trees and general topologies). In Section V-D, we describe heuristic estimation algorithms, some of which apply to general topologies as well.

#### Model and framework

We first describe the model of tree networks for which we derive the MLE.

Logical Tree. We consider a tree topology, like the one depicted in Fig. 3, consisting of the set of nodes and the set of directed links. leaf nodes, shown on top of the tree, act as sources of probe packets. The remaining leaves, shown at the bottom of the tree, act as receivers. As typically assumed in tomography problems (as described in Section III), this is a “logical” tree topology, i.e., every intermediate node has degree at least three. An intermediate node is either a coding point (with multiple incoming links and one outgoing link) or a branching point (with one incoming link and multiple outgoing links). For each node , we denote the set of its parents (nodes with a link outgoing to ) by and the set of its children (nodes with a link coming from ) by . The source nodes have no parent and the receiver nodes have no children. are considered known and fixed throughout the experiments.

In this Section, we focus on the tree topology shown in Fig. 3, which has the property that all coding points are located above all branching points. This is actually a mild assumption: starting from an undirected tree, if one is allowed to choose the sources among the leaf nodes, then one can always ensure this property.8 Note that this tree model includes all cases in Fig. 2 (except for Case 3 in the 5-link topology, which is treated separately in Section V-C).

Operation of Sources. Each source sends a probe packet , which is a vector of length in the form of:

 xi=[M0,⋯,0,1i,0,⋯,0],i=1,2,⋯,M

Operation of Intermediate Nodes. Each coding point (bit-wise) XORs all packets it receives from its parents, and forwards the result to its child9. This very simple design effectively keeps the presence of each source orthogonal from every other source. This ensures versatility, in the sense that no matter which probe packets get XOR-ed, they will not cancel each other out. For most practical purposes, this simple probe design is sufficient: a single IP packet can be up to 1500B (including the headers) and thus, can accommodate roughly 12,000 probe sources (bits). In large networks, one can also spatially reuse probe packets by allocating the same probe packet to all sources whose packets do not meet. Finally, each branching point multicasts the packet it receives from its parent to all its children.

One can see that there will be a node after which flows thought the network. We denote this node by . Node is the last coding point in the tree. Node has parents , and only one child, which we denote by node . Node multicasts the packet it receivers from node to all its children .

We use the notation that , when is a descendant of , and that when is an ancestor of . Every node has multiple parents and only one child, while every node has one parent and multiple children. We are going to treat these two sets of nodes differently in the rest of Section V-B. We name any link of the tree that is above node by its starting point, and we name any link that is below node by its end point. In other words, link denotes a link between nodes if and , while link denotes a link between nodes if and .

Loss Model. As described in Section III, we model the loss rate of individual links by an i.i.d. Bernoulli process, independent across links. In particular, we use the following notation:

• A packet that traverses a link above node is lost with probability and arrives at node with probability .

• A packet that traverses a link below node is lost with probability and arrives at node with probability .

• Finally, we denote the loss rate of link by .

In general, we use the notation for any quantity .

Let denote the packet observed at node , and let , denote the set of all ’s. is a binary vector of length . Its element, , represents the probe packet of source : indicates that the probe packet of source reaches node , and 0 that it does not. For the sources, , thus and , . For any node , if for a parent of , with probability , and with probability , independently for all the parents of . For any node , if (the all-zero vector), then , for the children of (and hence for all descendants of ). If , then for a child of , with probability , and with probability , independently for all the children of .

Data, Likelihood, and Inference. As described in Section III-A, in each experiment, one probe is dispatched from each source. The outcome of a single experiment is a record of whether or not each source probe was received at each receiver, which is the set of vectors observed at receiver . It is denoted by and is an element of the space of all such outcomes. For a given set of link probabilities , the distribution of the outcomes on will be denoted by . The probability mass function for a single outcome is .

We perform experiments. The probability of independent observations (each ) is given by Eq.(1). Our task is to estimate using maximum likelihood, from the data . We work with the log-likelihood function given in Eq.(2). The MLE of the loss rates is the that maximizes , as given by Eq.(3).

#### The Likelihood Equation and its Solution

Candidates for the MLE are solutions of the likelihood equation:

 ∂L∂αk(α)=0,k∈V (8)

We need to define some additional variables to compute the MLEs. For each node , let be the set of outcomes such that for at least one source that is an ancestor of and for any arbitrary set of receivers . Let ; an estimate of can be computed from:

 ^γrk=∑x∈Ωr(k)^p(x),where^p(x)=n(x)n (9)

is the observed proportion of experiments with outcome . shows the probability of the set of outcomes in which link has definitely worked. Note that link may have worked for some other outcomes as well, but they are not included in . Also note that can be directly estimated from the observations at the receivers.

For each node , we define to be the set of outcomes such that for at least one receiver which is a descendant of . Let ; an estimate of is:

 ^γmk=∑x∈Ωm(k)^p(x) (10)

is the probability of the outcomes in which link has definitely worked; and it can be directly estimated from the observations at the receivers. Our goal is to compute from .

Special Case (i): Multicast Tree (MINC). If , the general model turns into a multicast tree with a single source, which is the case considered in [minc]. We represent the source node by . Each node other than the source node, has one parent , and a set of children. We denote the link loss rates by , where is the end point. We simply assume that .

The outcome of each experiment is , where each is a single binary value (instead of a binary vector of length in the general case), corresponding to whether the source probe is observed at each receiver or not. The state space of the observations is . We say that a link is at level if there is a chain of ancestors leading back to the source.

Only is used for each node in the multicast tree; it is the set of outcomes where for at least one receiver that is a descendant of . The definition of is like before.

The MLE for the multicast tree has been computed in [minc]: Let show the probability that the path from the source to node works, which we denote by . Its estimate can be computed as follows. For the source node, , for the leaf nodes , , and for all other nodes , is the unique solution in of:

 1−^γmk^Amk=∏j∈d(k)(1−^γmj^Amk) (11)

can then be computed from , i.e., , as follows:

 ^αk=^Amk^Amf(k),k∈V∖{0}(^α0=1) (12)

We refer to Eq.(12) as MINC in the rest of the paper.

Note. Eq.(11) is obtained from the following relations, after some computations in [minc], which we repeat here for completeness. Let denote the conditional probability of given that has observed something. Failure can be due to either (failure of link ), or all paths towards the destinations failing. Therefore, the obey the following recursion:

 ¯¯¯βmk=¯¯¯¯αk+αk∏j∈d(k)¯¯¯βmj,k∈V∖R (13)
 βmk=αk,k∈R (14)

Eq.(11) then follows from the following relation between and :

 γmk=βmklm(k)∏i=1αfi(k) (15)

Special Case (ii): Reverse Multicast Tree (RMINC). If , the general model turns into a reverse multicast tree with a single receiver, which we denote by . Each node other than has one child , and a set of parents. We denote link loss rates by , where is the starting point. We assume that .

The outcome of each experiment, , is a binary vector of length . Each of its elements, , represents whether the probe packet of source is observed at the receiver or not. The state space of the observations is . We say that a link is at level if there is a chain of descendants leading down to the receiver.

Only is used for each node in the reverse multicast tree; it is the set of outcomes where for at least one source that is an ancestor of . The definition of is like before.

The MLE for the reverse multicast tree is similar to the multicast tree. Let show the probability that the path from node to the receiver node works, which we denote by . Its estimate can be computed as follows. For the receiver node, , for the source nodes , , and for all other nodes , is the unique solution in of:

 1−^γrk^Ark=∏j∈f(k)(1−^γrj^Ark) (16)

We can then compute from , i.e., , as follows:

 ^αk=^Ark^Ard(k),k∈V∖{0}(^α0=1) (17)

We refer to Eq.(17) as RMINC in the rest of the paper.

Note. Eq.(16) results from the following relations. Let denote the conditional probability of given that the path from to the receiver works. We have that:

 ¯¯¯βrk=¯¯¯¯αk+αk∏j∈f(k)¯¯¯βrj,k∈V∖S (18)
 βrk=αk,k∈S (19)
 γrk=βrklr(k)∏i=1αdi(k) (20)

Comparison of MINC and RMINC. The reader will notice that the MLE for the multicast tree and the reverse multicast tree have the same functional form. This is a special case of the more general “reversibility” property, first observed in [globecom08]. Indeed, there is a 1-1 correspondence between the observable outcomes in the two cases; furthermore, the corresponding outcomes have the same probability, as a function of ’s, thus leading to the same MLE. In the following, we describe the reversibility property in more detail.

Reversibility – A Structural Property. Consider a tree topology with leaf nodes, some of which act as sources and the remaining ones, , act as receivers of probes. Routing from to is given (e.g., determined in the routing subproblem) and defines a direction on every link , along which probes flow.

###### Definition 3

We call the triplet ( a configuration.

We define as dual the configuration that results from reversing the orientation of all links in the network, and from having the sources become receivers, while the receivers act as sources. More formally:

###### Definition 4

Consider the original configuration . Consider the graph that has the same nodes but reversed edges, i.e., iff , and success rate , associated with every edge . Select sources and receivers . We call the the dual configuration of .

For example, a multicast tree is the dual configuration of a reverse multicast tree (Cases and in Fig. 2). In Appendix B, we show that the dual configurations of Fig. LABEL:fig_fischer_5(a) and Fig. LABEL:fig_fischer_5(b) result in the same mean square error bound. In fact, a closer look reveals that not only the values but also the functional forms of these two ML estimators coincide. The following theorem generalizes this notion to general trees.

###### Theorem V.1

Consider a configuration with observations at the receivers , and probability distribution . Consider its dual configuration , with observations and probability distribution . Then, there is a bijection between outcomes and their probabilities in the original and in the dual configuration .

{proof}

Let be the original tree graph, and its dual. In every experiment, there exist possible error events, depending on which subset of the links fail. Observing the outcomes at the receivers corresponds to observing unions of events, that occur with the corresponding probability (e.g., as in the example of Table I). We show that for each observable outcome, which occurs with probability in , there exists exactly one observable outcome that occurs with the same probability in and vice-versa. This establishes a bijection.

With every edge of , we can associate a set of sources that flow through this edge, and a set of receivers that observe the flow through . Our main observation is that the pair uniquely identifies , i.e., no other edge has the same pair. In the dual configuration , edge is uniquely identified by the pair . If in , edge fails while all other edges do not, the receivers will not receive the contribution in the probe packets of the sources . If in , edge fails while all other edges do not, the receivers will not receive the contribution in the probe packets of the sources . Thus, there is a one-to-one mapping between these events. Using this equivalence, an observable outcome consisting of a union of events can be mapped to an observable outcome in the reverse tree.

###### Corollary V.2

The maximum likelihood estimators for a configuration and its dual have the same functional form.

{proof}

The bijection established above implies that a configuration and its dual have the same set of observable outcomes, with the same probabilities. Therefore, they have the same likelihood function and thus, the same maximum likelihood estimator. We note that this corollary establishes reversibility only for the maximum likelihood estimation. The performance of suboptimal algorithms may differ when applied to a configuration and its dual.

A note on directional networks. It is also important to note that the notion of dual configurations does not assume that the loss rates in both directions of a link are the same. Reversibility means that the two ML estimators for a configuration and its dual are described by the same function. However, the loss parameters we try to estimate (using the same estimator function) in the two directions may have different values.

#### Maximum Likelihood Estimation of Loss Rates

We now present how to “reduce” the original tree to a multicast and to a reverse multicast tree, and how to estimate . These intermediate results are then used in the MLE algorithm in Section V-B4.

Reduction to a Multicast Tree (m). If we take the upper part of the original tree in Fig. 3 and consider it as an aggregate link, we obtain the reduced multicast tree in Fig. 4(a). The aggregate link summarizes the operation of all links above node and link . Node receives a packet if at least one path from the sources to node works and link works. In other words, the success probability of the aggregate link, , depends on the paths from the sources to node , and also link .

More formally, we map the outcomes of the original tree to the outcomes of the multicast tree, as follows. Each is a set of binary vectors, each of length , while each is a single binary vector of length . Any outcome is obtained by taking a set of outcomes , in all of which the same receivers have observed all-zero vectors10 and the same receivers have observed non-zero vectors, and by replacing each non-zero vector (that may contain any of the source probes ) by value 1, and each all-zero vector by value 0. I.e.:

 ∑xRt≠[0,0,⋯,0],xRt′=[0,0,⋯,0]n(x)=nm(xm),xmRt=1,xmRt′=0,t,t′∈{1,⋯N},t≠t′ (21)

If the original tree has link success rates and an associated probability distribution of outcomes , then the multicast tree is defined with parameters and associated probability distribution , such that:

 αmk=αk,k

can be directly calculated from , since each event in