Fitting a Model to Data in Loss Tomography
Abstract
Loss tomography has received considerable attention in recent years and a number of estimators have been proposed. Although most of the estimators claim to be the maximum likelihood estimators, the claim is only partially true since the maximum likelihood estimate can be obtained at most for a class of data sets. Unfortunately, few people are aware of this restriction that leads to a misconception that an estimator is applicable to all data sets as far as it returns a unique solution. To correct this, we in this paper point out the risk of this misconception and illustrate the inconsistency between data and model in the most influential estimators. To ensure the model used in estimation consistent with the data collected from an experiment, the data sets used in estimation are divided into 4 classes according to the characteristics of observations. Based on the classification, the validity of an estimator is defined and the validity of the most influential estimators is evaluated. In addition, a number of estimators are proposed, one for a class of data sets that have been overlooked. Further, a general estimator is proposed that is applicable to all data classes. The discussion starts from the tree topology and end at the general topology.
Applicability, Datadriven modeling, Intersection, Partition, Loss tomography.
1 Introduction
Network measurement has received considerable attention in recent years since it can not only provide necessary information for network modeling but also verify some of the assumptions made on ad hoc basis for network modeling. In contrast to direct measurement that is only suitable for a small network, network tomography is proposed for a large network [1]. Network tomography, as a methodology, differs from direct measurement in many ways, and the most important one is the use of statistical inference to accomplish the task that involves probing, modeling, and estimation. As a new methodology, a large number of works have been carried out in recent years and the results reported so far cover loss tomography [2], [3], [4], [5], [6], delay tomography [7], [8], [9], [10], [11], loss pattern tomography [12], sourcedestination traffic matrix [13], and shared congestion flows [14]. Despite the overwhelming enthusiasm and a wealth of publications in this area, network tomography is in its infancy and there is more work that needs to be done. In addition, some of the characteristics that seems to be fully investigated still require further studies. For instance, although loss tomography has been studied for more than 10 years and the likelihood equation of the tree topology has been available for over 10 years [2], few are aware that the likelihood equation is restricted to a specific class of data. That is also true to almost all other estimators proposed in the past. Using those estimators without checking data type may result in unexpected consequence. This paper is devoted to address the validity of an estimator in the context of data classes and propose estimators for those unidentified data classes.
The fundamental principle of loss tomography is built on statistical inference, where parametric estimation is frequently used to fit a statistical model to data (observation), and then the maximum likelihood estimation is used to find the unknown parameters of the model. Unfortunately, the principle has been partially overlooked by almost all of the previous works. In addition, the few exceptions, such as [2] and [6], that consider this issue take a different approach to handle this. Rather than fitting a model to data, the exceptions go in the opposite direction and try to fit data to a predefined model. If a data set does not fit the model, the data set is either discarded or skipped. Also, the discussions presented in [2], [6] are far from complete, which fail to identify all unsuitable data sets and, of course, fail to propose alternative likelihood equations for the unidentified data sets. Furthermore, there has been no discussion of the validity of an estimator (a likelihood equation) in regard to data sets. Without knowing those, an estimator may be mistakenly used on a data set to that it does not fit and returns an incorrect estimate of the unknown parameter(s). The incorrect estimate can be very different from the correct one that invalidates the estimate from being used in traffic control or modeling. Apart from presenting the problems of previous works, we also provide solutions, where a number of estimators, one for a data class, are proposed in this paper to overcome the problems. More, a general estimator applicable to all type of data is proposed in this paper. More over, the discussion covers both tree and general topologies.
Previous works on loss tomography were focused on proposing estimators and proving there is a unique solution from the proposed estimator. If an estimator cannot return a unique solution or cannot return a solution, the data set used in the estimation is considered inconsistent with the estimator and skipped or discarded [2] [5]. Although it is fundamentally important in terms of identifiability to make sure a unique solution from a likelihood equation, we must know that the likelihood equation is from a likelihood function and the likelihood function is from a data set. Only if the likelihood equation used in estimation fits the nature of the data set, is the unique solution obtained from the equation the maximum likelihood estimate (MLE), if the maximum likelihood principle is used in the estimation. To remedy this, data obtained from experiments are divided into 4 exclusive classes according to the characteristics of intersection and partition in observation. The 4 classes are called perfect, chainedonly, partitioned, and chain partitioned, respectively. Each class requires a likelihood equation and the likelihood equations proposed in [2] [5] only suit to the data sets of the perfect class. Although the perfect class is the most likely scenario among the 4 classes, in particular when the number of probes sent to receivers approaches infinite, i.e. , other scenarios do occur from time to time if .
1.1 Contribution and Paper Organization
Data consistency raised in [2] aims to eliminate 3 types of data from estimation since the likelihood equation proposed in that paper cannot find a solution for the 3 types of data. Nevertheless, [2] falls short of considering whether a unique solution returned by the likelihood equation is the MLE. In other words, the unique solution from a likelihood equation is only the necessary condition of the MLE, while the sufficient condition requires the likelihood equation fits the data set. To make this possible, we in this paper present the relationship between data and likelihood equations, and emphasize that MLE can only be obtained if and only if a likelihood equation fits to the data and there is a unique solution to the likelihood equation. The contribution of this paper can be divided into two parts: identify problems and find solutions, which are detailed as follows:

To improve our understanding of statistical inference in loss tomography, this paper reiterates the importance of the statistical principle of fitting a model to data in the context of loss tomography. It points out that an estimator is only valid if the model used by the estimator fits the data collected from observation. It further examines the validity of the most influential estimators proposed for loss tomography and identifies the pitfalls of the estimators.

To solve the problems, data used in estimation are divided into 4 classes on the basis of intersection and partition in the observation of descendants. The estimators proposed so far only fit to one of the 4 classes. Then, a number of estimators are proposed for the other 3 classes that have been overlooked, including a general estimator that is able to handle all data classes.
The rest of the paper is organized as follows. In Section 2 we present the essential background, including the notations and statistics used in this paper. In Section 3, we present the problems existed in the most influential estimators in details. Section 4 provides the solutions to the problems presented in Section 3 for the tree topology. Section 5 extends the solutions obtained from the tree topology to the general topology. The last section is devoted to concluding remark.
2 Notations and Related Works
The two most influential works in loss tomography, one for the tree topology [2] and the other for the general topology [5], are introduced in this section, where the latter is developed on top of the former. Because of the relation, both have the same restriction to the data used in estimation.
2.1 Notation
To assist the following discussion, the symbols used in this paper and their definitions are introduced briefly in this section. For those who wants to know the details, please refer to [5].
To collect information from a large network, a number of probes are multicast from a number of sources located on one side of the network to a number of receivers located on the other side of the network. The paths from sources to receivers cover the links of interest. If there is only a single source, the paths from the source to receiver forms a special tree, called multicast tree, where the root only has a child. Let donate the multicast tree, where is a set of nodes representing routers and switches of a network; is a set of directed links connecting the nodes of ; and is an melement vector, one for a link to describe the loss rate of the link. is used to denote all receivers. As a hierarchical structure, each node in a tree except the root has a parent. Each node except leaf ones has a number of descendants. Let denote the descendants of node and denote the number of descendants in . Further, each multicast subtree is named by the number assigned to the child node of the root, where denotes the multicast subtree rooted at node , where , and are the nodes, links and parameters of the subtree. Note that a multicast subtree is different from an ordinary one, where multicast subtree is rooted at node that uses link to connect subtree . The group of receivers attached to is denoted by . If probes are dispatched from the source, each probe gives rise of an independent realization of the loss process , if probe passes link ; otherwise . The observation of and comprise the data set for inference. Therefore, observations are also called data or data sets in the following discussion.
To estimate the loss rates of a tree topology, a set of sufficient statistics is introduced in [5] one for a node to denote the number of probes reaching node confirmed from observation of , . In addition, let denote the number of probes that are observed simultaneously by at least one receiver attached to subtree and at least one receiver attached to subtree . Similarly, denotes the number of probes observed simultaneously by the receivers attached to subtrees , and . Furthermore, we can have statistics to count the number of probes observed simultaneously by more descendants. This process continues until all descendants are included, i.e., .
Given , we have a algebra on , and a measure on . Then, a measurable space, , is established for each node to obtain the statistics used in estimation and to divide observations into classes, where counts the number of probes observed simultaneously by the members of . Note that .
In contrast to the tree topology, there are multiple intersected trees in a general network. Then, the nodes located in a shared area, called shared segment, can observe probes sent by multiple sources. To accommodate multiple sources, the notations defined above need to be extended to consider the sources. Therefore, an extra symbol in most cases is added to the corresponding notations defined above to represent the source. For instance, denotes the number of probes sent by source passing link , denotes the observation of for the probes sent by source , where is the number of probes sent by . Note that link instead of node is used as the reference in the general topology since there is no longer 1to1 mapping between nodes and links in the general topology.
2.2 Related Works
Multicast Inference of Network Characters (MINC) is the pioneer of using multicast probes to create correlated observations at the receivers of the tree topology [2], [15], [16], where a Bernoulli model is used to describe the loss process of a link. Using this model, the authors of [2] derive a direct expression of the pass rate of a path connecting the source to an internal node as follows:
(1) 
Using empirical probability and to replace and , (1) becomes a singlevariable polynomial that has roots according to the fundamental theorem of algebra. However, we are only interested in the roots falling into the support of , i.e.. The lemma 1 of [2] proves there is a unique solution to (1) in the support of if . In addition, three extreme cases are identified and ruled out from estimation since there is no solution to the likelihood equation if the data set used for estimation falls into the three cases.
Recently, Zhu proposes an analytical solution to the general topology [6], where the likelihood equation is as follows:
(2) 
where is the set of sources sending probes to node and is one of the sources. It is also proved that (1) is a special case of (2). As [2], [6] also discusses the data consistent problem in the same line as its predecessor for the general topology. In the paper, Zhu points out the difference between the tree topology and the general one in terms of data consistency. Despite this, the paper as its predecessor treats the data falling into the 3 types as exceptions and eliminate them from estimation.
3 Problem Formulation
As stated, previous works fail to consider the impact of data on likelihood equations. To illustrate the impact, we examine the likelihood equations proposed in [2] and [6] with imperfect data to calibrate their validity in this section.
3.1 Statistical Implication
Given observation , a likelihood function is constructed as a probability measurement where is a variable. The maximum likelihood principle proposed by Fisher aims to find the that can maximize . The structure of the likelihood function depends on , so does the likelihood equation since it is derived from the likelihood function. Thus, the likelihood equation as a statistical model connects some random variables to the others and expresses the relation between the variables. The relation can be analyzed on the basis of matching a model to data. Taking (1) as an example, the both sides of the equation denotes the loss rate of subtree , where the left hand side (LHS) uses the data obtained from observation directly to express the loss rate of subtree while the right hand side (RHS) uses the probability reasoning to achieve the same. The LHS can be viewed as the data and the RHS as the model. The correspondence between data and model becomes obvious if we expand the both sides of (1), where the LHS is as follows:
(3)  
which is constructed from . It is easy to prove
(4) 
since and the values of the terms on the RHS of (3) monotonically decrease from left to right, i.e. a term in a left summation is larger than a term in a right summation whose subscript has one more number than its left correspondent. Thus, .
In contrast to the LHS, the RHS of (1) is built on the frequentist view that expresses the loss rate of subtree by the product of the loss rates of the subtrees rooted at node . Expanding the RHS, we have
(5) 
Deducting 1 and multiplying on both (3) and (5), one is able to notice the 1to1 correspondence between the terms of (3) and that of (5). The correspondence reflects that the MLE can only be achieved if the RHS (model) matches the LHS (data).
The above discussion unveils that if the LHS of (1) matches the RHS term by term, the solution obtained from the equation is the MLE. The model used by (1) is based on the assumption that every term of (3) exists. Note that matching the RHS to the LHS, term by term, is also the condition that (1) holds. With the increase of probes, the variation of the estimate decreases according to Fisher Information. This is also reflected on the corresponding terms. As , one can even use a single pair of the correspondences to form an explicit estimator. For instance, the estimator proposed in [17] is based on the last pair of the correspondences. However, if , some terms on the LHS may not exist, and then we must consider:
3.2 No Solution because of Invalidity
Recall lemma 1 of [2] that states there is a unique solution to (1) in the support of if . This condition in practice means that there is at least an intersection in the observations of the descendants of node k, mathematically the condition can be written as . Note that lemma 1 does not ensure the solution is the MLE of .
In contrast, if
there must have . We call this complete mutual exclusion at node . Once the complete mutual exclusion occurs at node , there is no correlated information about in observation. Thus, lemma 1 of [2] concludes there is no solution to (1). This can be explained either by only considering equation (1) or by considering the validity of equation (1). [2] takes the former and considers (1) a concave function that does not intersect with the axis in . We take the latter and consider if the complete mutual exclusion occurs at node , the loss rate of subtree is equal to instead of . This means that given the complete mutual exclusion, (1) no longer holds, let alone a solution.
3.3 Incorrect Solution because of Invalidity
As stated, if or , some of the terms on the LHS of (1) may not exist, however, their counterparts on the RHS do as long as . If so, there is at least a mismatch between the LHS and the RHS. Then, the unique solution obtained from (1) may not be the MLE. We call this partial mutual exclusion, mathematically
As the complete mutual exclusion, (1) does not hold if a partial mutual exclusion occurs in observation. For instance, assume node has 3 descendants, a, b, and c, , , , and . Putting the available information into the expansion of (1), we have
(6)  
Although there is a unique solution to (6), the solution is certainly not the MLE of since the data and model are obviously mismatched. In fact, the observation of does not provide any information for and should not be considered. If we remove the terms related to subtree c, we have
where the model fits the data.
As the previous subsection, (1) is no longer valid if there is a mutual exclusion in observation. Then, the following questions are emerged:

how many types of exclusions exist in observation?

is there an estimator applicable to all types of observations?
We will address those issues in the next section.
4 Data Classification and Solutions
4.1 Classification of Data Set
The discussion presented in the last section unveils the impact of observations on estimation and details the incompletion of previous works on loss tomography. Considering the variation in observation, we propose a new strategy in loss tomography to match a model to data. To make this possible, we divide the data sets used in estimation into a number of classes on the basis of intersection and partition in the observations of descendants and introduces a number of models, one for a class of data. The classification is presented in Table 1, where

the perfect class denotes the data sets that satisfies the follows:

the chainedonly is for the data sets that are not in the perfect class, but the observations of the descendants cannot be divided into two exclusive groups, i.e. and
if 
in contrast to the chainonly, the partition only is for such mutual exclusions that the observation of can be divided into a number of exclusive partitions and at least one partition has more than 2 descendants. Within a multidescendants partition, the observation is perfect.

the chained partition class is for the data sets that combine the characteristics of the above two classes, i.e. the observation can be divided into a number of exclusive partitions and at least one partition has more than 2 members, and in a multimember partition, its observation falls into the chained class.
Figure 1 illustrates the four classes, where each circle is for the observation of a descendant. Among the subfigures, (a) is for the perfect class, (b) for the chainedonly class, (c) for the partition class, and (d) for the chained partition class.
4.2 Prefect Observation
As stated, most of the likelihood equations proposed previously do not consider the variation of observations. With the introduction of data classes, the likelihood equations proposed in the past need to be calibrated to find their applicability in regard to the data classes. The equations that are of concern in this paper is (1) for the tree topology and (2) for the general topology because they are the most influential one in each of the topologies. The following theorem provides the validity of (1).
Theorem 1.
The estimate obtained from (1) is the MLE iff the data used in estimation falls into the perfect class.
The estimate obtained from (1) has been proved to be the MLE [2], where mutual exclusions in observations have not been considered. On the basis of the analysis presented in the last section, the RHS of (1) contains all possible terms of correlations, from pairs of descendants to the product of all descendants. This requires the LHS to have all correspondent terms to match the RHS. Therefore, there must not have any form of mutual exclusion in observation. On the other hand, if there is a mutual exclusion in observation between the observations of siblings, (1) no longer holds. Then, the theorem follows.
Another estimator is proposed recently in [5] to tackle the use of approximation to find the solution of (1) if a node has 5 or more descendants. [5] proposes an equivalent transformation to turn (1) into a linear function by merging the descendants into two groups. The transformation takes advantage of the selfsimilarity of (1), and derives the follows:
(7)  
where and denote the two groups which satisfy and . Further, and can be considered two virtual descendants of node and each connects the descendants of its group. Note that the derivation of (7) takes advantage of theorem 1, where it is assumed the observations of and to fall into the perfect class. If , it is easy to prove there is a unique solution and the solution can be obtained analytically. On the other hand, if there is a unique solution to (1), the same solution should be obtained from (7) no matter how the two groups are constructed. This requires . If , the can be selected as one of the two groups. However, the itself is not in the perfect class. Thus, the assumption made previously does not hold, and neither does (7). The following corollary provides the detail.
Corollary 1.
If a data set belongs to the perfect class, the maximum likelihood estimate can be obtained from (7).
4.3 Chained Only
As stated, (1) is only valid to the perfect data sets. If the data set obtained from an experiment falls into the chainedonly class, a new likelihood equation is needed that can be obtained by removing some of the terms from the RHS of (1) that correspond . Let denote the set of the terms that need to be removed; if , denotes the number of descendants involved in the term. Then, the likelihood equation takes the following form
(8) 
where the summation on the RHS is for the terms that need to be removed from the first term on the RHS.
Given (8) as the likelihood equation, the next question is whether there is a unique solution to it in the support of . The following lemma answers the question.
Lemma 1.
Let C be the set of with and . The equation of has a unique solution if the summation term is a part of the product one. Moreover, is continuously differentiable on C.
See appendix The lemma extends from lemma 1 of [2], (8), as (1), is a polynomial with a degree that is lower than that of (1) since at least . Using the lemma, we can prove the solution to (8) is the MLE.
If the chained observations of can be divided into two exclusive groups, and , where the observations of fall into the perfect class, and the observations of are exclusive partitioned, i.e. there is no intersection in the observations of any two descendants of . In addition, the observation of each descendant of is intersected with all of , (7) can be used here to obtain the MLE since . For instance, if node has 3 descendants, , , and , the observations of the 3 descendants belong to the chainedonly class, where , , and . If and are in and is in , we have the MLE from (7), where
Hence, merging the descendants having exclusive observations into is equal to remove those terms that do not occur in the data part from the model part. Since each descendants in is intersected with every one of , timing the statistic of to that of maintains the correspondences between data and model.
The above discussion shows that by appropriate grouping, (7) can obtain the MLE for some of the data sets falling into chained class. However, if a given data set cannot be divided into two groups as above, the estimate obtained by (7) is not the MLE. Despite this, the estimate obtained by (7) is still a little better than that obtained by (1). Using the previous example and assume , , and . In this case, the observations cannot be merged into two groups as above. If we merge b and c, and putting the statistics into the expansion of (7), we have
(9) 
As (6), there is an inconsistence between the LHS and the RHS of (9). Then, the estimate obtained from the equation is not the MLE either despite that the error here is smaller than that of (6). On the other hand, if merging and , we cannot even have a solution since (7) fails to hold.
4.4 PartitionOnly
For the simplicity reason, the discussion is started from the observation of that consists of two exclusive partitions, and then the discussion is extended to multiple partitions.
Let and be the two partitions. Thus, the LHS of the likelihood equation is equal to 1 minus the sum of the pass rates of and since their observations are mutual exclusive, and the RHS of the equation according to (8) is equal to deducting those terms that involve the members of the two partitions from the terms of the perfect class. The likelihood equation is given in theorem 2.
Theorem 2.
For a network of the tree topology, if the observation of is partitioned into two exclusive parts, the likelihood equation of the observation is as follows:
(10) 
Let denote the terms that need to be removed from (1) because of the mutual exclusion. Then, according to (8), we have
Removing is equivalent to divide the descendants into two groups according to the mutual exclusion, and add the terms of one group into another. Rearranging the terms of the above, we have (10).
Given (10), we have a polynomial with the degree one less than that of the number of descendants in the larger exclusive group. If the degree is larger or equal to 5, there is no closed form solution to the polynomial unless some of the descendants can be merged when we estimate the pass rate from the source to their parent. Fortunately, this is achievable since the observation of each group is perfect. Then, (7) can be used on each partition to turn (10) into a linear equation. Then, a closed form solution follows.
If the observation of node , , is divided into Q exclusive partitions (Q2), we have the following theorem for its likelihood equation.
Theorem 3.
If the observations of are divided into Q partitions, the likelihood equation is
(11) 
If the observation of is divided into Q exclusive partitions, it is equivalent to have Q independent subtrees connected to node and the pass rate of subtree is equal to the sum of the pass rates of the Q independent subtrees. For each of the subtrees, there is a likelihood equation as
(12) 
where .
Because of the independence in the observation of the Q partitions, the likelihood equation is equal to the sum of the likelihood equations of the Q independent ones. Then, the theorem follows.
We can also prove (11) from the likelihood function constructed directly from the observation. Given the partitioned data set, the loglikelihood function of the observation in respect to can be written as
Differentiating it with respect to and letting the derivative 0, we have
Solving it, we have
Then, we have
(13) 
Note that if the observation of falls into the perfect class, we have
Using the above to substitute from (13) and rearranging the terms afterwards, we have (11).
Since (12) is a concave function, (11) is a concave function because (11) is a sum of (12). In addition, there is a common support for each of the member functions of (11). Then, (11) has a maximum point in that can be obtained directly by
where (7) is used in each of the exclusive partitions to divide the descendants into two groups and merge their statistics, where , , is the number of probes that observed simultaneously by the receivers of the two groups of the exclusive partion. In addition, and are the empirical pass rates from the source to the two groups of the exclusive partition, respectively.
4.5 Chained Partition
As defined, a data set falling in this class can be divided into a number of exclusive partitions, each partition consists of the observation of a number of descendants. In addition, the observation of a partition that has more than 2 members is not in the perfect class but chained. Thus, a new likelihood equations is needed for this class of data and the equation must combine the feature of the likelihood equations proposed for the chainedonly and partition only classes. Since the observation of a partition is not in the perfect class, the likelihood equation for the partition is in the form of (8). If there are Q () partitions, since the observations between partitions are exclusive, the likelihood equation for this class is in the form of
4.6 Complete Mutual Exclusion
Given Theorem 3, we know if there is no intersection between the observations of the receivers attached to the subtrees rooted at node , the observations of the descendants do not provide any information about the path connecting the source to their parent. Thus, there is no need to add the correlation between them into the model part of (1). The RHS of (1) is equal to
(15) 
Using empirical probability to replace from (15), (1) is collapsed and there is no solution of course.
To solve the problem, we can either sending more probes to break the tie or use bootstraps to produce some synthetical probes to create intersections. Then, we can use an appropriate likelihood equation to estimate .
4.7 Independent Path
During the presentation, we always assume that either there are at least 2 descendants in a partition or at least there is a partition that has 2 descendants of node . Without this assumption, a data set may be in the class of complete mutual exclusion. Under this assumption, if each partition has more than 2 descendants, the pass rate from the source to the parent of the descendants can be estimated independently and the total pass rate is equal to the sum of the pass rates of the partitions.
Although using Theorem 3, we are able to obtain the same solution as that obtained from (1) and revive the estimator based on the equivalent transformation. A new question is emerged that is whether an estimator needs to consider a partition that only has a single descendant. The following theorem provides the answer to this question.
Theorem 4.
The observation of singledescendant groups has no impact on the estimation of the pass rate of the path connection the source to the parent of the descendant.
According to the condition, (11) is the likelihood equation fitting the data. For this case, a number of identical terms can occur in the summation of the LHS and RHS of the equation, one for the loss rate of a singledescendant group. Those terms cancel each other without effect to the estimation. Then, the theorem follows. For instance, in the previous example descendant is independent from and . Based on theorem 4, c is removed from estimation and we have
(16) 
Using lemma 2, we have
Simplifying the above, we have (16). Theorem 4 also explains why there is no solution for the data set falling into the complete mutual exclusive group. This is because the observation of each descendant has its own partition, and then each of them is canceled from (11) that leads to the collapsed of (11).
5 Multisources
In the general topology, a node may have more than one parents, thus a node may observe probes sent by multiple sources. Because of this, estimating the pass rate of a link must consider the probes sent from all related sources regardless the probes pass the link of interest or only pass its ancestors. Therefore, (2) is the likelihood equation of for a path connecting source to node regardless node is a joint node or not, where a joint node is such a node that has more than one parents [6]. The difference between (2) and (1) is at those nodes that can receive probes from multiple sources; where the former considers all probes sent by related sources to node and uses
(17) 
as the empirical pass rate of link j, while the latter has
Note that in (17) is the parent node of link and is the set of sources sending probes to node . Despite the differences between the likelihood equations of the tree and the general topologies, both take into account all probes reaching the end node of the path of interest. More importantly, the principle of fitting a model to data becomes more obvious in the general topology than that in the tree topology. If we consider the both sides of (2) data and model, respectively, the LHS of (2) as the data is from the observation of a single source, called individual observation; while the RHS as the model considers the probes sent by multiple sources, called global observation. As previously proved, the MLE can be obtained if and only if the RHS matches the LHS and there is a unique solution to (2).
Considering fitting a model to data in the general topology, we can use the same classification as those defined in the tree topology to divide data sets into 4 classes. Since the nodes in a general topology can have multiple parents, even a node that has only a parent can have multiple ancestors located on different paths to the node, the nodes are divided into 3 types: single parent and single source nodes (single node), multiple parents nodes (joint nodes), and single parent and multiple sources nodes (shared nodes). For the single parent and single source nodes, the methods proposed for the tree topology can be used to estimate the loss rate of the path connecting the source to the node given the loss rates of the subtrees rooted from the node, in particular if there are shared segments in the subtree. Our focus is on the joint nodes because single nodes need to know the pass rate of the shared subtrees and a shared node can be considered a special joint node. The difference between a joint node and a shared one is on the paths connecting the sources to the node, where the former has distinguished paths and the latter has a shared part of the paths. Therefore, they can be handled in the same manner in terms of estimation. In addition, if we use the divideandconquer approach proposed in [5] to divide a general topology into a number of trees, there is no need to consider the shared nodes separately.
5.1 Joint Node
For a joint node, say , there are up to likelihood equations, one for a path connecting a source to the node.
Since all the paths connect to an ordinary subtree rooted at node , we need to have a unique pass rate for the subtree that can maximize the likelihood function constructed from observation. As previously analyzed, each of the likelihood equations is determined by observation. (2) as (1) holds if and only if observation is in the perfect class, i.e. is in the perfect class. In this case, the likelihood equations are equivalent to each other in regard to the shared subtree and the LHS of (2) matches its RHS. Knowing the pass rate of one of the paths, say from , the pass rate of another path, say from , can be obtained easily by
since the likelihood equations are in the form of . Given the fact that subtree is a common part of the paths from the sources of to , the pass rates of the paths from the sources to node are proportional to the pass rates from the source to .
If is not in the perfect class, the situation becomes more complex than that of the tree topology and needs to be analyzed further. To assist the following discussion, we divide the observations of a shared subtree into 4 classes on the basis of individual and global observation and present them in Table 2. The global observation of node is defined as
Apart from the (perfect, perfect) class, we need to consider data consistency again in a different way from those defined in [2] and [5]. The previous concern is focused on the consistency between data and model and the approach used is to eliminate those data sets that is inconsistent with the model. Here, the consistency has been extended to consider the consistency between individual observations, and the consistency between an individual observation and a global one. With multiple sources sending probes to receivers, each source creates its own individual observation that is the view of the source on the shared segment. The views created by different sources can be different from each other although when the same views are expected. However, when , different views may occur that make estimation impossible since there is a lack of a consistent model.
For the data set in the (others, perfect) class, although the data created by different sources compensate each other to create a perfect view, the individual data are not consistent with the global one that implies different models for the likelihood equations. Because of this, estimation cannot proceed and we need to send more probes until the data falls into (perfect, perfect) class or skip the estimation. This also apply to the data of (others, others) class since if is not consistent with , the likelihood equations are different from one another.
The (others, identical others ) class is designated to the observation: are identical in terms of correlation. Hence, if the individual observations are identical to the global one in terms of correlation, a model that is consistent with the data can be created, so does a consistent likelihood equation for every source. This extends (2) to cover imperfect data in some degree and the following theorem presents the likelihood equation for the pass rate of a shared subtree.
Theorem 5.
Given data in the (others, identical others ) class, the likelihood equation of the pass rate of the shared subtree is as follows:
(18) 
where is the ratio between the number of probes reaching and the number of probes reaching node as (17) and as defined in the tree topology, denotes the terms that need to be removed from likelihood equation (see proof for detail).
Let that is the pass rate of the shared subtree. If is not perfect, the corresponding terms on the RHS of (2) should be removed as those discussed in the tree topology. Let the terms be for that is a function of . We have a likelihood equation as follows for each source:
There are equations as above, one for a source. Adding the equations together, we have the theorem. As previous, we are able to prove the solution space is concave and there is a unique solution in the support of the pass rate. Then, the solution is the MLE of the pass rate of the shared subtree.
Given the pass rate , the loss rate of the path connecting a source to node can be obtained directly by since (18) is a polynomial, if its degree is 5 or higher, there is a lack of explicit methods so far to solve (18) other than approximation. To minimize the use of approximation, we can use the divideandconquer approach proposed in [6] here to break a general network into a number of trees, where (2) or (18) is used on each joint node to have the MLEs of the number of probes reaching the joint node. With the numbers, a general network can be divided into a number of trees and the methods proposed in the previous section can be used to obtain the MLE of each path.
6 Conclusion
Loss tomography is built on statistical inference that requires a correct model to describe the observation received from an experiment. The model must match the nature of the data. Nevertheless, the dependency of a model on a set of data has been either overlooked or misunderstood that leads to a misconception that an estimator is applicable to all sorts of data sets as far as it returns a unique solution. Within this paper we attempt to correct this and consider the validity of an estimator that demands a match between data and model in estimation. We then revisit two of the most influential estimators proposed previously and find that they, as those estimators proposed previously, at most are the maximum likelihood estimator for a type of data only. To overcome this, fitting a model to data has been emphasized in this paper and the necessary and sufficient condition of the maximum likelihood estimator is presented in this paper that require 1) a likelihood equation matching a model to the data; and 2) a unique solution to the likelihood equation. The necessary and sufficient conditions indicates that in order to obtain a MLE, we need to use all available information in observation, eliminate redundant information, and match a model to the data. To generalize the results, data obtained from experiments are divided into 4 classes, and 4 likelihood equations are presented in this paper, one for a data class. Apart from the tree topology, this issue is also considered for the general topology, where data consistency has been extended to consider the difference between individual views and between an individual view and the global view. The connection and relation between them have been analyzed and an estimator is proposed for the case of identical individual views.
Appendix
Lemma 1 {IEEEproof} Let , , and , we have , , and . Let , we have , , and , if . Note that is a small part of that have two or more timed together. Let , that is strictly concave on .
References
 Y. Vardi. Network Tomography: Estimating SourceDestination Traffic Intensities from Link Data. Journal of Amer. Stat. Assoc.
 R. Cáceres, N.G. Duffield, J. Horowitz, and D. Towsley. Multicastbased inference of networkinternal loss characteristics. IEEE Trans. on Information Theory, 45, 1999.
 F. LoPresti & D.Towsley N.G. Duffield, J. Horowitz. Multicast topology inference from measured endtoend loss. IEEE Trans. Inform. Theory, 48, Jan. 2002.
 W. Zhu and Z. Geng. A bottom up inference of loss rate. Computer Communications, 28, 2005.
 W. Zhu and K. Deng. Loss tomography from tree topology to general topology. Submitted for Publication, 2009.
 W. Zhu. Loss rate inference in multisource and multicastbased general topologies. Submitted for Publication, 2009.
 G. Liang and B. Yu. Maximum pseudo likelihood estimation in network tomography. IEEE trans. on Signal Processing, 51(8), 2003.
 Y. Tsang, M. Coates, and R. Nowak. Network delay tomography. IEEE Trans on Signal Processing, (8), 2003.
 F.L. Presti, N.G. Duffield, J. Horwitz, and D. Towsley. Multicastbased inference of networkinternal delay distribution. IEEE/ACM Trans. on Networking, (6), 2002.
 MengFu Shih and Alfred O. Hero III. Unicastbased inference of network link delay distributions with finite mixture models. IEEE Trans on Signal Processing, (8), 2003.
 E. Lawrence, G. Michailidis, and V. Nair. Network delay tomography using flexicast experiments. Journal of Royal Statist. Soc, (Part5), 2006.
 V. Arya, N.G. Duffield, and D. Veitch. Multicast inference of temporal loss characteristics. Performance Evaluation, (912), 2007.
 G. Liang, N. Taft, and B. Yu. A fast lightweight approach to origindestination ip traffic estimation using partial measurements. IEEE/ACM trans. on Networking, 14(6), 2006.
 D. Rubenstein, J. Kurose, and D. Towsley. Detecting Shared Congestion of Flows via EndtoEnd Measurement. IEEE/ACM trans. on Networking, 10(3), 2002.
 R. Cáceres, N.G. Duffield, S.B. Moon, and D. Towsley. Inference of Internal Loss Rates in the MBone . In IEEE/ISOC Global Internet’99, 1999.
 R. Cáceres, N.G. Duffield, S.B. Moon, and D. Towsley. Inferring linklevel performance from endtoend multicast measurements. Technical report, University of Massachusetts, 1999.
 N. Duffield, J. Horowitz, F. Presti, and D. Towsley. Explicit loss inference in multicast tomography. IEEE Trans. on Information Theory, 52(8), Aug., 2006.