A Stochastic Kaczmarz Algorithm for Network Tomography\thanksreffootnoteinfo
Abstract
We develop a stochastic approximation version of the classical Kaczmarz algorithm that is incremental in nature and takes as input noisy real time data. Our analysis shows that with probability one it mimics the behavior of the original scheme: starting from the same initial point, our algorithm and the corresponding deterministic Kaczmarz algorithm converge to precisely the same point. The motivation for this work comes from network tomography where network parameters are to be estimated based upon end–to–end measurements. Numerical examples via Matlab based simulations demonstrate the efficacy of the algorithm.
TIFR]Gugan Thoppe, IITB]Vivek Borkar, IITB]D. Manjunath
School of Technology and Computer Science, Tata Institute of Fundamental Research, Mumbai 400005, India
Department of Electrical Engineering, Indian Institute of Technology, Powai, Mumbai 400076, India
Key words: Kaczmarz algorithm; Stochastic approximation; Network tomography; Online algorithm.
The authors thank the referees for their constructive comments that have helped us to improve the article significantly.
1 Introduction
1.1 Kaczmarz algorithm
Kaczmarz algorithm [20] is a successive projection based iterative scheme for solving ill posed linear systems of equations. Since its introduction, its convergence properties have been extensively analyzed [18] and it has found diverse applications in areas ranging from tomography [29], synchronization in sensor networks [17], to learning and adaptive control [4, 27, 32]. The original algorithm is deterministic, but some applications, notably network tomography which we describe later, call for a stochastic version. In this article, we introduce and analyze a stochastic approximation version based on the RobbinsMonro paradigm [19] that has become a standard workhorse of signal processing and learning control [6], [21]. We use the ‘o.d.e.’ approach [12, 24] to analyze the scheme and argue that it has the same asymptotic behavior as the original deterministic scheme ‘almost surely’. While we apply our results to network tomography in this article, we believe that this analysis will be of use in other areas mentioned above. In particular, networked control is one potential application area.
A significant development in this line of research has been a randomized Kaczmarz scheme having provable strong convergence properties [34], [23], with recent modifications to further improve performance by weighted sampling [17], [38]. The important difference between these works and ours is as follows. For them, the randomization is over the choice of rows, which is a part of algorithm design and can be chosen at will. In our case, however, a part of the randomness is due to noise and not under our control, as also in the choice of rows which a priori we allow to be uncontrolled.
1.2 Network tomography
Network tomography is inference of spatially localized network behavior using only measurements of endtoend aggregates. Recent work can be classified into traffic volume and link delay tomography. A basic paradigm in both these is to infer the statistics of the random vector from an ill posed measurement model where the matrix is assumed to be known a priori. See [11, 9, 22] for excellent surveys.
In the transportation literature, the aim is to estimate the traffic volume on the endtoend routes assuming access to only traffic volumes on a subset of links [25, 5, 33, 35]. An excellent survey is given in [1]. An analogous problem has been addressed in packet networking [16, 36, 37]. In all of these works, one sample of is assumed available and is estimated by a suitable regularization.
Link delay tomography deals with estimation of link delay statistics from path delay measurements. Here the network is usually assumed to be in the form of a tree. Multicast probe packets, real or emulated, are sent from the root node to the leaves. For each probe packet, a set of delay measurements for paths from the root node to the leaf nodes is collected. These delays are correlated and this correlation is exploited to estimate the link delay statistics [8, 2, 14, 31]. Using independent samples of the path delay vector, an expectation maximization based algorithm is derived to obtain the maximum likelihood estimates of the link parameters. There is also work on estimating link level loss statistics [8], link level bandwidths [13], linklevel cross traffic [30] and network topology [15] using endtoend measurements.
1.3 Summary of our work
We show that, starting from the same initial point, the stochastic approximation variant of the Kaczmarz (SAK) algorithm and the deterministic Kaczmarz algorithm converge to the same point. Using this, we develop a novel online algorithm for estimation of the means (more generally, moments and crossmoments) of the elements of the vector from a sequence of measurements of the elements of the vector Our scheme can be used for both traffic volume and link delay tomography. An important advantage of our scheme is that it is real time—taking observed data as inputs as they arrive and making incremental adaptation. Also, unlike previous approaches, our scheme allows for elements of to be correlated. While our analysis is under the simplifying statistical assumption that the samples are IID, we point out later in Section 5 that these can be relaxed considerably. For link delay tomography, our algorithm does away with the need for multicast probe packet measurements and can be used even for networks with topologies other than tree.
2 Model and problem description
2.1 Basic notation
For For vectors, we use to denote their Euclidean norm and for inner product. For a matrix denotes its th row, its th entry, its row space and its transpose. We use to denote the derivative of the map with respect to
Let denote the random vector with finite variance whose statistics we wish to estimate. Let be an a priori known matrix with full row rank and let
(1) 
where is a zero mean, bounded variance random variable denoting noise in the measurement. Let be a random variable taking values in such that, Let be IID copies of , that are jointly independent, and . (The IID assumption is purely for simplicity of analysis. We point out later that these results extend to much more general situations.) We assume that at each time step we know only the value of and the th component of i.e.
Our objective is to develop a realtime algorithm, with provable convergence properties, to estimate the moments and cross moments of the random vector
3 Preliminaries
3.1 Stochastic approximation algorithms
The archetypical stochastic approximation algorithm is
(2) 
where is Lipschitz, is a positive stepsize sequence satisfying and and represents noise. As (3.1) can be viewed as a noisy discretization of the o.d.e.
(3) 
This is the ‘o.d.e. approach’ [12, 24]. More specifically, suppose that the following assumptions hold.
(A1) is a squareintegrable martingale difference sequence w.r.t. the fields satisfying a.s. for some
(A2) exists ( will be necessarily Lipschitz) and the o.d.e. has origin as its globally asymptotically stable equilibrium.
(A3) Also, a continuously differentiable Lyapunov function such that for
Then, as in Chapters 2,3 of [7], we have:
Lemma 1
The iterates of (3.1) a.s. converge to
3.2 Kaczmarz algorithm
Consider the inverse problem of finding a fixed from where is as defined in Section 2. W.l.o.g., let rows of be of unit norm. Given an approximation of a natural optimization problem to consider is
(4) 
Elementary calculation shows that its solution is
(5) 
Clearly, As has full row rank, is the only point in that satisfies The Kaczmarz algorithm uses this fact to solve (3.2). With prescribed initial point stepsize and its update rule is given by
(6) 
Theorem 1
[10] If then as .
Let Since are translations of As Thus, Hence, Thus we have:
Lemma 2
For any if and only if
4 The SAK Algorithm
We develop here a SAK algorithm to estimate for the model of Section 2. Let an approximation to be given. Observe from (2.1) that
(7) 
By rescaling equations, we assume w.l.o.g. that the rows of are of unit norm. This saves some notation without affecting the analysis. not being known exactly, one may estimate it offline and use the classical Kaczmarz to determine From (3.2), note that the classical Kaczmarz would have converged to
(8) 
As against this offline scheme, a better alternative is to use an online algorithm. Using the notations and assumptions of Section 2, a SAK algorithm to estimate based on (3.2), is:
(9) 
where is as defined below (3.1). Note in (4) the noisy measurements of the elements of and the real time estimates of
We now analyze its behaviour. Clearly, the iterates of (4) always remain confined to the affine space defined below (3.2). Since has full row rank, for each there exists unique such that
(10) 
Thus one can equivalently analyze the algorithm
(11) 
where is the matrix with in its th diagonal position and zero elsewhere and is the dimensional vector with its th position occupied by and zero elsewhere.
Let Defining where note that (4) can be rewritten as
(12) 
Theorem 2
Proof. For each let Lipschitz property of and (A1) are easily verified. If , then pointwise. Let . This vanishes only at the origin. Further, for any solution to the o.d.e. again with equality only at the origin. Thus is a Lyapunov function for the o.d.e. with the origin as its globally asymptotically stable equilibrium. Thus holds. By Lemma 1, to exhibit it now suffices to show i.e. is the globally asymptotically stable equilibrium of the o.d.e. given in (4). Towards this, consider the function As has full row rank, if and only if For any solution of (4), But Thus with equality only when This shows that is a Lyapunov function. Thus is the sole globally asymptotically stable equilibrium of (4) as desired.
5 Extensions

We had assumed to be IID. The final result, however, can be established under much more general conditions. For example, can be:

ergodic Markov, as in Part II, Chapter 1, [6], with ’s the corresponding stationary probabilities.

asymptotically stationary as in Chapter 6, [21],

‘controlled’ Markov, as in Chapter 6, [7], which allows for nonstationarity under the mild restriction that the relative frequency of remain bounded away from zero a.s. .
Likewise, can be ergodic Markov or asymptotically stationary as long as it is independent of In fact, we can allow it to be long range dependent and heavy tailed [3], which is often the case with real communications networks.


The above analysis was for estimation of means. We can also extend it to cover higher moments. For simplicity, we neglect measurement noise from (2.1) and consider the model Observe that, for any and each
(14)
where and Let denote a dimensional vector, and let Also, let denote the matrix, whose th entry is the coefficient associated with component of as given in (5). The set of relations in (5) can thus be compactly written as
(15) Note that is generically full row rank \@footnotemark\@footnotetextFix Let be an nonsingular matrix made up of the independent columns of . Clearly, columns of are also columns of where is as defined above (5). Let Each contains the identity matrix, hence is nonempty. Since is a multivariate polynomial, is a nonempty Zariski open set in , therefore open dense in the usual topology. Thus is an open dense set of .. Hence, (5) is of the same spirit as (2.1). One can thus use (4), after replacing samples of with those of and with to estimate in realtime for any For desired the only condition one needs to ensure is that if are the components of that are positive, then such that are simultaneously nonzero. Clearly, by choosing appropriate in (5), one can estimate the moments of any desired order.
Given finite moment estimates, one can then postulate a maximum entropy distribution. For e.g., if , then the maximum entropy distribution is , where is for normalization and are chosen so as to ensure .

Since (4) is of the form for suitable vector and martingale difference noise , we can iterate this to obtain . Note that the matrix is similar to the positive definite matrix . Hence, if denotes the minimum eigenvalue of then , where denotes the weighted norm defined by . This can be used to obtain estimates for finite time error and convergence rate. We do not pursue this here, see, however, [26].
6 Experimental Results
We illustrate the application of SAK algorithm in real time delay tomography for the network of Figure 1. The goal here is to use the measurements of endtoend delay experienced by probe packets while traversing different paths in the network to obtain, in real time, the estimates of link delay statistics.
In the framework of Section 2, the experimental setup is as follows. A priori we choose six paths in the network. This is described by the pathlink matrix (rows paths, columns links)
Its entry is one if link is present on path Thus, row four denotes the path that connects the nodes 2811124. The delay a probe packet experiences while traversing link is a random variable with arbitrary nonnegative distribution. The delay across path is where are IID standard Gaussian random variables denoting measurement error. We generate a million probe packets, where the packet is sent along a path whose index, denoted is chosen uniformly randomly from Thus each path gets about 167,000 samples. We use to record the delay, packet experiences while traversing the path We also run our SAK Algorithm of (4) for a million iterations, first for (2.1) and then for (5) with The chosen start point, the actual value and final estimated value of moments are given in Tables 1 and 2. In both cases the initial point satisfies the assumption of Lemma 2 and hence the final estimates are close to the actual values. In Table 2 we give only a subset of results. Figure 2 compares the realtime estimates of expected delay for candidate links and obtained using the SAK algorithm and the averaged SAK algorithm. The iterates of the averaged SAK algorithm are samples averages of the SAK algorithm iterates. Observe that, although we run the simulation for a million packets, the estimates are very nearly the true values in after about 300 iterations. Also, note that the error in the estimates does not decrease monotonically. This is because of the direct use of noisy measurements. The fluctuations, however, get suppressed as the stepsizes decrease with iterations.
LinkId  Initial  True expected  Final 

guess  delay  estimate  
1  00.00  50.25  45.09 
2  00.00  26.32  33.17 
3  12.15  41.84  39.96 
4  00.00  09.10  11.92 
5  25.34  23.04  19.98 
6  00.00  48.08  46.87 
7  00.00  41.49  39.05 
8  00.00  49.75  50.97 
9  00.00  34.72  37.34 
10  00.00  03.78  07.82 
11  28.86  44.05  42.06 
12  39.90  48.54  53.54 
13  00.00  29.07  26.82 
Moment  Initial  True  Final 

guess  value  estimate  
17388  20539  20570  
0  277.85  286.29  
15985  158.83  164.34  
126  2427.8  2390.5 
References
 [1] T. Abrahamsson. Estimation of origindestination matrices using traffic counts–a literature survey. IIASA Interim Report IR98021/May, 27:76, 1998.
 [2] A. Adams, T. Bu, T. Friedman, J. Horowitz, D. Towsley, R. Caceres, N. Duffield, F. Presti, S. B. Moon, and V. Paxson. The use of endtoend multicast measurements for characterizing internal network behavior. Communications Magazine, IEEE, 38(5):152–159, 2000.
 [3] V. Anantharam and V. S. Borkar. Stochastic approximation with long range dependent and heavy tailed noise. Queueing Systems, 71(12):221–242, 2012.
 [4] K.J. Åström. Theory and applications of adaptive control–a survey. Automatica, 19(5):471–486, 1983.
 [5] M. G. H. Bell. The estimation of origindestination matrices by constrained generalised least squares. Transportation Research Part B: Methodological, 25(1):13–22, 1991.
 [6] A. Benveniste, M. Métivier, and P. Priouret. Adaptive algorithms and stochastic approximations. Springer Publishing Company, Incorporated, 1990.
 [7] V. S. Borkar. Stochastic approximation: a dynamical systems viewpoint. Cambridge University Press Cambridge, 2008.
 [8] R. Cáceres, N. Duffield, J. Horowitz, and D. Towsley. Multicastbased inference of networkinternal loss characteristics. Information Theory, IEEE Transactions on, 45(7):2462–2480, 1999.
 [9] R. Castro, M. Coates, G. Liang, R. Nowak, and B. Yu. Network tomography: recent developments. Statistical science, pages 499–517, 2004.
 [10] E. Chong and H. Zak. An introduction to optimization. John Wiley & Sons, 2001.
 [11] A. Coates, A. Hero III, R. Nowak, and B. Yu. Internet tomography. Signal Processing Magazine, IEEE, 19(3):47–65, 2002.
 [12] D. Derevitskii and A. Fradkov. Two models for analyzing the dynamics of adaptation algorithms. Automation and Remote Control, 35(1):59–67, 1974.
 [13] A. Downey. Using pathchar to estimate internet link characteristics. In ACM SIGCOMM Computer Communication Review, volume 29(4), pages 241–250. ACM, 1999.
 [14] N. Duffield, J. Horowitz, F. Presti, and D. Towsley. Network delay tomography from endtoend unicast measurements. In Evolutionary Trends of the Internet, pages 576–595. Springer, 2001.
 [15] N. Duffield, J. Horowitz, F. Presti, and D. Towsley. Multicast topology inference from measured endtoend loss. Information Theory, IEEE Transactions on, 48(1):26–45, 2002.
 [16] A. Feldmann, A. Greenberg, C. Lund, N. Reingold, J. Rexford, and F. True. Deriving traffic demands for operational ip networks: Methodology and experience. IEEE/ACM Transactions on Networking (ToN), 9(3):265–280, 2001.
 [17] N. Freris and A. Zouzias. Fast distributed smoothing of relative measurements. In Decision and Control (CDC), 2012 IEEE 51st Annual Conference on, pages 1411–1416. IEEE, 2012.
 [18] A. Galántai. Projectors and projection methods, volume 6. Springer, 2004.
 [19] Robbins H and S. Monro. A stochastic approximation method. The Annals of Mathematical Statistics, pages 400–407, 1951.
 [20] S. Kaczmarz. Angenäherte auflösung von systemen linearer gleichungen. Bulletin International de lâAcademie Polonaise des Sciences et des Lettres, 35:355–357, 1937.
 [21] H. Kushner and G. Yin. Stochastic approximation algorithms and applications. Springer New York, 2003.
 [22] E. Lawrence, G. Michailidis, and V. Nair. Statistical inverse problems in active network tomography. Lecture NotesMonograph Series, pages 24–44, 2007.
 [23] D. Leventhal and A. Lewis. Randomized methods for linear constraints: convergence rates and conditioning. Mathematics of Operations Research, 35(3):641–654, 2010.
 [24] L. Ljung. Analysis of recursive stochastic algorithms. Automatic Control, IEEE Transactions on, 22(4):551–575, 1977.
 [25] M. Maher. Inferences on trip matrices from observations on link volumes: a bayesian statistical approach. Transportation Research Part B: Methodological, 17(6):435–447, 1983.
 [26] G. Moustakides. Exponential convergence of products of random matrices: Application to adaptive algorithms. International Journal of Adaptive Control and Signal Processing, 12(7):579–597, 1998.
 [27] P. Parks and J. Militzer. A comparison of five algorithms for the training of cmac memories for learning control systems. Automatica, 28(5):1027–1035, 1992.
 [28] B. Polyak and A. Juditsky. Acceleration of stochastic approximation by averaging. SIAM Journal on Control and Optimization, 30(4):838–855, 1992.
 [29] C. Popa and R. Zdunek. Kaczmarz extended algorithm for tomographic image reconstruction from limiteddata. Mathematics and Computers in Simulation, 65(6):579–598, 2004.
 [30] R. Prasad, C. Dovrolis, M. Murray, and K. Claffy. Bandwidth estimation: metrics, measurement techniques, and tools. Network, IEEE, 17(6):27–35, 2003.
 [31] F. Presti, N. Duffield, J. Horowitz, and D. Towsley. Multicastbased inference of networkinternal delay distributions. Networking, IEEE/ACM Transactions on, 10(6):761–775, 2002.
 [32] J. Richalet, A. Rault, J. Testud, and J. Papon. Model predictive heuristic control: Applications to industrial processes. Automatica, 14(5):413–428, 1978.
 [33] H. Sherali, R. Sivanandan, and A. Hobeika. A linear programming approach for synthesizing origindestination trip tables from link traffic volumes. Transportation Research Part B: Methodological, 28(3):213–233, 1994.
 [34] T. Strohmer and R. Vershynin. A randomized Kaczmarz algorithm with exponential convergence. Journal of Fourier Analysis and Applications, 15(2):262–278, 2009.
 [35] Y. Vardi. Network tomography: Estimating sourcedestination traffic intensities from link data. Journal of the American Statistical Association, 91(433):365–377, 1996.
 [36] Y. Zhang, M. Roughan, N. Duffield, and A. Greenberg. Fast accurate computation of largescale ip traffic matrices from link loads. In ACM SIGMETRICS Performance Evaluation Review, volume 31(1), pages 206–217. ACM, 2003.
 [37] Y. Zhang, M. Roughan, C. Lund, and D. Donoho. An informationtheoretic approach to traffic matrix estimation. In Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications, pages 301–312. ACM, 2003.
 [38] A. Zouzias and N. Freris. Randomized extended Kaczmarz for solving leastsquares. arXiv preprint arXiv:1205.5770, 2012.