A Stochastic Kaczmarz Algorithm for Network Tomography\thanksreffootnoteinfo

# A Stochastic Kaczmarz Algorithm for Network Tomography\thanksreffootnoteinfo

[    [    [
###### Abstract

We develop a stochastic approximation version of the classical Kaczmarz algorithm that is incremental in nature and takes as input noisy real time data. Our analysis shows that with probability one it mimics the behavior of the original scheme: starting from the same initial point, our algorithm and the corresponding deterministic Kaczmarz algorithm converge to precisely the same point. The motivation for this work comes from network tomography where network parameters are to be estimated based upon end–to–end measurements. Numerical examples via Matlab based simulations demonstrate the efficacy of the algorithm.

TIFR]Gugan Thoppe, IITB]Vivek Borkar, IITB]D. Manjunath

School of Technology and Computer Science, Tata Institute of Fundamental Research, Mumbai 400005, India

Department of Electrical Engineering, Indian Institute of Technology, Powai, Mumbai 400076, India

Key words:  Kaczmarz algorithm; Stochastic approximation; Network tomography; Online algorithm.

11footnotetext: The work of V. Borkar is supported in part by a J. C. Bose Fellowship from the Government of India, grant 11IRCCSG014 from IRCC, IIT Bombay, and a grant from the Dept. of Science and Technology for ‘Distributed Computation for Optimization over Large Networks and High Dimensional Data Analysis’. D. Manjunath is also affiliated with the Bharti Centre for Communication, IIT Bombay and is also supported in part by the above DST grant. This paper was not presented at any IFAC meeting. Corresponding author G. Thoppe. Tel. +91-22-22782930.
The authors thank the referees for their constructive comments that have helped us to improve the article significantly.

## 1 Introduction

### 1.1 Kaczmarz algorithm

Kaczmarz algorithm [20] is a successive projection based iterative scheme for solving ill posed linear systems of equations. Since its introduction, its convergence properties have been extensively analyzed [18] and it has found diverse applications in areas ranging from tomography [29], synchronization in sensor networks [17], to learning and adaptive control [4, 27, 32]. The original algorithm is deterministic, but some applications, notably network tomography which we describe later, call for a stochastic version. In this article, we introduce and analyze a stochastic approximation version based on the Robbins-Monro paradigm [19] that has become a standard workhorse of signal processing and learning control [6], [21]. We use the ‘o.d.e.’ approach [12, 24] to analyze the scheme and argue that it has the same asymptotic behavior as the original deterministic scheme ‘almost surely’. While we apply our results to network tomography in this article, we believe that this analysis will be of use in other areas mentioned above. In particular, networked control is one potential application area.

A significant development in this line of research has been a randomized Kaczmarz scheme having provable strong convergence properties [34], [23], with recent modifications to further improve performance by weighted sampling [17], [38]. The important difference between these works and ours is as follows. For them, the randomization is over the choice of rows, which is a part of algorithm design and can be chosen at will. In our case, however, a part of the randomness is due to noise and not under our control, as also in the choice of rows which a priori we allow to be uncontrolled.

### 1.2 Network tomography

Network tomography is inference of spatially localized network behavior using only measurements of end-to-end aggregates. Recent work can be classified into traffic volume and link delay tomography. A basic paradigm in both these is to infer the statistics of the random vector from an ill posed measurement model where the matrix is assumed to be known a priori. See [11, 9, 22] for excellent surveys.

In the transportation literature, the aim is to estimate the traffic volume on the end-to-end routes assuming access to only traffic volumes on a subset of links [25, 5, 33, 35]. An excellent survey is given in [1]. An analogous problem has been addressed in packet networking [16, 36, 37]. In all of these works, one sample of is assumed available and is estimated by a suitable regularization.

Link delay tomography deals with estimation of link delay statistics from path delay measurements. Here the network is usually assumed to be in the form of a tree. Multicast probe packets, real or emulated, are sent from the root node to the leaves. For each probe packet, a set of delay measurements for paths from the root node to the leaf nodes is collected. These delays are correlated and this correlation is exploited to estimate the link delay statistics [8, 2, 14, 31]. Using independent samples of the path delay vector, an expectation maximization based algorithm is derived to obtain the maximum likelihood estimates of the link parameters. There is also work on estimating link level loss statistics [8], link level bandwidths [13], link-level cross traffic [30] and network topology [15] using end-to-end measurements.

### 1.3 Summary of our work

We show that, starting from the same initial point, the stochastic approximation variant of the Kaczmarz (SAK) algorithm and the deterministic Kaczmarz algorithm converge to the same point. Using this, we develop a novel online algorithm for estimation of the means (more generally, moments and cross-moments) of the elements of the vector from a sequence of measurements of the elements of the vector Our scheme can be used for both traffic volume and link delay tomography. An important advantage of our scheme is that it is real time—taking observed data as inputs as they arrive and making incremental adaptation. Also, unlike previous approaches, our scheme allows for elements of to be correlated. While our analysis is under the simplifying statistical assumption that the samples are IID, we point out later in Section 5 that these can be relaxed considerably. For link delay tomography, our algorithm does away with the need for multicast probe packet measurements and can be used even for networks with topologies other than tree.

## 2 Model and problem description

### 2.1 Basic notation

For For vectors, we use to denote their Euclidean norm and for inner product. For a matrix denotes its -th row, its -th entry, its row space and its transpose. We use to denote the derivative of the map with respect to

Let denote the random vector with finite variance whose statistics we wish to estimate. Let be an a priori known matrix with full row rank and let

 Y≡(Y(1),…,Y(m))′=AX+W, (1)

where is a zero mean, bounded variance random variable denoting noise in the measurement. Let be a random variable taking values in such that, Let be IID copies of , that are jointly independent, and . (The IID assumption is purely for simplicity of analysis. We point out later that these results extend to much more general situations.) We assume that at each time step we know only the value of and the -th component of i.e.

Our objective is to develop a real-time algorithm, with provable convergence properties, to estimate the moments and cross moments of the random vector

## 3 Preliminaries

### 3.1 Stochastic approximation algorithms

The archetypical stochastic approximation algorithm is

 xk+1=xk+ηk[h(xk)+ξk+1], (2)

where is Lipschitz, is a positive stepsize sequence satisfying and and represents noise. As (3.1) can be viewed as a noisy discretization of the o.d.e.

 ˙x(t)=h(x(t)). (3)

This is the ‘o.d.e. approach’ [12, 24]. More specifically, suppose that the following assumptions hold.

(A1) is a square-integrable martingale difference sequence w.r.t. the fields satisfying a.s. for some

(A2) exists ( will be necessarily Lipschitz) and the o.d.e. has origin as its globally asymptotically stable equilibrium.

(A3) Also, a continuously differentiable Lyapunov function such that for

Then, as in Chapters 2,3 of [7], we have:

###### Lemma 1

The iterates of (3.1) a.s. converge to

### 3.2 Kaczmarz algorithm

Consider the inverse problem of finding a fixed from where is as defined in Section 2. W.l.o.g., let rows of be of unit norm. Given an approximation of a natural optimization problem to consider is

 minu∈RN∥u−x0∥, subject to Au=Av∗. (4)

Elementary calculation shows that its solution is

 x∗=x0+A′(AA′)−1(Av∗−Ax0). (5)

Clearly, As has full row rank, is the only point in that satisfies The Kaczmarz algorithm uses this fact to solve (3.2). With prescribed initial point stepsize and its update rule is given by

 xk+1=xk+κ[⟨ark,v∗⟩−⟨ark,xk⟩]ark. (6)
###### Theorem 1

[10] If then as .

Let Since are translations of As Thus, Hence, Thus we have:

###### Lemma 2

For any if and only if

## 4 The SAK Algorithm

We develop here a SAK algorithm to estimate for the model of Section 2. Let an approximation to be given. Observe from (2.1) that

 EY=AEX. (7)

By rescaling equations, we assume w.l.o.g. that the rows of are of unit norm. This saves some notation without affecting the analysis. not being known exactly, one may estimate it off-line and use the classical Kaczmarz to determine From (3.2), note that the classical Kaczmarz would have converged to

 x∗=x0+A′(AA′)−1(E(Y)−Ax0). (8)

As against this off-line scheme, a better alternative is to use an on-line algorithm. Using the notations and assumptions of Section 2, a SAK algorithm to estimate based on (3.2), is:

 xk+1=xk+ηk[Yk+1−⟨aZk+1,xk⟩]aZk+1, (9)

where is as defined below (3.1). Note in (4) the noisy measurements of the elements of and the real time estimates of

We now analyze its behaviour. Clearly, the iterates of (4) always remain confined to the affine space defined below (3.2). Since has full row rank, for each there exists unique such that

 xk=x0+A′αk. (10)

Thus one can equivalently analyze the algorithm

 αk+1=αk+ηk[~Yk+1−eZk+1A(x0+A′αk)], (11)

where is the matrix with in its -th diagonal position and zero elsewhere and is the dimensional vector with its -th position occupied by and zero elsewhere.

Let Defining where note that (4) can be rewritten as

 αk+1=αk+ηk[Λ(EY−A(x0+A′αk))+ξk+1]. (12)

If then, clearly, (4) is in the form given in (3.1). Its limiting o.d.e. is thus

 ˙α(t)=Λ(EY−A(x0+A′α(t))). (13)
###### Theorem 2

Proof. For each let Lipschitz property of and (A1) are easily verified. If , then pointwise. Let . This vanishes only at the origin. Further, for any solution to the o.d.e. again with equality only at the origin. Thus is a Lyapunov function for the o.d.e. with the origin as its globally asymptotically stable equilibrium. Thus holds. By Lemma 1, to exhibit it now suffices to show i.e. is the globally asymptotically stable equilibrium of the o.d.e. given in (4). Towards this, consider the function As has full row rank, if and only if For any solution of (4), But Thus with equality only when This shows that is a Lyapunov function. Thus is the sole globally asymptotically stable equilibrium of (4) as desired.

Because of (4), it follows that the SAK algorithm of (4) converges to of (4), the same point that the corresponding classical Kaczmarz converges to.

## 5 Extensions

1. We had assumed to be IID. The final result, however, can be established under much more general conditions. For example, can be:

• ergodic Markov, as in Part II, Chapter 1, [6], with ’s the corresponding stationary probabilities.

• asymptotically stationary as in Chapter 6, [21],

• ‘controlled’ Markov, as in Chapter 6, [7], which allows for non-stationarity under the mild restriction that the relative frequency of remain bounded away from zero a.s. .

Likewise, can be ergodic Markov or asymptotically stationary as long as it is independent of In fact, we can allow it to be long range dependent and heavy tailed [3], which is often the case with real communications networks.

2. The above analysis was for estimation of means. We can also extend it to cover higher moments. For simplicity, we neglect measurement noise from (2.1) and consider the model Observe that, for any and each

 [Y(i)]q=∑r∈ΔN,q(qr1,…,rN)N∏ℓ=1[aiℓX(ℓ)]rℓ, (14)

where and Let denote a dimensional vector, and let Also, let denote the matrix, whose -th entry is the coefficient associated with component of as given in (5). The set of relations in (5) can thus be compactly written as

 Yq=Aq~Xq. (15)

Note that is generically full row rank \@footnotemark\@footnotetextFix Let be an non-singular matrix made up of the independent columns of . Clearly, columns of are also columns of where is as defined above (5). Let Each contains the identity matrix, hence is non-empty. Since is a multivariate polynomial, is a non-empty Zariski open set in , therefore open dense in the usual topology. Thus is an open dense set of .. Hence, (5) is of the same spirit as (2.1). One can thus use (4), after replacing samples of with those of and with to estimate in real-time for any For desired the only condition one needs to ensure is that if are the components of that are positive, then such that are simultaneously nonzero. Clearly, by choosing appropriate in (5), one can estimate the moments of any desired order.

Given finite moment estimates, one can then postulate a maximum entropy distribution. For e.g., if , then the maximum entropy distribution is , where is for normalization and are chosen so as to ensure .

3. We have taken the process as given, i.e., not within our control. If instead one can schedule , randomization policies such as [34, 23, 17, 38] can be used to advantage. Further performance improvements are possible by adapting additional averaging as in [28]. We do not pursue this here.

4. Since (4) is of the form for suitable vector and martingale difference noise , we can iterate this to obtain . Note that the matrix is similar to the positive definite matrix . Hence, if denotes the minimum eigenvalue of then , where denotes the weighted norm defined by . This can be used to obtain estimates for finite time error and convergence rate. We do not pursue this here, see, however, [26].

## 6 Experimental Results

We illustrate the application of SAK algorithm in real time delay tomography for the network of Figure 1. The goal here is to use the measurements of end-to-end delay experienced by probe packets while traversing different paths in the network to obtain, in real time, the estimates of link delay statistics.

In the framework of Section 2, the experimental setup is as follows. A priori we choose six paths in the network. This is described by the path-link matrix (rows paths, columns links)

Its entry is one if link is present on path Thus, row four denotes the path that connects the nodes 2-8-11-12-4. The delay a probe packet experiences while traversing link is a random variable with arbitrary non-negative distribution. The delay across path is where are IID standard Gaussian random variables denoting measurement error. We generate a million probe packets, where the packet is sent along a path whose index, denoted is chosen uniformly randomly from Thus each path gets about 167,000 samples. We use to record the delay, packet experiences while traversing the path We also run our SAK Algorithm of (4) for a million iterations, first for (2.1) and then for (5) with The chosen start point, the actual value and final estimated value of moments are given in Tables 1 and 2. In both cases the initial point satisfies the assumption of Lemma 2 and hence the final estimates are close to the actual values. In Table 2 we give only a subset of results. Figure 2 compares the real-time estimates of expected delay for candidate links and obtained using the SAK algorithm and the averaged SAK algorithm. The iterates of the averaged SAK algorithm are samples averages of the SAK algorithm iterates. Observe that, although we run the simulation for a million packets, the estimates are very nearly the true values in after about 300 iterations. Also, note that the error in the estimates does not decrease monotonically. This is because of the direct use of noisy measurements. The fluctuations, however, get suppressed as the stepsizes decrease with iterations.

## References

• [1] T. Abrahamsson. Estimation of origin-destination matrices using traffic counts–a literature survey. IIASA Interim Report IR-98-021/May, 27:76, 1998.
• [2] A. Adams, T. Bu, T. Friedman, J. Horowitz, D. Towsley, R. Caceres, N. Duffield, F. Presti, S. B. Moon, and V. Paxson. The use of end-to-end multicast measurements for characterizing internal network behavior. Communications Magazine, IEEE, 38(5):152–159, 2000.
• [3] V. Anantharam and V. S. Borkar. Stochastic approximation with long range dependent and heavy tailed noise. Queueing Systems, 71(1-2):221–242, 2012.
• [4] K.J. Åström. Theory and applications of adaptive control–a survey. Automatica, 19(5):471–486, 1983.
• [5] M. G. H. Bell. The estimation of origin-destination matrices by constrained generalised least squares. Transportation Research Part B: Methodological, 25(1):13–22, 1991.
• [6] A. Benveniste, M. Métivier, and P. Priouret. Adaptive algorithms and stochastic approximations. Springer Publishing Company, Incorporated, 1990.
• [7] V. S. Borkar. Stochastic approximation: a dynamical systems viewpoint. Cambridge University Press Cambridge, 2008.
• [8] R. Cáceres, N. Duffield, J. Horowitz, and D. Towsley. Multicast-based inference of network-internal loss characteristics. Information Theory, IEEE Transactions on, 45(7):2462–2480, 1999.
• [9] R. Castro, M. Coates, G. Liang, R. Nowak, and B. Yu. Network tomography: recent developments. Statistical science, pages 499–517, 2004.
• [10] E. Chong and H. Zak. An introduction to optimization. John Wiley & Sons, 2001.
• [11] A. Coates, A. Hero III, R. Nowak, and B. Yu. Internet tomography. Signal Processing Magazine, IEEE, 19(3):47–65, 2002.
• [12] D. Derevitskii and A. Fradkov. Two models for analyzing the dynamics of adaptation algorithms. Automation and Remote Control, 35(1):59–67, 1974.
• [13] A. Downey. Using pathchar to estimate internet link characteristics. In ACM SIGCOMM Computer Communication Review, volume 29(4), pages 241–250. ACM, 1999.
• [14] N. Duffield, J. Horowitz, F. Presti, and D. Towsley. Network delay tomography from end-to-end unicast measurements. In Evolutionary Trends of the Internet, pages 576–595. Springer, 2001.
• [15] N. Duffield, J. Horowitz, F. Presti, and D. Towsley. Multicast topology inference from measured end-to-end loss. Information Theory, IEEE Transactions on, 48(1):26–45, 2002.
• [16] A. Feldmann, A. Greenberg, C. Lund, N. Reingold, J. Rexford, and F. True. Deriving traffic demands for operational ip networks: Methodology and experience. IEEE/ACM Transactions on Networking (ToN), 9(3):265–280, 2001.
• [17] N. Freris and A. Zouzias. Fast distributed smoothing of relative measurements. In Decision and Control (CDC), 2012 IEEE 51st Annual Conference on, pages 1411–1416. IEEE, 2012.
• [18] A. Galántai. Projectors and projection methods, volume 6. Springer, 2004.
• [19] Robbins H and S. Monro. A stochastic approximation method. The Annals of Mathematical Statistics, pages 400–407, 1951.
• [20] S. Kaczmarz. Angenäherte auflösung von systemen linearer gleichungen. Bulletin International de lâAcademie Polonaise des Sciences et des Lettres, 35:355–357, 1937.
• [21] H. Kushner and G. Yin. Stochastic approximation algorithms and applications. Springer New York, 2003.
• [22] E. Lawrence, G. Michailidis, and V. Nair. Statistical inverse problems in active network tomography. Lecture Notes-Monograph Series, pages 24–44, 2007.
• [23] D. Leventhal and A. Lewis. Randomized methods for linear constraints: convergence rates and conditioning. Mathematics of Operations Research, 35(3):641–654, 2010.
• [24] L. Ljung. Analysis of recursive stochastic algorithms. Automatic Control, IEEE Transactions on, 22(4):551–575, 1977.
• [25] M. Maher. Inferences on trip matrices from observations on link volumes: a bayesian statistical approach. Transportation Research Part B: Methodological, 17(6):435–447, 1983.
• [26] G. Moustakides. Exponential convergence of products of random matrices: Application to adaptive algorithms. International Journal of Adaptive Control and Signal Processing, 12(7):579–597, 1998.
• [27] P. Parks and J. Militzer. A comparison of five algorithms for the training of cmac memories for learning control systems. Automatica, 28(5):1027–1035, 1992.
• [28] B. Polyak and A. Juditsky. Acceleration of stochastic approximation by averaging. SIAM Journal on Control and Optimization, 30(4):838–855, 1992.
• [29] C. Popa and R. Zdunek. Kaczmarz extended algorithm for tomographic image reconstruction from limited-data. Mathematics and Computers in Simulation, 65(6):579–598, 2004.
• [30] R. Prasad, C. Dovrolis, M. Murray, and K. Claffy. Bandwidth estimation: metrics, measurement techniques, and tools. Network, IEEE, 17(6):27–35, 2003.
• [31] F. Presti, N. Duffield, J. Horowitz, and D. Towsley. Multicast-based inference of network-internal delay distributions. Networking, IEEE/ACM Transactions on, 10(6):761–775, 2002.
• [32] J. Richalet, A. Rault, J. Testud, and J. Papon. Model predictive heuristic control: Applications to industrial processes. Automatica, 14(5):413–428, 1978.
• [33] H. Sherali, R. Sivanandan, and A. Hobeika. A linear programming approach for synthesizing origin-destination trip tables from link traffic volumes. Transportation Research Part B: Methodological, 28(3):213–233, 1994.
• [34] T. Strohmer and R. Vershynin. A randomized Kaczmarz algorithm with exponential convergence. Journal of Fourier Analysis and Applications, 15(2):262–278, 2009.
• [35] Y. Vardi. Network tomography: Estimating source-destination traffic intensities from link data. Journal of the American Statistical Association, 91(433):365–377, 1996.
• [36] Y. Zhang, M. Roughan, N. Duffield, and A. Greenberg. Fast accurate computation of large-scale ip traffic matrices from link loads. In ACM SIGMETRICS Performance Evaluation Review, volume 31(1), pages 206–217. ACM, 2003.
• [37] Y. Zhang, M. Roughan, C. Lund, and D. Donoho. An information-theoretic approach to traffic matrix estimation. In Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications, pages 301–312. ACM, 2003.
• [38] A. Zouzias and N. Freris. Randomized extended Kaczmarz for solving least-squares. arXiv preprint arXiv:1205.5770, 2012.
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters