Fine tuning consensus optimization for distributed radio interferometric calibration

# Fine tuning consensus optimization for distributed radio interferometric calibration

###### Abstract

We recently proposed the use of consensus optimization as a viable and effective way to improve the quality of calibration of radio interferometric data. We showed that it is possible to obtain far more accurate calibration solutions and also to distribute the compute load across a network of computers by using this technique. A crucial aspect in any consensus optimization problem is the selection of the penalty parameter used in the alternating direction method of multipliers (ADMM) iterations. This affects the convergence speed as well as the accuracy. In this paper, we use the Hessian of the cost function used in calibration to appropriately select this penalty. We extend our results to a multi-directional calibration setting, where we propose to use a penalty scaled by the squared intensity of each direction.

Fine tuning consensus optimization for distributed radio interferometric calibration

 Sarod Yatawatta††thanks: This work is supported by the European Research Council project LOFARCORE, grant #339743. ASTRON, The Netherlands Institute for Radio Astronomy, The Netherlands. Email: yatawatta@astron.nl

Index Terms—  Calibration, Interferometry: Radio interferometry

## 1 Introduction

Modern radio interferometric arrays deliver large volumes of data, in order to reach higher sensitivities yielding new science. To reach the full potential of such arrays, estimation of systematic errors in the data and correction for such errors (also called as calibration) is essential. This is not a trivial task for an array with hundreds of receivers that collect data over many hours and at thousands of different frequencies. A case in point being the square kilometre array (SKA), which is in the planning phase. Thus, there is an urgent need for computationally efficient and robust algorithms. On the other hand, there is a surge in research related to large scale and distributed data processing algorithms (also called as big-data), which we can exploit to solve some of these problems.

Our recent work [1] introduced distributed-calibration as a way of distributing the computational burden over a network of computers while at the same time improving the quality of calibration. We essentially exploited the continuity of systematic errors over frequency to enforce an additional constraint onto calibration. This reduces calibration to a consensus optimization [2] problem and we used alternating direction method of multipliers (ADMM) [3] as the underlying algorithm in the proposed distributed calibration scheme.

Consensus optimization, practically implemented with ADMM, has been extensively studied and is deployed in a wide variety of application areas (some recent examples are [4, 5, 6]). In addition, similar work is beginning to appear in radio astronomical imaging [7, 8, 9]. However, compared with other users of ADMM, we observe several unique properties of the calibration problem that we face. First, the cost function used in calibration is non-linear and non-convex. The systematic errors are mainly caused by directional effects such as the ionosphere and the receiver beam shape. Although we know the general properties of such errors, building an entirely accurate model (for instance for their variation with frequency) is not feasible. Hence, we enforce consensus only by using an approximate model, and this is clearly different and also more involved from most other applications. Indeed, other applications such as consensus averaging, where consensus is enforced on a constant value, use a perfect model. Furthermore, most other applications use complicated network topologies (that in turn affect the performance of ADMM) and on the other hand, in our case, we have a much simpler (and fully connected) network with one fusion center.

Of particular interest is the convergence rate of ADMM, which depends on many factors including the penalty parameter and the network topology [10]. In most cases, the penalty parameter is selected by trial and error, following some general guidelines [3]. However, for specific problems, better methods to select the penalty have been proposed [10, 11, 12]. Recent work [13] has suggested to select the penalty parameter as large as possible to make the objective function strongly convex. Hence for our problem, we study the Hessian of the cost function to select appropriate values for the penalty parameter. For calibration along multiple directions in the sky, we can select different penalty values along each direction. Intuitively, we select a large penalty along directions with higher signal where we have more confidence in our model. These directions are mostly close to the center of the field of view. On the other hand, for directions far away from the center, we select a smaller penalty.

The rest of the paper is organized as follows: In section 2 we give an overview of radio interferometric calibration. Next, in section 3, we present distributed calibration based on consensus optimization. We also present a scheme based on the Hessian of the cost function to select the penalty parameter. Simulation results are presented in section 4 where we demonstrate the improved performance with a refined penalty parameter. Finally, we draw our conclusions in section 5.

Notation: Matrices and vectors are denoted by bold upper and lower case letters as and , respectively. The transpose and the Hermitian transpose are given by and . The matrix Frobenius norm is given by . The set of real and complex numbers are denoted by and . The identity matrix is given by . The matrix trace operator is given by .

Consider a radio interferometric array with receivers. The sky is composed of many discrete sources and we consider calibration along directions in the sky. The observed data at a baseline formed by two receivers, and is given by [14]

 Vpq=K∑k=1JpkCpqkJHqk+Npq (1)

where () is the observed visibility matrix (or the cross correlations). The systematic errors that need to be calibrated for station and are given by the Jones matrices (), respectively. Note that since directions are calibrated, for each station, there are Jones matrices (so in total). The sky signal (or coherency) along the -th direction is given by () and is known a priori. The values of and in (1) are implicitly dependent on sampling time and frequency of the observation. The noise matrix () is assumed to have complex, zero mean, circular Gaussian elements.

Estimating the Jones matrices in (1) can be further simplified by using the space alternating generalized expectation maximization (SAGE) algorithm [15, 16]. In a nutshell, using SAGE algorithm, we can simplify calibration along directions to single direction calibration subproblems (see [16] for details). Calibration along the -th direction is done by using the effective observed data

 Vpqk=Vpq−K∑l=1,l≠kˆJplCpqlˆJHql (2)

using current estimates and and for an array with receivers, we can form at most baselines that collect visibilities as in (2), for any given time and frequency sample. We define our objective function (for the -th direction) under a Gaussian noise model as

 gk(J1k,J2k,…)=∑p,q∥Vpqk−JpkCpqkJHqk∥2 (3)

where the summation is over the baselines that have data. By increasing the time and frequency interval within which data are collected, this summation can be expanded (thus improving the signal to noise ratio). By defining () as the augmented matrix of Jones matrices of all stations along the -th direction,

 J△=[JT1k,JT2k,…,JTNk]T, (4)

and () (and likewise) as the canonical selection matrix

 Ap△=[0,0,…,I,…,0], (5)

(only the -th block of (5) is an identity matrix) we can rewrite (3) as

 gk(J)=∑p,q∥Vpqk−ApJCpqk(AqJ)H∥2. (6)

Calibration along the -th direction is the estimation of by minimizing (6). Note that (6) has to be minimized for each direction and updated values of (2) are re-used until convergence is reached in the SAGE algorithm. We also note that (6) only gives solutions for one frequency and time interval, and to calibrate the full dataset, many such solutions are obtained for data observed at different time and frequency intervals.

## 3 Distributed Calibration

We have introduced calibration along directions, but only working on a single frequency and time sample in section 2. In this section, we consider calibrating data observed at different frequencies, but only along direction, because this can easily be extended to directions using the SAGE algorithm. We impose an additional constraint that tries to preserve continuity of in (6) over frequency. To solve this, we introduced the use of consensus optimization in [1], where the objective function is modified into an augmented Lagrangian

 Lf(Jf,Z,Yf)=gf(Jf)+∥YHf(Jf−BfZ)∥+ρ2∥Jf−BfZ∥2 (7)

where the subscript denotes data (and parameters) at frequency . In (7), is the original cost function as in (6), except that the subscripts denote frequency . The Lagrange multiplier is given by (). The calibration parameters are given by (). The continuity in frequency is enforced by the frequency model given by (), which is essentially a set of basis functions in frequency, evaluated at . The global variable () is shared by data at all frequencies.

The ADMM iterations for solving (7) are given as

 (Jf)n+1=argminJ  Lf(J,(Z)n,(Yf)n) (8) (Z)n+1=argminZ  ∑fLf((Jf)n+1,Z,(Yf)n) (9) (Yf)n+1=(Yf)n+ρ((Jf)n+1−Bf(Z)n+1) (10)

where we use the superscript to denote the -th iteration. The steps (8) and (10) are done for each in parallel. The update of the global variable (9) is done at the fusion center. More details of these steps can be found in [1].

In this paper, we study strategies for selecting the penalty parameter to get faster convergence and accurate results. In order to do this, we use the Hessian operator of the cost function (6), which is given as [1, 17],

 Hessf(gf(J),J,\boldmathη) = ∑p,q(ATp((Vpqf−ApJCpqfJHATq)Aq% \boldmathη −Ap(JCpqf\boldmathηH+\boldmathηCpqfJH)ATqAqJ)CHpqf +ATq((Vpqf−ApJCpqfJHATq)HAp\boldmathη −Aq(JCpqf\boldmathηH+\boldmathηCpqfJH)HATpApJ)Cpqf)

where .

For convexity, we need a positive definite Hessian. Since we have a Hessian operator (instead of a matrix), we need to find the smallest eigenvalue of the Hessian, and for convexity, this should be positive. In order to find this, we define a cost function as

 h(\boldmathη)△=12trace(\boldmathηHHessf(gf(J),J,\boldmathη) +HessHf(gf(J),J,%\boldmath$η$)\boldmathη)

and we find the smallest eigenvalue by solving

 λ=argmin\boldmathη    h(\boldmathη) (13) subject to  \boldmathηH% \boldmathη=I.

The constraint makes the minimization of (3) restricted onto a complex Stiefel manifold [18], which can be easily solved by using the Riemannian trust region method [19, 20]. In order to do this, we require the gradient and Hessian of , which are given as

and

 Hess(h(\boldmathη),\boldmathη,%\boldmath$ζ$)=Hessf(gf(J),J,\boldmathζ), (15)

where .

After obtaining from (13), our strategy is to select such that so that the Hessian of the augmented Lagrangian (7) is positive semi-definite [13]. In order to do this, we need an estimate for in (3). We can find this by initial calibration with a pre-determined value of (say ). Once we obtain , we use (13) to find and afterwards we update . Note that is dependent on , but we ignore the frequency dependence of and use one value of (typically the middle) to estimate it.

So far, we have considered calibration along one direction only. The next question that we must answer is how to select for calibration along directions in the sky. For each direction, in (3) will influence the value of . If the centroid of the source (cluster) [21] is along direction in the sky and if its effective (unpolarized) intensity is , we have

 Cpqf≈exp(ȷϕ(l,m,p,q))αI (16)

where is the phase contribution and is a identity matrix. Hence is a diagonal scalar matrix. If is close to the true solution, the term becomes negligible compared with the other terms in (3). The remaining terms have a product and the phase term in (16) cancel out. Therefore, for different clusters, the value for obtained by (13) is mainly determined by the squared effective intensity of each source. Hence, once we have determined a suitable value for for one direction, the corresponding values for other directions can be determined by scaling by the squared effective intensity.

## 4 Simulation Results

We simulate an array of receivers that calibrate along directions in the sky. The matrices in (1) are generated with their elements having values drawn from a complex uniform distribution in , multiplied by a frequency dependence given by a random -th order polynomial. The intensities of the sources are randomly generated in the range and their positions are randomly chosen in a field of view of about square degrees. The variation of intensities with frequency is given by a power law with randomly generated exponent in . The noise matrices in (1) are simulated to have complex circular Gaussian random variables. The variance of the noise is changed according to the signal to noise ratio ()

 SNR△=∑p,q∥Vpq∥2∑p,q∥Npq∥2. (17)

With this setup, we generate data for frequency channels in the range to MHz. For calibration, we setup a -rd order polynomial model (), using Bernstein basis functions [22] for the matrix in (7). Note that we intentionally use a lower order frequency dependence than what is actually present in the data to create a realistic scenario when the exact model is not known. During calibration, initial values for the parameters are always set as for . Unless stated otherwise, all directions have the same value of . We use ADMM iterations, and after the -st iteration, we solve (13) to estimate , and we get a typical value of for a source with unit amplitude. Regardless, we perform calibration with various values of to compare performance.

We find the normalized (averaged over all directions) mean squared error (NMSE) between true and its estimate as

 NMSE△=1√2KN√∑k∥Jf−ˆJfU∥2 (18)

to measure the accuracy of calibration. In (18), is a unitary matrix that removes the unitary ambiguity in the estimated [23].

In Fig. 1, we show the NMSE for various values of , with increasing number of ADMM iterations. We see that for () we get the best performance, but increasing too much beyond this value () shows no additional improvement. A notable behavior of the NMSE is the enhancement of the error at the edges (especially at low ADMM iterations), which we attribute to Runge’s phenomenon [24] in polynomial interpolation.

In Fig. 2, we show the final NMSE for 50 ADMM iterations, which once again shows that gives the best result.

In Fig. 3, we show NMSE for a simulation with intensities at mid frequency and along the directions. In one calibration, we use regularization for all directions and in the other, we use equal to and respectively. We see that varying in proportion to the squared intensity gives the better NMSE.

## 5 Conclusions

We have investigated refining the performance of distributed calibration based on consensus optimization in this paper. We have used the Hessian of the cost function to appropriately select the penalty parameter such that the augmented Lagrangian becomes convex. Furthermore, in a multi-directional calibration scheme, we have proposed to scale the penalty parameter proportional to the squared intensity along each direction. According to our simulations, such fine-tuning of parameters gives superior performance in terms of accuracy and convergence of the distributed calibration scheme.

## References

• [1] S. Yatawatta, “Distributed radio interferometric calibration,” Monthly Notices of the Royal Astronomical Society, vol. 449, no. 4, pp. 4506–4514, 2015.
• [2] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed optimization and statistical learning via the alternating direction method of multipliers,” Foundations and Trends® in Machine Learning, vol. 3, no. 1, pp. 1–122, 2011.
• [3] D. Bertsekas and J.N. Tsitsiklis, Parallel and Distributed Computation: Numerical Methods, 2nd edition, Singapore: Athena Scientific, 1997.
• [4] T.-H. Chang, M. Hong, and X. Wang, “Multi-agent distributed optimization via inexact consensus ADMM,” Signal Processing, IEEE Transactions on, vol. 63, no. 2, pp. 482–497, Jan. 2015.
• [5] E. Wei and A. Ozdaglar, “Distributed alternating direction method of multipliers,” in Decision and Control (CDC), 2012 IEEE 51st Annual Conference on, Dec 2012, pp. 5445–5450.
• [6] T. Erseghe, “A distributed and scalable processing method based upon ADMM,” Signal Processing Letters, IEEE, vol. 19, no. 9, pp. 563–566, Sept 2012.
• [7] A. Ferrari, D. Mary, R. Flamary, and C. Richard, “Distributed image reconstruction for very large arrays in radio astronomy,” in Sensor Array and Multichannel Signal Processing Workshop (SAM), 2014 IEEE 8th, June 2014, pp. 389–392.
• [8] R. E. Carrillo, J. D. McEwen, and Y. Wiaux, “PURIFY: a new approach to radio-interferometric imaging,” Monthly Notices of the Royal Astronomical Society, vol. 439, pp. 3591–3604, Apr. 2014.
• [9] A. Onose, R. E. Carrillo, A. Repetti, J. D. McEwen, J.-P. Thiran, J.-C. Pesquet, and Y. Wiaux, “Scalable splitting algorithms for big-data interferometric imaging in the SKA era,” ArXiv e-prints, Jan. 2016.
• [10] R. Nishihara, L. Lessard, B. Recht, A. Packard, and M. I. Jordan, “A general analysis of the convergence of ADMM,” in International Conference on Machine Learning 32, 2015.
• [11] A. Teixeira, E. Ghadimi, I. Shames, H. Sandberg, and M. Johansson, “The ADMM algorithm for distributed quadratic problems: Parameter selection and constraint preconditioning,” Signal Processing, IEEE Transactions on, vol. 64, no. 2, pp. 290–305, Jan 2016.
• [12] E. Ghadimi, A. Teixeira, I. Shames, and M. Johansson, “Optimal parameter selection for the alternating direction method of multipliers (ADMM): Quadratic problems,” Automatic Control, IEEE Transactions on, vol. 60, no. 3, pp. 644–658, March 2015.
• [13] M. Hong, Z.-Q. Luo, and M. Razaviyayn, “Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems,” in Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on, April 2015, pp. 3836–3840.
• [14] J. P. Hamaker, J. D. Bregman, and R. J. Sault, “Understanding radio polarimetry, paper I,” Astronomy and Astrophysics Supp., vol. 117, no. 137, pp. 96–109, 1996.
• [15] J.A. Fessler and A.O. Hero, “Space alternating generalized expectation maximization algorithm,” Signal Processing, IEEE Transactions on, vol. 42, no. 10, pp. 2664–2677, Oct. 1994.
• [16] S. Kazemi, S. Yatawatta, S. Zaroubi, P. Labropoluos, A.G. de Bruyn, L. Koopmans, and J. Noordam, “Radio interferometric calibration using the SAGE algorithm,” Monthly Notices of the Royal Astronomical Society, vol. 414, no. 2, pp. 1656–1666, June 2011.
• [17] S. Yatawatta, “Radio interferometric calibration using a Riemannian manifold,” in Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, 2013, pp. 3866–3870.
• [18] P.-A. Absil, R. Mahony, and R. Sepulchre, Optimization Algorithms on Matrix Manifolds, Princeton Univ. Press, Princeton NJ, 2008.
• [19] P.-A. Absil, C. G. Baker, and K. A. Gallivan, “Trust-region methods on Riemannian manifolds,” Found. Comput. Math., vol. 7, no. 3, pp. 303–330, July 2007.
• [20] N. Boumal, B. Mishra, P.-A. Absil, and R. Sepulchre, “Manopt, a Matlab toolbox for optimization on manifolds,” Journal of Machine Learning Research, vol. 15, pp. 1455–1459, 2014.
• [21] S. Kazemi, S. Yatawatta, and S. Zaroubi, “Clustered calibration: an improvement to radio interferometric direction-dependent self-calibration,” Monthly Notices of the Royal Astronomical Society, vol. 430, no. 2, pp. 1457–1472, 2013.
• [22] R. T. Farouki and V. T. Rajan, “Algorithms for polynomials in Bernstein form,” Comput. Aided Geom. Des., vol. 5, no. 1, pp. 1–26, June 1988.
• [23] S. Yatawatta, “On the interpolation of calibration solutions obtained in radio interferometry,” Monthly Notices of the Royal Astronomical Society, vol. 428, no. 1, pp. 828–833, Jan. 2013.
• [24] G. Dahlquist and A. Björck, Numerical Methods in Scientific Computing, Volume I, Society for Industrial and Applied Mathematics, 2008.
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters