Radio interferometric gain calibration as a complex optimization problem
Recent developments in optimization theory have extended some traditional algorithms for least-squares optimization of real-valued functions (Gauss-Newton, Levenberg-Marquardt, etc.) into the domain of complex functions of a complex variable. This employs a formalism called the Wirtinger derivative, and derives a full-complex Jacobian counterpart to the conventional real Jacobian. We apply these developments to the problem of radio interferometric gain calibration, and show how the general complex Jacobian formalism, when combined with conventional optimization approaches, yields a whole new family of calibration algorithms, including those for the polarized and direction-dependent gain regime. We further extend the Wirtinger calculus to an operator-based matrix calculus for describing the polarized calibration regime. Using approximate matrix inversion results in computationally efficient implementations; we show that some recently proposed calibration algorithms such as StefCal and peeling can be understood as special cases of this, and place them in the context of the general formalism. Finally, we present an implementation and some applied results of CohJones, another specialized direction-dependent calibration algorithm derived from the formalism.
keywords:Instrumentation: interferometers, Methods: analytical, Methods: numerical, Techniques: interferometric
In radio interferometry, gain calibration consists of solving for the unknown complex antenna gains, using a known (prior, or iteratively constructed) model of the sky. Traditional (second generation, or 2GC) calibration employs an instrumental model with a single direction-independent (DI) gain term (which can be a scalar complex gain, or complex-valued Jones matrix) per antenna, per some time/frequency interval. Third-generation (3GC) calibration also addresses direction-dependent (DD) effects, which can be represented by independently solvable DD gain terms, or by some parameterized instrumental model (e.g. primary beams, pointing offsets, ionospheric screens). Different approaches to this have been proposed and implemented, mostly in the framework of the radio interferometry measurement equation (RIME, see ME1); RRIME1; RRIME2; RRIME3 provides a recent overview. In this work we will restrict ourselves specifically to calibration of the DI and DD gains terms (the latter in the sense of being solved independently per direction).
Gain calibration is a non-linear least squares (NLLS) problem, since the noise on observed visibilities is almost always Gaussian (though other treatments have been proposed by Kazemi2013a). Traditional approaches to NLLS problems involve various gradient-based techniques (for an overview, see Madsen-NLLS), such as Gauss-Newton (GN) and Levenberg-Marquardt (LM). These have been restricted to functions of real variables, since complex differentiation can be defined in only a very restricted sense (in particular, does not exist in the usual definition). Gains in radio interferometry are complex variables: the traditional way out of this conundrum has been to recast the complex NLLS problem as a real problem by treating the real and imaginary parts of the gains as independent real variables.
Recent developments in optimization theory (CR-Calculus; ComplexOpt) have shown that using a formalism called the Wirtinger complex derivative (WirtingerDeriv) allows for a mathematically robust definition of a complex gradient operator. This leads to the construction of a complex Jacobian , which in turn allows for traditional NLLS algorithms to be directly applied to the complex variable case. We summarize these developments and introduce basic notation in Sect. LABEL:sec:Wirtinger. In Sect. LABEL:sec:unpol, we follow on from Tasse-cohjones to apply this theory to the RIME, and derive complex Jacobians for (unpolarized) DI and DD gain calibration.
In principle, the use of Wirtinger calculus and complex Jacobians ultimately results in the same system of LS equations as the real/imaginary approach. It does offer two important advantages: (i) equations with complex variables are more compact, and are more natural to derive and analyze than their real/imaginary counterparts, and (ii) the structure of the complex Jacobian can yield new and valuable insights into the problem. This is graphically illustrated in Fig. LABEL:fig:JHJ (in fact, this figure may be considered the central insight of this paper). Methods such as GN and LM hinge around a large matrix – – with dimensions corresponding to the number of free parameters; construction and/or inversion of this matrix is often the dominant algorithmic cost. If can be treated as (perhaps approximately) sparse, these costs can be reduced, often drastically. Figure LABEL:fig:JHJ shows the structure of an example matrix for a DD gain calibration problem. The left column row shows versions of constructed via the real/imaginary approach, for four different orderings of the solvable parameters. None of the orderings yield a matrix that is particularly sparse or easily invertible. The right column shows a complex for the same orderings. Panel (f) reveals sparsity that is not apparent in the real/imaginary approach. This sparsity forms the basis of a new fast DD calibration algorithm discussed later in the paper.
In Sect. LABEL:sec:separability, we show that different algorithms may be derived by combining different sparse approximations to with conventional GN and LM methods. In particular, we show that StefCal, a fast DI calibration algorithm recently proposed by Stefcal, can be straightforwardly derived from a diagonal approximation to a complex . We show that the complex Jacobian approach naturally extends to the DD case, and that other sparse approximations yield a whole family of DD calibration algorithms with different scaling properties. One such algorithm, CohJones (Tasse-cohjones), has been implemented and successfully applied to simulated LOFAR data: this is discussed in Sect. LABEL:sec:implementations.
In Sect. LABEL:sec:pol we extend this approach to the fully polarized case, by developing a Wirtinger-like operator calculus in which the polarization problem can be formulated succinctly. This naturally yields fully polarized counterparts to the calibration algorithms defined previously. In Sect. LABEL:sec:variations, we discuss other algorithmic variations, and make connections to older DD calibration techniques such as peeling (JEN:peeling).
While the scope of this work is restricted to LS solutions to the DI and DD gain calibration problem, the potential applicability of complex optimization to radio interferometry is perhaps broader. We will return to this in the conclusions.