A Elements of tensor algebra

Revisiting the radio interferometer measurement equation. IV. A generalized tensor formalism

Key Words.:
Methods: numerical - Methods: analytical - Methods: data analysis - Techniques: interferometric - Techniques: polarimetric

Abstract

Context:The radio interferometer measurement equation (RIME), especially in its form, has provided a comprehensive matrix-based formalism for describing classical radio interferometry and polarimetry, as shown in the previous three papers of this series. However, recent practical and theoretical developments, such as phased array feeds (PAFs), aperture arrays (AAs) and wide-field polarimetry, are exposing limitations of the formalism.

Aims:This paper aims to develop a more general formalism that can be used to both clearly define the limitations of the matrix RIME, and to describe observational scenarios that lie outside these limitations.

Methods:Some assumptions underlying the matrix RIME are explicated and analysed in detail. To this purpose, an array correlation matrix (ACM) formalism is explored. This proves of limited use; it is shown that matrix algebra is simply not a sufficiently flexible tool for the job. To overcome these limitations, a more general formalism based on tensors and the Einstein notation is proposed and explored both theoretically, and with a view to practical implementations.

Results:The tensor formalism elegantly yields generalized RIMEs describing beamforming, mutual coupling, and wide-field polarimetry in one equation. It is shown that under the explicated assumptions, tensor equations reduce to the RIME. From a practical point of view, some methods for implementing tensor equations in an optimal way are proposed and analysed.

Conclusions:The tensor RIME is a powerful means of describing observational scenarios not amenable to the matrix RIME. Even in cases where the latter remains applicable, the tensor formalism can be a valuable tool for understanding the limits of such applicability.

Introduction

Since its formulation by Hamaker et al. (1996), the radio interferometer measurement equation (RIME) has been adopted by the calibration and imaging algorithm development as the mathematical formalism of choice when describing new methods and techniques for processing radio interferometric data. In its matrix version (also known as the Jones formalism, or JF) developed by Hamaker (2000), it has achieved remarkable simplicity and economy of form.

Recent developments, however, have begun to expose some limitations of the matrix RIME. In particular, phased array feeds (PAFs) and aperture arrays (AAs), while perfectly amenable to a JF on the systems level (in the sense that the response of a pair of PAF or AA compound beams can be described by a Jones matrix), do not seem to fit the same formalism on the element level. In general, since a Jones matrix essentially maps two complex electromagnetic field (EMF) amplitudes onto two feed voltages, it cannot directly describe a system incorporating more than two receptors per station (as in, e.g., the “tripole” design of Bergman et al. 2003). And on the flip side of the coin, Carozzi & Woan (2009) have shown that two complex EMF amplitudes are insufficient – even when dealing with only two receptors – to properly describe wide-field polarimetry, and that a three-dimensional Wolf formalism (WF) is required. Other “awkward” effects that don’t seem to fit into the JF include mutual coupling of receptors.

These circumstances seem to suggest that the JF is a special case of some more general formalism, one that is valid only under certain conditions. The second part of this paper presents one such generalized formalism. However, given the JF’s inherent elegance and simplicity, the degree to which is is understood in the community, and (pragmatically but very importantly) the availability of software implementations, it will in any case continue to be a very useful tool. It is therefore important to establish the precise limits of applicability of the JF, which in turn can only be done in the context of a broader theory.

The first part of this paper therefore re-examines the basic tenets of the RIME, and highlights some underlying assumptions that have not been made explicit previously. It then proposes a generalized formalism based on tensors and Einstein notation. As an illustration, some tensor RIMEs are then formulated, for observational scenarios that are not amenable to the JF. The tensor formalism is shown to reduce to the JF under the previously established assumptions. Finally, the paper discusses some practical aspects of implementing such a formalism in software.

1 Why is the RIME 22?

As a starting point, I will consider the RIME formulations derived in Paper I of this series (Smirnov 2011a). A few crucial equations are reproduced here for reference. Firstly, the RIME of a point source gives the visibility matrix measured by interferometer as the product of matrices: the intrinsic source brightness matrix , and the per-antenna Jones matrices and :

(1)

The Jones matrix describes the total signal propagation path from source to antenna . For any specific observation and instrument, it is commonly represented by a Jones chain of individual propagation effects:

(2)

which leads to the onion form of the RIME:

(3)

The individual terms in the matrix product above correspond to different propagation effects along the signal path. Any practical application of the RIME requires a set of matrices describing specific effects, which are then inserted into Eq. (3). These specific matrices tend to have standard single-letter designations (see e.g. Noordam & Smirnov 2010, Sect. 7.3). In particular, the term2 describes the geometric (and fringe stopped) phase delay to antenna , The rest of the Jones chain can be partitioned into direction-independent effects (DIEs, or -plane effects) on the right, and direction-dependent effects (DDEs, or image-plane effects) on the left, designated as3 and . We can then write a RIME for multiple discrete sources as

(4)

Substituting the exponent for the term then gives us the Fourier transform (FT) kernel in the full-sky RIME:

(5)

where all matrix terms4 under the integration sign are functions of direction .

The first fundamental assumption of the RIME is linearity5 The second assumption is that the signal is measured in a narrow enough frequency band to be essentially monochromatic, and at short enough timescales that is essentially constant; departures from these assumptions cause smearing or decoherence, which has already been reviewed in Paper I (Smirnov 2011a, Sect. 5.2). These assumptions are obvious and well-understood. It is more interesting to consider why the RIME can describe instrumental response by a Jones matrix. Any such matrix corresponds to a linear transform of two complex number into two complex numbers, so why two and not some other number? This actually rests on some further assumptions.

1.1 Dual receptors

In general, an EMF is described by a complex 3-vector . However, an EMF propagating as a transverse plane wave can be fully described by only two complex numbers, , corresponding to the first two components of in a coordinate system where the third axis is along the direction of propagation. At the antenna feed, the EMF is converted into two complex voltages . Given a transverse plane wave, two linearly independent complex measurements are necessary and sufficient to fully sample the polarization state of the signal.

In other words, a RIME works because we build dual-receptor telescopes; we do the latter because two receptors are what’s needed to fully measure the polarization state of a transverse plane wave. PAFs and AAs have more than two receptors, but once these have been electronically combined by a beamformer into a pair of compound beams, any such pair of beams can be considered as a virtual receptor pair for the purposes of the RIME.

Carozzi & Woan (2009) have pointed out that in the wide-field case, the EMF arriving from off-axis sources is no longer parallel to the plane of the receptors, so we can no longer measure the polarization state with the same fidelity as for the on-axis case. In the extreme case of a source lying in the plane of the receptors, the loss of polarization information is irrecoverable. Consequently, proper wide-field polarimetry requires three receptors. With only two receptors, the loss-of-fidelity effect can be described by a Jones matrix of its own (which the authors designate as ), but a fully three-dimensional formalism is required to derive itself.

1.2 The closed system assumption

When going from the basic RIME of Eq. (1) to Eq. (3), we decompose the total Jones matrix into a chain of propagation effects associated with the signal path from source to station . This is the traditional way of applying the RIME pioneered in the original paper (Hamaker et al. 1996), and continued in subsequent literature describing applications of the RIME (Noordam 1996; Rau et al. 2009; Myers et al. 2010; Smirnov 2011a).

Consider an application of Eq. (3) to real life. Depending on the application, individual components of the Jones chains may be derived from a priori physical considerations and models (e.g. models of the ionosphere), and/or solved for in a closed-loop manner, such as during self-calibration. Crucially, Eq. (3) postulates that the signal measured by interferometer is fully described by the source brightness and the set of matrices and , and does not depend on any effect in the signal propagation path to any third antenna . If, however, antenna is somehow electromagnetically coupled to and/or , the measured voltages and will contain a contribution received via the signal path to , and thus will have a non-trivial dependence on, e.g., that cannot be described by the formalism alone.

To be absolutely clear, the basic RIME of Eq. (1) still holds as long as any such coupling is linear. In other words, there is always a single effective that ties the voltage to the source EMF vector . In some applications, e.g. traditional selfcal, where we solve for this in a closed-loop manner, the distinction on whether depends on propagation path only, or whether other effects are mixed in, is entirely irrelevant. However, when constructing more complicated RIMEs (as is being done currently for simulation of new instruments, or for new calibration techniques), an implicit assumption is made that we may decompose into per-station Jones chains, as in Eq. (3). This is tantamount to assuming that each station forms a closed system.

Consider the effect of electrical cross-talk, or mutual coupling in a densely-packed array. If cross-talk or coupling is restricted to the two receptors within a station, such a station forms a closed system. For a closed system, the Jones chain approach is perfectly valid. If, however, cross-talk occurs between receptors associated with different stations, the receptor voltages will not only depend on , but also on , , etc. (See Sect. 2.1 for a more thorough discussion of this point.) With the emergence of AA and PAF designs for new instruments, we can no longer safely assume that two receptors form a closed system; in fact, even traditional interferometers can suffer from troublesome cross-talk in certain situations (Subrahmanyan & Deshpande 2004).

Some formulations of the RIME can incorporate coupling within each pair of stations and via an additional matrix (see e.g. Noordam 1996) used to describe multiplicative interferometer-based effects. By definition, this approach cannot incorporate coupling with a third station ; any such coupling requires additional formulations that are extrinsic to the RIME, such as the ACM formalism of Sect. 2, or the tensor formalism that is the main subject of this paper.

The closed system assumption has not been made explicit in the literature. This is perhaps due to the fact that the RIME is nominally formulated for a single interferometer . Consider, however, that for an interferometer array composed of stations, the “full” RIME is actually a set of equations. By treating the equations independently, we’re implicitly assuming that each equation corresponds to a closed system. The higher-order formalisms derived below will make this issue clear.

1.3 The colocation assumption

A final seldom explicated assumption is that each pair of receptors is colocated. While not required for the general RIME formulation of Eq. (1) per se, colocation becomes important (and is quietly assumed) in specific applications for two reasons. Firstly, it allows us to consider the geometric phase delay of both receptors to be the same, which makes the matrix scalar, and allows us to commute it around the Jones chain. and can then be commuted together to form the FT kernel, which is essential for deriving the full-sky variants of the RIME such as Eq. (5). And secondly, although the basic RIME of Eq. (1) may be formulated for any four arbitrarily-located receptors, when we proceed to decompose into per-station terms, we implicitly assume a single propagation path per each pair of receptors (same atmosphere, etc.), which implies colocation. In practice the second consideration may be negligible, but not so the first.

Classical single-feed dish designs have colocated receptors as a matter of course, but a PAF system such as APERTIF (van Cappellen & Bakker 2010) or ASKAP (Johnston et al. 2008) typically has horizontally and vertically oriented dipoles at slightly different positions. The effective phase centres of the beamformed signals may be different yet again. The matrix then becomes diagonal but not scalar, and can no longer be commuted around the RIME. In principle, we can shoehorn the case of non-colocated receptors into the RIME formulations by picking a reference point (e.g., the mid-point between the two receptors), and decomposing into a product of a scalar phase delay corresponding to the reference point, and a non-scalar differential delay term: The scalar term can then be commuted around the RIME to yield the FT kernel of Eq. (5), while becomes a DIE that can be absorbed into the overall phase calibration (or cause instrumental or polarization if it isn’t). The exact form of and can be derived from geometric considerations (or analysis of instrument optics), but such a derivation is extrinsic to the RIME per se. This situation is similar to that of the term derived by Carozzi & Woan (2009), and is another reason behind the multidimensional formalism proposed later on in this paper.

Note that conventional FT-based imaging algorithms also assume colocated receptors when converting visibilities to Stokes parameters. For example, the conventional formulae for and ,

implicitly assume that the constituent visibilities are measured on the same baseline. Some leeway is acceptable here: since the measured visibilities are additionally convolved by the aperture illumination function (AIF), the formulae above still apply, as long as the degree of non-colocation is negligible compared to the effective station size. Note also that some of the novel approaches of expectation-maximization imaging (Leshem & van der Veen 2000; Levanda & Leshem 2010) formulate the imaging problem in such a way that the colocation requirement can probably be done away with altogether.

2 The array correlation matrix formalism

I will first explore the limitations of the closed-system assumption a little bit further. Consider an AA, PAF, or conventional closely-packed interferometer array where mutual coupling affects more than two receptors at a time. Such an array cannot be partitioned into pairs of receptors, with each pair forming a closed system. The normal RIME of Eq. (3) is then no longer valid. An alternative is to describe the response of such an array in terms of a single array correlation matrix (ACM, also called the signal voltage covariance matrix), as has been done by Wijnholds (2010) for AAs, and Warnick et al. (2011) for PAFs. Since the ACM provides a valuable conceptual link between the RIME and the tensor formalism described later in this paper, I will consider it in some detail in this section.

Let’s assume an arbitrary set of receptors (e.g. dipoles) in arbitrary orientation, and a single point source of radiation. If we represent the voltage response of the full array by the column vector , then we can express it (assuming linearity as usual) as the product of a matrix with the source EMF vector :

If all pairwise combinations of receptors are correlated, we end up with an ACM6 :

(6)

where is the source brightness matrix, and is an Jones-like matrix for the entire array. Note that for , this equation becomes the autocorrelation matrix given by the RIME of Eq. (1) with .

To derive the matrix for a given observation, we need to decompose it into a product of “physical” terms that we can analyse individually. As an example, let’s consider only three effects: primary beam (PB) gain, geometric phase, and cross-talk. The matrix can then be decomposed as follows:

(7)

and the full ME then becomes:

(8)

The matrix corresponds to the PB gain, the diagonal matrix corresponds to the individual phase terms (different for every receptor), and the matrix corresponds to the cross-talk and/or mutual coupling between the receptors. The equation does not include an explicit term for the complex receiver gains: these can be described either by a separate diagonal matrix, or absorbed into .

In the case of a classical array of dishes, we have receptors, with each adjacent pair forming a closed system. In this case, becomes block-diagonal – that is, composed of blocks along the diagonal, equivalent to the “leakage” matrices of the original RIME formulation (Hamaker et al. 1996). becomes block-scalar (, and Eq. (8) dissolves into the familiar set of independent RIMEs of Eq. (3).

Note that the ordering of terms in this equation is not entirely physical – in the actual signal path, the phase delay represented by occurs before the beam response . To be even more precise, phase delay may be a combination of geometric phase that occurs “in space” before , and fringe stopping that occurs “in the correlator” after . Such an ordering of effects becomes very awkward to describe with this matrix formalism, but will be fully addressed by the tensor formalism of Sect. 3.

2.1 Image-plane effects and cross-talk

If we now consider additional image-plane effects7, things get even more awkward. In the simple case, if these effects do not vary over the array (i.e. for a given direction, are the same along each line of sight to each receptor), we can replace the vector in Eq. (7) by , where is a Jones matrix describing the image-plane effect. We can then combine the E matrix and the matrix into a single term , which is a matrix describing the voltage gain and all other image-plane effects, and define an “apparent sky” matrix as . Equation (8) then becomes

If image-plane effects do vary across receptors, then a matrix formalism is no longer sufficient! The expression for each receptor must somehow incorporate its own Jones matrix. We need to describe signal propagation along lines of sight, and each propagation effect needs a matrix. A full description of the image-plane term then needs complex numbers.

Another way to look at this conundrum is as follows. As long as each receptor pair is colocated and forms a closed system (as is the case for traditional interferometers), the voltage response of each receptor depends only on the EMF vector at its location. The correlations between stations and can then be fully described in terms of the EMF vectors at locations and . This allows us to write the RIME in a matrix form, as in Eq. (3) or (8). In the presence of significant cross-talk between more than two receptors, the voltage response of each receptor depends on the EMF vectors at multiple locations. In effect, the cross-talk term in Eq. (8) “scrambles up” image plane effects between different receptor locations; describing this is beyond the capability of ordinary matrix algebra.

In practice, receptors that are sufficiently separated to see any conceivable difference in image-plane effects would be too far apart for any mutual coupling, while today’s all-digital designs have also eliminated most possibilities of cross-talk. Mathematically, this corresponds to where , which means that image-plane effects can, in principle, be shoehorned into the matrix formalism of Eq. (8). This, however, does not make the formalism any less clumsy – we still need to describe different image-plane effects for far-apart receptors, and mutual coupling for close-together ones, and the two effects together are difficult to shoehorn into ordinary matrix multiplication.

2.2 The non-paraxial case

Carozzi & Woan (2009) have shown that the EMF can only be accurately described by a 2-vector in the paraxial or nearly-paraxial case. For wide-field polarimetry, we must describe the EMF by a rank-3 column vector , and the sky brightness distribution by a matrix The intrinsic sky brightness is still given by a matrix ; once an Cartesian system is established, this maps to via a transformation matrix (ibid., Eqs. (20) and (21)):

It is straightforward to incorporate this into the ACM formalism: the term of Eqs. (8) is replaced by , and the dimensions of the matrix become .

3 A tensor formalism for the RIME

The ACM formalism of the previous section turns out to be only marginally useful for the purposes of this paper. It does aid in understanding the effect of mutual coupling and the closed system assumption a little bit better, but it is much too clumsy in describing image-plane effects, principally because the rules of matrix multiplication are too rigid to represent this particular kind of linear transform. What we need is a more flexible scheme for describing arbitrary multi-linear transforms, one that can go beyond vectors and matrices. Fortunately, mathematicians have already developed just such an apparatus in the form of tensor algebra. In this section, I will apply this to derive a generalized multi-dimensional RIME.

3.1 Tensors and the Einstein notation: a primer

Tensors are a large and sprawling subject, and one not particularly familiar to radio astronomers at large. Appendix A provides a brief but formal description of the concepts required for the formulations of this paper. This is intended for the in-depth reader (and to provide rigorous mathematical underpinnings for what follows). For an executive overview, only a few basic concepts are sufficient:

Tensors are a generalization of vectors and matrices. An -type tensor is given by an -dimensional array of numbers, and written using upper and lower indices: e.g. . Superscripts are indices just like subscripts, and not exponentiation8! For example, a vector is typically a (1,0)-type tensor, denoted as . A matrix is a (1,1)-type tensor, denoted as .

Upper and lower tensor indices are quite distinct, in that they determine how the components of a tensor behave under coordinate transforms. Upper indices are called contravariant, since components with an upper index (such as the components of a vector ) transform reciprocally to the coordinate frames. As a simple example, consider a “new” coordinate frame whose basis vectors are scaled up by a factor of with respect to those of the “old” frame. In the “new” frame, the same vector is then described by coordinate components that are scaled by a factor of w.r.t. the “old” components. By contrast, for a linear form (that is, a linear function mapping vectors to scalars), the “new” components are scaled by a factor of . Lower indices are thus said to be covariant.

In physical terms, upper indices tend to refer to vectors, and lower indices to linear functions on vectors. An matrix can be thought of as a “vector” of linear functions on vectors, and thus has one upper and one lower index in tensor notation, and transforms both co- and contravariantly. This is manifest in the familiar (or , depending which way the coordinate transform matrix is defined) formula for matrix coordinate transforms. For higher-ranked tensors, the general rules for coordinate transforms are covered in Sect. A.2.1.

Einstein notation (or Einstein summation) is a convention whereby repeated upper and lower indices in a product of tensors are implicitly summed over. For example,

is a way to write the matrix/vector product in Einstein notation. The index is a summation index since it is repeated, and the index is a free index. Another useful convention is to use Greek letters for the summation indices. For example, a matrix product may be written as .

The tensor conjugate is a generalization of the Hermitian transpose. This is indicated by a bar over the symbol and a swapping of the upper and lower indices. For example, is the conjugate of , and is the conjugate of .

3.2 Recasting the RIME in tensor notation

As an exercise, let’s recast the basic RIME of Eq. (1) using tensor notation. This essentially repeats the derivations of Paper I (Smirnov 2011a) using tensor terminology (compare to Sect. 1 therein).

For starters, we must define the underlying vector space. The classical Jones formalism (JF) corresponds to rank-2 vectors, i.e. the space. We can also use space instead, which results in a version of the Wolf formalism (WF) suggested by Carozzi & Woan (2009). Remarkably, both formulations look exactly the same in tensor notation, the only difference being the implicit range of the tensor indices. I’ll stick to the familiar terminology of the JF here, but the same analysis applies to the WF.

An EMF vector is then just a (1,0)-type tensor . Linear transforms of vectors (i.e. Jones matrices) correspond to (1,1)-type tensors, (note that is not, as yet, a tensor index here, but simply a station “label”, which is emphasized by hiding it within brackets). The voltage response of station is then

where is a summation index. The coherency of two voltage or EMF vectors is defined via the outer product9 , yielding a (1,1)-type tensor, i.e. a matrix:

Combining the two equations above gives us

And now, defining the source brightness tensor as , we arrive at

(9)

which is exactly the RIME of Eq. (1), rewritten using Einstein notation. Not surprisingly, it looks somewhat more bulky than the original – matrix multiplication, after all, is a more compact notation for this particular operation.

Now, since we can commute the terms in an Einstein sum (as long as they take their indices with them, see Sect. A.5.2), we can split off the two terms into a sub-product which we’ll designate as :

(10)

What is this ? It is a (2,2)-type tensor, corresponding to numbers. Mathematically, it is the exact equivalent of the outer product , giving us the form of the RIME, as originally formulated by Hamaker et al. (1996). The components of the tensor given by correspond exactly to the components of a 4-vector produced via multiplication of the matrix by the 4-vector (see Paper I, Smirnov 2011a, Sect. 6.1).

Finally, note that we’ve been “hiding” the and station labels inside square brackets, since they don’t take part in any tensor operations above. Upon further consideration, this distinction proves to be somewhat artificial. Let’s treat and as free tensor indices in their own right10. The set of all Jones matrices for the array can then be represented by a (2,1)-type tensor All the visibilities measured by the array as a whole will then be represented by a (2,2)-type tensor , and we can then rewrite Eq. (9) as

(11)

…which is now a single equation for all the visibilities measured by the array, en masse (as opposed to the visibility of a single baseline given by Eq. (1) or (9)). Such manipulation of tensor indices may seem like a purely formal trick, but will in fact prove very useful when we consider generalized RIMEs below.

Note that the brightness tensor is self-conjugate (or Hermitian), in the sense that . The visibility tensor , on the other hand, is only Hermitian with respect to a permutation of and :

4 Generalizing the RIME

In this section, I will put tensor notation to work to incorporate image-plane effects and mutual coupling and beamforming into a generalized RIME hinted at in Sect. 2. This shows how the formalism may be used to derive a few different forms of the RIME for various instrumental scenarios. Note that the resulting equations are somewhat speculative, and not necessarily applicable to any particular real-life instrument. The point of the exercise is to demonstrate the flexibility of the formalism in deriving RIMEs that go beyond the capability of the Jones formalism.

First, let’s set up some indexing conventions. I’ll use for free indices that run from 1 to (or 3, see below), i.e. for those that refer to EMF components, or voltages on paired receptors, and for summation indices in the same range. I shall refer to such indices as 2-indices (or 3-indices). For free indices that refer to stations or disparate receptors (and run from 1 to ), I’ll use , and for the corresponding summation indices, I shall refer to these as station indices.

Consider again the arbitrary receptors of Sect. 2 observing a single source. The source EMF is given by the tensor All effects between the source and the receptor, up to and not including the voltage gain, can be described by a (2,1)-type tensor, This implies that they are different for each receptor . The PB response of the receptor can be described by a (1,1)-type tensor, . Finally, the geometric phase delay of each receptor is a (1,0)-type tensor,

Let’s take this in small steps. The EMF field arriving at each receptor is given by

(12)

(remembering that we implicitly sum over here). If we consider just one receptor in isolation, we can re-write the equation for one specific value of . This corresponds to the familiar matrix/vector product:

where is the Jones matrix describing the image-plane effect for this particular receptor, and is the geometric phase delay. The receptor translates the EMF vector into a scalar voltage . This is done via its PB response tensor, , which is just a row vector:

Now, if we put the receptor index back in the equations, we arrive at the tensor expression:

(13)

We’re now summing over when applying image-plane effects, and over when applying the PB response.

Finally, cross-talk and/or mutual coupling scrambles the receptor voltages. If is the “ideal” voltage vector without cross-talk, then we need to multiply it by an matrix (i.e. a (1,1)-type tensor) to apply cross-talk:

(14)

The final equation for the voltage response of the array is then:

(15)

We’re now summing over (which ranges over all receptors), and .

The visibility tensor , containing all the pairwise correlations between the receptors, can then be computed as . Applying Eq. (15), this becomes

This uses a different set of summation indices within each pair of brackets, since each sum is computed independently. Doing the conjugation and rearranging the terms around, we arrive at:

(16)

This is the tensor form of a RIME for our hypothetical array. Note that structurally it is quite similar to the ACM form of Sect. 2 (e.g. Eq. (8)), but with one principal difference: the (2-1)-type tensor describes receptor-specific effects, which cannot be expressed via a matrix multiplication. Note also that the other awkwardness encountered in Sect. 2, namely the difficulty of putting geometric phase delay and fringe stopping into their proper places in the equation, is also elegantly addressed by the tensor formalism. Additional phase delays tensors can be inserted at any point of the equation.

4.1 Wolf vs. Jones formalisms

Equation (16) generalizes both the classical Jones formalism (JF), and the three-component Wolf formalism (WF). The JF is constructed on top of a two-dimensional vector space: EMF vectors have two components, the indices range from 1 to 2, and the tensor is the usual brightness matrix. The WF corresponds to a three-dimensional vector space, with the tensor becoming a matrix.

Recall (Sect. 2.2) that the brightness matrix is derived from the brightness matrix via a transformation matrix . This derivation can also be expressed as an Einstein sum:

where is the tensor equivalent of the transformation matrix.

In subsequent formulations, I will make no distinction between the JF and the WF unless necessary, with the implicit understanding that the appropriate indices range from 1 to 2 or 3, depending on which version of the formalism is needed.

4.2 Decomposing the matrix

If we isolate the left-hand sub-product in Eq. (16),

and track down the free indices in this tensor expression – and – we can see that the product is a (1,1) tensor, We can then rewrite the equation in a more compact form:

(17)

Not surprisingly, this is just the ACM RIME of Eq. (6) rewritten in Einstein notation. In hindsight, this shows how we can break down the full-array response matrix into a tensor product of physically meaningful terms. Note how this parallels the situation of the form RIME: even though each visibility matrix, in principle, depends on only two Jones matrices (Eq. (1)), in real-life applications we almost always need to form them up from a chain of several different Jones terms, as in e.g. the onion form (Eq. (3)). What the tensor formulation offers is simply a more capable means of computing the response matrices (more capable than a matrix product, that is) from individual propagation tensors.

4.3 Characterizing propagation tensors

Since it the original formulation of the matrix RIME, a number of standard types of Jones matrices have seen widespread use. The construction of Jones matrices actually follows fairly simple rules (even if their behaviour as a function of time, frequency and direction may be quite complicated). A number of similar rules may be proposed for propagation tensors:

  • A tensor that translates the EMF vector into another vector (e.g., Faraday rotation) must necessarily have an upper and a lower 2-index.

  • A tensor that translates both components of the EMF field equally (i.e. a scalar operation such as phase delay) does not need any -indices at all.

  • A tensor transforming the EMF vector into a scalar (e.g., the voltage response of a receptor) must have a lower 2-index.

  • A tensor for an effect that is different across receptors must have a station index.

  • A tensor for an effect that maps per-receptor quantities onto per-receptor quantities must have two station indices (upper and lower).

Some examples of applying these rules:

  • Faraday rotation translates vectors, so it must have an upper and a lower 2-index. If different across stations and/or receptors, it must also have a station index. This suggests that the tensor looks like (or ).

  • Phase delay operates on the EMF vector as a scalar. It is different across receptors, hence its tensor looks like .

  • PB response translates the EMF vector into a scalar voltage, and must therefore have one lower 2-index. It is usually different across stations and/or receptors, hence its tensor looks like

  • Cross-talk or mutual coupling translates receptor voltages into receptor voltages, so it needs two station indices. Its tensor looks like .

  • If mutual coupling needs to be expressed in terms of the EMF field at each receptor instead, then it may need two 2-indices and two station indices, giving a (2,2)-type tensor, . Alternatively, this may be combined with the PB response tensor , giving the voltage response of each receptor as a function of the EMF vector at all the other receptors. This would be a (2,1)-tensor, .

4.4 Describing mutual coupling

Equations (15) and (16) were derived under the perhaps simplistic assumption that the effect11 of mutual coupling can be fully described via cross-talk between the receptor voltages. That is, the collection of EMF vectors at receptor’s location was described by a (2,0)-type tensor, (Eq. (12)), then converted into nominal receptor voltages by the PB tensor (Eq. (13)), and then converted into actual voltages via a (1,1)-type cross-talk tensor (Eq. (14)).

The underlying assumption here is that each receptor’s actual voltage can be derived from the nominal voltages alone. To see why this may be simplistic, consider a hypothetical array of of identical dipoles in the same orientation, parallel to the axis. Nominally, the dipoles are then only sensitive to the component of the EMF, which, in terms of the PB tensor , means that for all . Consequently, the actual voltages given by this model will only depend on the component of the EMF. If mutual coupling causes any dipole to be sensitive to the component of the EMF seen at another dipole, this results in a contamination of the measured signal that cannot be described by this voltage-only cross-talk model.

A more general approach is to describe the voltage response of each receptor as a function of the EMF at all the receptor locations, rather than the nominal receptor voltages. This requires a (1,2)-type tensor:

This tensor (consisting of complex numbers) then describes the PB response and the mutual coupling together. The simpler cross-talk-only model corresponds to being decomposable into a product of two (1,1)-type tensors ( complex numbers), as This model will perhaps prove to be sufficient in real-life applications, but it is illustrative how simple it is to extend the formalism to the more complex case.

4.5 Describing beamforming

In modern PAF and AA designs, receptors are grouped into stations, and operated in beamformed mode – that is, groups of receptor voltages are added up with complex weights to form one or more compound beams. The output of a station is then a single complex voltage (strictly speaking, a single complex number, since beamforming is usually done after A/D conversion) per each compound beam, rather than individual receptor voltages.

Beamforming may also be described in terms of the tensor RIME. Let’s assume stations, each being an array of receptors. The voltage vector registered at station (where ) can be described by Eq. (15). In addition, the voltages are subject to per-receptor complex gains (which we had quietly ignored up until now), which corresponds to another term, . The output of one beamformer, , is computed by multiplying this by a covector of weights, :

(18)

In a typical application, the beamformer outputs are correlated across stations. In this context, it is useful to derive a compound beam tensor, which would allow us to treat a whole station as a single receptor. To do this, we must assume that image plane effects are the same for all receptors in a station (). Furthermore, we need to decompose the phase term into a common “station phase” , and a per-receptor differential delay , so that . The latter can be derived in a straightforward way from the station (or dish) geometry. We can then collapse some summation indices:

(19)

This expression is quite similar to Eq. (15). Now, if for station the compound beam tensor is given by , then a complete RIME for an interferometer composed of beamformed stations is:

(20)

which is very similar to the RIME of Eq. (16), except that the PB tensor has been replaced by the station beam tensor , and there’s no cross-talk between stations. If each station produces a pair of compound beams (e.g., for the same pointing but sensitive to different polarizations), then this equation reduces to the classical matrix RIME, where the -Jones term is given by a tensor product. In principle, we could also combine Eqs. (18) and (20) into one long equation describing both beamforming and station-to-station correlation.

This shows that a compound beam tensor () can always be derived from the beamformer weights, receptor gains, mutual coupling terms, element PBs, and station geometry, under the assumption that image-plane effects are the same across the aperture of the station. By itself this fact is not particularly new or surprising, but its useful to see how the tensor formalism allows it to be formulated as an integral part of the RIME.

As for the image-plane effects assumption, it is probably safe for PAFs and small AAs, but perhaps not so for large AAs. If the assumption does not hold, we’re left with an extra index in Eq. (19), and may no longer factor out an independent compound beam tensor . This situation cannot be described by the Jones formalism at all, but is easily accommodated by the tensor RIME.

4.6 A classical dual-pol RIME

Equation (16) describes all correlations in an interferometer in a single (1,1)-type tensor (matrix). Contrast this to Eq. (11), which does the same via a (2,2)-type tensor, by grouping pairs of receptors per station. Since the latter is a more familiar form in radio interferometry, it may be helpful to recast Eq. (16) in the same manner. First, we mechanically replace each receptor index () by pairs of indices (, , , ), corresponding to a station and a receptor within the station:

Next, we assume colocation (since the per-station receptors are, presumably, colocated) and simplify some tensors. In particular, and can lose their receptor indices:

This equation cannot as yet be expressed via the Jones formalism, since for any , the sum on the right-hand side contains terms for other stations (). To get to a Jones-equivalent formalism, we need to remember the closed system assumption, i.e. no cross-talk or mutual coupling between stations (Sect. 1.2). This corresponds to for . is then equivalent to a tensor of one rank lower, with one station index eliminated:

(21)

For any , this is now exactly equivalent to a Jones-formalism RIME of the form:

where is a scalar, and the rest are full matrices. The term here incorporates the traditional -Jones (receiver gains) and -Jones (polarization leakage). Finally, if we assume no polarization leakage (i.e. no cross-talk between receptors), then for , and we can lose another index:

(22)

In the Jones formalism, this is equivalent to being a diagonal matrix for any given .

4.7 A full-sky tensor RIME

By analogy with the matrix RIME (see Paper I, Smirnov 2011a, Sect. 3), we can extend the tensor formalism to the full-sky case. This does not lead to any new insights at present, but is given here for the sake of completeness.

When observing a real sky, each receptor is exposed to the superposition of the EMFs arriving from all possible directions. Let’s begin with Eq. (16), and assume the term is a DIE, and the rest are DDEs. For a full-sky RIME, we need to integrate the equation over all directions as

which, projected into coordinates, gives us:

(23)

Let’s isolate a few tensor sub-products and collapse indices. First, we can introduce an “apparent sky” tensor:

Note that this is an matrix. Physically, corresponds to the coherency “seen” by receptors and as a function of direction. Next, we introduce a phase tensor:

which another matrix. Note that we reuse the letter K here, but there shouldn’t be any confusion with with the “other” , since the tensor type is different. Each component of this tensor is given by

Equation (23) then becomes simply:

(24)

where the integral then corresponds to element-by-element Fourier transforms, and all the DDE-related discussions of Papers I (Smirnov 2011a, Sect. 3) and II (Smirnov 2011b, Sect. 2) apply.

The apparent coherency tensor

If we designate the value of the integral in Eq. (23) by the apparent coherency tensor , we have arrive at the simple equation

which ties together the observed correlation matrix, , and the apparent coherency tensor . The physical meaning of each element of is, obviously, the apparent coherency observed by receptor pair and . The cross-talk term “scrambles up” the apparent coherencies among all receptors. Note that this similar to the coherency matrix (or ) used in the classical formulation of the matrix RIME (Hamaker et al. 1996; Smirnov 2011a, Sect. 1.7).

4.8 Coordinate transforms, or whither tensors?

Einstein summation by itself is a powerful notational convenience for expressing linear combinations of multidimensional arrays, one that can be gainfully employed without regard to the finer details of tensors. The formulations of this paper may in fact be read in just such a manner, especially as they do not seem to make explicit use of that many tensor-specific constructs. It is then a fair question whether we need to invoke the deeper concepts of tensor algebra at all.

There is one tensor property, however, that is crucial within the context of the RIME, and that is behaviour under coordinate transforms. In the formulations above, I did not specify a coordinate system. As in the case of the matrix RIME, the equations hold under change of coordinate frame, but their components must be transformed following certain rules. The general rules for tensors are given in Sect. A.4 (Eq. (32)); for the mixed-dimension tensors employed in this paper, coordinate transforms only affect the core (or ) vector space, and do not apply to station indices (the latter are said to be invariant, see Sect. A.6.2).

As long as we know that something is a tensor of a certain type, we have a clear rule for coordinate transformations given by Eq. (32). However, Einstein notation can be employed to form up arbitrary expressions, which are not necessarily proper tensors unless the rigorous rules of tensor algebra are followed (see Appendix A). This argues against a merely mechanical use of Einstein summation, and makes it worthwhile to maintain the mathematical rigour that enables us to clearly follow whether something is a tensor or not.

5 Implementation aspects

Superficially, evaluation of Einstein sums seems straightforward to implement in software, since it is just a series of nested loops. Upon closer examination, it turns out to raise some non-trivial performance and optimization issues, which I’ll look at in this section.

5.1 A general formula for FLOP counts

Consider an Einstein sum that is a product of tensors (over a -dimensional vector space), with free and summation indices. I’ll call this an (-calibre product. Let’s count the number of floating-point operations (ops for short) required to compute the result. The resulting tensor has components. Each component is a sum of individual products (thus additions); each product incurs multiplications. The total op count is thus