A Statistical Framework for the Utilization of Simultaneous Pupil Plane and Focal Plane Telemetry for Exoplanet Imaging, Part I: Accounting for Aberrations in Multiple Planes

A Statistical Framework for the Utilization of Simultaneous Pupil Plane and Focal Plane Telemetry for Exoplanet Imaging, Part I: Accounting for Aberrations in Multiple Planes

Richard A. Frazin
Dept. of Climate and Space Sciences and Engineering, University of Michigan, Ann Arbor, MI 48109

A new generation of telescopes with mirror diameters of 20 m or more, called extremely large telescopes (ELTs) has the potential to provide unprecedented imaging and spectroscopy of exo-planetary systems, if the difficulties in achieving the extremely high dynamic range required to differentiate the planetary signal from the star can be overcome to a sufficient degree. Fully utilizing the potential of ELTs for exoplanet imaging will likely require simultaneous and self-consistent determination of both the planetary image and the unknown aberrations in multiple planes of the optical system, using statistical inference based on the wavefront sensor and science camera data streams. This approach promises to overcome the most important systematic errors inherent in the various schemes based on differential imaging, such as ADI and SDI. This paper is the first in a series on this subject, in which a formalism is established for the exoplanet imaging problem, setting the stage for the statistical inference methods to follow in the future. Every effort has been made to be rigorous and complete, so that validity of approximations to be made later can be assessed. Here, the polarimetric image is expressed in terms of aberrations in the various planes of a polarizing telescope with an adaptive optics system. Further, it is shown that current methods that utilize focal plane sensing to correct the speckle field, e.g., electric field conjugation, rely on the tacit assumption that aberrations on multiple optical surfaces can be represented as aberration on a single optical surface, ultimately limiting their potential effectiveness for ground-based astronomy.

A Statistical Framework for the Utilization of Simultaneous Pupil Plane and Focal Plane Telemetry for Exoplanet Imaging, Part I: Accounting for Aberrations in Multiple Planes

Richard A. Frazin

Dept. of Climate and Space Sciences and Engineering, University of Michigan, Ann Arbor, MI 48109

00footnotetext: E-mail: rfrazin _at_ umich.edu

Keywords: exoplanet, adaptive optics, short-exposure imaging, image processing

1 Introduction and Motivation

The coming decades will see the construction and operation of a new class of ground-based telescopes with mirror diameters of 20 m or more. Such instruments are called Extremely Large Telescopes (ELTs), and one of the top science priorities of the ELTs is direct imaging and spectroscopy of exoplanetary systems. ELTs should allow direct imaging of Earth-like planets [1], but achieving this goal will require unprecedented precision in characterization of the optical system, as it is necessary to separate the starlight from the vastly fainter planetary emission. The most promising method for achieving high-contrast imaging and spectroscopy from the ground combines high-order adaptive optics (AO) with a stellar coronagraph. Typically, the AO system operates at visible wavelengths, while the science camera in the coronagraph captures images in the near-infrared. Observing in the near-IR improves the needed contrast ratios and is required for spectroscopy of key molecules such as water and methane. (For a review of high-contrast direct imaging, the reader may consult [2].) The major limitation in high-contrast imaging is the appearance of so-called ”quasi-static” speckles (QSS) in the image that can be brighter than the planetary emission.[3, 4] These speckles are caused by constructive interference of the starlight, which has been distorted by aberrations in the optical system. QSS change on wide range of timescales due to a variety of mechanical stresses on the telescope caused by, for example, winds and temperature gradients, and they may last for days.

The current paradigm for treating the QSS is background subtraction based on differential imaging methods, the most important of which are angular differential imaging (ADI) and spectral differential imaging (SDI). ADI and SDI have number of biases due to complicated self-subtraction issues.[5, 6] The self-subtraction problems make these methods unsuitable for true imaging, and they create many challenges for characterizing point sources (planets), resulting in a large and ever-growing literature. As the basis of the SDI method is scaling of telescope point spread function (PSF) with wavelength, self-subtraction makes the method unsuited to imaging objects that are extended in the radial direction, and the method is sensitive to assumptions about the spectrum of the object. SDI can work well at large separations between the planet and star, but performs poorly at small separations since the wavelength-stretching effect is proportional to the distance from the center of the image. This is unfortunate because the expected success in planet discovery is heavily dependent on being able to make observations at small star-planet separations.[7, 8] ADI relies on the diurnal field rotation to separate the PSF from the planetary emission, so that any circularly symmetric component in the image will be indistinguishable from the PSF and be subject to self-subtraction. Combining SDI and ADI does not eliminate the self-subtraction problem, and resulting response pattern is rather complex.[6] Futher, SDI and ADI are plagued by a problem that the community has come to appreciate only very recently, which is that the uncertainties have been greatly under-estimated due to the fact that small sample statistics have not been taken into account. Briefly, the background-subtracted image has residual speckles, whose amplitudes decrease with distance from the star. In order to determine an uncertainty level as a function of radius, the amplitude of the speckles as a function of radial distance is estimated from the background-subtracted image. However, speckles have a size of (where  is the wavelength and  is the telescope diameter), so, close to the host star there are very few speckles in a given small annulus. These small sample considerations lead to a statistical penalty that exponentially increases toward small separations.[9]

As the technology advances and higher levels of contrast are achieved, new problems associated with background subtraction via differential imaging are likely to surface. For example, if the star exhibits a small amount of linear polarization, the point spread matrix (PSM), which is the Stokes-vector generalization of the PSF [10], will depend on the diurnal rotation angle, further undermining the performance of ADI. Similar considerations apply to the planetary emission itself (which may be strongly linearly polarized due to Rayleigh scattering) if the PSM is not diagonally dominant. Furthermore, the PSFs of coronagraphic systems are sensitive to the pointing error, so that if the star is moving slightly relative to the pointing center during the diurnal rotation, the PSF will be changing, again, undermining the performance of ADI.

It is with this state of affairs in mind that several authors began to explore the utility of post-processing the pupil-plane telemetry, provided by a wavefront sensor (WFS), in conjunction with simultaneous image plane telemetry from a science camera, to treat the effects of the QSS.[11, 12, 13] The historical context of this approach is reviewed in [11]. As the WFS must run at a frequency of approximately 1 kHz for the AO system to keep track of the atmospheric fluctuations, these methodologies also require kHz exposure cadence in the science camera, a prospect made practical by a new generation of ultra-low noise IR cameras capable of kHz readouts, such as the SWIR single photon detector, SAPHIRA eAPD and the MKIDS.[14, 15, 16, 17]

Figure 1: Modulation of intensities at a single pixel (corresponding to the planet location) of the science camera in a stellar coronagraph simulation. The black solid line shows the time-series of the temporal variation of the planetary intensity. The dotted blue line shows the corresponding stellar speckle intensity at the same pixel. Both the planetary and stellar intensity are normalized to have a mean of unity in this figure. The stellar intensity is enhanced by a sinusoidal static aberration at the spatial frequency corresponding to the planet’s location. From [11], where this effect is demonstrated mathematically.

The combination of millisecond pupil plane and focal plane telemetry may well lead to orders of magnitude in contrast improvement over ADI and SDI, as it leverages a vastly larger and richer data set than standard exposure times, which average over the atmospheric turbulence. It is important to emphasize that the millisecond imaging techniques can be generalized to take advantage of essentially all constraints on problem proposed to date. These constraints include those imposed by diurnal rotation (used by ADI), multi-wavelength observations (used by SDI), as well polarization (used in polarization differential imaging [18]). Thus, the potential losses of information are rather limited. When one takes images that average over the atmospheric turbulence, the stellar speckles look much like planets. But, with an AO-corrected signal, planets and stellar speckles behave much differently at millisecond timescales. This is illustrated in Fig. 1, which shows a stellar coronagraph simulation result from [11]. Fig. 1 is a plot of two time series of the intensity calculated at a single pixel, corresponding to the location of a simulated planet, of the science camera. The dotted blue curve illustrates how the atmospheric turbulence (with AO correction) modulates the stellar speckle at a cadence of 1 millisecond. The solid black curve shows the much weaker modulation of the planetary light in the same pixel. These two time-series, the planetary intensity and the speckle intensity, are quite different in character, with the speckle having an approximately exponential probability density function (PDF), while the PDF of the planetary intensity is localized around its non-zero mean.[19, 11] This can be understood as follows: Much as that of the star, the planet’s wavefront is stabilized by the AO system, as the flat part of the planet’s wavefront is responsible for its intensity at this position in the image plane. However, the stellar speckle intensity at that location is entirely due to the random, non-flat part of the star’s wavefront (the coronagraph removes the flat part) and, hence, it is much more volatile. In an actual measurement, the signal seen would be a weighted sum of these two curves, with the star having much more weight, necessitating the collection of many milliseconds of data and the utilization of statistical inference methods to separate them. Taking advantage of the fleeting moments in which the starlight is reduced at the planet’s location,

Figure 2: Stellar coronagraph simulation of the effect of a vibration on a speckle in the science camera. This speckle is caused by a vibrational NCPA of the form , and the modulation is caused by the interaction of the NCPA with the random AO residual. The black dotted curve corresponds to the intensity of a speckle for Hz and the red solid curve corresponds to Hz. The ms data are clearly sensitive to the frequency of the vibration. Both curves are calculated with the same sequence of residual phase values. From [13].

was first considered by Labeyrie [20].

The richness of the millisecond telemetry is further illustrated by Fig. 2, taken from [13], which shows its sensitivity to high-frequency vibrations. (Vibrations pose a particularly challenging problem to high-contrast astronomy.) Fig. 2 shows the time-dependence of the intensity of a vibrational speckle caused by a pupil-plane aberration of the form , where  is the vibration frequency, is the time,  is the spatial frequency of the aberration and  is the 2D spatial coordinate in the pupil plane. The black and red curves correspond to two different frequencies (10 and 100 Hz), and the fact that they do not coincide shows that the millisecond data are sensitive to the frequency. Obviously, conventional observations that use exposures orders of magnitude longer would see the same speckle, quite independently of frequency. Similarly, [13] showed that when a pupil plane aberration is given by , the millisecond data are sensitive to whether  is real or imaginary as well as the phase angle , while conventional imaging cannot distinguish these effects.

In [11, 13], the author’s simulations use the millisecond telemetry from the focal and image planes to inform a large system of equations that provide a simultaneous statistical inference of both the aberrations in the optical system and the planetary image, avoiding background subtraction altogether. In contrast, in the background estimation method of [12], the millisecond telemetry is used to solve for the electric field due to the aberrations, as will be discussed later in Sec. 44.2. It is important to note that the methods of [11] and [12] are fully compatible with utilizing the same constraints that ADI and SDI, as well speckle nulling and electric field conjugation methods [21, 22], which themselves will be discussed in Sec. 4.

The objective of this series of papers is to provide a rigorous statistical inference framework that is suitable for the high-precision requirements of exoplanet imaging with ELTs that are capable millisecond telemetry in both the pupil and image planes. The developments will account for polarization effects and detailed propagation within the optical system, noise and bandwidth effects in the wavefront sensor, as well as noise in the polarimetric image measured by the science camera.

The purpose of this paper, the first in the series, is to provide a formal description of polarimetric formation of an astronomical image in the presence of atmospheric turbulence, allowing for vector propagation of the wavefront through an adaptive optics telescope system with unknown aberrations at various surfaces in the optical system. The developments here also provide a new perspective on current methods that utilize focal plane sensing to treat the unwanted speckle field. Subsequent papers in this series will build upon this foundation to treat statistical inference of the aberrations and planetary image while treating practical limitations of the hardware.

2 Elementary Concepts

The portion of the astronomical community with an interest in precise optical measurements, including those requiring ultra-high contrast, is increasingly becoming aware of the importance of polarization changes imparted to the light by the various elements of the telescope’s optical system. These (often undesired) changes are generally called ”polarization aberration,” and are most commonly caused by reflection off mirrors at non-normal incidence. The effects of polarization aberration are clearly illustrated by the point-spread matrix in [10], which operates on the Stokes vector of the incoming light. This section introduces a straightforward formalism to treat polarimetric imaging system operating on a wave that is modulated by atmospheric turbulence. In addition, notations and conventions to be used throughout this series of papers are introduced here.

2.1 timescales and Coherence Functions

The developments presented below will involve three timescales. The smallest timescale is the coherence time  of thermal light that has gone through a bandpass filter that removes all of the light except within a band of width  centered on frequency , i.e., .[23] For example, setting the wavelength m and a narrow bandpass of just 1%,  s. This timescale is relevant in the definition of the coherence functions, which require averaging over many periods of  so that these functions obtain their mean values or very nearly so. The middle timescale, here referred to as the ”Greenwood time,” , where  is the Greenwood frequency. The Greenwood time, on the order of s, is the time period in which most of the atmospheric phase distortions change significantly and defines the bandwidth requirement for an effective AO system.[24] The largest timescale, is that for which the telescope optical systems exhibit dynamical aberration (due to time-variable mechanical stresses) that change the detailed structure of the image in the science camera. Precise values of these three timescales are of little importance in the developments below so long as they satisfy , although the possibility of using millisecond observations to detect rapid mechanical vibrations, perhaps even with periods approaching  s is an intriguing one that is worth further investigation.[13]

Any spatially variant time-harmonic field can be represented in terms of an elliptically polarized wave with the electric field vector confined to some plane. However, in a field with arbitrary spatial variation, the normal vector specifying this plane will be a function of position, as will the polarization parameters.[25] In Sec. 33.2 it will be argued that atmospheric phase modulation will cause the normal vector describing the plane of polarization to wobble slightly on time-scale of . Rigorously accounting for this effect is beyond the scope of this paper and is left to future work. Thus, it will be assumed that the electric field component of the light that the telescope is collecting can be represented as a quasi-monochromatic signal admitting a Jones vector representation in a known plane.[25, 26] Consider the following representation of the analytic signal [25, 23] corresponding to the amplitudes of the electric field fluctuations in orthogonal directions of a plane-wave propagating in the  direction :


which defines the Jones vector  to be used throughout this presentation, and where a factor of , with , has been suppressed. Constraints on time-dependent functions in Eq. (1) are described below. While  and are in general complex-valued,  and  can be taken to be real. The random processes  and  are zero-mean so that , where the brackets  indicate a time average in which, for some function :


where . Thus,  is a low-pass version of .

The random processes  and  control the correlation between the and  components, and they vary on timescales on the order of  or longer, but much less than . The functions , ,  and are allowed to fluctuate on timescales on order of , but are quite constant over timescales similar to , so that, to a very high degree of accuracy, , , and . In this way, the high-frequency character of the optical field has been placed into the , and  factors.  and  are required to be in phase, so that


which have the same phase factor . It will be useful to factor  into two parts, a scalar component representing the commonalities between  and  and a polarization state representing their differences:


where  is the intensity, , , , so that , and  indicates complex conjugation. The form of Eq. (5) is convenient for easy interpretation. The scalar component controls the complex amplitude of the electric field vector, and in the polarization state,  and  set the relative amplitudes of the and  components. determines linear vs. circular polarization. The degree of polarization is set by  and .

For an astronomical signal just above the Earth’s atmosphere,  and  are essentially constant in time, so the factorization in Eq. (5) becomes:


thus defining the polarization state vector . The time-dependent effects of the atmosphere and the telescope system will be assumed to be representable as linear operators, which take the form of a Jones matrix, as described below in Sec. 2.2.2. The fluctuations related to atmospheric and telescopic dynamics on the timescales of  and , respectively, do not cause depolarization, but they can cause other changes in the polarization. When an astronomical signal of the form in Eq. (6) is multiplied by a time-dependent Jones matrix that varies on timescales of  or longer, the resulting field can be described by the more general form in Eq. (5).

Describing the intensities measured by polarization sensitive instruments requires the 2nd order statistics of , which are contained in the coherency vector (usually called the coherency matrix when arranged in the form of a matrix),[25, 23] and is given by the time-averaged Kronecker product of and , denoted as [27]:


where the  symbol indicates the imaginary part. The second equality in Eq. (7) makes explicit use timescale separations explained above. In order that the linear vs. circular polarization nature of the light is set only by , it is clear that the condition  must be satisfied, as any non-zero mean would modify the relative phase of the  and components of the field.

Most intensity measurements are more readily interpreted in terms of the Stokes vector, which is easily obtained from :


where the 0th component of is the total intensity, the 1st corresponds to vertical/horizontal polarization, the 2nd to diagonal polarization and the 3rd to circular.[25, 26] Note that  is invertible and has 4 eigenvalues that all have an absolute value of . Thus, the coherency and Stokes vectors are equivalent representations, and it follows that a measurement of the full Stokes vector fully specifies the beam, allowing determination of , , , , and .

When considering wave propagation, it is necessary to allow the field to have spatial dependence in the transverse plane. Let  and  be the transverse [i.e., ] coordinates of two points in some plane, and let and be two times. The  cross-density vector, , is a generalization of the coherency ([28] calls this the ”cross-spectral density”). In the most general case, the  cross-density vector is defined as , so that . For the purposes of image formation it is sufficient to only allow time differences  such that . As the  and  stochastic processes were assumed to be stationary, the rapidly fluctuating part of the cross-density (i.e., at timescales ) can only be a function of the time difference .[23] Further, as the fields are taken to be quasi-monochromatic, . The mutual coherency  is defined as the cross-density evaluated at 0 time difference, so, . As will be seen in Sec. 22.2, the mutual coherency plays an important role in polarimetric image formation.

Consider the field of an astronomical source, as represented in Eq. (6), at the spatial location  in some plane just above the Earth’s atmosphere. Clearly, , , , and  must be independent of  because the Stokes parameters that would be measured in that plane (at least not too far from Earth) are independent of . But, in Sec. 33.1 it will be seen that the mutual coherency  carries an encoding of the image of the astronomical source. Therefore, the image information must be contained the functions ,  and , which must be functions of the spatial coordinate . To see this, consider the definition of the mutual coherency. Similarly to Eq. (7), the mutual coherency is given by:


where the time argument of  has been dropped because it does not depend on time for astronomical signals above the atmosphere. As  is strictly real above the atmosphere, it has no influence on the amplitude of . Thus, the amplitude of  is determined by the function  and its three cousins in Eq. (9).

According to the Van Cittert-Zernike theorem for scalar fields , the astronomical image can be determined by taking the Fourier transform of the mutual coherence function and [23]. Since vector analog of the mutual coherence function is the mutual coherency in Eq. (9), the polarimetric image of the astronomical source is encoded in (which is analogous to the phase of the mutual coherence function), and the functions , and (which are analogous to the magnitude of the mutual coherence function, also known as the ”visibility” in long-baseline stellar interferometry). This will be discussed further in Sec. 33.1. As discussed after Eq. (7),  and must satisfy the condition , but the value of  has no such restriction on its behavior (and similarly for its and cousins). To understand this, consider an extended astronomical source with purely linear  polarization, so (), i.e., the coherency of the light emerging from the source is given by , in which  corresponds to the position in the sky, is a real-valued scalar function, and the vector (where  indicates transpose). In this case, the source has a spatially constant polarization, so the mutual coherency seen at the Earth must exhibit scalar behavior, i.e., , where  is a constant coherency vector and  is some scalar function. The scalar behavior of  requires . Further, one has . However, for realistic sources in which the polarization state is not spatially constant, these conditions do not hold, allowing the ”polarization ellipse,” if it were ever defined for the mutual coherency, to change as a function of .

2.2 Polarimetric Image Formation and Polarization Aberration

One may model the effect of a perfectly flat optical surface, such as an ideal flat mirror, on a plane wave with a  Jones matrix operating on the Jones vector , so that the outgoing plane wave is given by .[26] The coherency vector of the outgoing wave is given by , which makes use of the common identity for Kronecker products of matrices: .[29] As explained in the fine paper by McGuire & Chipman [27], so long as the paraxial approximation applies, this result can be extended to any optical system that is non-scattering and does not depolarize the light.[10] Let the field in entrance pupil plane, denoted with index , of some such optical system be represented by the Jones vector , where  is the 2D coordinate vector in the plane, and let the outgoing field in the exit plane (with index number ) be represented by . For example, plane 0 could correspond to the entrance pupil of a telescope and the plane 1 could correspond to the surface of a detector where an image is formed. The optical system is modeled in terms of a Jones propagation matrix :


where  is 0 when  is not within the spatial limits of the entrance pupil. At first glance, Eq. (10) implies instantaneous action and violates the laws of relativity, however, the time delays in the optical system are accounted for by phase shifts, such as a quadratic phase term describing a lens or focusing mirror.[30] As per the conventions mise en place in the Appendix, the  in Eq. (10) only refers to fluctuations on timescales far longer than  and the action can be considered to be instantaneous at such timescales. The coherency vector of the outgoing light is given by:


where is the mutual coherency in plane . The  matrix of functions  is the propagation kernel for the mutual coherency. Thus, the coherency vector output by the optical system can be expressed in terms of the Jones propagator of the optical system and the mutual coherency vector at the entrance pupil.[10]

Below [in Eq. (14)], it will be seen that  is essentially the spatial Fourier transform of the scene to be imaged. Therefore, if  in Eq. (11) has Fourier transforming properties, then  will be a polarimetric image of the scene. As is emphasized in [10], only an optical system in which  is proportional to the identity matrix will treat all Stokes parameters of the input beam the same way, and any deviation from this ideal is called polarization aberration. Examples of polarization aberration in telescopes are discussed in [10], and [27] analyzes a circular retarding lens (with corn syrup as the key ingredient) for which left and right circularly polarized light have different focal lengths.

3 Propagation Models

Eq. (11) relates the polarimetric image produced by an imaging system to the mutual coherency of the optical radiation incident upon its entrance pupil. The objective of this section to follow the mutual coherency of the light starting from the astronomical object to the Earth, through the atmosphere and, finally, through the telescope while accounting for unknown aberrations and the vector nature of the field. The derivation given in Sec. 33.1 for the polarimetric version of the van Cittert-Zernike theorem is similar to the scalar version in [23]. Henceforth, this paper will use the notation conventions given in the Appendix.

3.1 To Earth

Consider a planetary system, as viewed from the direction of Earth, and assume that the telescope is pointed somewhere close to the center of the system, where the host star is located. The pointing direction of the telescope corresponds to the  direction, so that light from the planetary system collected by the telescope is traveling in more-or-less the  direction. Then, any point in the planetary system is given by the coordinates , where  is the distance from the planetary system to Earth, and  is a two-dimensional vector, with units of radians, corresponding to the direction cosines  and .

At optical frequencies, outer space is effectively homogenous and isotropic, so there are no polarization effects and wave propagation is a scalar phenomenon. Then, the Huygens-Fresnel principle can be used to propagate each component of  independently to a plane (normal direction ) just above the Earth’s atmosphere.[30] Denote this plane with the index  (below, the plane will correspond to the telescope entrance pupil, and the positive indices will correspond to subsequent planes within the optical system). The optical radiation emerging from the planetary system can be considered to be emitted from the plane, which, say, cuts through the star, and the field emerging from the plane is given by . Then, the planetary system gives rise to the vector field  at location  in the  plane, just above Earth’s atmosphere, which is given by [23]:


where  is the field emerging from the planetary system, , and  is the obliquity factor.

As the planetary system consists of many independently radiating sources, the emergent light is effectively spatially incoherent, so that the mutual coherency of the emergent radiation is given by .*** The only truly spatially incoherent source corresponds to , and [23] advocates approximating the incoherent condition with the scale factor used here, corresponding to a field that is coherent over an area roughly the size of . Using the spatially incoherent source condition and the fact that , one finds:


where the quadratic phase factor is . For astronomical sources the  factor can be dropped since the telescope diameter squared is much smaller than (e.g., for a 100 m telescope operating at 1 m and targeting Alpha Centauri A, this ratio is at most ).

Note that in classical radiometry  has units of [energy/(area solid-angle time)] and is called the radiance, and  has units of [solid-angle].[25] In astronomy, the preference is for surface brightness, which also has units [energy/(area solid-angle time)], but is a function of the angular coordinate  instead of the linear coordinate .[31] Define the planetary surface brightness coherency vector as [note that this does not constitute a change of variables, so scaling by the Jacobian is not required], so that Eq. (13) becomes:


Eq. (14) is the vector-field generalization of the well-known van Cittert-Zernike theorem for astronomical sources, and states that the mutual coherency vector of the light arriving at the Earth is proportional to the Fourier transform of the coherency vector of the light emerging from the planetary system, . The objective of this paper is to develop a methodology for determining  in the high-contrast context. Tervo et al. [32] derive the same result in terms of the Stokes parameters and discuss the history of previous similar efforts.

The coherency of the planetary light above the atmosphere is easily calculated from Eq. (14) as , resulting in:


Note that  has no dependence on the spatial coordinate .

In principle, Eq. (14) describes the mutual coherency arising from the entire planetary system, central star included. However, in the high-contrast context the star is vastly brighter than the surrounding material and the propagation of its light must be treated with much more care than the planetary light. Therefore, the starlight and planetary light are to be treated separately in these developments. Assume that the star is unresolved, essentially acting as a point source, and is located at a small angle  from the telescope pointing direction, and let  be its coherency vector, as would be measured above the atmosphere. The star’s mutual coherency can be calculated with the aid of Eq. (14) by setting , resulting in:


Since , one can see that can be factored as per Eq. (6):


where  is star’s intensity, , is the real-valued phase, and  is the polarization state vector. With the variables so-defined, one has:


3.2 Through the Atmosphere

The journey from the plane (above the atmosphere) to the plane, corresponding to the entrance pupil of the telescope, requires the light to traverse the Earth’s atmosphere. While atmospheric polarization aberration has been discussed in the military context of propagation of beams of laser light ([33] has a useful list of references) the literature on the subject in the context of astronomy appears to be rather limited. The most important effect of the atmosphere is that of a random phase screen, and the second most important effect is scintillation, which itself is caused by phase screening at larger heights.[34] Absorption effects by the atmosphere should only affect photometry and are generally well-understood.[31] The atmosphere generally not expected to produce polarization effects, however, several studies have found polarization related to presences of specific particles or molecules or molecules in the atmosphere. For example, [Kemp_SolarPol08] found small, wavelength-dependent linear polarization in high-precision solar observations at large zenith angles, and attributed it to double scattering off of various molecules. Bailey et al.  [35] report a slight linear polarization caused by Saharan dust high in the atmosphere over the Canary Islands and cite previous studies indicating polarization effects on the order of 1 part in . There is also literature on atmospheric polarization effects in the context of laser beam propagation for military purposes. For example, [36] gives results for atmospheric polarization effects on a Gaussian Schell beam. While it seems that the case of a plane wave incident upon the Earth from an astronomical source should be representable by a Gaussian Schell beam in the limit that the beam-width parameters go to infinity, the laser propagation results given by [36] are unphysical in that limit.

Given the current state of knowledge of atmospheric polarization effects, it will be assumed that the atmosphere contributes no polarization aberration, so the star’s Stokes parameters are known and correspond to those that can be measured in a standard observational setting (with long exposures that average over the turbulence), which is likely to be correct to first order. Thus, the effect of the atmospheric turbulence will be considered to be a scalar phenomenon described by a complex-valued phase screen (imaginary values account for scintillation) . Then, as there are no optics between the plane (above the atmosphere) and the plane (the telescope entrance pupil), the Jones vector of the starlight at the entrance pupil is given by:


which makes use of Eq. (17).

Some remarks on the limitations of Eq. (19) are in order. As mentioned in the discussion leading to Eq. (1), the local plane of polarization at position must have a dependence on turbulent modulation. As a thought experiment, consider atmospheric fluctuations in the telescope entrance pupil plane in a small vicinity of the point at some time  resulting in tilt only, so that [where , and  is the tilt angle]. Since the tilt changes the direction of propagation, an unpolarized beam would have an electric field component parallel to the -axis, while the -component would be correspondingly reduced. Consider that typical seeing conditions in a world-class observatory correspond deviations in the angle of propagation on the order of 1 arc second (). A deviation  would correspond to a -component of the electric field amplitude of about  time the -component, but impact on the -component itself would only be about 1 part in . While these numbers are not large, one must keep in mind that the telescope has optics that reduce the beam diameter while amplifying the angle. Consider an ELT that reduces the beam from a primary diameter of 50 m to a diameter of 2 cm (a factor of 2500) by the time it impinges on the DM, making  on the internal pupil plane corresponding to the DM. Thus, the atmospheric fluctuations combined with the high demagnification correspond to a local beam wobble on the order of in the vicinity of the conjugate point of . This deviation now imparts -component of over  of the original -component of the electric field amplitude, which itself would be reduced by nearly 1 part in . Such considerations may not be negligible in precision optical modeling and require further investigation.

Using Eq. (19), the star’s mutual coherency at the entrance pupil is:


where Eq. (21) uses Eq. (16). From Eq. (21), the star’s coherency at the entrance pupil is simply:


To calculate the mutual coherency of the planetary light, one can assume that its angular size is small enough that anisoplanitism effects are negligible, so that the planetary field is modulated by the same factor . Any small deviations from this assumption should be inconsequential, as they would result in a slight blurring of the planetary component of the image, not the stellar component. As there is no convenient factorization of the mutual coherency of the planetary light in the plane above the atmosphere, as in Eq. (17), on can multiply  in Eq. (12) by  and follow the same steps to arrive at a blurred version of Eq. (14) (which is simple because the atmospheric modulation factor does not depend on ), to find:


Eq. (23) shows that the Fourier transform of the polarimetric planetary image, , is multiplied by a complex-valued atmospheric modulation function, . Thus, in an imaging system, which must have Fourier transforming properties, the will be convolved with the Fourier transform of , resulting in an image that is distorted by the atmospheric turbulence.

Figure 3: Schematic diagram of an optical system containing a telescope with a closed-loop AO system and a coronagraph. Modified from [37].

3.3 Through the Optical System

Consider a ground-based telescope equipped with a WFS and coronagraph, as shown schematically in Fig. 3. The telescope, coronagraph and the science camera comprise a system consisting of a number of optical surfaces, with the telescope entrance pupil corresponding to surface number and the science camera detector surface having index . Let , where  and  are the coordinate vectors in planes and , respectively, be a Jones propagator from surface to surface , including any aperture geometry or other structure on surface . The kernel explicitly excludes interaction with the  surface. The vector is a set of experimentally determined parameters that modify to account for various conditions such as alignment or other possibly significant factors.

The field on the plane before interacting with the surface is given by integrating against the propagation kernel matrix:


In order to be as general as possible and allow for unknown (scalar and polarization) aberration at each optical surface, the propagation kernel can be factored into known and unknown parts as follows:


where the ”k” superscript stands for ”known” and the ”u” superscript stands for ”unknown,”  is a  Jones matrix function of unknown (possibly complex-valued) functions representing the unknown aberration caused by surface . If the  function were parameterized, it could be included in the definition of the  vector, as an alternative formulation. Accounting for unknown aberrations in the form of Eq. (25) provides a bit of mathematical convenience as  can be regarded as a (small) deviation from the identity, i.e.:


where  is the deviation from the identity matrix ().  is quite suitable for representing unknown scalar phase aberrations, such as small bumps on a mirror that have negligible effect on the polarization. Such a scalar aberration can be written as , where  is the (possibly complex-valued) phase aberration on the optical surface . Using the Taylor expansion of the exponential (), one has . Then, using Eq. (25), the propagation kernel including an unknown a scalar aberration in plane can be written as:


The choice of the kernel matrix  may be the Jones matrix function corresponding to an optical element in plane combined with some approximation of the Huygens-Fresnel principle to propagate the field to plane . For example, if the  Jones matrix function for plane is given by (which, if desired, can be factored into known and unknown parts as ), the corresponding Huygens-Fresnel kernel can be written as:


where is the Euclidean distance between the point on surface and on surface , and  is the angle between the aperture normal on surface and the line connecting these two points.

One can propagate the field from surface to surface by using the kernel , which is defined as the contraction of the two kernels and :


In operator notation Eq. (29) is written as:


which implies integration over the coordinate that is the second argument in the first propagator and the first argument in the second propagator.

Similarly, the field from the star at the science camera, located in plane  can be determined from the propagation kernel , which is a contraction of the propagation kernels from plane through plane :


where  is a vector of DM commands,  is a column vector of instrument parameters that collects the . Note that in Eq. (31) hides the dependence on on  insides the ellipsis (””), which contains a propagation kernel corresponding to the DM that depends on . In operator notation, Eq. (31) is written as:


in which integrations over the intermediate coordinate vectors are implicit.

Using Eq. (19), the field from the star at the science camera is given by integral:


where the position in science camera plane is given by . The polarimetric image of the star can be determined with the help of Eqs. (11) and (21):


Using Eq. (23) , the polarimetric image of the planetary system (excluding the star) is given by:


Note that detailed propagation kernels in Eq. (35) are likely to be unnecessarily precise, as much more simple approximations to calculate the planetary image should be adequate. This will be discussed in future papers in this series. Adding Eqs. (34) and (35), the polarimetric image on the science camera detector is the sum of the stellar and planetary contributions:


Using the operator notation as in Eq. (50),  also represents the operator that propagates the mutual coherency through the optical system. Using Eq. (32) and the identity (for suitably dimensioned matrices) , this operator can be written as:


in which , and the  (the DM command vector) argument only affects the operators in which the second coordinate corresponds to a DM.

Referring to Fig. 3, let the kernel that propagates the field from the telescope entrance pupil to the WFS entrance pupil be denoted as , where  is the index assigned to the plane corresponding to the WFS entrance pupil. Neglecting the planetary contribution, the field in the WFS entrance pupil is given by:


In Part 2 of this series, the SC field will be expressed in terms of  and non-common path aberrations, which are unique to the separate optical path experienced by the field impinging on the WFS.

4 Methods for Treatment of Aberrations

The purpose of this section is to outline the physical principles behind various aberration correction schemes in a rigorous fashion. The level of rigor will allow the reader to understand the strengths and weaknesses of the various approaches. This section does not discuss issues concerning noise, measurement errors, statistical or computational methodologies. These critical concerns are left to later papers in the series.

4.1 Explicit Determination of Unknown Aberrations

The first step in solving for the unknown aberrations is to calculate how they manifest themselves in the polarimetric image. Using Eqs. (25) and (26), and dropping terms that are 2nd order or higher in , Eq. (32) becomes: