ALICE Data Release: A revaluation of HST-NICMOS coronagraphic images

# ALICE Data Release: A revaluation of HST-NICMOS coronagraphic images

J. Brendan Hagan    Élodie Choquet    Rémi Soummer    Arthur Vigan
###### Abstract

The Hubble Space Telescope (HST) NICMOS instrument has been used from 1997 to 2008 to perform coronagraphic observations of about 400 targets. Most of them were part of surveys looking for substellar companions or resolved circumstellar disks to young nearby stars, making the NICMOS coronagraphic archive a valuable database for exoplanets and disks studies. As part of the Archival Legacy Investigations of Circumstellar Environments (ALICE) program, we have consistently re-processed a large fraction of the NICMOS coronagrahic archive using advanced PSF subtraction methods. We present here the high-level science products of these re-analyzed data, which we delivered back to the community through the Mikulski Archive for Space Telescopes (MAST) archive: 10.17909/T9W89V. We also present the second version of the HCI-FITS format (for High-Contrast Imaging FITS format), which we developed as a standard format for data exchange of imaging reduced science products. These re-analyzed products are openly available for population statistics studies, characterization of specific targets, or detected point source identification.

methods: data analysis — techniques: image processing — catalogs
\move@AU\move@AF\@affiliation

Space Telescope Science Institute, 3700 San Martin Drive, Baltimore, MD 21218, USA

\@AF@join

Hubble Fellow \move@AU\move@AF\@affiliationDepartment of Astronomy, California Institute of Technology, 1200 E. California Blvd, Pasadena, CA 91125, USA \move@AU\move@AF\@affiliationJet Propulsion Laboratory, California Institute of Technology, 4800 Oak Grove Drive, Pasadena, CA 91109, USA

\move@AU\move@AF\@affiliation

Space Telescope Science Institute, 3700 San Martin Drive, Baltimore, MD 21218, USA

\move@AU\move@AF\@affiliation

Laboratoire d’Astrophysique de Marseille, 38, rue FrÃ©dÃ©ric Joliot-Curie, 13388 Marseille cedex 13

## 1 Introduction

Direct imaging of exoplanetary systems is one of the greatest challenges in modern astronomy due to the very high contrast between the host star and surrounding circumstellar objects at close angular separation. To obtain direct astrophysical signal of these circumstellar objects, dedicated coronagraphic instruments are required to attenuate the starlight in the images to enable fainter object detection. Yet because of imperfect starlight suppression and wavefront variations, the coronagraphic images are still dominated by the star light at short separations. To detect the faintest objects, this residual starlight, i.e. the coronagraphic point spread function (PSF)555From hereafter the authors use ââPSFââ when referring to residual starlight, must be subtracted out using post-processing data reduction techniques.

Classical differential imaging methods consist in subtracting a typical PSF image from the science image, using a single image (Lowrance2005) or a median image (Marois2006) either from a reference star or from the science target itself using angular diversity to limit self-subtraction of the astrophysical source. These classical differential imaging methods usually have limited performance due to significant wavefront variations between acquisitions, causing temporal variations in the PSF. For HST’s instruments, instrumental deformations on timescales from a few seconds to several days (Lallo2006; Makidon2006) introduce additional temporal variations in the PSF that typically limit point source detections to contrast lower than within from the star with differential imaging (e.g. Lowrance2005). Yet, these methods have been very successful at detecting brown dwarf companions and circumstellar disks \citep[e.g][]Schneider1999, Kalas2006,Schneider2014.

Over the last decade, advanced post-processing techniques have emerged that take advantage of a library of instrument PSFs. The temporal variations of the PSF introduced in the instrument PSF library can be combined to create an optimal synthetic PSF that is then subtracted from the target PSF. The synthetic PSF can be created using a linear combination of instrument PSF images (LOCI and its variants \citepLafreniere2007) or using Principal Component Analysis (PCA) on the PSF library \citepSoummer2012, Amara2012. For first-generation instruments on the Hubble Space Telescope (HST), these techniques are more efficient when using PSFs from many different stars (Multiple Reference Star Differential Imaging, MRDI), even when observations are separated by large time intervals. This has been demonstrated by the re-discovery of HR 8799 planets (b,c, and d) in archival NICMOS data from 1998 \citepLafreniere2009,Soummer2011. Indeed, depending upon the size and quality of the PSF image library, these techniques can offer a significant improvement in comparison to classical PSF subtraction.

With the advent of these advanced post-processing techniques, it is worth considering reanalyzing some of the data obtained with first-generation coronagraphic instruments. We started the ALICE project (Archival Legacy Investigations of Circumstellar Environments) with the goal to consistently reprocess the NICMOS coronagraphic archive with advanced post-processing methods \citepChoquet2014d. NICMOS operated on-board Hubble for about 8 years between 1997 and 2008. Its mid-resolution channel NIC2 (pixel size 0.076â) was equipped with a 0.3â-radius coronagraphic mask and a Lyot stop. During its time in operation approximately 400 stars were observed, most as part of surveys looking for debris disks and planets around nearby stars. The ALICE pipeline assembles and aligns large PSF libraries from consistent subsets of this database that are used to process each individual targets with the KLIP algorithm \citepSoummer2012. This project has revealed new images of 12 debris disks previously undetected from the NICMOS data, among which 11 had never before been imaged in scattered light \citep[][Choquet et al. 2018, in press]2014ApJ…786L..23S,2016ApJ…817L…2C,Choquet2017. In addition, we found a total of 452 point sources uncovered in the data \citep2015SPIE.9605E..1PC.

We are now delivering the re-processed products of the ALICE program back to the community, openly accessible as High-Level Science Products through the MAST archive, 10.17909/T9W89V. We also further improved the HCI-FITS standard format for high-contrast imaging science products to make it compatible with any high-contrast imaging instrument, and we present here version two of this format.

In Sec. 2 we present the content of the ALICE archive and the reprocessed datasets. In Sec. 3 we present the HCI-FITS format used for the delivered science products.

## 2 Released datasets

### 2.1 ALICE inputs

The input data for the ALICE program come from the Legacy Archive PSF Library and Circumstellar Environments (LAPLACE) program \citep[HST program AR-11279, PI: G. Schneider;][]Schneider2010. This program delivered a homogeneous re-calibration of a large fraction of the raw NICMOS coronagraphic archive, optimized for imaging at separations close to the coronagraphic-mask inner working angle (radius of 03) using PSF subtraction techniques. This re-calibration was performed using contemporary flat-field frames optimally matched to the location of the coronagraphic mask (as opposed to epochal flats), and using an improved bad-pixel correction. During the second era of NICMOS operations (after replacement of its cooling system), dark calibration observations were obtained with less frequency than during the first era, and the LAPLACE program delivered two versions of re-calibrated images: one using observation-optimized dark frames when available, and one using synthetic model dark frames.

For consistency, the ALICE program re-analyzed all the non-polarimetric data from the LAPLACE program that were calibrated with contemporary flats for NICMOS Era 1, and with contemporary flats and observed dark frames for NICMOS Era 2, which represent 72% of the non-polarimetric NICMOS archival datasets. Furthermore, we also re-analyzed a few selected datasets that were not re-calibrated by LAPLACE program (from HST programs 7248, 10897, and 11155).

### 2.2 ALICE outputs

The outputs of the ALICE program – and delivered data described through this document – all come from the input data detailed above. Data that was not reprocessed may have: (1) not been part of the LAPLACE inputs described above, (2) suffered from bad acquisition images not centered on the coronagraphic mask, (3) failed sub-pixel alignment within a PSF library. The complete list of NICMOS coronagraphic programs (non-polarimetric data) is detailed in Table A. The table also reports the number of targets per HST program that have been partly or entirely re-processed as part of the ALICE program. The complete list of re-processed images for each HST program, target, and filter set can be accessed on the MAST archive. Details on how these data were re-analysed (field-of-view, alignment procedure, PSF subtraction and analysis methods) are described in \citetChoquet2014d.

NICMOS data from 401 targets were used as input for the ALICE program, some observed in multiple HST programs (451 different target observations) and/or with multiple filters. In total there were 590 different datasets, i.e. 590 unique observations with respect to program, target, and filter. The number of input images that were re-analyzed by the ALICE program amounts to a total of 4879 images (83% of them were acquired with the F110W or F160W filters). Reprocessed data are delivered for 494 datasets out of the 590 input ones. Of the 96 datasets not delivered, 56 of them were entirely composed of bad acquisition images and 40 others could not be properly aligned within a PSF library (often because the target was not a bright star or the target was a binary or contained a bright companion causing the alignment to fail). The majority of these datasets were acquired with the F110W or F160W filters (408 datasets). This amounts to a total of 3955 reprocessed images (86% in F110W or F160W), delivered back to the community.

### 2.3 Note on the reprocessing efficiency

The PSF libraries assembled by the ALICE pipeline are much larger in the F110W and F160W filters than in the other filters, as the F110W and F160W were commonly used for surveys and specific object characterizations. The medium-band and narrow-band filters were exclusively used for characterization of known circumstellar objects. Furthermore, as ALICE’s reprocessing is based on MRDI and excludes images with detected circumstellar material from the reference PSF libraries \citep[except in “planet mode”, which includes images of the science target acquired in a different orientation of the spacecraft, when available,][]Choquet2014d, the medium- and narrow-band filter data were reprocessed with PSF libraries of very small sizes (see Table 2.3). ALICE’s reprocessing for these datasets is not expected to improve much upon classical PSF subtraction techniques. The real added value from the ALICE program mostly concerns NICMOS data acquired with the F110W and F160W filters.

## 3 Released products

For each dataset re-processed by the ALICE program, we provide three kind of outputs: 1) a high-level science FITS file, gathering all the high-contrast imaging metrics in the standard HCI-FITS format for the combined data at the target level. This is the main output product of the ALICE program and should be used for detection purposes. 2) A folder with FITS files for each reprocessed image of the dataset, gathering the material needed to perform forward modeling of an astrophysical signal. These should be used for characterization of signals detected in the main high-level science product. 3) A folder with preview PDF files of the main outputs of the dataset.

\twocolumngrid

### 3.1 Definition of dataset

We refer to a dataset as the set of images acquired with the same filter element, as part of the same HST program, and under the same target identifier (HST TARGNAME keyword). Datasets are thus composed of an homogeneous group of images as designed by the PI of the program. A dataset may combine images acquired with different spacecraft orientations, with different exposure times, and different epochs (within a year). We assume that any potential astrophysical source remains unchanged in all the images of a dataset when combining them in the main ALICE science FITS file.

A few targets have several datasets with the same filter element, in different HST programs (despite HST policies not to re-observe a target in the same mode), or within the same program (target re-observed after a failed acquisition).

### 3.2 The HCI-FITS file

The main science output of the ALICE program is a multi-extension FITS file \citepPence2010 that gathers all the high-level information about the re-processed dataset.

In an effort to facilitate high-level data exchange and to make exoplanet population statistical analyses easy, we developed a specific format for these data, and we propose that it becomes a standard for reduced products of high-contrast imaging data \citep2014SPIE.9147E..51C. We describe below the main choices we made for the Version 2 of this format and provide a detailed description of its content. We hereafter call this format the HCI-FITS file, for High-Contrast Imaging FITS file. It is inspired from the OI-FITS format, which is the standard for calibrated data exchange from optical interferometers \citepPauls2004,Pauls2005. In addition to the products delivered by ALICE, the HCI-FITS format presented here has also been implemented in the Direct Imaging Virtual Archive (DIVA), another large scale database dedicated to high-contrast imaging data \citep2017arXiv170305322V.10

#### A single FITS file

To enable high-level science analyses of high-contrast imaging data, several products are mandatory (e.g. the reduced image, detection limits, source detections). In order to prevent information loss when exchanging data, all the information must be gathered in a single file. The FITS file format offers the structure needed to achieve that, through the use of extensions. Extensions may contain different types of data, including images (IMAGE extension type) which is appropriate for the reduced images, sensitivity and SNR maps, and multi-dimension tables (BINTABLE extension type) which is appropriate for radial detection limits, characteristics of potentially detected sources, or general characteristics of the reduced products. Moreover, having a single file as reduced product makes databases more convenient to both implement and use.

#### A flexible standard

We developed HCI-FITS format to be compatible with any type of dataset, regardless of the instrument, or observing mode, or processing method used to obtain the final reduced products. It can be used for both ground-based of space observation, for coronagraphic and saturated imaging, for broad-band imaging, integral-field spectroscopy, and polarimetric imaging. Depending on the observer/analyst’s choice to present the reduced data, a HCI-FITS file may contain products for either a single image or an image cube. For example, this format supports all options between combining all reduced images from an Integral Field Spectrograph (IFS) in one broad-band image and keeping each reduced image separated in a spectral cube, while tracking specific image characteristics in all cases.

#### Structure of the HCI-FITS format

We identified five main products that are necessary for a high-level use of reduced HCI data: The reduced images, the SNR maps, the sensitivity maps (or “noise” maps), the radial detection limits (or “contrast curves”), and the characteristics of any detected point sources. The HCI-FITS format is thus composed of 6 extensions, one for each of these products, plus one which tracks the main characteristics of each image provided in the file. The extensions may appear in any order in the FITS file, but must have mandatory EXTNAME values to enable compatibility between files. The structure of the HCI-FITS format is provided in Table 3. The SOURCE_DETECTION extension is optional but must respect the specified format if present. This structure is not exclusive and may include additional data in other extensions (e.g. intermediate products such as the instrument PSF image). Reading software or codes should not presume the presence of such additional extensions.

The mandatory products enable analyses such as detection limit comparisons and astrophysical signal comparisons, but does not enable precise characterization of unreported signal. Such characterisation requires a forward modeling process \citepLagrange2010,Milli2012,Soummer2012,Pueyo2016 for which intermediate products are needed (instrument PSF image, raw data, eigen-images of the PSF library). Such detailed characterization is out of the scope of the HCI-FITS format use.

The DATA_INFORMATION extension is critical to identify the characteristics of each reduced image in the file. It is the extension that makes this format compatible with any collection of high-contrast images. It is a BINTABLE extension that must be composed of 12 fields that track the field orientation, polarization state, epoch, and spectral information of the images. It must have as many rows as images provided in the file, and if several images are provided, the order must be the same in the DATA_INFORMATION table as in the image cubes. We present in Table 3.2 the structure of this table.

\twocolumngrid

The DETECTION_LIMIT extension is also BINTABLE and reports the radial point source detection limits for each image present in the file. It must be composed of two mandatory fields reporting the separation from the star and the corresponding detection limit. The header of this extension must indicate the confidence level of the detection limit using the mandatory NSIGMA keyword. As the high-contrast imaging community is currently in the process of improving the definition of detection limits \citepMawet2014,JensenClem, we note that the header of this extension may be further developed. The structure of this extension is presented in Table 3.2.

The third optional BINTABLE extension reports the characteristics of the point sources detected in the data. For each source it must indicate its astrometry, photometry, and SNR in each image provided in the file.

#### Specifics of the ALICE HCI-FITS files

For the specific case of ALICE, the HCI-FITS files always contain the reduced data for each combined-roll and for the combination of all images, so the ALICE products present cubes of , where is the number of spacecraft orientations used to observe the target.

As the ALICE data were reprocessed using the PCA-based KLIP algorithm using large PSF libraries, we provide in the REDUCED_DATA header some specific keywords describing the reduction parameters we used for the dataset (See Table 3.2).

The images in the SENSITIVITY_MAP extension are computed from the temporal variance of the residual speckle field through the PSF library. To do so, we reprocessed the reference images from the PSF library with the same parameters as the science images, and rotated-combined groups of them with the same numbers, weights, and angles as for the science combined images. We then compute covariance matrix of these combined reference images, convolve it with a aperture, and compute its square-root to estimate the temporal speckle noise map per resolution element. The images in the SNR_MAP extension are computed by convolving the reduced images from the REDUCED_DATA extension with the same aperture, and dividing it with the images from the SENSITIVITY_MAP. The tables provided in the DETECTION_LIMIT extension are the radial averages of the images in the SENSITIVITY_MAP extension, computed in 2-pixel wide annuli. They are normalized by the stellar flux converted to count/s (keyword STARFLUX in the primary header) to give a measure of the point source detection limit in terms of contrast to the star. We provide in the header of these three extensions keywords describing the parameters used to compute these metrics (see Table 3.2). It is important to note that the sensitivity map and the detection limit do not inlcude the processing throughput.

The characteristics of the detected sources in extension SOURCE_DETECTION are computed from a matched-filter process with a synthetic, unocculted NICMOS PSF, computed for the corresponding filter element with the TinyTIM software package \citepKrist2011. The source astrometry is determined with the position that maximizes the cross-correlation between the reduced images and the normalized synthetic PSF. The photometry of the source is retrieved with the maximum value of the cross-correlation, subtracted from the local background level, corrected from post-processing over-subtraction with analytical forward modeling \citepSoummer2012, and corrected from correlation losses between the synthetic and the real PSF by using photometric calibration data acquired on calibration white dwarfs. The contrast is computed by normalizing the photometry by the stellar flux converted in count/s (keyword STARFLUX in the primary header). Table 3.2 provides a description of the SOURCE_DETECTION extension.

The primary header of the ALICE HCI-FITS files is described in Table B of Appendix B. It gathers a selection of keywords useful at a science level, and come from the raw HST FITS header, LAPLACE program added keywords, and from our work.

### 3.3 Data image products

In addition to the main HCI-FITS file which provides science metrics for the combined products of a dataset, we also provide a “Products” folder gathering intermediate products for each image in the dataset. These files are complementary to the combined HCI-FITS products. While the purpose of main HCI-FITS file is to provide high-level science metrics to quantify the detection limits and detected sources in the dataset, the data image products are useful for diagnostic and astrophysical signal forward modeling.

For each exposure, we provide a multi-extension FITS file gathering the products described in Table 3.3. Unlike the main HCI-FITS file, these products are specific to ALICE and their format is not compatible with all type of high-contrast science products.

The REDUCED_IMAGE extension provides the reduced image computed by the ALICE pipeline using the KLIP algorithm. The image is not derotated and is presented with the same field orientation as the raw NICMOS image. The star is centered on pixel (41, 41) with pixel (1,1) at the bottom-left of the image. The header of the extension provides detailed information on the reduction parameters used to compute the image (see Table C).

The RAW_IMAGE extension provides the raw NICMOS image calibrated by the LAPLACE program. The LAPLACE image has been cropped to a field of view smaller than the NICMOS full frame ( pixels or arcsec), and the star is centered on pixel (41, 41). The header of the extension lists the keywords provided in the raw HST file, as well as the position of the star center in the full NICMOS field of view (See Table C).

The REDUCTION_ZONE extension provides a binary image of the reduction zone (pixels with 0 value were excluded from the reduction). In most case the reduction zone correspond to the full image except for a central mask of a few pixels radius. The parameters used to define the reduction zone are provided in the extension header (see Table C).

The EIGEN_IMAGES extension provides the cube of the first principal components of the PSF library used for the PSF subtraction, truncated at the number of components actually used to reduce the data. This cube can be used to analytically compute the impact of the PSF subtraction process on an astrophysical source using forward modeling.

The REF_FILE_NAMES extension provides a table with the file-name of the NICMOS images composing the PSF library used to reduce the image. The table also provides the position of the star center in these reference images in the full NICMOS field of view, around which they have been aligned and cropped to pixels.

\twocolumngrid

### 3.4 Preview Folder

Finally, we also deliver for each dataset a “Preview” folder that contains PDF and CSV files of the content of each extension of the main HCI-FITS file. The PSF files show images of each frame of the REDUCED_DATA extension (one version with the point source detections circled and one version without), of the first frame (North-combined frame) of the SNR_MAP extension and of the DETECTION_LIMIT extension. The CSV files show all the data contained in each BINTABLE extension (DATA_INFORMATION, DETECTION_LIMIT, SOURCE_DETECTION).

## 4 Conclusion

We have presented the re-processed NICMOS data that we re-analyzed as part of the ALICE program. We deliver these science products to the community so that they may aid population studies through detection limits and substellar candidate identifications. We also presented the version 2 of the HCI-FITS format, a standard format for high-contrast imaging science products that can be used with any type of high-contrast imaging dataset. We hope this effort will help gather consistent datasets throughout the community.

This project was made possible by the Mikulski Archive for Space Telescopes (MAST) at STScI which is operated by AURA, Inc. for NASA under contract NAS5-26555. Support was provided by NASA through grants HST-AR-12652.01 (PI: R. Soummer) and HST-GO-11136.09-A (PI: D. Golimowski), HST-GO-13855 (PI: E. Choquet), HST-GO-13331 (PI: L. Pueyo), and by STScI Directorâs Discretionary Research funds. E.C. acknowledges support from NASA through Hubble Fellowship grant HF2-51355 awarded by STScI. Part of this research was carried out at the Jet Propulsion Laboratory, California Institute of Technology. This research has made use of the SIMBAD database, operated at CDS, Strasbourg, France, and of NASA’s Astrophysics Data System.

\onecolumngrid

APPENDIX

## A The NICMOS coronagraphic archive

We report in Table A the comprehensive list of HST programs that used the coronagraphic mode of the NICMOS instrument for non-polarimetric observations. We also report the number of target observed per program, as well as the number of these which have been re-calibrated as part of the LAPLACE program, and re-analyzed as part of the ALICE program.

\twocolumngrid

## B ALICE HCI-FITS file primary header

In the primary header of the HCI-FITS file, we selected keywords from the raw HST data keywords, and calibrated LAPLACE keywords that may be useful for a high-level science analysis of these data. We also added useful keywords specific to our work. The list of keyword present on the ALICE HCI-FITS file is presented in Table B.

\twocolumngrid

## C Data image Fits file headers

In Table  C we describe the header of the Data image FITS file provided in the “Products” folder. Most of the keywords in the primary header come from the raw NICMOS FITS file primary header, and we only describe here the keywords that we modified or added. Similarly, the RAW_IMAGE extension header corresponds to the SCI extension header in the raw NICMOS file, and we describe her the added keywords.

\twocolumngrid\bibliography@latex

biblio-disk-planet

You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters