Color, 3D simulated images with shapelets
We present a method to simulate color, 3-dimensional images taken with a space-based observatory by building off of the established shapelets pipeline. The simulated galaxies exhibit complex morphologies, which are realistically correlated between, and include, known redshifts. The simulations are created using galaxies from the 4 optical and near-infrared bands (B, V, i and z) of the Hubble Ultra Deep Field (UDF) as a basis set to model morphologies and redshift. We include observational effects such as sky noise and pixelization and can add astronomical signals of interest such as weak gravitational lensing. The realism of the simulations is demonstrated by comparing their morphologies to the original UDF galaxies and by comparing their distribution of ellipticities as a function of redshift and magnitude to wider HST COSMOS data. These simulations have already been useful for calibrating multicolor image analysis techniques and for better optimizing the design of proposed space telescopes.
keywords:galaxies: fundamental parameters, statistics, methods: statistical, image simulations, gravitational lensing
, Corresponding author. , , , ,
As astronomical surveys become deeper and wider, analysis techniques correspondingly become more complex and demanding. To calibrate these methods, extensive work has already been invested in the simulation of monochromatic astronomical imaging. Simulation packages have been developed to incorporate a semi-analytic model of galaxy number counts and evolution EWBM (), or to mimic the properties of real observations Massey (2004). However, there are currently no packages able to create correlated images across several bands.
Multi-band image simulations are firstly useful to develop and calibrate analysis methods that use multicolor data. Many measurements in astronomy (for example photometry, astrometry and shape measurement) are “inverse problems,” where variation in a signal is easy to introduce but difficult to measure, usually due to complications involving observational seeing and noise. Simulated data provide the best way to calibrate such methods because these variables can be controlled. A known astronomical signal can be inserted into simulated data, and the accuracy of a method can be judged by examining any errors in its recovery.
One example is the measurement of weak gravitational lensing. In weak lensing, light from background galaxies is lensed by foreground matter distributions, causing a shear (distortion) of the background galaxies’ shapes. The distortion is easy to add during the construction of simulated data. Although the lensing signal is achromatic, color simulations can be used to test sophisticated measurement methods that take advantage of
the increased number of shear-measurable galaxies if some galaxies are only sufficiently bright in certain bands,
reduced noise on shear measurement (by ) if the intrinsic shapes of galaxies are uncorrelated between bands, and
reduced systematic bias on shear measurement if the intrinsic shapes of galaxies are correlated between bands JainJarvis ().
One common challenge in weak lensing measurement is the deconvolution of galaxy shapes from the instrumental point-spread function (PSF). Since the PSF is different in each band, PSF-dependent biases will be averaged out by looking at multiple bands. Conversely, biases inherent to a method will not be ameliorated. Developing multicolor analysis techniques to exploit these tricks requires multicolor simulations.
Multi-band image simulations are also useful to optimize the design and improve the science case for planned, multi-band imaging surveys such as SNAP Aldering () or Euclid DUNE (). These surveys require multiple bands in order to observe different types of galaxies, observe objects typically obscured in other bands, observe objects out to different redshifts, and most importantly to obtain photometric redshifts for galaxies. Engineering requirements for the design of these instruments can be derived via image simulations by measuring the (often complex and subtle) effects of engineering parameters on scientific return High (). Predictions for the scientific return of a given mission can be similarly estimated WLII (); WLIII ().
A full demonstration of the potential gains in multicolor shear measurement, or a full optimization of a future space-based lensing mission is beyond the scope of this paper. The purpose of this paper is to present a method for simulating deep, multi-color space-based images with correlated morphologies and redshifts. The simulation pipeline we present here will serve as a basis for performing the optimization of both shear measurement techniques and future space missions in future papers.
Our simulation pipeline generalizes the single-color method of Massey (2004), representing complex galaxy morphologies as “shapelets” Cartesian (); Polar (). Shapelet-based simulations are already widely used for weak lensing. The Shear TEsting Program (STEP) used similar simulated data to test and improve shape measurement and PSF correction methods Mass2006 (). Our generalization to multi-band, 3-dimensional simulations thus increases the realism and utility of a well established technique.
This paper is organized as follows. In §2 we give a brief review of shapelets and how they can be used to generate simulated images. In §3 we present the methodology by which we create multi-band, 3-dimensional simulations. In §4 we test the realism of our simulations through comparison to the real HST data. Lastly, in §5, we discuss the conclusions and summarize our findings.
Shapelets, or 2-dimensional Gaussian-weighted Laguerre polynomials, form a complete, orthonormal basis able to represent any localized image, including a galaxy shape, in a relatively small number of coefficients. Any image can be represented as a linear combination of shapelet basis functions :
where is the pixel position, are the shapelet coefficients, and is a characteristic size Polar (). Shapelets also simplify the practical processes of image convolution and deconvolution. In real space, convolution is an expensive process with computation time scaling with the square of the number of pixels. With shapelets, convolution becomes a computationally inexpensive matrix operation Cartesian (), and deconvolution merely requires a matrix inversion BR (). This is advantageous when it is necessary to deconvolve with a PSF and re-convolve with a different PSF.
Shapelet coefficients thus form a multi-dimensional parameter space that describes a galaxy. In general, any possible galaxy morphology can be thought of as a point in this multi-dimensional parameter space. When the shapelet coefficients of a set of observed galaxies are placed in this space, various correlations emerge. Different directions in parameter space correspond to characteristics of the galaxy such as size, ellipticity, or the number of spiral arms. A classic example of this effect is the Hubble tuning-fork diagram Hubble (); Sandage (); deV (), which parametrizes galaxies’ ellipsoid, bulge/disk ratio, and how tightly wound the spiral arms are. The shapelets method increases the dimensionality of the parametrization with axes corresponding to galaxies’ magnitude, size (), and polar shapelet coefficients Massey (2004).
Real galaxy morphologies only occupy a small region of this multi-dimensional parameter space. Most regions of parameter space, corresponding to random shapelet coefficients, do not produce an image that resembles a galaxy. To manufacture useful simulations, it is essential to map the region corresponding to morphologically realistic galaxies. This region will constitute a probability density function (PDF), from which we will be able to draw simulated galaxy images. To acquire the PDF, we begin with a sample of real galaxies. Of course, the PDF is only noisily sampled by this finite set of galaxies, so we smooth it to obtain an approximation to the true, underlying PDF.
In this section we present the methodology used to create the simulations. The process can be summarized in two steps: (1) shapelet catalog creation, and (2) image constitution.
3.1 Shapelet Catalog Creation
Since our goal is to simulate multicolor images, it is necessary to start from real, multicolor data. The best data for this are the Hubble Ultra-Deep Field (UDF) images. In this field, there are 8049 galaxies with a detection signal to noise ratio of at least 10 in any one band. Photometric redshifts for each galaxy are publicly available in the Coe et al photo-z () catalog.111Available for download at http://adcam.pha.jhu.edu/coe/UDF/.
We use the program shex from the Shapelets software package222Version 2.1, available at http://www.astro.caltech.edu/rjm/shapelets/. to decompose all of these galaxies, in all observed bands, into a linear combination of shapelet basis functions. We run shex up to a maximum radial oscillation value, or n_max in the basis functions, of 20 in order to optimize decomposition. Although this is computationally expensive (n_max = 20 corresponds to 231 coefficients), we are assured that the large objects are well modeled. This algorithm automatically copes with the varying pixel scale between optical and near-infrared imaging. To maximize the efficiency of the shapelet model, we iterate the center of each decomposition on the pixel grid, the maximum order , and the scale size of each decomposition independently in each band, using the algorithm discussed in Polar (). This number is recorded for later image reconstruction. We store catalogs of galaxy shapes in their raw form as well as deconvolved from the UDF PSF as modeled by the stars in the field.
To model -band imaging, we thus increase the dimensionality of the shapelet parameter space -fold. For example, while a bright object may be uniquely described in one band by 233 coefficients (including magnitude, size and redshift) if n_max=20, it is now described by 932 coefficients. Though in the UDF , our simulation software is not limited to this number and could accept an input catalog of galaxies with more bands in the future. The UDF galaxies in this highly dimensional parameter space automatically contain the correlations between shapelet coefficients necessary to produce realistic galaxy images.
We then smooth the finite number of points in shapelet parameter space, using an Epanechnikov kernel Epanech () with a different smoothing length, , for each parameter. Note that, if we choose to be too small, the galaxy appears nearly unchanged, and we shall simply reproduce UDF galaxies in the simulated images; if we choose it to be too large, the galaxy is not realistic. Following the established smoothing scheme explored by Massey (2004), we smooth the complex polar shapelet coefficients in modulus and phase space, setting for phases and the mean separation between nearest neighbors in that dimension for moduli. To perturb the galaxy redshifts slightly, we set the redshift smoothing length to be , where is a free parameter. We choose to smooth over since it is used more frequently in determining cosmological parameters. We also choose to be 6 as a reasonable limit to conservative smoothing. If is chosen to be lower than 6, the high redshift objects could get smoothed to an unrealistically high redshift.
3.2 Image Creation
For each simulated image, we generate a sufficient number of new galaxies that their density in the simulated image reproduces that in the UDF. In practice, rather than pixelating and drawing from the smoothed PDF, we use an equivalent Monte-Carlo bootstrap technique Massey (2004). For each new galaxy, an original UDF galaxy is selected at random and perturbed in shapelet space, within the smoothing kernel, to create a new galaxy. We also append a mock catalog of photometric redshifts to these new galaxies. These redshifts are slightly perturbed from the original galaxy’s observed redshift via the same smoothing process, wile their distribution still follows the observed distribution in the UDF.
At this stage, a known weak lensing shear signal can be added to the objects. Similar effects could also be added to simulate, e.g. proper motions, photometric variability, or supernovae. The galaxy is finally convolved in shapelet space with the desired point spread function.
Once new objects are created, they are formed into a multi-band shape catalog to be arranged into new images. The objects are first re-composed into pixelated postage stamps, then placed into large, empty arrays. The placement of objects is done such that the object appears at the same (RA, Dec) position in each band, in flux units of photons per second per pixel, and with (for the sake of this paper) a constant pixel scale of arcsec per pixel. Together with the mock photometric redshift catalog, a 3-dimensional, color simulation is thus created.
The images are made realistic by adding both a sky background and shot noise. Figure 1 shows an example image. The simulations could also be made more realistic by adding cosmic rays, variable sky background/read noise, or charge transfer inefficiency trailing. The background noise could also be smoothed with a small kernel to approximate the effects of the DRIZZLE routine on stacked data from multiple, dithered exposures Drizzle (). On the one hand, DRIZZLE allows for a sharper pixel scale and correction for geometric distortions. On the other hand it produces correlated pixel noise. We have not enabled these options in the standard images studied in this paper, but intend to explore their effects in future work.
We examine the morphological realism of our simulations by “blind-testing” them against the original UDF. We also create a “-function image,” whereby the smoothing length () for each coefficient is set to zero (ie, objects are not perturbed). This allows us to separate the morphological characteristics due to shapeletization and due to smoothing in shapelet space, the results of which are found in Table 1. Our tests are similar to those done in Massey (2004). We first consider general comparisons using photometry and size from SExtractor. We then examine more advanced morphological tests.
We initially test our perturbed simulations against the UDF with a size-magnitude plot by plotting fwhm vs. AB magnitude. The i-band plot is given as a representative sample, as shown in Figure 2. The distribution should be the same for the simulations as for the original UDF. We use SExtractor parameters fwhm_image and mag_best. We also test the simulations’ realism through a magnitude histogram, a histogram of ellipticity components and , and a histogram of fwhm. Since one can always increase the number of simulated galaxies in an image, the histograms are normalized to the same area to better observe the simulations’ realism. We define the ellipticity components and to be
where , , and are the SExtractor parameters a_image, b_image, and theta_image, namely the major and minor axes and the angle between the major axis and the horizontal. This convention is generally adopted in weak lensing.
The plots discussed above reveal a strong morphological agreement between the simulated images and the originals. There is a good representation of objects from AB magnitude 22 to 28 for all bands up to a normalization factor. The ellipticity histograms show a very strong representation of objects with and between . Beyond this, objects are not as well represented on account of the difficulty in representing highly elliptical objects as shapelets. Lastly, the fwhm histogram as well shows a strong agreement across all sizes.
A more demanding test is provided by the morphological classification parameters asymmetry (A), concentration (C), and clumpiness (S). These morphology characteristics have been developed in BJC (); CGW (); C (). The CAS parameters are defined in this work slightly differently than in C (). We define the asymmetry, concentration, and clumpiness to be
where is the flux intensity at a given pixel, is the intensity at the point around the origin, and are radii containing 80 and 20 of the flux respectively, and is the intensity after the image is smoothed with a Gaussian kernel of width . The definitions above do not include a correction for the background as the more typical versions do. This is noted, but the error should average to zero when many galaxies are used given that the noise characteristics are the same for both the UDF and the simulated images. We use a smoothing width for to be 5 pixels. We also use the Petrosian Radius (R), defined to be the radius where the surface brightness at that radius is equal to 20 of the surface brightness integrated within that radius Petrosian (). Massey et al (2004) demonstrated that the monochromatic shapelet image simulations are consistent with real data by plotting A vs. C, A vs. S, and R vs. C for the simulations and for the real data. Another test is to check the mean and RMS values for A,C, and S for the simulations against the original UDF. It was found that the simulations relative to the Hubble Deep Field (HDF) demonstrated a roughly equal concentration while showing a lower asymmetry and clumpiness Polar (). Though the objects in the original simulations from the HDF have this discrepancy, it is relatively small, and it is concluded that the HDF simulations are realistic Massey (2004). We demonstrate a similar recovery here.
Examining the CAS plots, we see generally a strong agreement between the simulated images and the originals. One will notice the spread in Petrosian Radius for the simulated images. This can be explained by the shapeletization of objects. We have chosen to optimize the shapelet decomposition of objects to completely model the wings of galaxies, but at the expense of sometimes truncating their central cusps. This causes an increase in the Petrosian Radius, but should not present a large problem to methods such as weak lensing since the shearing (and then PSF smearing) happens later; this is just a small change in the intrinsic shape of the individual object which varies far more than the shear signal anyway.
In computing our results, we rejected any major outliers in the simulated images, for there were, on occasion, hugely asymmetric objects with unrealistically high concentration and clumpiness indices. These objects were very large galaxies that had not properly decomposed into shapelets. These objects were flagged and not included in subsequent simulations. We also rejected any asymmetry, clumpiness, or concentration measurements in any band where a galaxy was so faint that the routines failed to converge. We also set the limiting magnitude for these statistics to be 28 as computed by SExtractor as a balance between believable measurements and sufficiently deep galaxies. The results from the i-band simulated images are presented as a representative sample in Figures 2 through 4. A summary of results for all bands is presented in Table 1. The overall agreement is quite good with very little deviation from the original UDF.
As we shall use these simulations for weak lensing, of particular interest is the intrinsic ellipticity variance (), as a function of magnitude and redshift for shear-measurable galaxies. We include in Figure 5. We calculate the shear of a galaxy with the weak lensing measurement method of Rhodes, Refregier, and Groth (hereafter RRG) RRG (). This is a mature method developed specifically for space-based weak lensing measurements and with thorough testing during analysis of many Hubble Space Telescope images: including the Groth Strip gs (), the Medium Deep Survey mds (), the STIS Parallel Survey stis_ps (), and the COSMOS 2 Square Degree Survey cosmos (). This method was also used for testing during the development of the monochromatic shapelets image simulation pipeline described in Massey (2004). We run the RRG pipeline on the simulated images exactly as we have run it on real HST data and make similar cuts on objects in order to obtain the most representative results possible.
We define a shear-measurable galaxy to be one that passes several cuts. We first discard faint galaxies with S/N less than 10, where the S/N is defined as the the ratio of the SExtractor parameters flux_auto to fluxerr_auto. We also remove galaxies with ellipticity , after correction for the PSF cut (). Note that, in the presence of image noise, especially during the PSF correction stage, it is possible for a moment-based shape measurement method to produce a non-physical ellipticity . This ellipticity cut also implicitly removes galaxies for which an iterative centroiding process in RRG failed to converge. This includes objects for which there was a large shift away from the initial position detected by SExtractor. They are usually blended objects or close pairs, for which an accurate shape measurement would be impossible anyway. We finally discard objects with size (where and are the weighted second order moments) smaller than 1.2 times that of the PSF. It is important that the cuts we make on the galaxies useful for lensing be as realistic as possible. Given the long history of the RRG method in space-based weak lensing measurements we feel that running the RRG pipeline on the simulated images is the best way to make these cuts realistic.
In Fig. 5, we compare the RMS shear for our simulations with real data from COSMOS Alexie (). The high agreement within our sample error is indicative of the simulations’ realism. Running more simulations lowers the statistical error bars, but we are limited by sample variance due to the limited number of galaxies in the UDF. The high error bars seen in the figure reflect this uncertainty.
|Filter||Image||A||rms A||C||rms C||S||rms S||rms e|
We presented a method to create an arbitrary amount of 3-dimensional, color, simulated, unique deep space images. The simulations are created by perturbing a galaxy’s polar shapelet coefficients in such a way as to create unique but realistic objects. The previous simulation pipeline has been expanded to correlate morphologies across four wavelength bands and now also include a redshift distribution. Though we currently use four wavelength bands, our simulation pipeline is flexible enough to include an arbitrary number of colors, should such a data set become available and useful.
Our simulations were tested by comparing them to the original UDF images. They were found to have similar morphologies to the original galaxies. Additionally, the weak lensing cosmological properties of the simulations were tested against COSMOS HST data. Within reasonable error, our simulations were found to be consistent.
The authors thank the Caltech SURF program, the Berkeley Physics Undergraduate Research Scholars program, and the Jet Propulsion Laboratory, run under a contract with NASA by Caltech, for their support. We would also like to thank Richard Ellis for further support and enthusiasm regarding the project and Alexandre Refregier for his ideas and enthusiasm. The project would not have gotten started on the right path were it not for Will High’s help. Thanks also to Peter Capak for useful insights in simultaneous detection with SExtractor. Dave Johnston provided the PSF for SNAP and was helpful in calibrating the measurements of . Thanks also to Alexie Leauthaud for help in shear measurement. JR and MF were supported in part by NASA grant BEFS-399131.02.02.01.07.
- (1) Erben T., van Waerbeke L., Bertin E., Mellier Y. & Schneider P. 2001, A&A, 366, 717
- (2) Jarvis M. and Jain B. 2008, JCAP, 01, 003
- Massey (2004) Massey R., Refregier A., Conselice C., Bacon D., 2004 MNRAS 349, 214
- (4) Aldering, SNAP Collaboration SPIE, 4835
- (5) Refregier A. et al, SPIE, 6265
- (6) High F., Ellis R., Massey R., Rhodes J., Lamoureux J., SNAP Collaboration 2004AAS, 205.6502H
- (7) Massey R., Rhodes J., Refregier A., Albert J., Bacon D., Bernstein G., Ellis R., Jain B., McKay T., Perlmutter S., Taylor A., 2004, AJ, 127, 3089
- (8) Refregier A., et al. 2004, AJ, 127, 3102
- (9) Refregier A. 2003 MNRAS, 338, 35
- (10) Massey R., & Refregier A. 2005 MNRAS, 363, 197
- (11) Massey et al, 2007, MNRAS 376, 13
- (12) Refregier A. & Bacon D., 2003 MNRAS, 338, 48
- (13) Hubble E. 1926, ApJ, 64, 321
- (14) Sandage A. 1961, “The Hubble Atlas of Galaxies” (Carnegie Inst. Washington Publ. 618) (Washington: Carnegie Inst.)
- (15) de Vaucouleurs G. 1959, Hand. Physik, 53, 275
- (16) Szalay A., Connolly A., & Szokoly G. 1999, AJ, 117, 68
- (17) Bertin E. & Arnouts S. 1996, A&AS, 117, 393
- (18) Coe D., Benítez N., Sánchez S., Jee M., Bouwens R., Ford H. 2006, AJ, 132, 926
- (19) Refregier A. & Bacon D. 2003, MNRAS, 338, 48
- (20) Epanechnikov, V. 1969, “Nonparametric Estimation of a Multi-variate Probability Density,” Theory of Probability and Its Applications, 14, 153
- (21) Silverman B. 1986, “Density Estimation for Statistics” and Data Analysis (Chapman and Hall, London)
- (22) Fruchter A. & Hook R. 2002, PASP, 114, 144
- (23) Bershady M., Jangren A. & Conselice C. 2000, AJ, 119, 2645
- (24) Conselice C., Gallagher J. & Wyse R. 2002, AJ, 123, 2246
- (25) Conselice C. 2003, ApJS, 147, 1
- (26) Petrosian V. 1976, ApJ, 209, 1L
- (27) Rhodes J., Refregier A., & Groth E. 2000, ApJ, 536, 79
- (28) Rhodes J., Refregier A., Groth E. 2001, ApJ, 552L, 85
- (29) Refregier A., Rhodes J., Groth E. 2002, ApJ, 572L, 131
- (30) Rhodes J., Refregier A., Collins N., Gardner J., Groth E., Hill R. 2004, ApJ, 605, 29
- (31) Massey R., et al 2007, ApJS 172, 239
- (32) Rhodes J., Refregier A., Groth E. 2001, ApJ, 552L, 85
- (33) Hoekstra H., Mellier Y., van Waerbeke L., Sembolini E., Fu L., Hudson M., Parker L., Tereno I., Benabed K. 2006, ApJ, 647, 116.
- (34) Rhodes J., Refregier A., Massey R., Albert J., Bacon D., Bernstein G., Ellis R., Lampton M., Kim A., McKay T., Perlmutter S., SNAP collaboration 2004, Astropart. Phys. 20, 377
- (35) Leauthaud A., et al 2007, ApJS 172, 219