GOODS–Herschel: an infrared main sequence for star-forming galaxiesHerschel is an ESA space observatory with science instruments provided by European-led Principal Investigator consortia and with important participation from NASA.

# Goods–Herschel: an infrared main sequence for star-forming galaxies1

###### Key Words.:
Galaxies: evolution – Galaxies: active – Galaxies: starburst – Infrared: galaxies

We present the deepest 100 to 500 m far-infrared observations obtained with the Herschel Space Observatory as part of the GOODS–Herschel key program, and examine the infrared (IR) 3–500 m spectral energy distributions (SEDs) of galaxies at 0 2.5, supplemented by a local reference sample from IRAS, ISO, Spitzer and AKARI data. We determine the projected star formation densities of local galaxies from their radio and mid-IR continuum sizes.

We find that the ratio of total IR luminosity to rest-frame 8 m luminosity, (/), follows a Gaussian distribution centered on =4 (=1.6) and defines an IR main sequence for star-forming galaxies independent of redshift and luminosity. Outliers from this main sequence produce a tail skewed toward higher values of . This minority population (20 %) is shown to consist of starbursts with compact projected star formation densities. can be used to separate galaxies with normal and extended modes of star formation from compact starbursts with high–, high projected IR surface brightness (310 Lkpc) and a high specific star formation rate (i.e., starbursts). The rest-frame, UV-2700 Å size of these distant starbursts is typically half that of main sequence galaxies, supporting the correlation between star formation density and starburst activity that is measured for the local sample.

Locally, luminous and ultraluminous IR galaxies, (U)LIRGs (10L), are systematically in the starburst mode, whereas most distant (U)LIRGs form stars in the “normal” main sequence mode. This confusion between two modes of star formation is the cause of the so-called “mid-IR excess” population of galaxies found at 1.5 by previous studies. Main sequence galaxies have strong polycyclic aromatic hydrocarbon (PAH) emission line features, a broad far-IR bump resulting from a combination of dust temperatures (15 – 50 K), and an effective 31 K, as derived from the peak wavelength of their infrared SED. Galaxies in the starburst regime instead exhibit weak PAH equivalent widths and a sharper far-IR bump with an effective 40 K. Finally, we present evidence that the mid-to-far IR emission of X-ray active galactic nuclei (AGN) is predominantly produced by star formation and that candidate dusty AGNs with a power-law emission in the mid-IR systematically occur in compact, dusty starbursts. After correcting for the effect of starbursts on , we identify new candidates for extremely obscured AGNs.

## 1 Introduction

It is now well established that 85 % of the baryon mass contained in present-day stars formed at 02.5 (see, e.g., Marchesini et al. 2009 and references therein) and that most energy radiated during this epoch by newly formed stars was heavily obscured by dust. To understand how present-day galaxies were made, it is therefore imperative to accurately determine the bolometric output of dust, hence the total IR luminosity, , integrated from 8 to 1000 m. In the past, this key information on the actual star formation rate (SFR) experienced by distant galaxies was determined by extrapolating observations in the mid-IR and sub-millimeter (sub-mm) or by correcting their UV luminosities for extinction. These extrapolations implied that the number density per unit comoving volume of luminous IR galaxies (LIRGs, 10/L10) was 70 times larger at 1, i.e., 8 Gyr ago, when LIRGs were responsible for most of the cosmic SFR density per unit co-moving volume (see e.g., Chary & Elbaz 2001 – hereafter CE01, Le Floch et al. 2005, Magnelli et al. 2009). Earlier in the past, at 2, sub-mm and Spitzer observations revealed that the contribution to the cosmic SFR density of even more active objects, the ultraluminous IR galaxies (ULIRGs, 10 L), was as important as for LIRGs (Chapman et al. 2005, Papovich et al. 2007, Caputi et al. 2007, Daddi et al. 2007a, Magnelli et al. 2009, 2011). However, none of these studies used rest-frame far-IR measurements of individual galaxies at wavelengths where the IR spectral energy distribution (SED) of star-forming galaxies is known to peak. At best, they relied on stacking of far-IR data from individually undetected sources.

With the launch of the Herschel Space Observatory (Pilbratt et al. 2010), it has now become possible to measure the total IR luminosity of distant galaxies directly. Using shallower Herschel data than the present study, Elbaz et al. (2010) showed that extrapolations of from the mid-IR (24 m passband), which was done under the assumption that the IR SEDs of star-forming galaxies remained the same at all epochs, were correct below 1.3, with an uncertainty of only 0.15 dex. However, the extension of this assumption to (U)LIRGs at 1.3, in large part relying on stacking, failed by a factor 3-5 typically (Elbaz et al. 2010, Nordon et al. 2010). This finding confirmed the past discovery of a so-called “mid-IR excess” population of galaxies (Daddi et al. 2007a, Papovich et al. 2007, Magnelli et al. 2011): the 8 m rest-frame emission of 2 (U)LIRGs was excessively strong compared to the IR SED of local galaxies with equivalent luminosities when deriving from the radio continuum at 1.4 GHz, from stacked measurements from Spitzer-MIPS 70 m, or from the UV luminosity corrected for extinction.

Various causes have been invoked to explain this “mid-IR excess” population: (i) an evolution of the IR SEDs of galaxies; (ii) the presence of an active galactic nucleus (AGN) heating dust to temperatures of a few 100 K; or (iii) limitations in local libraries of template SEDs, i.e., the -correction effect on distant galaxies probing regimes where the SEDs were not accurately calibrated. Evidence pointing toward an important role played by obscured AGN to explain these discrepancies (point (ii)) came from the stacking of Chandra X-ray images at the positions of the most luminous 2 BzK galaxies (Daddi et al. 2007a). The most luminous of these distant galaxies were detected in both the soft (0.5–2 keV) and hard (2–7 keV) X-ray channels of Chandra and exhibited a flux ratio typical of heavily obscured (10 cm) or even Compton thick AGN (10 cm). Surprisingly, however, a high fraction of the same objects, when observed in mid-IR spectroscopy with the Spitzer IR spectrograph (IRS), were found to possess intense polycyclic aromatic hydrocarbon (PAH, Léger Puget 1984, Puget Léger 1989, Allamandola et al. 1989) broad lines with equivalent widths strongly dominating over the hot to warm dust continuum (Rigby et al. 2008, Farrah et al. 2008, Murphy et al. 2009, Fadda et al. 2010, Takagi et al. 2010). Deeper Chandra observations have since showed that only 25 % of the 2 BzK-selected mid-IR galaxies hosted heavily obscured AGN, the rest being otherwise composed of relatively unobscured AGNs and star-forming galaxies (Alexander et al. 2011). This would instead favor points (i) or (iii) above.

In this paper, we present the deepest 100 to 500 m far-IR observations obtained with the Herschel Space Observatory as part of the GOODS–Herschel Open Time Key Program with the PACS (Poglitsch et al. 2010) and SPIRE (Griffin et al. 2010) instruments. Thanks to the unique power of Herschel to determine the bolometric output of star-forming galaxies, we demonstrate that incorrect extrapolations of from 24 m observations at 1.5, and the associated claim for a “mid-IR excess” population, do not indicate a drastic evolution of infrared SEDs, nor the ubiquity of warm AGN-heated dust dominating the mid-IR emission. Instead, we show that the 8 m bolometric correction factor (/) is universal in the range 02.5, hence defining an IR “main sequence” (MS). We show that past incorrect extrapolations resulted from the confusion between galaxies with extended star formation and those with compact starbursts, which exhibit notably different infrared SEDs.

We present evidence that this IR main sequence is directly related to the redshift dependent SFR – M* relation (Noeske et al. 2007, Elbaz et al. 2007, Daddi et al. 2007a, 2009, Pannella et al. 2009, Magdis et al. 2010a, Gonzalez et al. 2011) and is able to separate galaxies between those experiencing a “normal” mode of extended star formation and starbursts with compact projected star formation densities. This distinction between a majority of “main sequence” (MS) galaxies and a minority of compact “starbursts” (SB) is analogous to the recent finding of two regimes of star formation in the Schmidt-Kennicutt (SK) law, with MS galaxies following the classical SK relation while the SFR of SB galaxies is an order of magnitude greater than expected from their projected gas surface density (Daddi et al. 2010, Genzel et al. 2010). To separate these two star-formation modes, the GOODS–Herschel observations of distant galaxies are supplemented by a reference sample of local galaxies using a compilation of data from IRAS, AKARI, Spitzer, SDSS and radio observations.

The GOODS–Herschel observations and catalogs are presented in Section 2. The main limitation of the Herschel catalogs, the confusion limit, and a “clean index” identifying sources with robust photometry are discussed in Sect. 2.3. The high- and low-redshift galaxy samples are introduced in Section 3 together with a description of the method used to compute total IR luminosities, stellar masses and photometric redshifts. The IR main sequence is presented in Section 4 where the so-called “mid-IR excess problem” is addressed and a solution proposed using the bolometric correction factor. This parameter, which relies on the same rest-frame wavelengths independent of galaxy redshift, is used to separate star-forming galaxies in two modes: a main sequence and a starburst mode. In the following sections, is shown to correlate closely with the IR surface brightness, hence with the projected star formation density, and with the starburst intensity, that we quantify here with a parameter named “starburstiness”, for local (Section 5) and distant (Section 6) galaxies. It is shown that galaxies exhibiting enhanced values are undergoing a compact starburst phase. The universality of among main sequence star-forming galaxies is used to produce a prototypical IR SED for galaxies in the main sequence mode of star formation in Section 7. We combine Spitzer and Herschel photometry in many passbands for galaxies at to derive composite SEDs for both main sequence and starburst galaxies. Finally, galaxies exhibiting an AGN signature are discussed in Section 8, where we present a technique to identify obscured AGN candidates that would be unrecognized by previous methods.

We use below a cosmology with =70 , , and we assume a Salpeter initial mass function (IMF, Salpeter 1955) when deriving SFRs and stellar masses.

## 2 Goods–Herschel data and catalogs

### 2.1 Observations

The sample of high-redshift galaxies analyzed here consists of galaxies observed in the two Great Observatories Origins Deep Survey (GOODS) fields in the Northern and Southern hemispheres. Observations with the Herschel Space Observatory were obtained as part of the open time key program GOODS–Herschel (PI D.Elbaz), for a total time of 361.3 hours. PACS observations at 100 and 160 m cover the whole GOODS–north field of 1016 and part of GOODS–south, i.e., 1010 (but reaching the largest depths over 64 arcmin). When considering the total observing times of 124 hours in GOODS–N and 206.3 hours in GOODS–S (including 2.6 and 5 hours of overheads), the PACS GOODS–Herschel observations reach a total integration time per sky position of 2.4 hours in GOODS–N and of 15.1 hours in GOODS–S, i.e., 6.3 times longer. Due to the larger beam size and observing configuration, the SPIRE observations of GOODS–N cover a field of 900 arcmin, hence largely encompassing the central 1016, for a total observing time of 31.1 hours and an integration time per sky position of 16.8 hours.

Fig. 1 shows a montage of images (each 55) from Spitzer–IRAC at 3.6 m to SPIRE at 500 m. This illustrates the impact of the increasing beam size as a function of wavelength: the number of sources that are clearly visible at each wavelength increases when going from the longest to the shortest wavelengths (with the exception of the 70 m image, which comes from Spitzer and not Herschel). Composite three color images of GOODS–N at 100–160–250 m and GOODS–S at 24–100–160 m are shown in Figs. 2 and 3.

### 2.2 Catalogs

#### Source extraction

Flux densities and their associated uncertainties were obtained from point source fitting using 24 m prior positions. For the largest passbands of SPIRE (i.e., 350 and 500 m), the 24 m priors are much too numerous and would lead to an over-deblending of the actual sources. Hence, we defined priors with the following procedure. For PACS-100 m, we used MIPS-24 m priors down to the 3 limit and imposing a minimum flux density of 20 Jy. For PACS-160 m and SPIRE-250 m, we restricted the 24 m priors to the 5 (30 Jy) limit (reducing the number of priors by about 35 %). For SPIRE-350 and 500 m, we kept only the 24 m priors for sources with a S/N ratio greater than 2 at 250 m. These criteria were chosen from Monte Carlo simulations (see Sect. 2.2.2) to avoid using too many priors that would result in subdividing flux densities artificially, while producing residual maps (after PSF subtracting the sources brighter than the detection limit) with no obvious sources remaining.

Fig. 4 can be used to infer the reliability of the Spitzer-MIPS 24 m images of the two GOODS fields for identifying potential blending issues with Herschel. It shows that the 20 Jy depth at 24 m (3) reaches fainter sources than any of the Herschel bands, down to the confusion level and up to a redshift of 3. The technique used to estimate total IR luminosities for the Herschel sources is discussed in Sect. 3.2. Hence the positions of 24 m sources can be used to perform robust PSF fitting source detection and flux measurements on the Herschel maps. We validated the efficiency of this technique by checking that no sources remain in the residual images after subtracting the detected sources or by independently extracting sources using a blind source extraction technique (Starfinder: Diolaiti et al. 2000). Although it is the case that most Herschel sources have a 24 m counterpart, a few 24 m-dropout galaxies were found, i.e., galaxies detected by Herschel but not at 24 m. This will be the subject of a companion paper (Magdis et al. 2011). But these objects represent less than 1 % of the Herschel sources.

#### Limiting depths of the catalogs and flux uncertainties

The noise in the Herschel catalogs results from the combined effects of (1) instrumental effects photon noise, (2) background fluctuations due to the presence of sources below the detection threshold (photometric confusion noise, see Dole et al. 2004), (3) blending due to neighboring sources, above the detection threshold (source density contribution to the confusion noise). In both PACS and SPIRE images (except at 100 m in GOODS–N), the depths of the GOODS–Herschel observations are always limited by confusion, i.e., (2) and (3) are always stronger than (1). Global confusion limits have been determined for PACS (Berta et al. 2011) and SPIRE (Nguyen et al. 2010). However these global definitions assume no a priori knowledge on the local projected densities of sources, as if e.g., 500 m sources were distributed in an independent manner with respect to shorter wavelengths such as the 250 and 350 m ones, or even down to 24 m. Moreover, the flux limit associated to source blending, (3), is often artificially set to be the flux density above which 10 % of the sources are blended, even though statistical studies, such as the present one, could afford higher fractions as long as the photometric uncertainty is well controlled. Actual observations instead demonstrate that shorter wavelengths do provide a good proxy for the density field of longer wavelengths (see Fig. 1). Hence we define the 3 (or 5) sensitivity limits of the GOODS–Herschel catalogs as the flux densities above which at least 68 % of the sources can be extracted with a photometric accuracy better than 33 % on the basis of Monte Carlo simulations and we use the positions of 24 m sources as priors to extract sources from PSF fitting. Individual sources are attributed a “clean” flag depending on the underlying density field as defined in Sect. 2.3.

Flux uncertainties were derived in two independent ways. First (i), we added artificial sources into the real Herschel images and applied the source extraction procedure. This process was repeated a large number of times (Monte Carlo – MC – simulations). Second (ii), we measured the local noise level at the position of each source on the residual images produced after subtracting sources detected above the detection threshold. The first technique gives a noise level for a given flux density averaged over the whole map, while the second one provides a local noise estimate. In the MC simulations, we define the 3 (or 5) sensitivity limits in all bands as the flux densities above which a photometric accuracy better than 33 % (or 20 %) is achieved for at least 68 % of the sources in the faintest flux density bin (as in Magnelli et al. 2009, 2011).

Technique (i) provides a statistical noise level attributed for a given flux density which accounts for all three noise components but is independent of local variations of the noise. The histogram of the output - input flux densities of the MC simulations follows a Gaussian shape whose was used to define the typical limiting depths of the Herschel catalogs listed in Col.(3) of Table 1. All GOODS–Herschel images (except the PACS–100 m image in GOODS–N) reach the 3 confusion level, i.e., the flux density for which the photometric accuracy is better than 33 % for at least 68 % of the sources is more than three times higher than the instrumental noise level.

In technique (ii), only the noise components (1) and (2) are taken into account, since the objects participating in the third component (source blending) have been subtracted to produce the residual images. However, imperfect subtraction of sources, due to local blending, may inflate the local residuals in the maps after source subtraction. In the PACS images and catalogs, both techniques result in very similar noise levels. A statistical limiting depth was computed by convolving the residual images with the PACS beam at each wavelength and measuring the of the distribution of individual pixels. This method resulted in the same depths as in technique (i) and listed in Col.(3) of Table 1. Instead, for the SPIRE data, local noise estimates in the residual maps were found to be systematically lower than those measured with technique (i). On average, sources with a SPIRE flux density corresponding to the detection threshold of 3 in the MC simulations are found to present a local signal-to-noise ratio of 5 in the residual maps. For SPIRE sources, this implies that we consider only sources above the 5 limit in the residual maps, to be consistent with the 3 limit resulting from the MC simulations.

Due to local noise variations in the maps, there can be small numbers of sources with flux densities slightly fainter than the nominal detection limits, which explains the presence of sources below the horizontal lines in Fig. 5.

### 2.3 Local confusion limit and “clean index”

The main source of uncertainty, in the SPIRE images in particular, comes from the high source density relative to the beam size, i.e., the so-called confusion limit (see Condon 1974). Assuming that this limit applies equally at all positions of the sky, Nguyen et al. (2010) estimated that the floor below which SPIRE sources may not be extracted is 30 mJy, corresponding to 5 confusion limits of 29, 31 and 34 mJy/beam for beams of 18.1″, 24.9″and 36.6″ FWHM at 250, 350 and 500 m respectively.

However, this “global confusion limit” is defined assuming no a priori knowledge on the projected density map of the underlying galaxy population. If one instead assumes that shorter wavelengths, at a higher spatial resolution, can be used to define the local galaxy density at a given galaxy position, then a “local confusion limit” can be defined. In practice, this means that not all SPIRE sources are located at a place where several bright PACS or MIPS–24 m fall in the SPIRE beam. Following this recipe, Hwang et al. (2010a) defined a “clean index” that was attributed to all individual Herschel detections under the following conditions: a 500 m source is flagged as “clean” if its 24 m prior has at most one bright neighbor in the Spitzer-MIPS 24 m band (where “bright” means an F50% of the central 24 m source) within 20 (1.1FWHM of Herschel at 250 m) and no bright neighbor in each one of the shorter Herschel passbands, i.e., at 100, 160, 250, 350 and 500 m within 1.1FWHM of Herschel in these passbands (see Table 1). As a result, we only kept 11 clean sources at 500 m for which we consider that the photometry is reliable. The criterion becomes less critical for the shorter bands, since we only consider the presence of bright neighbors at shorter wavelengths. As a result, the number of 350 m detections is an order of magnitude larger than at 500 m. This “local confusion limit” was empirically defined after visually inspecting the data for all individual sources but a more detailed investigation of this quality flag using simulations of the actual GOODS sources both spatially and in redshift confirms its robustness (Leiton et al. 2011, in prep.). For galaxies for which this “clean index” condition is not met in some bands, unphysical jumps in the IR SED are observed. This may lead to wrong estimates of the dust temperature for example, systematically shifting it to colder values, since source blending affects preferentially the longest wavelengths.

With the sensitivity limits of GOODS–Herschel, Fig. 4 shows that below a redshift of 3 the shortest wavelengths are always deeper than the longest ones, hence one can take advantage of these higher resolution images to better constrain the confusion limit at local, instead of global, scales. Moreover, we note that the fluxes in e.g., the SPIRE bands are not independent of those measured in the 24 m and 100 m passbands. They even follow a tight correlation (see Elbaz et al. 2010 and the present analysis), again in the redshift range of interest here, i.e., 0–2.5. Hence it is possible to map the density of IR sources and to flag sources in relatively isolated areas, with respect to similarly bright or brighter IR sources. If the “clean index” did not reject efficiently problematic measurements, this would result in an increase of the dispersion in the figures presented in this paper. Since we will show that these dispersions are quite small already, if this effect was corrected, it would only reinforce our results. Typically, half of the Herschel sources detected at 160 m survive this criterion (see Table 1).

## 3 High- and low-redshift galaxy samples

### 3.1 The GOODS–Herschel galaxy sample

Both GOODS fields have been subject to intensive follow-up campaigns, resulting in a spectroscopic redshift completeness greater than 70 % for the Herschel sources (Table 1). We use a compilation of 3630 and 3018 spectroscopic redshifts for GOODS–N (Cohen et al. 2000, Wirth et al. 2004, Barger, Cowie & Wang 2008, and Stern et al. in prep.) and GOODS–S (Le Fèvre et al. 2004, Mignoli et al. 2005, Vanzella et al. 2008, Popesso et al. 2009, Balestra et al. 2010, Silverman et al. 2010, and Xia et al. 2010) respectively. Photometric redshifts and stellar masses are computed in both fields from U-band to IRAC 4.5 m photometric data using Z-PEG (Le Borgne & Rocca-Volmerange 2002). The templates used for both photometric redshifts and stellar mass estimates are determined from PEGASE.2 (Fioc & Rocca-Volmerange 1999) are were produced using nine scenarios for the star formation history (see Le Borgne & Rocca-Volmerange 2002) with various star-formation efficiencies and infall timescales, ranging from a pure starburst to an almost continuous star-formation rate, aged between 1 Myr and 13 Gyr (200 ages). There is no constraint on the formation redshift. The templates are required to be younger than the age of the Universe at any redshift.

The redshift distributions of the sources individually detected in each of the Herschel bands, as well as at 24 m with Spitzer, are presented in Fig. 5 for both fields. This illustrates the relative power of these bands to detect sources as a function of redshift. While the 500 m band samples sources at all redshifts from =0 to 4, it only provides a handful of objects: 24 galaxies in total within the 10′15′ size of the GOODS–N field, with only 11 flagged as clean, 73 % of which have a spectroscopic redshift determination. In comparison, more than a thousand sources are detected in the 100 m band, the vast majority being flagged as clean and 72 % having a spectroscopic redshift.

Table 1 also lists the characteristics of the other IR catalogs that we use in the present study. The GOODS Spitzer IRAC catalogs were created using SExtractor (Bertin & Arnouts 1996), detecting sources in a weighted combination of the 3.6 and 4.5 m images, with matched-aperture photometry in the four IRAC bands, using appropriate aperture corrections to total flux. The Spitzer 24 m and 70 m catalogs (Magnelli et al. 2011) use data from the Spitzer GOODS and FIDEL programs (PI: M. Dickinson). Sources detected in the IRAC images are used as priors to extract the 24 m fluxes, and then in turn a subset of those 24 m sources are used as priors to extract fluxes at 70 m. The 16 m data comes from Spitzer IRS peak-up array imaging (Teplitz et al. 2011); here again, 16 m catalog fluxes are extracted using IRAC priors. In this study, we make particular use of the Spitzer data to quantify the redshift dependence of the IR SEDs while minimizing mid-infrared k-corrections by measuring the rest-frame 8 m emission of galaxies at 0, 1 and 2 from their observed fluxes in the IRAC-8 m, IRS-16 m and MIPS-24 m passbands. Table 1 also gives the spectroscopic (%), and photometric spectroscopic (%) completeness of the IR catalogs from 3.6 to 500 m within the fiducial GOODS area. As noted previously, the SPIRE images of GOODS–N cover a wider field, but here we do not count the sources detected outside the regular GOODS area.

Known AGN were excluded from the sample and will be discussed separately in Section 8. X-ray/optical AGN were identified from one of the following criteria: [0.5-8.0 keV] 310 ergs s, a hardness ratio (ratio of the counts in the 2-8 keV to 0.5-2 keV passbands) higher than 0.8, N10 cm, or broad/high-ionization AGN emission lines (Bauer et al. 2004). We also excluded power-law AGN, i.e., galaxies showing a rising continuum emission in the IRAC bands due to hot dust radiation (see definition in Sect. 8).

### 3.2 Total infrared luminosities

Total IR luminosities, , for GOODS–Herschel galaxies were determined by allowing the normalization of the CE01 template SEDs to vary and choosing the one that minimizes the fit to the Herschel measured flux densities. At the highest redshifts considered in the present analysis (), the Herschel 100 m passband samples rest-frame mid-IR wavelengths. Hence, to avoid mixing galaxies with and without direct far-IR detections, we require at least one photometric measurement at wavelengths longer than 30 m in the rest-frame. This excludes a few high redshift galaxies detected only at 100 m. Total IR luminosities, , were integrated from 8 to 1000 m on the best-fitting normalized CE01 SED. When only one or two Herschel measurements are available above 30 m, the degeneracy of the fit being large, we use the standard CE01 technique, i.e., we use the SED with the closest luminosity from the CE01 library without allowing any renormalization.

In order to quantify the impact of the choice of a given set of SEDs to fit the Herschel measurements and determine , we have repeated the same exercise with another SED library from Dale & Helou (2002, DH02). The ratio of the values derived with one or the other family of SEDs has a median of 1 and a dispersion of 12 %–rms. The uncertainty in the determination of is therefore dominated by the actual error bars on the Herschel flux measurements rather than by the choice of the SED library. In order to account for the latter source of uncertainty, we have generated a series of 100 realizations of the Herschel flux measurements assuming a Gaussian distribution within their error bars and determined 100 values of by fitting those realizations independently. The final associated to a given galaxy is the median of the 100 Monte Carlo estimates and its error bar is the rms around the median. This procedure was repeated for each individual galaxy.

Since we will compare the distant GOODS–Herschel galaxies to a reference sample of local galaxies for which is estimated from IRAS measurements alone, as a consistency check we computed the total IR luminosity that we would obtain for the GOODS–Herschel galaxies if we had used Eq. 1 (taken from Sanders & Mirabel 1996),

 LIR/L⊙=4πD2lum[m][1.8×10−14(FIR[Wm−2])]/3.826×1026where FIR=13.48F12μm+5.16F25μm+2.58F60μm+F100μm , (1)

as a proxy for the derivation of the 8 – 1000 m luminosity, instead of the actual integral over the IR SED. The IRAS flux densities , , and in Eq. 1 are in Jy. Both techniques give equivalent total IR luminosities within 5 %, hence again the dominant cause of discrepancy in the comparison is related to flux uncertainties.

### 3.3 Local galaxy reference sample

The local galaxy reference sample that we use in this paper consists of galaxies detected with the Infrared Space Observatory (ISO), AKARI, and Spitzer. Their rest-frame 8 m luminosities and total IR luminosities are compared to those of the GOODS–Herschel galaxies. Galaxies with direct IRAC–8 m measurements from Spitzer are supplemented with galaxies with ISO 6.75m and AKARI 9m photometry, for which pseudo-IRAC 8 m luminosities, , were computed using the IR SED of M82 (Förster Schreiber et al. 2001, Elbaz et al. 2002). The ISO and AKARI samples span a wide range of relatively low luminosity galaxies, together with a sample of ULIRGs, while the Spitzer sample contains a quite complete sample of local LIRGs (see Table 2).

#### Local Iso galaxy sample

The mid-IR luminosities of this sample of 150 galaxies described in CE01 and Elbaz et al. (2002) were obtained from measurements taken with ISO. The sample includes 110 galaxies closer than 300 Mpc and spanning a wide range of mid-IR luminosities estimated from ISOCAM-LW2 (5–8.5 m, centered at 6.75 m) and 41 ULIRGs, at distances 80 to 900 Mpc, with mid-IR luminosities determined with the PHOT-S spectrograph of ISOPHOT (Rigopoulou et al. 1999). We refer to CE01 for a discussion of the conversion of the PHOT-S spectra into broadband luminosities equivalent to the LW2 filter. Pseudo-IRAC 8 m luminosities, , were estimated by first convolving the ISOCAM CVF spectrum of M82 (Förster Schreiber et al. 2001, Elbaz et al. 2002) to the ISOCAM-LW2 and IRAC-8 m bandpasses and then normalizing the resulting luminosities to the observed luminosity for each of the 150 galaxies, in order to derive their . Since both filters are wide and largely overlapping, the conversion depends very little on the exact shape of the spectrum used for the conversion and we checked that indeed using the CE01 SEDs (for example) instead of that for M82 would make negligible differences with respect to the actual dispersion of galaxies in the diagram. Total IR luminosities, , were derived from the four IRAS band measurements using Eq. 1.

#### Local Akari galaxy sample

Galaxies with mid-infrared (9 m) measurements from AKARI were cross-matched with the IRAS Faint Sources Catalog ver. 2 (FSC-2; Moshir, Kopman & Conrow 1992) and with spectroscopic redshifts from the Sloan Digital Sky Survey Data Release 7 (SDSS DR7; Abazajian et al. 2009) supplemented by a photometric sample of galaxies with redshifts available in the literature (Hwang et al. 2010b). For both IRAS and AKARI, we consider only the sources with reliable flux densities2. A total of 287 galaxies have 9 m flux densities from the AKARI/Infrared Camera (IRC, Onaka et al. 2007) Point Source Catalog (PSC ver. 1.0, Ishihara et al. 2010) reaching a detection limit of 50 mJy (5) with a uniform distribution over the whole sky and closer than 450 Mpc (0.1). As in Sect. 3.3.1, pseudo-IRAC 8 m luminosities, , were computed by convolving the ISOCAM CVF spectrum of M82 with the AKARI-IRC 9 m bandpass to estimate the conversion factor between the IRC–9 m and IRAC–8 m luminosities assuming the same IR SED for all galaxies. The effective wavelength of the AKARI 9 m passband is 8.6 m (Ishihara et al. 2010), not far from that of the IRAC-8 m filter (7.9 m, Fazio et al. 2004).

Total IR luminosities were computed from the four IRAS bands using Eq. 1. The IRC–9 m measurements were not used in the computation of . Far-IR measurements were supplemented with the AKARI/Far-Infrared Surveyor (FIS; Kawada et al. 2007) all-sky survey Bright Source Catalogue (BSC ver. 1.03) that contains 427 071 sources, with measured flux densities at 65, 90, 140 and 160 m. We used the supplementary far-IR measurements for 16 % of the sample for which there is no 12 m nor 25 m reliable measurement from IRAS. We checked the consistency of these IR estimates from AKARI with those obtained from IRAS alone and found that AKARI luminosities were systematically lower by 10 %. We corrected those 16 % galaxies by this factor.

#### Local Spitzer galaxy sample

A sample of 202 IRAS sources, consisting of 291 individual galaxies (some blended at IRAS resolution), were observed with the IR spectrograph (IRS) on-board Spitzer as part of the Great Observatories All-sky LIRG Survey project (GOALS; Armus et al. 2009). The sources were drawn from the IRAS Revised Bright Galaxy Sample (RBGS; Sanders et al. 2003) and represent a complete sub-sample of systems () with IR luminosities originally defined to be in the range of  L 10 L. The GOALS sample includes 200 LIRGs and 22 ULIRGs. The total IR luminosities of the systems were derived using their IRAS measurements and Eq. 1 (see Armus et al. 2009 for further details on this calculation).

Using the spectral images obtained with the short-low module of IRS, Díaz-Santos et al. (2010) measured the spatial extent of the light radiated in the mid-IR continuum at 13.2 m of a sub-sample of 211 individual galaxies (closer than 350 Mpc) for which data were available at the time of publication and sources could be detected. We use these size estimates in our analysis regarding the link between star formation compactness and the ratio. This fraction of extended emission (FEE) is directly related to the spatial distribution of the star formation regions and presents the advantage of being measured in a wavelength range not affected by the presence/absence of emission lines such as PAHs. For the multiple systems unresolved by IRAS, Díaz-Santos et al. (2010) distributed the total IR luminosity between galaxies proportionally to their Spitzer/MIPS–24 m fluxes. Due to this redistribution of the luminosity, there are now 44 galaxies with IR luminosities less than 10 L in our sample. Added to these normal star-forming galaxies, the present sample finally includes 154 LIRGs and 13 ULIRGs (with 10/L410). IRAC-8 m luminosities for these galaxies are from Mazzarella et al. (in prep). Stellar masses were derived by cross-matching the GOALS sample with 2MASS and converting the Ks luminosities into stellar masses (excluding remnants) using a using a mass-to-light ratio /=0.7 M/L computed from PEGASE 2 (Fioc & Rocca-Volmerange 1997, 1999) assuming a Salpeter IMF and an age of 12 Gyr.

## 4 Universality of Ir8 (=LIR/L8): an IR main sequence

### 4.1 The mid-infrared excess problem

Before the launch of Herschel, the derivation of , hence also of the SFR, of distant galaxies had to rely on extrapolations from either mid-IR or sub-mm photometry. While there are many reasons why extrapolations from the mid-IR could be wrong (evolution in metallicity, geometry of star formation regions, evolution of the relative contributions of broad emission lines and continuum), it was instead found that they work relatively well up to 1.5. Using shallower Herschel data than the present study, Elbaz et al. (2010) compared , estimated from Herschel PACS and SPIRE, to – the total IR luminosity extrapolated from the observed Spitzer mid-IR 24 m flux density – and found that they agreed within a dispersion of only 0.15 dex. The CE01 technique used to extrapolate attributes a single IR SED per total IR luminosity. Hence a given 24 m flux density is attributed the of the SED that would yield the same flux 24 m flux density at that redshift.

Stacking Spitzer MIPS-70 m measurements at prior positions defined by 24 m sources in specific redshift intervals, Magnelli et al. (2009) found that the rest-frame 24 m/(1+) and 70 m/(1+) luminosities were perfectly consistent with those derived using the CE01 technique for galaxies at 1.3. Although the 70 m passband probes the mid-IR regime for redshifts 0.8, it presents the advantage of sampling the continuum IR emission of distant galaxies without being affected by the potentially uncertain contribution of PAHs, contrary to that at 24 m. At 1.5 however, extrapolations from 24 m measurements using local SED templates were found to systematically overestimate the 70 m measurements (Magnelli et al. 2011). This mid-IR excess, first identified by comparing with radio, MIPS-70 m and 160 m stacking (Daddi et al. 2007a, Papovich et al. 2007, Magnelli et al. 2011) has recently been confirmed with Herschel by Nordon et al. (2010) on a small sample of 2 galaxies detected with PACS and by stacking PACS images on 24 m priors (Elbaz et al. 2010, Nordon et al. 2010).

Here, thanks to the unique depth of the GOODS–Herschel images, we are able to compare to for a much larger number of galaxies than in Elbaz et al. (2010) and, more importantly, for direct detections at 1.5. In the left-hand part of Fig. 6, we show that the mid-IR excess problem is not artificially produced by imperfections that could result from the indirect stacking measurements, but instead takes place for individually detected galaxies at 1.5 and at high 24 m flux densities, corresponding to 10 L. Although known AGN were not included in the sample, unknown AGN may still remain. Indeed it has been proposed that the mid-IR excess problem could be due to the presence of unidentified AGN affected by strong extinction, possibly Compton thick (Daddi et al. 2007b, see also Papovich et al. 2007). At these high redshifts, the re-processed radiation of a buried AGN may dominate the mid-IR light measured in the 24 m passband, while the far-IR emission probed by Herschel would be dominated by dust-reprocessed stellar light. Indeed, studies of local dusty AGN have demonstrated that their contribution to the IR emission of a galaxy drops rapidly above 20 m in the rest-frame (Netzer et al. 2007). However, this explanation for the mid-IR excess problem was recently called into question by mid-IR spectroscopy of 2 galaxies obtained using the Spitzer IRS spectrograph showing the presence of strong PAH emission lines where one would expect hot dust continuum emission to dominate if this regime were dominated by a buried AGN (Murphy et al. 2009, Fadda et al. 2010) and by deeper Chandra observations (Alexander et al. 2011).

### 4.2 Resolving the mid-IR excess problem: universality of Ir8

We have seen that extrapolations of from 24 m measurements using the CE01 technique fail at 1.5. We also find that using the same technique with another set of template SEDs, such as the DH02 ones, fails in a similar way.

We wish to test the main hypothesis on which the CE01 technique relies, namely, that IR SEDs do not evolve with redshift. If that was the case, then a single SED could be used to derive the of any galaxy whatever the rest-frame wavelength probed, as long as it falls in the dust reprocessed stellar light wavelength range. Indeed, local galaxies are observed to follow tight correlations between their mid-IR luminosities at 6.75, 12, 15, 25 m and (see CE01, Elbaz et al. 2002) as well as with their SFR as derived from the Pa line (Calzetti et al. 2007) for the Spitzer passbands at 8 and 24 m. This technique fails at 1.5, which has until now been interpreted as evidence that distant IR SEDs are different from local ones. However, in order to properly test the redshift evolution of the IR SEDs, it is necessary to compare measurements in the same wavelength range for galaxies at all redshifts. For that purpose, we now compute the same rest-frame mid-IR luminosity, (=[8 m]), defined as the luminosity that would be measured in the IRAC–8 m passband in the rest-frame. We choose this particular wavelength range because it can be computed from 0 to 2.5 with minimum extrapolations using the IRAC-8 m filter for nearby galaxies (0.5), the IRS-16 m peak-up array for intermediate redshifts around 1 (0.5z1.5) and the MIPS-24 m passband at 2 (1.52.5). Even in these conditions, small k-corrections need to be applied in order to calculate for the same rest-frame passband. This was done using the mid-IR SED of M82 for all galaxies. We verified that using other SEDs, such as the CE01 or DH02 templates, would alter by factors that are small when compared with the dispersion of the observed relation. The results are shown in the right-hand part of Fig. 6. Surprisingly, when plotting galaxies at all redshifts and luminosities in the same wavelength range, we no longer see a discrepancy between galaxies above and below 1.5. The sliding median of the ratio, defined as =, – illustrated by white points connected with a solid grey line in the right-hand part of Fig. 6 – remains flat and equal to =4.9 [-2.2,+2.9] (solid and dashed lines in Fig. 6-right) from =10 to 510 L or equivalently from =510 to 310 L. The 68 % dispersion around the median is only 0.2 dex.

In order to test possible selection effects on the galaxies used to determine the ratio, we combined Herschel detections with stacked measurements on 24 m prior positions. This was done by defining intervals of luminosity in , e.g., from the 16 m band for sources around 1 or 24 m for sources around 2. In a given interval, we determined the median of the obtained for detections on one hand (white dots connected with a solid line in the right-hand part of Fig. 6) and on the other hand measured average PACS 100 m and 160 m flux densities for the sources with no Herschel detection by stacking sub-images of 60″ on a side at their 24 m prior positions. These sub-images were extracted from the residual images to avoid contamination by detections. The average stacked PACS-100 m and 160 m flux densities were converted into total IR luminosities using the CE01 library of template SEDs, selected based on luminosity at the median redshift of the galaxies in that luminosity interval. We found no systematic difference when deriving from the PACS 100 m or 160 m data when using the CE01 templates for the extrapolation (see also Elbaz et al. 2010) . Both PACS bands gave consistent values for . The two values obtained for from detected and stacked undetected sources were then combined according to a weight depending on the number of sources in each group within this interval and on the signal-to-noise ratio of these measurements (quadratically), in order to avoid giving the same weight to both measurements if they have the same number of sources but very different S/N ratios. The resulting relation is shown with yellow open triangles separately for each GOODS field. Since the 100 m and 160 m gave similar results, we only present in the right-hand part of Fig. 6 the result obtained from the 100 m band. Again, the typical ratio appears to be flat, independent of both luminosity and redshift. The range of luminosities probed by GOODS–Herschel varies as a function of redshift as shown in the upper panel of Fig. 7, where we represent the distribution of total IR luminosities measured with Herschel as a function of redshift for the galaxies classified as “clean” (Sect. 2). This is due to the combination of limited volume at low redshifts – limiting the ability to detect rare luminous objects – and depth at high redshifts – limiting the ability to detect distant low luminosity objects. In the bottom panel of Fig. 7, we show the redshift evolution of the ratio. It is flat up to 2 and then, due to the shallower detection limit of Herschel compared to Spitzer–24 m, it is slightly larger than the typical value, since only galaxies with high / can be detected by Herschel.

Hence, we do not see a mid-IR excess when comparing systematically to 8 m rest-frame data. In particular, if AGN were playing a more important role at 1.5 than at lower redshifts, we would expect to see a change in at this redshift cut-off contrary to what is actually observed. The cause for the mid-IR discrepancy is therefore not specific to galaxies at , but is instead due to the templates used to represent ULIRGs. Locally, galaxies with 10 L are very rare, most probably because galaxies today are relatively gas-poor compared to those at high redshift. Moreover, they have infrared SEDs that are not typical of star-forming galaxies in general, including those of most distant ULIRGs. The majority of high-redshift galaxies, even ultraluminous ones, share the same IR properties as do local, normal, star-forming galaxies with lower total luminosities. Galaxies with SEDs like those of local ULIRGs do exist at high redshift, but they do not dominate high redshift ULIRGs by number as they do in the present day.

### 4.3 Origin of the “mid-IR excess” discrepancy

Fig. 8 shows the original data that were used to build the CE01 library of template SEDs. The solid line in the figure shows the relation traced by the SED templates. Originally, the mid-IR luminosity was computed from the ISOCAM–LW2 filter at 6.75 m, , which we convert here to using the SED of M82. The conversion was validated by a sub-sample of galaxies for which we have measurements with both ISOCAM–LW2 and IRAC–8 m. While the trend followed by the CE01 templates is consistent with the GOODS–Herschel galaxies below 10 L, there is a break above this luminosity threshold that was required to fit the local ULIRGs in this diagram.

In Fig. 9-left, we supplement the original local ISO sample with the 287 AKARI galaxies introduced in Sect. 3.3.2. With this larger sample, we see galaxies extending the low luminosity trend beyond the threshold of 10 L, with a flat ratio. This trend is similar to the one found for the GOODS–Herschel galaxies (background larger orange symbols as in Fig. 6) and the extended local sample is well contained within the 16th and 84th percentiles around the median of the GOODS–Herschel sample (solid and dashed lines in Fig. 6).

The median of both samples are very similar (see Eqs. 3,2),

 IR8local=4.8   [−1.7,+6.4] (2)
 IR8GOODS−Herschel=4.9   [−2.2,+2.9] (3)

Note, however, the large upper limit of the 68 % dispersion in Eq. 2, which is mainly due to the elevated values of the local ULIRGs, as seen in the left-hand panel of Fig. 9. The medians of both samples are shifted to higher values because of the asymmetric tails of galaxies with large values of , as shown in the right-hand part of Fig. 9 where we compare the distribution for the local ISOAKARI galaxies (upper panel) with that of the GOODS–Herschel galaxies (lower panel). Both distributions present the same properties: they can be fitted by a Gaussian and a tail of high– values. The central values and widths of the Gaussian distributions are very similar for both samples (Eqs. 4,5),

 IR8local(center Gaussian)=3.9   [σ=1.25] (4)
 IR8GOODS−Herschel(center Gaussian)=4.0   [σ=1.6] , (5)

again reinforcing the interpretation that the distant galaxies behave very similarly to local galaxies. If the IR SED of galaxies were different at low and high redshift, then one would not expect them to have the same distributions in .

Hence we do not find evidence for different IR SEDs in distant galaxies. Instead, we find that local and distant galaxies are both distributed in two quite well-defined regimes: a Gaussian distribution containing nearly 80 % of the galaxies, which share a universal ratio of 4, and a sub-population of 20 % of galaxies with larger values. The exact proportion of this sub-population is not absolutely determined from this analysis, since it depends on the flux limit used to define the local reference sample, while the distant sample mixes together galaxies spanning a large range of redshifts and luminosities. Nevertheless, the objects in the high- tail remain a minority at both low and high redshift compared with those in the Gaussian distribution.

In the following, we call the dominant population “main sequence” galaxies, since they follow a Universal trend in valid at all redshifts and luminosities. We also justify this choice in the next sections by showing that this population also follows a main sequence in SFR – , while galaxies with an excess ratio systematically exhibit an excess sSFR (=SFR/). In the local sample, ULIRGs are clearly members of the second population whereas 2 ULIRGs mostly belong to the Gaussian distribution, hence are main sequence galaxies. It is therefore the weight of both populations that has changed with time and that is at the origin of the mid-IR excess problem. The CE01 SED library, illustrated by a blue line in Figs. 8 and 9, reaches values of that are more than five times larger than the typical value for main sequence galaxies. This leads to an overestimate of when the SED templates for local ULIRGs are used to extrapolate from 24 m photometry for main sequence galaxies at 2. Note, however, that it is not necessary to call for a new physics for the IR SED of these galaxies that would justify, e.g., stronger PAH equivalent widths, since most of the distant LIRGs and ULIRGs belong to the same main sequence as local normal star-forming galaxies. It is well-known that local (U)LIRGs are experiencing a starburst phase, with compact star formation regions, triggered in most cases by major mergers (see e.g., Armus et al. 1987, Sanders et al. 1988, Murphy et al. 1996, Veilleux, Kim & Sanders 2002 for ULIRGs and Ishida 2004 for LIRGs). This leads us to the investigation of the role of compactness presented in the next section. Indeed, if local ULIRGs are known to form stars in compact regions and are found to be atypical in terms of , then it would be logical to expect that distant ULIRGs instead are less compact, perhaps as a result of their higher gas fractions. Note also that galaxies with an excess ratio are found at all luminosities and redshifts and are not only a characteristic of ULIRGs.

## 5 Ir8 as a tracer of star formation compactness and “starburstiness” in local galaxies

The size and compactness of the star formation regions in galaxies is a key parameter that can affect the IR SED of galaxies. Chanial et al. (2007) showed that the dust temperature (T) estimated from the IRAS 60 over 100 m flux ratio, R(60/100), is very sensitive to the spatial scale over which most of the IR light is produced. It is known that there is a rough correlation of R(60/100), hence T, with (Soifer et al. 1987): locally, the most luminous galaxies are warmer. This relation has recently been established with AKARI and Herschel in the local and distant Universe (Hwang et al. 2010a). Locally, where galaxies can be spatially resolved in the far-IR or radio, Chanial et al. (2007) showed that the dispersion in the – T relation was significantly reduced by replacing by the IR surface brightness, . We extend this analysis to the relation between this star formation compactness indicator, , and , the far-IR over mid-IR luminosity ratio. In the present study, the term “compactness” is used to refer to the overall size of the starburst and not to the local clumpiness of the various star formation regions, which we cannot measure in most cases.

An extension of the Chanial et al. analysis to the brighter IR luminosity range of (U)LIRGS has become possible thanks to the work of Díaz-Santos et al. (2010). They used Spitzer/IRS data to derive the fraction of extended emission of the mid-IR continuum of the GOALS galaxy sample (Sect. 3.3.3) at 13.2m.

### 5.1 Determination of the projected star formation density

Due to the limited angular resolution of far-IR data, we first estimate the sizes of star formation regions from radio imaging by cross-matching the local galaxy sample with existing radio continuum surveys and then convert them into far-IR sizes using a correlation determined from a small sample of galaxies resolved in both wavelength domains as in Chanial et al. (2007).

The IRAS-60 m and VLA radio continuum (RC, 20 cm) azimuthally averaged surface brightness profiles of a sample of 22 nearby spiral galaxies was fitted by a combination of exponential and Gaussian functions by Mayya & Rengarayan (1997). The angular resolutions of the 60 m and 20 cm maps used in that study was about 1′, so we deconvolved their synthesized profiles by a 1′ beam and derived the intrinsic half-light radii r (at 60 m) and r. The half-light radii estimates at both wavelengths are strongly correlated (Fig. 10); a logarithmic bisector fit to the data is given in Eq. 6:

 rIR=(0.86±0.05)  rRC (6)

Hence in the following, we estimate the far-IR sizes of the star formation regions of our local galaxy sample from their radio continuum half-light radius using Eq. 6. The existence of such correlation is not surprising, since the radio and far-IR emission of star-forming galaxies are known to present a tight correlation (Yun et al. 2001, de Jong et al. 1985, Helou, Soifer & Rowan-Robinson 1985): the radio emission is predominantly produced by the synchrotron radiation of supernova remnants and the bulk of the far-IR emission is due to UV light from young and massive stars reprocessed by interstellar dust. Hence, we consider this size estimate to be a good proxy for the global size of the star formation regions of galaxies. This is obviously an approximation, since this does not account for the clumpiness or granularity of the region, but this is the best that we can do with existing datasets.

Our local galaxy sample was cross-matched with the NRAO VLA Sky Survey (NVSS, Condon et al. 1998) and the Faint Images of the Radio Sky at Twenty-cm (FIRST, Becker, White & Helfand 1995), both obtained with the VLA at 20 cm. A total of 11, 47 and 58 galaxies have radio sizes in our ISO, AKARI and Spitzer local galaxy samples (see Table 2).

We computed the IR surface brightness using Eq. 7,

 ΣIR=LIR/2πr2IR , (7)

where the IR luminosity is divided by 2 since r is the far-IR (60 m) half-light radius, which is derived from the 20 cm radio measurements using Eq. 6.

#### Mid-IR compactness

Using the low spectral resolution staring mode of the Spitzer/IRS, Díaz-Santos et al. (2010) measured the spatial extent of the mid-IR continuum emission at 13.2m for 211 local (U)LIRGs of the GOALS sample (see section 2.2.3). The 13.2m emission probes the warm dust (very small grains, VSGs) heated by the UV continuum of young and massive stars, and hence traces regions of dust-obscured star formation. Instead of measuring the half-light radius of the sources at this wavelength, Díaz-Santos et al. (2010) calculated their fraction of extended emission, or FEE, which they defined as the fraction of light in a galaxy that does not arise from its spatially unresolved central component. Conversely, the compactness of a source can be defined as the percentage of light that is unresolved, that is, 100(1FEE). The angular resolution of Spitzer/IRS at 13.2m is which, at the median distance of the sample used in this work, 91 Mpc, results in a spatial resolution of 1.7 kpc.

In the following, we consider galaxies as “compact” if their 13.2m compactness is greater than 60%. With this definition, we find that 55% (117/211) of the GOALS galaxies are compact. Interestingly, while it is true that the fraction of galaxies showing compact star formation (i.e., compact hot dust emission) increases with increasing (hence also with SFR), the compact population is not systematically associated with the most luminous sources. On the contrary, galaxies with compact star formation can be found at all luminosities (see Figure 4 of Díaz-Santos et al. 2010).

#### Identification of the galaxies with compact star formation

In order to check whether both star formation compactness indicators are consistent, we used the 58 galaxies from the GOALS sample for which we can determine both from the radio sizes (Sect. 5.1.1) and a 13.2 m compactness (Sect. 5.1.2). The comparison of both compactness indicators shows a correlation with a dispersion of 0.45 dex (Fig. 11). The critical threshold of 60 % in the 13.2 m compactness above which we classify galaxies as compact corresponds to 310 L kpc. Hence, we hereafter classify as compact the galaxies for which 310 L kpc. This threshold is more than two orders of magnitude lower than typical upper limits for star formation on small (kpc) scales (see Soifer et al. 2001). We note that if it were not averaged on large scales, the local star formation surface density could be much higher in many of these sources.

In Fig. 12, we present the distribution of far-IR sizes of extended (red) and compact (blue) galaxies, estimated from radio 20 cm imaging using Eq. 6. The median far-IR sizes of compact and extended galaxies are 0.5 kpc and 1.8 kpc respectively. The typical spatial resolution reached at 13.2 m, i.e., 1.7 kpc, is close to the typical size of extended galaxies and is significantly larger than the median size for compact galaxies. This contributes to the relatively high dispersion seen in Fig. 11. With a linear resolution of 0.2 kpc at the average distance of the GOALS sample, the radio estimator is therefore a finer discriminant of compact galaxies when good quality radio data exist.

In the following, we use both compactness indicators, i.e., radio and 13.2 m, with no distinction to define the projected IR surface brightness, (=/(2)). For galaxies with a measured radio size, is computed using Eq. 6, while for galaxies with a 13.2 m compactness estimate but no radio size we use the relation presented in Fig. 11.

### 5.2 Ir8, a star formation compactness indicator

The ratio is compared to the IR surface brightness, , in Fig. 13. The number of galaxies presented in this figure is larger than in Fig. 11 because we include sources with no radio size estimate as well. We find that is correlated with for local galaxies following Eq. 8,

 IR8=0.22 [−0.05,+0.06]×Σ0.15IR , (8)

where is in L kpc. Hence is a good proxy for the projected IR surface brightness of local galaxies. Galaxies with strong ratios are also those which harbor the highest star-formation compactness.

We showed in Fig. 11 that galaxies having more than 60 % of their 13.2 m emission unresolved by Spitzer–IRS, defined as compact star-forming galaxies by Díaz-Santos et al. (2010), presented an IR surface brightness of 310 Lkpc. This threshold is illustrated in Fig. 13 by a vertical dotted line. It crosses the best-fitting relation of Eq. 8 at =8, i.e., twice the central value of the Gaussian distribution of main sequence galaxies (Fig. 9 and Eq. 4). As a result, compact star-forming galaxies, with 310 Lkpc, systematically present an excess in , whereas nearly all galaxies with extended star formation exhibit a ’normal’ , i.e., within the Gaussian distribution of Fig. 9. This is illustrated in Fig. 14, reproducing the diagram for local galaxies of Fig. 9, this time including the GOALS sample. The sub-sample of galaxies with measured IR surface brightnesses are represented with large symbols, with blue and red marking galaxies with extended and compact star-formation respectively, i.e., lower and greater than 310 Lkpc. Galaxies with compact star formation systematically lie above the typical range of values. The trend can be extended to the local ULIRGs with no size measurement, since they are known to experience compact starbursts driven by major mergers (Armus et al. 1987, Sanders et al. 1988, Murphy et al. 1996, Veilleux, Kim & Sanders 2002).

Very interestingly, the proportion of galaxies with compact star formation rises with following a path very similar to the proportion of galaxies with 8 (Fig. 15), the 68 % upper limit of the GOODS–Herschel galaxies (Eq. 3). Hence, can be considered as a good proxy of the star formation compactness of local galaxies. This can be very useful for galaxies with no radio size measurement.

In comparison, the fraction of excess sources within the GOODS–Herschel sample remains low (around 20 %, see Fig. 6-right and Fig. 9) and never reaches such high proportions as seen in local ULIRGs. Due to the Herschel detection limit, however, only ULIRGs are individually detected at 2. The parameter is found to be biased towards high values in these galaxies which are responsible for the increase in the compactness fraction in the Herschel sample at the highest redshifts from 20 to 40 %.

Globally, this analysis suggests that compact sources have been a minor fraction of star-forming galaxies at all epochs, but locally, due to the low gas content of galaxies, compact sources make the dominant population of ULIRGs. Extending the analysis of local galaxies to the distant ones, this also suggests that compact star formation takes place at all luminosities but does not dominate the majority of distant ULIRGs. Conversely, knowing the compactness and mid-IR luminosity of a galaxy, one may optimize the determination of its total IR luminosity from mid-IR observations alone. This is discussed in Sect. 7.

Finally, we have assumed in this section that the compactness measured either from the radio or from the mid-IR continuum is associated with star formation. We discuss the role of AGN in Sect. 8, but we can already note that when an AGN contributes to the IR emission of a galaxy, it does so mainly at wavelengths shorter than 20 m (Netzer et al. 2007, Mullaney et al. 2011a). If AGN were contributing to the infrared emission, they would tend to boost relative to , therefore reducing . Instead, we see that an increasing compactness corresponds to an increase in as well. This reinforces the idea that we are dealing here with star formation compactness and not an effect produced by the presence of an active nucleus. In the next section, we show that compact star-forming galaxies are generally experiencing a starburst phase.

### 5.3 Ir8, a starburst indicator

In the previous section, we have seen that high values were systematically found in galaxies with compact star formation regions. We now show that these galaxies are experiencing a starburst phase. In the following, a star-forming galaxy is considered to be experiencing a starburst phase if its “current SFR” is twice or more stronger than its “averaged past SFR” (SFR), i.e., if its birthrate parameter =SFR/SFR (Kennicutt 1983) is greater than 2. Here SFR=/, where is the age of the galaxy. Alternatively, a star-forming galaxy may be defined as a starburst if the time it would take to produce its current stellar mass, hence its stellar mass doubling timescale, , defined in Eq. 9,

 τ [Gyr]=M⋆ [M⊙] / SFR [M⊙ Gyr−1]=1/sSFR [Gyr−1] , (9)

is small when compared to its age. Both definitions are equivalent if one assumes that galaxies at a given epoch have similar ages.

In recent years, a tight correlation between SFR and has been discovered which defines a typical specific SFR, (sSFR = SFR/), for “normal star-forming galaxies” as opposed to “starburst galaxies”. This relation evolves with redshift but a tight correlation between SFR and is observed at all redshifts from 0 to 7 (Brinchmann et al. 2004, Noeske et al. 2007, Elbaz et al. 2007, Daddi et al. 2007a, 2009, Pannella et al. 2009, Magdis et al. 2010a, Gonzalez et al. 2011). Hence, we use the sSFR definition of a starburst since it can be applied at all lookback times. In the present section, we consider only local star-forming galaxies. The SFR – relation for local AKARI galaxies is shown in the left-hand part of Fig. 16. The best fit to this relation is a one-to-one correlation (0.26 dex–rms), hence a constant sSFR0.25 Gyr or 4 Gyr (Eq. 9). Local galaxies with compact star formation (large red dots), as defined in the previous section, are systematically found to have higher sSFR, i.e., 1 Gyr, than that of normal star-forming galaxies.

Since there is a continuous distribution of galaxies ranging from the normal mode of star formation, with 4 Gyr, to extreme starbursts, that can double their stellar masses in 50 Myr, we quantify the intensity of a starburst by the parameter , which measures the excess in sSFR of a star-forming galaxy (which we label its “starburstiness”), as defined in Eq. 10:

 RSB=sSFR/sSFRMS=τMS/τ    [> 2  for starbursts] , (10)

where the subscript MS indicates the typical value for main sequence galaxies at the redshift of the galaxy in question. A starburst is defined to be a galaxy with 2. 75 % of the galaxies with compact star formation (310 L kpc) have 2, hence are also in a starburst mode, and 93 % of them have 1. Conversely, 79 % of the starburst galaxies are “compact”. Globally, starburst galaxies with sSFR 2 sSFR have a median 1.610 L kpc, hence more than 5 times higher than the critical IR surface brightness above which galaxies are compact. The size of their star-forming regions is typically 2.3 times smaller than that of galaxies with sSFR.

The sSFR and are correlated with a 0.2 dex dispersion (Fig. 16-right) following Eq. 11, where sSFR is in Gyr and in Lkpc:

 sSFR=1.81 [−0.66,+1.05]×10−4 × Σ0.33IR (11)

Both parameters measure specific quantities related to the SFR: the sSFR is measured per unit stellar mass, while is related to the SFR (derived from ) per unit area. Note, however, that it was not obvious a priori that these quantities should be correlated, since the stellar mass of most galaxies is dominated by the old stellar population, whereas the IR (or radio) size used to derive measures the spatial distribution of young and massive stars.

Because the starburstiness and the ratio are both enhanced in compact star-forming galaxies, they are also correlated as shown by Fig. 17. The fit to this correlation is given in Eq. 12:

 RSB=(IR8/4)1.2 (12)

The dispersion in this relation is 0.3 dex. Hence we find that it is mostly compact starbursting galaxies that present atypically strong bolometric correction factors, although there is not a sharp separation of both regimes, but instead a continuum of values.

## 6 Ir8 as a tracer of star formation compactness and “starburstiness” in distant galaxies

In the previous section, we have defined two modes of star formation:

• a normal mode that we called the infrared main sequence, in which galaxies present a universal bolometric correction factor and a moderate star formation compactness, , and

• a starburst mode, identified by an excess SFR per unit stellar mass, hence sSFR, as compared to the typical sSFR of most local galaxies.

Galaxies with an enhanced ratio were systematically found to be forming their stars in the starburst mode and to show a strong star formation compactness (310 Lkpc). In order to separate these two modes of star formation in distant galaxies as well, we first need to define the typical sSFR of star-forming galaxies in a given redshift domain. This definition has become possible since the recent discovery that star-forming galaxies follow a tight correlation between their SFR and with a typical dispersion of 0.3 dex over a large range of redshifts: 0 (Brinchmann et al. 2004), 1 (Noeske et al. 2007, Elbaz et al. 2007), 2 (Daddi et al. 2007a, Pannella et al. 2009), 3 (Magdis et al. 2010a), 4 (Daddi et al. 2009, Lee et al. 2011) and even up to 7 (Gonzalez et al. 2011).

### 6.1 Evolution of the specific SFR with cosmic time and definition of main sequence versus starburst galaxies

In the following, we assume that the slope of the SFR – relation is equal to 1 at all redshifts, hence that the specific SFR, sSFR (=SFR/), is independent of stellar mass at fixed redshift. A small departure from this value would not strongly affect our conclusions and the same logic may be applied for a different slope. At 0, our local reference sample is well fitted by a constant sSFR (see Fig. 16-left), although the best-fitting slope is 0.77 (Elbaz et al. 2007). At 10.3, Elbaz et al. (2007) find a slope of 0.9 but we checked that the dispersion of the data allows a nearly equally good fit with a slope of 1. At 2, Pannella et al. (2009) find a slope of 0.95 consistent with the value obtained by Daddi et al. (2007a) in the same redshift range. Lyman-break galaxies at 3 (Magdis et al. 2010a) and 4 (Daddi et al. 2009) are also consistent with a slope of unity. From a different perspective, Peng et al. (2010a) argue that a slope of unity is required to keep an invariant Schechter function for the stellar mass function of star-forming galaxies from 0 to 1 as observed from COSMOS data, while non-zero values would result in a change of the faint-end slope of the mass function that would be inconsistent with the observations.

However, the slope of the SFR – relation is sensitive to the technique used to select the sample of star-forming galaxies. Karim et al. (2011) find two different slopes depending on the selection of their sample: a slope lower than 1 for a mildly star-forming sample, and a slope of unity when selecting more actively star-forming galaxies (see their Fig.13). Using shallower Herschel data than the present observations, Rodighiero et al. (2010) found a slope lower than unity.

Assuming that the slope of the SFR – relation remains equal to 1 at all redshifts, a main sequence mode of star formation can be defined by the median sSFR in a given redshift interval, sSFR. The starburstiness, described in Eq. 10, measures the offset relative to this typical sSFR. Since at any redshift – at least in the redshift range of interest here, i.e., 3 – most galaxies belong to the main sequence in SFR – , we assume that the median sSFR measured within a given redshift interval is a good proxy to the sSFR defining the MS. Galaxies detected with Herschel follow the trend shown with open circles in Fig. 18 (blue for GOODS–N and black for GOODS–S). We have performed the analysis independently for both GOODS fields in order to check the impact of cosmic variance on our result. To correct for incompleteness, we performed stacking measurements as for Fig. 6 but in redshift intervals. The stacking was done on the PACS-100 m images using the 24 m sources as a list of prior positions. The resulting values (blue upward triangles for GOODS–N and black downward triangles for GOODS–S) were computed by weighting detections and stacking measurements by the number of sources used in both samples per redshift interval. The SFR was derived from extrapolated from the PACS-100 m band photometry using the CE01 technique. The CE01 method works well for 100 m measurements up to 3 as noted already in Elbaz et al. (2010), and we confirm this agreement with the extended sample of detected sources in the present analysis (Sect. 7.3). The trends found for both fields are in good agreement. The stacking detection measurements for GOODS–N are slightly lower than those obtained for GOODS–S which may result from a combination of cosmic variance and the fact that the GOODS–S image is deeper.

The redshift evolution of sSFR (Fig. 18), accounting for both detections and stacked measurements, is well fitted by Eq. 13,

 sSFRMS [Gyr−1]=26×t−2.2cosmic , (13)

where is the cosmic time elapsed since the Big Bang in Gyr. A starburst can be defined by its sSFR following Eq. 14,

 sSFRSB [Gyr−1]>52×t−2.2cosmic . (14)

The intensity of such starbursts, or “starburstiness”, is then defined by the excess sSFR: =sSFR/sSFR. Due to the evolution observed with cosmic time, a galaxy with a sSFR twice as large as the local MS value would be considered a starburst today, but a galaxy with the same sSFR at 1 would be part of the main sequence.

We have seen that for local galaxies, the starburstiness and are correlated (see Fig. 17). The same exercise for distant GOODS–Herschel galaxies, mixing galaxies of all luminosities and redshifts, is shown in Fig. 19. Distant galaxies exhibit a non negligible dispersion, but their sliding median, shown by a thick grey line in Fig. 19, is coincident with the best fit relation for local galaxies (solid and dashed blue lines).

We find that 80 % of the galaxies which belong to the SFR – main sequence – with 0.52 – also belong to the main sequence in =41.6 (Eq. 5). Hence we confirm that the two definitions of “main sequence galaxies” are similar and that on average they represent the same galaxy population. We note also that even though there is a tail toward stronger starburstiness and compactness, i.e., increased and , this regime of parameter space is only sparsely populated in the GOODS–Herschel sample, which suggests that analogs to the local compact starbursts predominantly produced by major mergers remain a minority among the distant galaxy population. Finally, we see that sub-mm galaxies (large open circles in Fig. 19) also follow the same trend.

### 6.2 Star formation compactness of distant galaxies

We have shown that local galaxies with high values also exhibit high ratios. We do not have IR or radio size estimates for the distant galaxy population, but we can use the high resolution HST–ACS images to study the spatial distribution of the rest-frame UV light in the populations of MS and SB galaxies. It has been suggested that distant (U)LIRGs at 1.52.5 (Daddi et al. 2007a) and at 3 (Magdis et al. 2010b) are not optically thick since the SFR derived from the UV after correcting for extinction using the Calzetti et al. (2000) law is consistent with the SFR derived from radio stacking measurements at these redshifts (see also Nordon et al. 2010).

We use HST–ACS images in the (4350 Å), (6060 Å) and (7750 Å) bands to sample the same rest-frame UV wavelength of 2700 Å at =0.7, 1.2 and 1.8 respectively. MS galaxies are selected to have 10.1 (Eq. 10) whereas SB galaxies are defined as galaxies with 2. We also tested a stricter definition for starbursts, 3 to avoid contamination from MS galaxies (Table 3). The result of the stacking of HST–ACS sub-images is shown in Fig. 20 for MS (left column) and SB galaxies with 3 (right column). It is clear that the sizes of the starbursts are more compact than those of the main sequence galaxies. The half-light radius of each stacked image was measured with GALFIT (Peng et al. 2010b) and is listed in Table 3.

These sizes are consistent with those obtained by Ferguson et al. (2004). SB galaxies typically exhibit half-light radii that are two times smaller than those of MS galaxies, implying projected star formation densities that are 4 times larger. We verified that this was not due to a mass selection effect by matching the stellar masses in both samples and obtained similar results, although with larger uncertainties. These sizes are larger than the radio-derived IR half-light radii of the local sample of MS (1.8 kpc) and SB (0.5 kpc) galaxies. However, since the distant galaxy sample has a different mass and luminosity selection than that of the local reference sample, we cannot directly compare their sizes. However, the difference in the relative sizes among the high-redshift galaxies confirms that star formation in distant starbursts is more concentrated than that in distant main sequence galaxies. This, again, is strong evidence for a greater concentration of star formation in galaxies with higher specific SFRs. Since we have seen that sSFR and are correlated (Fig. 19), this implies that in distant galaxies, like in local ones, galaxies with strong ratios are likely to be compact starbursts.

This result is consistent with the work of Rujopakarn et al. (2011), who measured IR luminosity surface densities for distant (U)LIRGs similar to those found in local normal star-forming galaxies. However, we find that this is the case for most but not all high redshift (U)LIRGs. Compact starbursts do exist in the distant Universe, even among (U)LIRGs, but they are not the dominant population.

## 7 Toward a universal IR SED for Main Sequence and Starburst galaxies

### 7.1 Medium resolution IR SED for main sequence (Ir8∼4) and starburst (Ir8>8) galaxies

At 2.5 – where we can estimate the rest-frame from Spitzer IRAC, IRS and MIPS photometry as well as reliable from Herschel measurements at rest-frame 30 m – the (=/) ratio follows a Gaussian distribution centered on 4 (Eq. 5, Fig. 9), with a tail skewed toward higher values for compact starbursts. This defines two populations of star-forming galaxies or, more precisely, two modes of star formation: the MS and SB modes. Galaxies in the MS mode form the Gaussian part of the distribution and present typical sSFR values (i.e., 1) while SB exhibit stronger values (see Fig. 9) and a stronger “starburstiness” (2).

is universal among MS galaxies of all luminosities and redshifts. This suggests that these galaxies share a common IR SED. In the local Universe, the rest-frame , , , from IRAS and from ISOCAM were also found to be nearly directly proportional to (see CE01 and Elbaz et al. 2002), hence reinforcing this idea. To produce the typical IR SED of MS and SB galaxies, we use -correction as a spectroscopic tool. We separate MS and SB galaxies by their ratios: =42 for MS galaxies (as in Eq. 5) and 8 (hence 2 away from the MS) for SB galaxies. We then normalize the individual IR SEDs by a factor 10/ so that all galaxies are normalized to the same reference luminosity of =10 L. The result is shown with light grey dots in the left-hand part of Fig. 21 for MS galaxies and in the right-hand part of Fig. 21 for SB galaxies. A sliding median was computed in wavelength intervals which always encompass 255 galaxies (blue points for MS in Fig. 21-left and red points for SB in Fig. 21-right). As a result, the typical MS and SB IR SEDs have an effective resolution of /=25 and 10 respectively, nearly homogeneously distributed in wavelength from 3 to 350 m.

The typical MS IR SED in the left-hand part of Fig. 21 has a broad far-IR bump centered around 90 m, suggesting a wide range of dust temperatures around an effective value of 30 K, and strong PAH features in emission. Instead, the typical IR SED for SB galaxies (Fig. 21-right) presents a narrower far-IR bump peaking around 70–80 m, corresponding to an effective dust temperature of 40 K, and weak PAH emission lines. We note however, that these prototypical IR SEDs result from the combination of 267 and 111 galaxies for the MS and SB modes, respectively. They therefore should be considered as average SEDs, acknowledging that there is a continuous transition from one to the other with increasing or star-formation compactness. In the next Section, we provide a model fit to these SEDs to better describe their properties.

### 7.2 SED decomposition of main sequence and starburst galaxies

In order to interpret the physical nature of the MS and SB SEDs derived in the previous section, we adopt a simple phenomenological approach. We decompose the two classes of SEDs with the linear combination of two templates, shown in Fig. 22: (1) a “star-forming region” component including Hii regions and the surrounding photo-dissociation region (labeled SF), and (2) a “diffuse ISM” (interstellar medium) component accounting for the quiescent regions (labeled ISM). The luminosity ratio of the two components controls the IR8 parameter. This SED decomposition is not unique and the two components used here are not rigorously associated with physical regions of the galaxies.

The SED of each sub-component is given by the model of Galliano et al. (2011, in prep.; also presented by Galametz et al. 2009). This model adopts the Galactic dust properties of Zubko, Dwek & Arendt (2004). To account for the diversity of physical conditions within a galaxy, we combine the emission of grains exposed to different starlight intensities, (normalized to the solar neighborhood value of ). We assume, following Dale et al. (2001), that the mass fraction of dust exposed to a given starlight intensity follows a power-law (index ): . The two cutoffs are and . We fit the two SEDs simultaneously, varying only the luminosity ratio of the two components. We add a stellar continuum to fit the short wavelengths (see Galametz et al. 2009 for a description). This component is a minor correction. In summary, the free parameters for the fit are:

• the starlight intensity distribution parameters (, and ) of each sub-component;

• the PAH mass fraction and charge of each sub-component;

• the luminosity ratio of the two components for the main sequence and for the starburst;

• the contribution of the stellar continuum (negligible here)

The fits are shown with solid black lines in Fig. 21 while the derived templates that we used for the decomposition are shown in blue and red lines in Fig. 22. The most relevant parameters are summarized in Table 4. The “diffuse ISM” component has colder dust and a larger PAH mass fraction than the “star-forming region” SED.

The main differences between galaxies in the MS and SB modes are:

• the effective T of galaxies in the SB mode is warmer than that of MS galaxies, i.e., 40 K versus 31 K;

• the contribution of diffuse ISM emission to the SB SED is negligible (8 %), consistent with the strong compactness seen both for local starbursts (in radio and mid-IR imaging, see Sect. 5) and for high redshift analogs (in the rest-frame UV, see Sect. 6.2);

• the MS SED requires a wider distribution of dust temperatures, typically ranging from 15 to 50 K;

• the stronger contribution of PAH lines to the broadband mid-IR emission in the MS SED is the main cause for the difference in ratios between the two populations.

Galaxies are distributed continuously between the MS and various degrees of SB strength, hence this decomposition technique can be used in the future to produce SEDs suitable for ranges of or sSFR values, in the form of a new library of template SEDs. We note, however, that this decomposition of the typical MS and SB SEDs is not unique. For example, the SB SED is very similar to the CE01 template for a local galaxy with  L galaxy in the local Universe, which turns out to be close to the observed median luminosity of the starbursts. Instead, the MS SED is closer to the CE01 SED for a 410 L galaxy in the local Universe.

We note also that a direct fit of the Rayleigh-Jeans portions of both SEDs would favor an effective emissivity index of =1.5 for the MS and =2 for the SB. However, this is a degenerate problem. Indeed, the effective emissivity index is not necessarily equal to the intrinsic of the grains. A temperature distribution of grains having an intrinsic would flatten the sub-mm SED and can give an effective of , as it is the case for our star-forming region. Finally, it is also not possible to disentangle some potential contribution from an AGN, particularly for the SB SED. Indeed, AGN are known to be ubiquitous in LIRGs (Iwasawa et al. 2011) and ULIRGs (Nardini et al. 2010), and they may contribute in part to the mid-IR continuum, mostly in SB SEDs, since those are both more compact and exhibit lower PAH equivalent widths than they do MS galaxies. However, even if AGN may contribute to some fraction of the light in these galaxies, they cannot dominate both in the mid and far-IR regimes since we find evidence that PAHs dominate around 8 m in both MS and SB galaxy types, even if they are stronger in the MS SED. The high values measured for SBs also suggest that star formation dominates the IR emission in these galaxies. In Sect. 8 we present a technique to search for hidden AGN activity in the GOODS–Herschel galaxies.

### 7.3 Derivation of total IR luminosities from monochromatic measurements

Now that we have defined a typical IR SED for main sequenc