A Transformation of error ellipses in the new frame

Cross-correlation of the 2XMMi catalogue with Data Release 7 of the Sloan Digital Sky Survey1

Key Words.:

The Survey Science Centre of the XMM-Newton satellite released the first incremental version of the 2XMM catalogue in August 2008 . Containing more than 220,000 X-ray sources, the 2XMMi was at that time the largest catalogue of X-ray sources ever published and thus constitutes an unprecedented resource for studying the high-energy properties of various classes of X-ray emitters such as AGN and stars. Thanks to the high throughput of the EPIC cameras on board XMM-Newton accurate positions, fluxes, and hardness ratios are available for a substantial fraction of the X-ray detections. The advent of the 7 release of the Sloan Digital Sky Survey offers the opportunity to cross-match two major surveys and extend the spectral energy distribution of many 2XMMi sources towards the optical bands. This implies building extensive homogeneous samples with a statistically controlled rate of spurious matches and completeness. We here present a cross-matching algorithm based on the classical likelihood ratio estimator. The method developed has the advantage of providing true probabilities of identifications without resorting to heavy Monte-Carlo simulations. Over 30,000 2XMMi sources have SDSS counterparts with individual probabilities of identification higher than 90%. At this threshold, the sample has only 2% spurious matches and contains 77% of all expected SDSS identifications. Using spectroscopic identifications from the SDSS DR7 catalogue supplemented by extraction from other catalogues, we build an identified sample from which the way the various classes of X-ray emitters gather in the multi dimensional parameter space can be analysed and later used to design a source classification scheme. We illustrate the interest of this clean source sample by investigating two scientific use cases. In the first example we show how these multi-wavelength data can be used to search for new QSO2s. Although no specific range of observed properties allows us to efficiently identify Compton Thick QSO2s, we show that the prospects are much better for Compton Thin AGN2 and discuss several possible multi-parameter selection strategies. In a second example, we confirm the hardening of the mean X-ray spectrum with increasing X-ray luminosity on a sample of over 500 X-ray active stars and reveal that on average X-ray active M stars display bluer colour indexes than less active ones. Although this catalogue of 2XMM-SDSS sources cannot be used directly for statistical studies, it nevertheless represents an excellent starting point to select well defined samples of X-ray-emitting objects.

1 Introduction

The growing collecting area and sensitivity of modern astronomical detectors combined with the increasing storage and processing capabilities offered by current computer facilities has made possible the gathering on comparatively short time scales of very large sky surveys that were beyond reach only a few years ago. Most parts of the electromagnetic spectrum benefit from this evolution. Among recently completed or ongoing projects are the Two Micron All Sky Survey (2MASS) (Cutri et al. 2003) and the Sloan Digital Sky Survey (Adelman-McCarthy et al. 2008) for instance. Space-borne missions currently in operation such as the Spitzer Space Telescope (Werner et al. 2004) observing in the infra-red or the Chandra (Weisskopf et al. 2000) and XMM-Newton (Jansen et al. 2001) X-ray observatories are collecting at a high rate a wealth of measurements on an unprecedented number of objects in their energy range. In the relatively near future, ground-based automated very large telescopes such as pan-STARRS (Wang et al. 2010) or such as the Large Synoptic Survey Telescope (Tyson 2002) will collect detailed photometric information on a breathtaking number of faint galaxies.

Merging measurements arising from several instruments allows us to build spectral energy distributions in a range of wavelengths extending over a large part of the electromagnetic spectrum. The recent availability of wide angle surveys with high detection sensitivities allows us to measure with comparable accuracies and in several scientifically important wavelength ranges the spectral energy density of the main classes of X-ray emitting astrophysical sources. Building large homogeneous samples provides valuable insight on the emission mechanisms and evolutionary processes and may allow the detection of rare objects or outliers, which would be otherwise hard to unveil in smaller samples. In this respect, a good estimate of the true rate of false cross-identification is important to assess the relevance of any group of outliers.

However, the gathering of large groups of sources with well characterised multi-wavelength properties first requires a proper handling of the cross-matching process between two or more catalogues. Although spatial resolution at high-energy steadily increased during the last years and may go on improving in the future, source density also grows as a result of the improved sensitivity, and the risk of confusion between unrelated objects detected at different wavelengths does not necessarily vanish. The confusion problem can be particularly arduous when comparing catalogues with very different spatial resolutions and densities, a problem often encountered in the identification process of high-energy sources which in several cases lack the superb spatial resolution affordable for instance in the optical domain, see e.g. Rutledge et al. (2000) for the identification of ROSAT sources and Luo et al. (2010) for a recent example involving multi-wavelength catalogues with different depths and angular resolutions.

The XMM-Newton satellite (Jansen et al. 2001) was launched by the European Space Agency late in 1999. XMM-Newton is currently the X-ray (0.2-12 keV) telescope in operation with the largest effective area. Three co-aligned telescopes feed two EPIC MOS (Turner et al. 2001) and one EPIC pn (Strüder et al. 2001) cameras. Two reflection grating arrays deviate about half of the X-ray photons from the EPIC MOS camera towards two Reflection Grating Spectrometers (RGS; den Herder et al. 2001). An Optical Monitor (OM; Mason et al. 2001), providing UV and optical images of a fraction of the field of view covered by the EPIC cameras down to the 21 magnitude, complements the X-ray instrumentation. One of the remarkable properties offered by the X-ray telescopes on-board XMM-Newton is to provide a large field of view of 30′ diameter with a weakly degraded image point-spread function and low vignetting even at large off-axis angles. Accordingly, a large number of sources may be serendipitously discovered around the main target of the observation, which builds up to make an X-ray survey with an unprecedented combination of sensitivity and area covered. Starting from the beginning of the project, ESA recognised the high scientific interest of exploiting the XMM-Newton survey and appointed the present Survey Science Centre (SSC) on a competitive basis. Lead by the University of Leicester, the SSC is a consortium of ten European institutes conducting its activity on behalf of ESA. The SSC responsibilities have been presented in Watson et al. (2001). One of the most demanding tasks given to the consortium is the compilation of a catalogue of all sources serendipitously discovered in the field of view of the X-ray instruments and of their characterisation and identification at least in a statistical way.

Several spectroscopic identification campaigns and multi-wavelength studies have been recently performed by the SSC on samples of thousands of EPIC sources using follow-up observations at 4-m and 8-m class telescopes. The availabilities of the recently published SDSS Data Release 7 (DR7) and of the incremental version of the 2XMM catalogue (2XMMi) offer a unique opportunity to extend the identification work to a much more extended sky area. With its spectroscopic and photometric limiting magnitude about 2 magnitudes brighter than that typically reached for the SSC source samples, SDSS identifications of XMM-Newton sources conveniently expand the identified sample towards brighter magnitudes and at the same time provide access to a rich group of accurately quantified photometric and spectroscopic data.

As part of its scientific activities, the Survey Science Centre of the XMM-Newton satellite has developed a specific cross-correlation algorithm yielding actual probabilities of identification based on positional coincidence and applied this algorithm to the cross-identification of the 2XMMi and SDSS DR7 catalogues, thus creating one of the largest set of optically identified X-ray sources available so far. The result of the cross-correlation is made available as a separate fits file and is also available through the XCat-DB2 (Motch et al. 2007; Michel et al. 2009).

The first sections of this paper present the details of the algorithm used to identify 2XMMi X-ray sources with SDSS DR7 optical objects. We apply the commonly used likelihood ratio to quantify the chance that a SDSS object is the counterpart of the X-ray source. Identification probabilities are computed with an original method that does not rely on Monte Carlo simulations and thus offers a better efficiency when cross-correlating large sets of data. We then describe the range of optical and X-ray parameters occupied by the main astrophysical classes of X-ray emitters and show how source classification could be achieved on this basis. In the last part of this paper, we investigate two example science cases, the search for new QSO2s, and the study of the properties of the X-ray active late-type star population.

2 Description of the cross-correlated catalogues

2.1 2XMMi catalogue

The incremental Second XMM-Newton Serendipitous Source Catalogue (2XMMi) is an extended version of the 2XMM Catalogue (Watson et al. 2009). It has been built from 4117 individual pointed observations performed by the XMM-Newton Observatory and contains 289 083 heterogeneous detections for a total of 221 012 unique X-ray sources. The catalogue covers of the sky over a large range of Galactic latitudes and longitudes. Owing to the wide range of exposure times, the area covered sensitively depends on limiting flux and energy range (see Fig. 8 in Watson et al. 2009). A 90% complete relative sky coverage is reached at  = 1 and 9 10erg cm s in the 0.5-2.0 keV and 2.0-12.0 keV bands respectively. The EPIC cameras encompass a field (FOV) of diameter and are sensitive in the energy range of keV. Source positions have a typical accuracy of . In this paper, we limit our analysis to point-like sources with a positional error smaller or equal to . A source is defined as point-like if its extent maximum likelihood parameter (ep_ext_ml) is . The resulting 2XMMi source sample consists of 264,361 detections and 200,067 unique 2XMMi sources.

2.2 SDSS Data Release 7

The Seventh Data Release of the Sloan Digital Sky Survey (Abazajian et al. 2009), covers  deg , mostly in the northern Galactic cap. A total of 357 million objects have 5 band photometry, among which 1.6 million galaxies, quasars, and stars were spectroscopically observed. Most of the 2000 deg increment over data release 6 are located at low galactic latitude. Astrometric errors are . At the 3% error level, the catalogue reaches magnitude limits in the range of to in the five photometric bands – u, g, r, i and z –. In this paper we only consider the so-called primary sources of the SDSS DR7 Photometric Catalogue as available from the VizieR data server. Primary sources are the “main” detection of an object and have the best defined set of parameters. For most scientific applications, the primary detections are the only ones needed. Source lists have been extracted using the VO ConeSearch protocol. The central point of each query is the centre of the FOV of the XMM-Newton observation considered and the search radius is the distance from the centre to the farthest X-ray source, to which we add for completeness.

3 Counterpart identification procedure

We discuss in (3.1) how we select optical candidates, taking into account arbitrary error ellipses on the source’s spherical coordinates. We compute a likelihood ratio () for each target-candidate pair (3.2). This involves a measure of the local density using a kernel smoothing method (Appendix B). Estimating the true distribution for spurious associations (3.3) then allows us to compute for each target-candidate pair the probability of association only based on positional coincidence (3.4).

3.1 Selection of optical candidates

Selection criterion

We consider a target X-ray source and a candidate optical source with the equatorial coordinates of the X-ray source; , and the error on and on and the correlation between and respectively; the equatorial coordinates of an optical source , and the error on and on and the correlation between and , respectively.

As everybody implicitly does – except Budavári & Szalay (2008) –, we convert the spherical problem into a plane one and positional errors are interpreted as usual 2D Gaussians. We have chosen a projection on a 2D plane with a frame centred on the position of the X-ray source and having for -axis the direction of the optical candidate (Fig. 1). Errors on positions become Gaussians: and with the angular distance between the X-ray and the optical source. As suggested by Sinnott (1984), is computed using the Haversine function. The transformation of the ellipses in the new reference frame is described in Appendix A.

The density of probability that the two sources are at the same location, and thus are the same object, is given by the convolution product of these two distributions. It leads to a new Gaussian:


With , and .

If the optical source is the counterpart of the X-ray source, it falls with a probability inside the ellipse defined by the equation


The completeness we have chosen is a criterion, often used as a compromise between the total number of associations and the number of counterparts missed (). This completeness, , leads in 2D to . In the frame we have chosen, the coordinates of the optical source are and . The selection criterion we adopt will retain all candidates satisfying

Figure 1: Chosen projection plane: the frame is centred on the X-ray source position ; the -axis is the direction towards the optical candidate, located at point . is the angular distance between the two sources. This frame is useful at high declinations when we cannot consider the meridians any longer – the directions of the north pole in and in – to be parallel. It allows us to deal naturally with the poles.

We make the additional following hypotheses. First, we neglect any systematic offset between the positions of the two catalogues. The 2XMMi catalogue as a whole is free of any systematic positional offset in a direction of the sky. This has been checked by cross-correlating the 2XMM catalogue with the SDSS DR5 Quasar catalogue (Watson et al. 2009). For a large number of cases (74% at ), it was possible to correct the astrometry by cross-correlating field X-ray sources with USNO B1.0 entries. When no reliable astrometric correction could be found, increasing the applied systematic error from 0.35″to 1.0″accounts for the possible remaining coordinate offset and rotation affecting all the sources detected in a given observation (Watson et al. 2009). Second, we assume that all positions and associated errors have been computed at the same epoch and therefore corrected for proper motions.

Application to XMM-SDSS DR7 data

The 2XMMi catalogue provides a circular error on position () and a systematic error () for each source . The error on positions of each X-ray source is the quadratic sum of these two values:


Because it is symmetric, we have , and thus is directly equal to .

Positional errors are elliptical in the SDSS DR7 catalogue: , and . The definitions of the different parameters are summarized in Table 1.

Table 1: Summary of the astrometric parameters for the 2XMM catalogue and for the SDSS DR7.

3.2 Likelihood ratio

We compute a likelihood ratio () for each target-candidate pair meeting the criterion of Eq. 3: the probability of finding the optical counterpart at a normalised distance (see below) divided by the probability of having a spurious object at that distance.

The density of probability that the two sources are at the same location knowing and corresponds to the density of probability of having the counterpart in and , assuming that it is the same astrophysical object as the X-ray emitting one. The Gaussian (Eq. 1) can be written in its canonical form . Where and are the semi-major and semi-minor axis, in the eigenvector frame , given by the eigendecomposition of the variance-covariance matrix of ,


We change the scale and switch to polar coordinates, which leads to the dimensionless Rayleigh distribution:


Therefore, the new elementary surface becomes , the surface of the (or ) ellipse.

The we use is inspired by the one described in De Ruiter et al. (1977). As Wolstencroft et al. (1986), we do not only consider the first candidate, but all sources satisfying Eq. 3. We thus replace the probability “of finding the first confusing object at a distance lying between and ” by the one of finding a confusing object between and .

The probability of finding the optical counterpart (cp) at a distance lying between and is


And the probability of finding a spurious object (spur) between an is given by the Poisson law:


We adopt the local surface density of sources at least as bright as , the magnitude of the candidate. Because more sources are available in a same given area, the densities computed with this method are more local – or more accurate – than densities computed in arbitrary bins of magnitudes. It is equivalent to computing local densities using increasingly sensitive instruments. We detail in Appendix B the method used to estimate local densities.

The likelihood ratio is the ratio of the two probability densities (7) and (8):


The formalism we apply here aims at providing probabilities of identification based on positional coincidences only. A Bayesian interpretation of the likelihood ratio method is described in Appendix C. We do not use other information on sources such as the spectral energy distribution. Hence, we do not add an extra term to the as is done for example in Wolstencroft et al. (1986), Sutherland & Saunders (1992) and Brusa et al. (2007). The quantity corresponds to the probability of having among the real counterparts a source of magnitude , or in a bin around (see formula (48) of the appendix). In this case, should be local, but then becomes hard to estimate. In general the estimate of is plagued with considerable errors which, besides the error on the local density estimation, dramatically affect the error on . We will see in Sect. 3.3 that the factor is somehow taken into account in our reliability function.

3.3 Computing reliabilities

Although we use a different definition, a different estimator of the rate of spurious associations and a different function to fit the reliability histogram, we more or less follow the work presented in part 3 of Oyabu et al. (2005). The method originates in Rutledge et al. (2000).

We define the reliability of an association in a given bin of as


where and are the unknown number of candidates which are respectively real and spurious counterparts in a given bin of ; is the number of candidates in a given bin of .

We therefore have to estimate . An often used method consists in correlating X-ray sources with artificial samples of optical sources. The generated samples have the same characteristics as the real sources: same density, same positional errors distribution, etc. Positions are randomly distributed. The sum of the results of these Monte-Carlo samples provides an estimate of the number of spurious associations as function of the distances, of the s, etc. This approach is used by Oyabu et al. (2005) among others. In Stephen et al. (2005) the random sample consists in a list of “anti […] sources”, which are“mirrored in Galactic longitude and latitude”.

We propose here a new method to estimate the number of spurious associations, not based on Monte-Carlo simulations, but instead directly computing their expected results. This scheme offers a better computing efficiency when cross-correlating huge sets of data. The basic idea of estimating the surface of an association related to the total available area can be found in Boller et al. (1998). The method is described in Appendix D.

In order to avoid computing too many local densities for estimating the rate of spurious associations, we divide the magnitude range into bins and associate all sources in the same magnitude bin with the mean value of their local density. The width of the bins depends on the magnitude accuracy of the catalogue. We then compute for all optical and X-ray sources the factor, the and . It is thus possible to compute the histogram of the expected number of spurious associations according to values. To increase the computing efficiency, we can bin the values. However, this approach involves another loss of accuracy for a meagre reduction of computing time.

As shown in Fig. 2, the histograms used in the computation of the reliability are the number of candidates and the number of spurious associations grouped in bin of .

Fitting the reliability function

In order to estimate the number of spurious associations we take the relatively realistic example where the X-ray source has at most one candidate. The reliability of an association (not to be confused with the integrated reliability for all associations having a ) is directly given by (see Eq. (46) of the appendix)


The term – the probability that the optical source is a counterpart divided by the probability that it is spurious – must be independent of the dimensionless distance (see Eq. 6). It is similar to the term used in De Ruiter et al. (1977). However, may depend on the nature of the underlying X-ray source population (e.g. stars, AGN) and may thus vary with source properties such as magnitude, optical colour, or flux ratios. In order to obtain a similar to that used in Brusa et al. (2007) for instance, we would need to consider an additional parameter describing the variation of with the magnitude (or any other relevant property) of the candidate counterpart. Alternatively, may be replaced by another term such as that playing the role of in Budavári & Szalay (2008). An histogram can be built from the and histograms made using the method explained in the previous paragraph. If the ratio were independent of source properties, could be fitted with Eq. (11) using only one free parameter . Including a term in with bins of magnitude, requires us to build histograms and fit each of them with functions in Eq. (11) having different parameters. However, in general the lack of statistics does not allow us to do so.

The histogram can then be seen as the sum of histograms and consequently can be modelled by the function


where is the total number of entries in histogram number .

In practice, the histograms are not binned according to but to . Best fits were obtained using the function


with , i.e. 6 free parameters.

The fit is performed using a Levenberg-Marquard algorithm. We compute the same number of and construct and fit the same number of histograms as there are magnitude bands in the SDSS.

Figure 2: Top: Histograms of the number of associations and of the estimate of the number of spurious associations by bin of . Bottom: Reliability histogram by bin of and its fitted curve. In the example, s have been computed according to the SDSS DR7 magnitude for XMM sources with a systematic error of with a galactic latitude .

3.4 Computing probabilities of identification in the general case

We now extend the Bayesian approach to X-ray sources having candidates. We assume that at most one association is real. This assumption should be fulfilled in our case for at least two reasons. First, we only consider point-like X-ray sources. This condition decreases the probability that the detection results from two distinct unresolved sources blended in the XMM-Newton beam. Second, 95% of the XMM-Newton sources matching a SDSS entry with a probability higher than 90% have a 0.5-2.0 keV flux higher than 1.65  10 erg cm s. At this flux, source confusion is of the order of a few percent only (Cappelluti et al. 2009). The corresponding source density of  500 deg (Cappelluti et al. 2009) is well below the value of 2000 deg, above which simulations show that source confusion becomes important (Loaring et al. 2005). Similar conclusions can be drawn for the hard (2-12 keV) sources.

Let us consider hypotheses:

  • : the i optical source is the counterpart

  • : there is no counterpart.

Then the Bayesian probability that the i source is the counterpart knowing , is


If , Eq. (14) leads to the formula below, obtained following the Rutledge et al. (2000) prescription


With Eq. (11), we easily show that . Computing and normalising the terms , as Rutledge et al. (2000) do to construct from , we obtain the equality . We thus apply Eq. (15) to compute the final probabilities of identification.

Each candidate possesses as many reliabilities as there are magnitude bands in the SDSS. The we consider in the final probability of identification formula are for each source the best of all photometric bands.

4 Observation grouping

XMM-Newton EPIC sources are correlated FOV by FOV, ie, observation by observation. In order to tail off count-rate noise on FOV histogram bins without sacrificing resolution, we have to increase count statistics. We therefore stacked data from similar FOV:

  • We split into two groups XMM-Newton FOV with different systematic errors on position: 0.35” or 1.0”.

  • Observations of the LMC and SMC regions are set apart.

  • Because they presumably share objects of same nature and same patterns of logN-logS relation, observations are grouped according to their galactic latitude.

As mentioned above, the relation between reliability and likelihood ratio depends on the overall properties of the X-ray populations present in the optical sample. In addition to galactic latitude, we also tested whether the X-ray flux could significantly modify the shape of the curves. Splitting further 2XMMi sources into groups of medium (10–10 erg cm s) and faint (10–10 erg cm s) 0.2-12 keV flux ranges does not change the probabilities of identification by more than 3% in most cases. The only noticeable difference is for faint sources with identification probabilities below 50%, which tend to show even lower identification probabilities by as much as 15%. We felt, however, that since the effect is relatively modest and only affects sources for which the significance of the identification is rather low, priority should be given to the gathering of sufficiently large subsamples. We thus did not consider any X-ray flux dependency in the final implementation.

5 Results of the 2XMMi-SDSS DR7 cross-correlation

Figure 3: Left: Histogram of the individual probabilities of identification. Centre: probabilities of identification versus matching distances. Right: distribution of the distance of SDSS candidates to 2XMMi sources expressed in units of the combined 2XMMi + SDSS positional error; black = all matches, red = identification probabilities 90%.

A total of 1337 XMM-Newton FOV hold at least one source with a SDSS counterpart candidate within the combined 3  search radius. These 1337 FOV contain 95 452 detections, corresponding to 73 636 unique 2XMMi sources. The cross-correlation of the 2XMMi catalogue with the SDSS DR7 leads to 72 169 “associations” involving 45 727 and 55 726 unique 2XMMi and SDSS DR7 sources respectively. This first number represents 20% and 62% of the unique sources available in the entire 2XMMi catalogue and in the 1337 FOV respectively. The distribution of the number of SDSS DR7 candidates by unique 2XMMi sources is given in Table 2, and the main properties of the distribution of the probabilities of identification and of their cumulative values are given in Table LABEL:tab:res_stat. We define the sample completeness as the fraction of 2XMMi sources having an individual probability of identification in SDSS above a given cutoff relative to the total number of 2XMMi sources with SDSS counterparts. In a similar manner, sample reliability is the fraction of non-spurious associations among 2XMMi sources having an individual probability of SDSS association above a given threshold. A total of 7 740 unique 2XMMi sources have several SDSS DR7 candidates. In this sub-sample, there are 896 and 2 672 2XMMi sources for which the candidate with the highest identification probability is not the nearest and the brightest SDSS DR7 candidate respectively.

The left panel of Fig. 3 shows the distribution of the individual SDSS source identification probabilities. Most SDSS entries found within the combined 3 search radius from the 2XMMi source have a high likelihood to be the true optical counterpart. The small tail of very low identification probability objects reflects the expected rising contribution of SDSS entries unrelated to the X-ray source at large matching distances. Most SDSS entries with identification probability higher than 90% are found less than 3 arcsec from the X-ray position (Fig. 3, centre and right panel). The rather wide spread of the 2XMMi positional errors accounts for the scatter affecting the distances at which high-probability SDSS sources are found from the X-ray position.

Expressed in terms of combined 2XMMi + SDSS errors, the distance distribution shown in the right panel of Fig. 3  follows the usual shape of a Rayleigh distribution. Fitting this histogram with a Rayleigh function plus a linear component, we obtain for the Rayleigh curve parameter, 0.178 for the linear slope and for the ratio between the total number of real associations and the total number of spurious ones within the search radius. Fitting separately distance histograms of the sources whose positions were corrected by eposcorr and uncorrected ones leads to , and , respectively. All errors on , on the slope and on the values are about 0.003. Keeping only the best candidate for each unique XMM source, we obtain R=0.81, which is consistent with the value of 79.4% given in Table LABEL:tab:res_stat. The origin of this small apparent overestimate ( 14%) of the positional errors of eposcorr corrected sources is so far unclear. In any case, the effect of this slight change on the identification probabilities is small. The global effect is to slightly decrease the probabilities of SDSS entries matching at large distances and to somewhat increase the probabilities of those located close to the X-ray source.

Nc 1 2 3 4 5 6 7 8 8
Nx 37 988 6059 1196 317 96 47 12 4 8
Table 2: Number of unique 2XMMi sources (Nx) with Nc SDSS DR7 candidates.
Probability of identification cutoffs
id 0.0 0.5 0.7 0.8 0.9 0.95
# det_id 60 567 53 347 49 527 46 387 40 193 32 610
# src_id 45 727 39 839 36 943 34 605 30 055 24 327
R 79.4% 91.7% 94.7% 96.2% 98.0% 99.0%
C 100.0% 96.8% 92.1% 87.5% 77.2% 63.2%
Frac X 62.1% 54.1% 50.2% 47.0% 40.8% 33.0%
Table 3: Cross-correlation statistics.

The practical implementation is described in Appendix E. Whenever optical data are used, we discard SDSS entries with recorded magnitudes fainter than 22.2 in any of the photometric bands considered. Indeed, objects with magnitudes higher than 22.2 tend to have smaller photometric errors than brighter ones, clearly indicating that SDSS photometric uncertainties and perhaps also mean values are not reliable at faint flux. We also ignored all SDSS entries having one of the following flag set: BLENDED, DEBLENDED_AS_MOVING, SATURATED, INTERP_CENTER, EDGE, SATUR_CENTER, PSF_FLUX_INTERP in order to ensure the best photometric quality. Unless specified otherwise, we will hereinafter only consider optical identifications with a probability larger than 90%. This threshold applies to both the spectroscopically identified sample and to the general photometric sample and corresponds to an overall sample purity of 98% (see Table LABEL:tab:res_stat).

6 Building an identified sample

One of the important task given to the SSC is the statistical identification and classification of all X-ray sources discovered in the wide field of view of the EPIC cameras. The statistical determination of the nature of any given 2XMMi source will first rely on the assessment of the reliability of its association with candidate counterparts at other wavelength. The description of this important step and of its results are the goals of the present paper.

On the other hand, the subsequent classification stage requires the knowledge of the parameter space occupied by the various groups of astrophysical sources using a “learning sample”. Therefore, the cross-correlation method presented here allows us to select in a clean and statistically controlled manner the best optical counterparts to 2XMMi sources and constitutes the first mandatory step towards building a reliable learning sample, which can be later used to define source classification schemes using advanced statistical methods. Eventually, the classification method, either supervised or not, will provide the most likely nature of the 2XMMi source (e.g. star, AGN, etc..) with for some methods, an estimate of the probability of the classification. First attempts to classify 2XMMi sources in two classes (stars and extragalactic) have been presented in Pineau et al. (2009) and are now implemented in the XCat-DB for the DR3 of the 2XMM catalogue.

Yet it is also well known that a reliable classification can only be achieved when the corresponding learning sample covers the parameter space spanned by the group of objects to identify as evenly as possible, see e.g. White (2008) or Richards et al. (2004). Being aware of this important requirement, the SSC has designed a general optical identification programme able to explore the widely diverse natures of the X-ray emitting objects discovered in the XMM source catalogues. Several wide field identification campaigns are currently conducted at various X-ray flux levels and galactic latitudes, which all aim at building completely identified source samples. The nature of the high population is the scope of four distinct projects. The bright part is studied by the Bright Sources Survey (XBS or BSS, Della Ceca et al. 2004; Caccianiga et al. 2008). The XMM-Newton Medium Sensitivity Survey (XMS, Barcons et al. 2002; Carrera et al. 2007; Barcons et al. 2007) and the XMM-2dF Wide Angle Survey (XWAS, Tedds et al. 2006) investigate the properties of medium flux sources. The faintest source population is the scope of the Subaru/XMM-Newton Deep Survey (SXDS, Ueda et al. 2008). Finally, the Galactic plane area is covered by the XMM-SSC Galactic Plane Survey (Motch 2005; Motch et al. 2010).

The first step towards building a sample of 2XMMi sources of known astrophysical nature was to select X-ray sources with reliable SDSS DR7 spectroscopic counterparts of a known class (i.e., with the specClass attribute pointing to an astrophysical object). For our purpose, the three most important groups of spectroscopic SDSS targets are the sample of quasar candidates defined by Richards et al. (2004), the main galaxy sample described in Strauss et al. (2002) and all stars belonging to the legacy survey and to the Sloan Extension for Galactic Understanding and Exploration programme (SEGUE, Yanny et al. 2009). The AGN sample is mostly a classically UV-excess (UVX) selected sample to which is added a small number of redder targets appearing as likely high redshift QSOs. The galaxy sample is less biased because it is only selected on brightness related criteria in the band. Stars from the legacy survey were mostly selected on the basis of their extreme colours. Among them, red dwarfs and CVs are the most likely to match 2XMMi sources. The SEGUE programme opens new areas at lower galactic latitudes, and its spectroscopic target selection aims at covering all spectral types.

We therefore extracted the SDSS spectroscopic catalogue accessible via CasJob, and following the SDSS spectral class scheme, define the classes: stars, galaxies, AGN and X-ray accreting binaries. We list below the origin of the different groups of identified sources:

  • Stars : i) 2XMMi/SDSS associations having the specClass attribute set to 1 or 6 and ii) the sample of stars coming from the kernel density classification (see Sect. 6.1).

  • Accreting binaries : i) objects in the Downes catalogue of cataclysmic variables (Downes et al. 2001). We used here the 2006 version, which contains many SDSS discoveries; ii) the Ritter catalogue of cataclysmic variables (Ritter & Kolb 2003) and iii) the Ritter catalogue of LMXRBs (Ritter & Kolb 2003).

  • Galaxies : 2XMMi/SDSS associations with a probability of identification 0.80 and with the specClass attribute set to 2.

  • AGN: i) sources from the Véron catalogue (Véron-Cetty & Véron 2006) and ii) SDSS DR7 objects associated with a 2XMMi source with a probability of identification 0.80 and having the specClass attribute set to 3 or 4 (QSO or high QSOs).

We use the range of X-ray luminosity to define several groups of active galaxies and consider all extragalactic objects as a single class. In particular, we do not make any formal distinction between QSO and AGN3. An X-ray source associated with both a star and an accreting binary was flagged as an accreting binary. We applied the same rules for star–AGN and binary–AGN pairs of apparently conflicting nature.

We added sources identified in the XBS and XMS SSC surveys and with SDSS counterparts. For the XMS, we considered their sources with classes NELG, BLAGN, and BLLac as AGN. For the XBS, AGN2, AGN1, BLLac and elusive AGN were assigned the general AGN type.

QSO2s candidates taken from Zakamska et al. (2003) and Reyes et al. (2008) as well as a handful of X-ray selected objects (see Sect. 8.1 below) having a reliable match in the 2XMMi catalogue were added to the identified sample.

6.1 The stellar identified sample

Building a clean stellar sample turned out to be more difficult because most stellar sources detected in X-rays have optical SDSS magnitudes brighter than 15 mag and are flagged as saturated. Furthermore, the SDSS DR7 spectroscopic database provided only few cross-matches with acceptable properties (i.e. non-saturated and probabilities of identification higher than 90%). Therefore, in order to enlarge the stellar sample, we applied a classification method allowing us to identify stars on the basis of their multi-colour properties.

We performed a kernel density classification (KDC, Richards et al. 2004) on all spatially unresolved (cl=6) SDSS candidates. This selection returns 10 533 SDSS sources with a correlation in the 2XMMi catalogue. The classification only uses the four colours , , and as parameters. The learning sample used for this classification consists of two classes: star and QSO, since we only consider point-like objects in the optical. It has been built from all unresolved SDSS sources, independently of their association with a 2XMMi entry. We only retained good quality detections (i.e. no flag SATURATED, BLENDED, DEBLENDED_AS_MOVING, INTERP_CENTER, EDGE, SATUR_CENTER or PSF_FLUX_INTERP set) that were spectroscopically identified in the DR7. The data have been retrieved from the DR7 database with CasJob. The stellar sample contains 67 269 sources flagged by the SDSS specClass attribute as star (STAR or STAR_LATE) and therefore also contains CVs and WDs. The non–stellar sample has 75 248 sources flagged by the SDSS specClass attribute as QSO (QSO) or high-redshift QSO (HIZ_QSO) plus 253 sources flagged by specClass as galaxy (GALAXY). For simplicity we call the non–star sample AGN sample below.

Estimates of the probability densities were computed using a fixed bandwidth kernel smoothing. The kernel applied uses the Epanechnikov profile and the bandwidth was chosen to be equal to 0.2 mag. Table 4 lists the results of the self-check of the learning sample, i.e., the results of the classification method applied to the learning sample only.

org\assign Star AGN
Star 96.89% 3.11%
AGN 1.62% 98.38%
Table 4: Results of the classification method applied to the learning sample.

The prior probability p(star) has been set to 0.25 and so p(AGN) to 0.75 as a result of iterative kernel density classifications converging to this relative number of stars and AGN in the SDSS/2XMMi learning sample. In order to select SDSS/2XMMi identifications with the best chance to be normal stars, we removed 13% of all SDSS entries classified as stars, but falling in low-density regions of the parameter space (i.e. far from the centre of the stellar multi-colour locus) and thus prone to be doubtful cases such as binaries, unidentified cataclysmic variables, or even mis-identified AGN.

The star/AGN classification has been made according to the optical properties of the spectroscopically identified SDSS objects. However, by construction, the density distribution in colours of the SDSS/2XMMi sample is not likely to follow that of the non X-ray emitting SDSS objects and can thus lead to some biases. For instance, there may be a considerable overdensity of non-X-ray emitting stars in some part of the 4-d colour diagram where most objects classified as AGN appear to be strong X-ray sources. This problem indeed occurs in the region covered by the AGN branch, where there is some overlap with A stars. Although some A stars do emit X-rays for debated reasons, not all do. We thus removed from the SDSS/2XMMi learning sample all classified stars with a colour of less than 1.2 (values taken from Covey et al. 2007).

The final stellar X-ray sample arising from the KDC contains 636 unique entries with a classification probability higher than 99.7% (3 Gaussian ). However, only 549 of these matches have a probability of identification with a 2XMMi source higher than 90% and were therefore entered in the final identified sample.

We also checked that the stellar SDSS/2XMMi sample adhered to the stellar locus derived by (Covey et al. 2007) using synthetic photometry in the 4-D colour space. The agreement is good, apart for the reddest stars of spectral type later than M5.

6.2 The final identified sample

Sample Galaxy AGN QSO
All 1 0 7 21 49 95
Final 1 0 5 14 37 85
Table 5: Distribution of XMS and XBS sources in the identified samples.
Sample Star Accreting Binary Extragalactic
DR7 KDC Dow.4 R1 5 R26 DR7 Veron
Final7 8 541 29 22 2 2021 1524
Table 6: Origins of the objects of each type in the final identified sample.
Sample Star Accr. Binary Extragalactic
Sub 549 26 2336
Table 7: Distribution of number of unique 2XMMi entries with types in the final identified sample.

The origin and distribution of the various classes of identified objects in the final sample are listed in Tables LABEL:xidsamp, 6, and 7. Most of the extragalactic identified sample comes from the DR7 spectroscopic catalogue through the Véron catalogue, while the vast majority of X-ray active stars are actually extracted from the KDC source classification. A large fraction of the identified accreting binaries (mostly cataclysmic variables) also come from follow-up SDSS discoveries through the Downes catalogue.

Finally, for all AGNs with a spectroscopic redshift we compute the observed X-ray luminosity using


where is the 0.2-12 keV X-ray flux in erg scm,  km s Mpc, and . The photon index was taken to be .

7 Grouping X-ray sources in parameter space

7.1 Sample properties and shortcomings

The left and centre panels of Fig. 4 display the X-ray flux distributions of 2XMMi sources with individual identification probabilities with SDSS DR7 entries and , compared to that of all 2XMMi sources present in the SDSS DR7 footprint. Many faint X-ray sources have likely counterparts in the SDSS. Figure 4 also shows that the fraction of 2XMMi sources with likely SDSS identifications does not vary strongly with X-ray flux. It steadily decreases by a factor of 2 from  erg cm s to  erg cm s. The drop in the identification rate at a flux above 10 erg cm s is probably caused by the increasing number of bright optical counterparts, which are likely to be flagged as saturated in SDSS and therefore absent from our sample. On the other hand, the shape of the decline of the SDSS identified fraction with decreasing X-ray flux (centre panel) as well as the observed distribution of the ratio with X-ray flux (right panel) are both consistent with populations of X-ray sources with a weakly varying distribution of ratios with X-ray flux. In other words, comparing the SDSS identified sample with the total sample does not reveal evidence of strong evolution of with redshift. A comparable conclusion was reached in Sect. 4 based on the weak dependency with X-ray flux of the reliability / likelihood ratio relations. It may thus be possible to extrapolate the properties of the 2XMMi/SDSS DR7 photometric identified sources to somewhat fainter X-ray and optical fluxes.

Figure 4: Left panel: X-ray flux distribution. From top to bottom, black line: all X-ray sources found in 2XMMi fields overlapping SDSS DR7, red line: all X-ray sources matching a SDSS entry, red dashed line: 2XMMi/SDSS R7 correlations with individual identification probabilities 90%, blue line: “final” identified sample. Centre panel: ratio of the SDSS identified to total number of 2XMMi sources. Upper curve, filled squares; all matches, lower curve, empty squares, matches with identification probabilities 90%. Right panel: Variation of log(fx/fr) with X-ray flux for 2XMMi/SDSS DR7 sources with identification probabilities 90% and magnitude brighter than 22.2. In all cases, the combined unique EPIC source detection likelihood is 6.

The situation of the “final” spectroscopic identified sample clearly differs. Its X-ray flux distribution strongly differs from that of the 2XMMi/SDSS DR7 photometric sample and from that of the overall 2XMMi sample. Obviously, this discrepancy arises from the higher optical brightness needed by spectroscopic observations (see left panel of Fig.4). In addition, the choice of the spectroscopic targets results from various heterogeneous optical selection criteria and is therefore unlikely to cover all X-ray emitting objects above equally any given X-ray flux threshold. Some examples are outlined in the sections below. We therefore stress that as its stands, this identified sample cannot in any manner be used as a learning sample suitable for a statistically reliable classification of 2XMMi sources with SDSS identifications.

However, the high number of spectroscopic SDSS matches supplemented by other identifications derived from archival catalogues allows us to build an unprecedentedly large sample of X-ray sources of known nature, enriched with accurate multicolour photometry and detailed spectral line measurements. This large collection of best quality data offers a unique opportunity to study to some extent the parameter locii occupied by the different classes of X-ray emitters. However, it also allows addressing two important issues. First, finding the most efficient physical parameters for separating different groups of X-ray sources. Second, it allows us to highlight the parameter regions not well covered by the SDSS observing strategy and therefore in need of extended spectroscopic studies.

Figure 5: Distribution of spatially unresolved objects in the optical band in the versus diagram. Left: the identified sample. Blue: AGNs with 44, green: , magenta: QSO2s - filled squares = X-ray selected - filled triangles = optically selected - encircled = Compton Thick (see Sect. 8.1), red: stars, cyan: accreting binaries. Right: the entire SDSS photometric sample. In this case, the colour codes the range of magnitude. black: 18, red:18-20, green, 20-21, blue:21. We only show SDSS entries with a probability of identification with an X-ray source higher than 90%, and magnitudes brighter than 22.2 and errors on 0.2.
Figure 6: Distribution of spatially resolved objects in the optical band in the versus diagram. Left: the identified sample blue: AGNs with 44, green: , yellow: galaxies with 42 AGN, magenta: QSO2 - filled squares = X-ray selected - filled triangles = optically selected - encircled = Compton Thick (see §8.1), red: stars, cyan: accreting binaries. Right: Entire SDSS photometric sample. In this case, the colour codes the range of magnitude. Black: 18, red:18-20, green, 20-21, blue:21. We only show SDSS entries with a probability of identification with an X-ray source larger than 90%, and magnitudes brighter than 22.2 and errors on 0.2.

The huge merit of the dedicated identification programmes such as the ones carried out by the SSC is to extend spectroscopic identifications to very low optical fluxes and thus offer a unique opportunity to unveil the different populations of extragalactic sources which may appear at fainter optical fluxes and in general at higher X-ray to optical flux ratios. The two strategies, wide and shallow on one hand and narrow and deep on the other hand are indeed quite complementary and suited to best characterise and scientifically investigate the entire serendipitous XMM-Newton catalogues.

7.2 The main classes of X-ray sources

We investigate the distribution of the various classes of X-ray sources in the original instrumental parameter space. In principle, we could also have used the parameter space resulting from a Principal Component Analysis (PCA), thus highlighting the most significant (or information-rich) linear combination of physical measurements. Pineau et al. (2008) showed that the two first eigenvectors deriving from the PCA analysis of the 2XMMi/SDSS DR7 sample gathering the largest data variance are indeed close to the two main ones used here, namely X-ray to optical flux ratios and optical colours. However, taking into account eigen-axes of higher orders, which include X-ray spectral information in the form of hardness ratios, can slightly improve the separation between different classes of sources (Pineau et al. 2008).

We show in Figs. 5 and 6 the positions in the / diagram of the various classes of objects present in the identified sample (left panel) and of all 2XMMi sources with only a cross-identification with a photometric SDSS entry (right panel). Sources spatially unresolved and extended in the optical band are presented separately in the two figures. A majority of 2XMMi sources match with SDSS-DR7 photometric entries close to the limiting magnitude of the SDSS survey (mag 22), and only a relatively small fraction is bright enough to have been selected for spectroscopic observations. For instance, over a grand total of 60567 2XMMi detections matching a SDSS DR7 entry, 87% have an error on their magnitude below 0.2, but only 12% are SDSS spectroscopic targets as well. In addition, the repartition of the spectroscopic targets are far from covering uniformly the parameter space spun by the optical counterparts of the serendipituous XMM-Newton sources.

7.3 Separating stellar from extragalactic sources

As expected, the ratio is a very powerful parameter to separate the late-type stellar X-ray population in which the high-energy emission arises in a magnetic active corona from X-ray luminous sources powered by accretion such as active galactic nuclei or cataclysmic variables. However, the distribution of low galaxies in clearly overlaps with that of active coronae. Introducing the colour index allows us to separate the bulk of the stars, especially the reddest M stars from most galaxies. Nevertheless, many galaxies, in particular of the early type, exhibit optical energy distributions similar to those of G type stars, show comparable and consequently cannot be easily distinguished from stars in the / diagram. Obviously, taking into account the spatial extension of the optical source allows us to efficiently separate them from stars (see Figs. 5 and 6).

Interestingly, the reddest point-like optical sources located on the ”stellar” branch are also the faintest ones with magnitudes in the range of 18 to 20 (see Fig. 5, right panel). They also appear to exhibit the highest ratio. This is consistent with the known increase of the ratio for M stars compared to that of earlier spectral types (see e.g. Vaiana et al. 1981). We note, however, that some high QSOs have been identified with very red point-like objects of ratios approaching those of active coronae (see Fig. 5).

Although cataclysmic variables occupy a locus in the / comparable to that of most quasars, their distribution exhibits a wider spread than that of AGNs. This large scatter can be used to provide a high likelihood identification of their class, at least for part of them. For instance, very blue objects, typically with below 0.2 as well as those with extreme , have a high probability of being cataclysmic variables.

7.4 Distinguishing between the various classes of extragalactic sources

Figure 5 shows that many of the 2XMMi sources having a counterpart in the DR7 of the SDSS cluster in a rather narrow range of blueish colours in the interval of 0.2 to 0.8. They are characterised by a 0 and appear as point-like sources in the optical. Their positions in this diagram overlaps with that of the vast majority of the spectroscopic SDSS AGN found in our identified sample, which for most of them are UV-excess optically selected quasars.

Let us now consider all objects, both spatially resolved and unresolved, occupying the UV excess quasar region ’ values comprised between 0.2 and +0.8 and -1.2). In this range of parameters, the mean of the spectroscopically identified sample appears slightly shifted by 0.3 dex to lower values (i.e. 0.75 mag brighter for a given X-ray flux), compared to that of the photometric sample. Since the mean magnitude of the corresponding spectroscopic and photometric-only groups are of 18.86 and 20.73 respectively, as a result of the necessarily brighter optical flux limit of the spectroscopic sample, this indicates that the photometric sample is dominated by a slightly more remote population of AGN, hence fainter in X-rays and in optical than the spectroscopic sample, albeit with a somewhat larger mean . It can also be seen in Fig. 7 that these UVX spectroscopically identified quasars are the most energetic with X-ray luminosities in excess of 10ergs/s. In a general manner, Fig. 5 shows that the spectroscopically identified sample of point-like objects covers the range of parameters populated by the photometric cross-identifications for both AGN and stars relatively well, except, as quoted above, for the faintest optical matches. This identified sample could thus be used as a learning sample to statistically identify and classify X-ray sources with optical counterparts of comparable brightness.

This is at variance with the situation prevailing for extended sources. As seen in Fig. 6, a considerable number of X-ray sources are identified with red spatially extended photometric objects, i.e., relatively faint reddish galaxies with 1.0 and corresponding 0.5. Unfortunately, the SDSS policy for selecting spectroscopic targets does not cover this region of the parameter space well. In the few cases in which an optical spectrum exists, they are assigned an AGN type. These galaxies are significantly optically brighter than most UVX quasars, most of which are in the mag range of 18 to 21. Their derived X-ray luminosities in the range of 10 erg s (Fig. 7) clearly show that the vast majority of these reddish objects are likely Seyfert galaxies. This population extends downwards to lower ratios, narrowing the range spanned and decreasing their X-ray luminosities. Eventually the brightest objects ( 18) merge with the group of ”normal” galaxies with 10 erg s, which is well represented in the spectroscopically identified sample. These low X-ray luminosities could be explained in terms of ULXs, starbursts, or of a collection of low-mass X-ray binaries in elliptical galaxies.

Figure 7: Distribution of extragalactic objects in the versus diagram according to . Black; 10, red 10 to 10, green 10 to 10, blue 10 to 10, yellow 10 to 10. We only show SDSS entries with a probability of identification with an X-ray source higher than 90%, and magnitudes brighter than 22.2 and errors on 0.2.
Ntot mag (rms) pn HR2 (rms) pn HR3 (rms) pn HR4 (rms)
point-like sources
+0.25 9538 +20.20 (1.07) -0.115 (0.228) -0.397 (0.237) -0.219 (0.376)
+0.75 3034 +20.59 (0.99) -0.045 (0.250) -0.360 (0.246) -0.199 (0.330)
+1.25 471 +20.51 (0.84) +0.061 (0.352) -0.294 (0.317) -0.188 (0.338)
+1.75 96 +20.17 (0.97) +0.168 (0.280) -0.229 (0.294) -0.150 (0.347)
+2.25 31 +19.79 (1.02) +0.200 (0.315) -0.307 (0.282) -0.168 (0.598)
+2.75 29 +19.04 (1.31) -0.081 (0.368) -0.447 (0.302) -0.162 (0.545)
extended sources
+0.25 812 +20.97 (0.84) -0.099 (0.222) -0.385 (0.255) -0.175 (0.425)
+0.75 1915 +20.23 (1.46) -0.100 (0.237) -0.370 (0.235) -0.246 (0.311)
+1.25 1918 +19.77 (1.44) -0.009 (0.307) -0.285 (0.335) -0.149 (0.341)
+1.75 1364 +19.89 (0.97) +0.172 (0.377) -0.111 (0.452) -0.028 (0.383)
+2.25 479 +20.00 (0.60) +0.233 (0.414) -0.012 (0.482) -0.018 (0.337)
+2.75 35 +19.81 (0.78) +0.027 (0.456) +0.056 (0.499) +0.027 (0.395)
Table 8: Main properties of the 2XMMi/SDSS identifications with in each 0.5 bin.

8 Science cases

In the next two sections we touch upon two distinct science cases, one in the extragalactic domain and one related to a galactic source population. These two examples aim at illustrating the range of research that these clean cross-correlated samples allow and do not explore in depth all possible paths of investigations. In particular, we do not make use of the spectroscopic line data, which could provide many additional astrophysical diagnostics. The first case considered bears on the topical search for QSO2s, while the second one explores the X-ray and optical properties of active stellar coronae.

8.1 Searching for QSO2 candidates

Figure 8: Left panel: Distribution of spectroscopically identified objects (optically resolved and unresolved) in the versus EPIC pn HR2 diagram. blue: AGNs with 44, green: , yellow: galaxies with 42, magenta: Type 2 QSO - filled squares = X-ray selected - filled triangles = optically selected - encircled = Compton Thick. Right panel: the entire SDSS photometric sample. Black: unresolved objects, green: extended objects, magenta: same as in right panel. We only show SDSS entries with a probability of identification with an X-ray source higher than 90%, magnitudes brighter than 23 and errors on HR2 less than 0.3.
Figure 9: Left panel: Distribution of spectroscopically identified objects (optically resolved and unresolved) in the versus EPIC pn HR2 diagram. blue: AGNs with 44, green: , yellow: galaxies with 42, magenta: Type 2 QSO - filled squares = X-ray selected - filled triangles = optically selected - encircled = Compton Thick. Right panel: the entire SDSS photometric sample. Black: unresolved objects, green: extended objects, magenta: same as in right panel. We only show SDSS entries with a probability of identification with an X-ray source larger than 90%, magnitudes brighter than 23 and errors on HR2 less than 0.3.

The members of the high-luminosity high-obscuration part of the AGN population are commonly denominated QSO2s. The synthesis modelling of the XRB (Gilli et al. 2007; Treister et al. 2009) predict that up to to the XRB (Gilli et al. 2007) could be produced by QSO2s, they could represent of the high luminosity AGN population (e.g. Della Ceca et al. 2008), and they could probably co-evolve with massive host galaxies (Severgnini et al. 2006); it is therefore clear, that hunting for QSO2 remains one of the most topical activities as is the search for associated X-ray and optical signatures.

Optical selection of QSO2s relies on finding objects showing only narrow emission lines with high-ionisation line ratios and high luminosity typically from [OIII], e.g. (Zakamska et al. 2003; Reyes et al. 2008). X-ray selection looks instead for luminous ( erg/s) significantly obscured (column density ) sources, which are best selected in the  keV hard X–ray band (e.g. Mainieri et al. 2002; Caccianiga et al. 2004; Perola et al. 2004; Vignali et al. 2006; Della Ceca et al. 2008; Krumpe et al. 2008). Within the Unified Model (Antonucci 1993), obscuration of the central X-ray-emitting and Broad Line-emitting regions by an intervening torus gives rise to those consistent properties across both bands.

A priori, QSO2s should present red optical colours (since the emission of the host galaxy would dominate over the obscured AGN), high X-ray hardness ratios8 (because of the predominant absorption of the lower energy X-rays) and high X-ray-to-optical flux ratio (, since the X-rays are less sensitive to absorption than the optical range). However, several effects could alter this simple recipe. For instance, Compton Thick absorption (defined here as N 10cm ) would completely absorb direct X-rays up to 10 keV, this would alter both the spectral shape (since scattered primary X-rays have “softer” spectra) and the ratio of optical to X-ray fluxes would be more typical of “normal” galaxies (i.e. ).

We investigate here whether the position in the overall X-ray and optical parameter space of the confirmed/candidates QSO2 discovered so far could give some hint on the way other candidates could be selected on the basis of broadband high-energy XMM-Newton data and optical photometry only.

To do so we first assembled a sample of bona fide QSO2 from optical and X-ray surveys. The optically selected sample has been obtained by cross-correlating the SDSS sample of Zakamska et al. (2003) and Reyes et al. (2008) with the 2XMMi catalogue. We list in Table \thetheorem the main properties of the QSO2 SDSS candidates matching a serendipitous EPIC source. The listed X-ray luminosities were computed assuming an average shape (see Sect. 6.2) for the large band 0.2 to 12 keV energy distribution and are not corrected for intrinsic absorption9. We also marked in the table the eight SDSS QSO2 that are good candidates to be Compton Thick. To define these objects we used 10, where L is the observed 2-10 keV luminosity (see above) and L is the expected intrinsic X-ray luminosity (the latter has been computed using the observed and the ratio derived for the unobscured view of Seyfert galaxies (Heckman et al. 2005).).

To this sample of optically selected QSO2 we added a small sample of five X-ray selected QSO2 (with the definition above) obtained by cross-correlating SDSS, 2XMMi, and a few selected lists of X-ray defined QSO2 (Della Ceca et al. 2008; Krumpe et al. 2008; Corral & others 2010). It is worth stressing that based on a detailed analysis of the X-ray and optical spectral properties (see Della Ceca et al. 2008; Corral & others 2010), all these X-ray defined QSO2 are Compton Thin (intrinsic between cm and few times cm).

2XMMi name id prob z u-g g-i HR 2 (pn) Log() Log()
Zakamska et al. (2003)
2XMM J005621.6+003235 0.964 0.484 1.891 0.643 1.800 0.061 -0.56 0.28 42.78 -0.626
2XMMi J011522.2+00151 0.997 0.390 6.023 1.239 2.437 0.046 0.73 0.04 44.26 0.566
2XMM J015716.9-005305 0.988 0.540 0.565 0.167 1.836 0.050 -0.18 0.46 42.78 0.285
2XMM J021047.0-100152 0.999 0.540 0.348 0.136 1.771 0.047 0.66 0.12 44.40 0.980
2XMM J103951.5+643005 0.991 0.402 0.490 0.075 1.209 0.027 -0.68 0.33 43.02 -0.353
2XMM J122656.4+013124 0.998 0.732 0.970 0.256 1.134 0.056 0.76 0.06 44.83 1.063
2XMM J164131.6+385841 0.995 0.596 0.550 0.112 1.768 0.028 0.63 0.05 44.96 1.245
Additional candidates from Reyes et al. (2008)
2XMMi J075821.2+392337 0.912 0.216 0.620 0.026 0.891 0.010 -0.04 0.27 42.37 -1.058
2XMM J093952.7+355358 1.000 0.137 1.527 0.051 1.534 0.007 -0.08 0.06 43.44 -0.040
2XMM J094506.4+035552 0.995 0.156 0.979 0.043 1.092 0.025 -1.00 0.28 41.63 -1.730
2XMM J100327.8+554155 0.991 0.146 1.390 0.066 1.340 0.010 -1.00 0.17 42.40 -0.772
2XMM J103408.5+600152 1.000 0.051 1.408 0.009 0.888 0.003 -0.48 0.05 42.29 -1.232
2XMM J103456.3+393939 0.995 0.151 1.465 0.049 1.412 0.007 -0.68 0.29 42.88 -0.512
2XMM J122709.8+124855 0.998 0.194 1.728 0.162 1.808 0.010 -0.66 0.27 42.88 -0.686
2XMM J131104.6+272806 0.998 0.240 1.272 0.070 1.702 0.010 -0.34 0.07 42.94 -0.784
2XMM J132419.8+053704 0.997 0.203 0.441 0.027 1.226 0.009 0.01 0.47 42.24 -1.290
2XMM J171350.7+572955 0.883 0.113 1.279 0.025 1.396 0.006 -0.69 0.21 42.10 -1.201
X-ray selected QSO2
2XMM J113148.6+311400 0.995 0.50 0.724 0.491 1.654 0.091 0.71 0.16 44.70 1.437
2XMM J122656.4+013124 0.998 0.73 0.970 0.256 1.134 0.056 0.76 0.06 44.83 1.063
2XMM J134656.6+580316 0.965 0.37 1.472 0.538 2.443 0.069 0.53 0.23 43.88 0.057
2XMM J160645.9+081523 0.960 0.62 0.919 1.085 2.565 0.216 0.81 0.26 44.80 1.607
2XMM J204043.2004548 0.336 0.62 1.448 2.334 2.581 0.304 0.70 0.11 44.72 1.518
Table 9: QSO2 candidates from Zakamska et al. (2003), Reyes et al. (2008) and X-ray selected samples matching 2XMMi entries with EPIC pn observations.

The position of the confirmed/candidates QSO2 in the parameter space obtained using , the optical colours (in particular, ) and hardness ratio (in particular HR2) are shown in Fig.8 and in Fig.9; we marked with different symbols the several “flavours” of QSO2, i.e. the X-ray selected QSO2 (all Compton Thin), the optically selected QSO2 and the candidate Compton Thick QSO2.

As can be seen in Fig.9, the QSO2 generally appear slightly redder than the bulk of the unresolved AGN, and of the identified galaxy sample having erg s. Their colours are more similar to that of the galaxy identified sample with erg s (green points in Fig.9, left panel).

The position of the QSO2 in Fig.8 and in Fig.9 clearly shows a separation between the “confirmed” Compton Thin QSO2 (magenta filled squares and triangles) and the “candidates” Compton Thick QSO2 (encircled). Interestingly, the four “optically selected” QSO2 occupying the same region of the X-ray selected Compton Thin QSO2 have been studied in the X-ray domain by (Ptak et al. 2006); all these sources (2XMMiJ011522.2+001518, 2XMMiJ021047.0100152, 2XMMiJ122656.4+013124, 2XMMiJ164131.6+385841) are described by an absorbed power-law model with an intrinsic cm. Therefore the upper right corner of Fig.8 is probably the best place where to look for Compton Thin QSO2; as shown in (e.g. Caccianiga et al. 2004) the very positive HR2 reflects the relatively large intrinsic photoelectric absorption present in many QSO2 and responsible for their preferential discovery in hard X-ray surveys11.

Finally, we show in Fig. 10 the behaviour of the EPIC pn HR2 hardness ratio with X-ray luminosity for all spectroscopic SDSS targets. The bulk of the SDSS Type 1 QSOs with X-ray luminosities higher than 10erg s (0.2-12 keV) cluster around an hardness ratio HR2 = 0.12; the same objects cluster around hardness ratios HR3 = 0.38 and HR4 = 0.28. These hardness ratios are in excellent agreement with the values expected from a canonical = 1.9 power law X-ray spectrum undergoing negligible intrinsic absorption and a mean Galactic absorption of 1.16 10cm (the average over all directions of galaxy and QSO targets). For this group of QSOs, there is no evidence of strong dependence of the power-law index with . There is however a small number of QSOs exhibiting a considerably harder X-ray spectrum (as testified by an increasing value of HR2) extending to the same locus occupied by confirmed Compton Thin QSO2. As one enters the AGN regime at X-ray luminosities below 10erg s, the number of extended sources with galaxy-like optical spectra rises considerably and the shape of the X-ray energy distribution shows a much larger scatter. The candidates Compton Thick QSO2 seem to populate this part of the diagram.

A number of spectroscopic SDSS entries occupy the same region of the LX / HR2 diagram as the reference Compton Thin QSO2. We explored the nature of these candidates by selecting objects with log() higher than 44, EPIC pn HR2 greater than 0.5 with an error of less than 0.2 on the hardness ratio: twelve objects match these conditions. From an inspection of the spectroscopic SDSS data and a literature search we found that at least 60% of them are indeed characterised by absorption at same level: two objects are clearly Broad Absorption Line QSOs (SDSS J114312.32+200346.0, SDSS J141546.24+112943.4), two are “dust reddened QSOs” (SDSS J122637.02+013016.0, SDSS J143513.90+484149.2) and three are Type 2 QSOs (SDSS J105144.24+353930.7, SDSS J130005.34+163214.8, SDSS J134507.93001900.9). The remaining five objects are apparently “normal” type I AGN, without any specific comment in literature: a detailed analysis of their optical and X-ray properties (e.g. to understand if these latter objects could also be classified as “dust reddened QSOs”) is beyond the scope of the present paper. The main results of this exploration is that, although Compton Thin QSO2 do separate well in the LX / HR2 diagram, other rare kinds of objects occupy the same locus.

Figure 10: Variation of the EPIC pn hardness ratios HR2 with X-ray luminosity for extragalactic spectroscopic SDSS targets optically unresolved (black dots) and optically extended (green dots). Magenta: Type 2 QSO - filled squares = X-ray selected - filled triangles = optically selected - encircled = Compton Thick. Errors on hardness ratios are below 0.2 in all cases.
Figure 11: Variation of the EPIC pn hardness ratio HR2 with colour for all 2XMMi matches with extended sources in the entire SDSS catalogue only having a larger than 0.5. Each histogram corresponds to a colour bin of 0.5. Black; =0.25, red; 0.75, green; 1.25, blue; 1.75, magenta; 2.25. Errors on hardness ratios are below 0.3 in all cases. We only show identifications with an individual identification probability higher than 90%, and mag brighter than 22.2, errors on less than 0.2. Histograms are normalised to the total number of sources in the colour bins shown in Table LABEL:redids.

Finally we investigated the possible presence of Compton Thin QSO2 candidates in the SDSS photometric matches. We did this by looking for sources showing evidences of photoelectric absorption among the 2XMMi sources matching mainly optically extended SDSS objects characterised by a high ratio and relatively red colours. We thus built histograms of the EPIC pn hardness ratios for all 2XMMi/SDSS matches with for various colour intervals. The resulting histogram for the EPIC pn HR2 hardness ratio is shown in Fig. 11.

The histograms clearly show a main peak of hardness ratio corresponding to the canonical low N  = 1.9 power-law spectrum characteristic of the type I QSOs for all colour indexes. However, for the reddest objects, typically with 1.5, a secondary bump is observed for harder hardness ratios (HR2 0.5) with values consistent with those of the Compton Thin QSO2; a similar secondary bump is also present if we use the hardness ratios HR3 (HR4 0.35) and HR4 (HR4 0.15). A detailed investigation of these possibly interesting sources is beyond the scope of the present paper.

8.2 X-ray active stars

At the high galactic latitudes covered here by the legacy SDSS survey (° with a mean of 58°), most of the X-ray sources are of extragalactic origin. For instance, the Extended Medium Sensitivity Survey, which constitutes the largest optically identified sample of serendipitous Einstein X-ray sources at high galactic latitude, contains only 25% of active stars (Stocke et al. 1991). Optical identification campains of the ROSAT all-sky survey sources (RASS, see Schwope et al. 2000, for the bright sample) yielded similar results (e.g. 35% of active stars in Zickgraf et al. (1997)), while the ROSAT deep survey of the Lockman field (Schmidt et al. 1998), which is 50 times more sensitive that the RASS, collected less than 10% of stellar sources. Stars constitute a bounded population of comparatively soft X-ray sources. Consequently, their relative contributions to the high galactic source number count is expected to decrease very significantly with increasing sensitivity in Chandra and XMM-Newton observations, which both offer a lower flux limit and harder energy response (see for instance Fig. 1 in Hérent et al. (2006)). Active coronae indeed constitute less than 10% of all sources identified in the XMM-Newton serendipitous survey of Barcons et al. (2007). The deepest Chandra surveys such as the Chandra Deep Field North (Alexander et al. 2003) counts only 3% of stars.

Figure 12: Aitoff projection of the distribution of the 2XMMi sources matching identified stars with a probability 90%.

This is at variance with the low latitude situation where most of the RASS X-ray sources were found to be associated with stars (Motch et al. 1997). XMM-Newton and Chandra have extended to lower fluxes the predominance of active coronae on the soft X-ray low latitude source population (Motch et al. 2003; Rogel et al. 2006). At higher energies, a longitude dependent population of galactic hard X-ray sources appears on the top of a usually dominant background of extragalactic sources (Hands et al. 2004; Motch 2006; Motch et al. 2010).

As stated above, the number of X-ray emitting and spectroscopically identified stars available in the DR7 is relatively small, and because of the scientific goals put forward at the time of the selection of targets for spectroscopic follow-up, concentrates on the reddest M type stars. The availability of the SEGUE archive in DR7 has somewhat increased the number of spectroscopically identified stars, but its effect on the cross-correlation statistics remains small. As mentioned in Sect. 6, in order to increase the stellar sample towards earlier types, we used a kernel density classification to identify the SDSS DR7 / 2XMMi matches with multicolour properties consistent with those expected from stars of main sequence class.

Figure 12 shows the distribution in galactic coordinates of all X-ray active stars present in the identified sample. Including SEGUE data in the DR7 has allowed the identification of a few stars at low galactic latitude. However, the mean of the stellar sample remains high, ( 45°) and therefore does not change the conclusion that the present stellar sample is typical of the high Galactic latitudes.

At high and typical distances of a few hundred parsecs, interstellar absorption remains negligible compared to other uncertainties and has little effect on the observed stellar colours. We computed the total galactic reddening in the directions of each of the X-ray emitting stars following Schlegel et al. (1998). The average E(B-V) is 0.029 with a rms of 0.025. Assuming the absorption coefficients computed by (Girardi et al. 2004) ( = 4500K;  = 4.5), the maximum reddening applicable on average to our sample of stars would be of 0.028 and 0.018 in the and colours respectively. Only very few spatially unresolved SDSS sources matching 2XMMi entries have combined , and colours compatible with those expected from giant class III stars, not to mention supergiants. In a recent paper, Guillout et al. (2009) estimate that their sample of 30° RASS sources identified with bright Tycho stars has a mean contamination of 35% by evolved stars with a peak at 60% for K stars. However, a smaller fraction of X-ray emitting evolved stars of 10% was present in the sample of Covey et al. (2008), which is more representative of what we should expect in our case because it was selected at higher galactic latitude and fainter flux. Below, we will assume that all stars belong to the main sequence, keeping in mind that a fraction of the class III and class IV stars, in particular in short period binaries such as RS CVn systems, could contribute to some extent. Being considered as single dwarfs, these stars would have computed X-ray luminosities below that actually emitted.

Our sample comprises 549 active coronae candidates with individual probability of identification above 90%, corresponding to a total sample reliability of 98%. The interval of and colours corresponds to K4 to late M5-M6 stars. Earlier stars are more often subject to optical saturation and have in general lower KDC probabilities. They are therefore excluded from the clean identified stellar sample built here.

Neglecting reddening effects, we computed the distances using the absolute magnitude calibration listed in Covey et al. (2007) and X-ray luminosity using the broad band (0.2-12 keV) flux listed in the 2XMMi catalogue for the EPIC camera. The mean photometric distance is 340 pc and all stars have distances in the range of 40 pc to 2000 pc. The overall Log(Lx) (0.2–12 keV) distribution peaks at 29.20 and ranges from 27.1 to 30.9. This interval of X-ray luminosity covers that exhibited by old stars such as the Sun or even less active, up to that emitted by the most active T Tauri or RS CVn stars. The mean X-ray luminosity does not vary with galactic latitude in the interval covered by our sample.

We show in Fig. 13 the distribution of X-ray active stars in the / diagram for two ranges of X-ray luminosities. It can be readily seen that the locus of X-ray bright sources is shifted by 0.1 mag in colour above that occupied by low  active coronae and by spectroscopically identified stars from SEGUE in general. Active stars appear to be bluer in for a given . The effect is particularly clear for the reddest stars with 1.3, corresponding to M0 types and later. This shift cannot be due to the lack of reddening correction since the most X-ray luminous stars that are expected to be the most remote and absorbed ones should appear redder than the closest low  stars, a trend opposite to what is observed. We also checked with the isochrones of Girardi et al. (2004) designed for the SDSS band passes that age or metallicity effects were unable to explain the different colour/colour tracks of the low and high  stars. Similarly, an enhanced H emission expected to be especially important for late M stars () can be excluded since it would yield larger . Interestingly, Covey et al. (2008) report that the counterparts of their extended ChaMP stellar X-ray survey do exhibit a 0.1 bluer index than average low-mass stars, although they do not report any significant change in the colour index. Comparing with their work is however not straightforward. While the colour/colour track followed on average by our active coronae is consistent with that of Covey et al. (2007) for the entire SDSS and therefore does not indeed contradicts the results of Covey et al. (2008), the difference we find rather arises among two groups of active stars. The SDSS M stars templates compiled by Bochanski et al. (2007) could indicate a similar trend in the colour index between active and inactive stars with however a large intrinsic scatter, while no such effect occurs in . Unfortunately, we have too few good band measurements to be able to confirm the trend seen by Covey et al. (2008). We agree with these authors that low-level optical flaring might be responsible for the bluer colours seen in X-ray active M dwarfs.

Figure 13: Distribution of X-ray active stars in the / diagram for two ranges of X-ray luminosities. Black dots Log(Lx)29.3. Red crosses Log(Lx)29.3. The arrow shows the direction of interstellar reddening. Only sources with photometric errors below 0.1 mag in and are shown. Density contours show the position of all stars identified in the SEGUE programme.

Active coronae emit essentially thin thermal spectra dominated by a series of narrow emission lines superposed on a weak continuum. In most cases, two thermal components are required to satisfactorily represent the observed energy distribution (see a recent review in Güdel & Nazé 2009). X-ray studies of open clusters and field stars of different ages led to a relatively coherent picture linking stellar rotation rates, overall X-ray luminosity and X-ray temperatures. Whereas the young (age 1 Myr) stars in Orion exhibit X-ray spectra with 0.8 keV and 2.9 keV, the analysis of 115 Myr old Pleiades stars yields 0.4 keV and 1.1 keV, while the X-ray corona of our Sun can also be characterised by 2T spectrum with 0.2 keV and 0.6 keV (see Sung et al. 2008, and references therein). Using ROSAT and ASCA observations of a dozen of carefully selected stars, Guedel et al. (1997) established that the overall X-ray luminosity, the temperature of the two components, and the emission measurement ratio of the hot to the cool plasma were all decreasing with age and rotation rate. The range of X-ray luminosity observed in our survey suggests ages younger than 2 Gyr.

We thus investigated whether the X-ray properties of our identified active coronae were depending on luminosity. For that purpose we selected the 296 (149) identified stellar X-ray sources having errors on 2XMMi hardness ratio 2 (3) of less than 0.2. This sample was then split into five ranges of luminosity for which the mean HR2 was computed, and in order to accommodate the lower number of sources in the harder band, into three bands of X-ray luminosities for HR3. Hardness ratio 2 and 3 measure the relative count rates in the energy ranges 0.5-1.0 keV and 1.0-2.0 keV and 1.0-2.0 keV and 2.0-4.5 keV respectively. They are therefore weakly dependent on the EPIC filter used and vary little within the range of N applicable to the present survey. HR2 and HR3 can thus be considered as indicators of the intrinsic shape of the X-ray energy distribution well suited to the range of temperatures exhibited by stellar coronae. We show in Fig. 14 the variation of the median HR2 and HR3 with X-ray luminosity. A clear spectral hardening accompanies the luminosity increase. Assuming a single temperature plasma undergoing N 1.7 10 cm the HR2 value would imply a thin thermal temperature of 0.45 for the lowest  bin and 0.72 keV for the largest , while for the same range of X-ray luminosities the HR3 value would indicate temperatures from 0.45 keV to 1.4 keV. The existence of a second hotter thin thermal component naturally accounts for this discrepancy. Unfortunately, using only two hardness ratios, it is impossible to fit both , and the ratio of the emission measurements of the hot and cool components, which is also expected to vary with X-ray luminosity (Guedel et al. 1997). In spite of this shortcoming, our data which benefit from the large throughput of XMM-Newton and of its capability to obtain detailed spectral information for an unprecedented number of sources, confirm for the first time the spectral hardening with X-ray luminosity in large groups of mixed age field objects.

Figure 14: Variation of the EPIC pn Hardness ratios 2 and 3 with mean X-ray luminosity for stellar X-ray sources with HR errors below 0.2. We plot here the median and associated errors.

9 Summary

We present the results of the cross-identification of the 2XMMi catalogue containing over 220,000 XMM-Newton EPIC serendipitous sources with the Data Release 7 of the Sloan Digital Sky Survey consisting of 357 million unique objects and over 1.6 million spectra. In order to ensure the best statistical control on the quality of the cross-correlation process, we use a likelihood ratio scheme only based on the probability of spatial coincidence of the X-ray source with the optical candidate. Using an original method that does not rely on heavy Monte Carlo simulations, we are able to compute true probabilities of identifications taking into account the varying fraction of X-ray sources expected to have a SDSS match as function of galactic latitude. We discuss the details of the statistical method used and present reliability and completeness curves for the entire set of matches. A total of 30,000 unique X-ray sources have a match in the DR7 with a probability of identification above 90%. At this threshold, the reliability of the total sample is of 98%, i.e. we expect only 2% of spurious cross-identifications, while the completeness is 77%, meaning that we miss about a quarter of the true matches, which therefore appear with individual probabilities of identifications below the 90% threshold.

One of the most ambitious task attributed to the Survey Science Centre of the XMM-Newton satellite is the statistical identification of all serendipitous X-ray sources discovered in the large field of view of the EPIC cameras. The 2XMMi/DR7 cross-identified sample offers an interesting opportunity to test various strategies and methods which would eventually led to the classification of the XMM catalogues. Pineau et al. (2008) presented a first attempt to reduce the parameter space dimension using Principal Component Analysis tools adapted to data affected by rather large instrumental errors and taking into account the intrinsic scatter of the measured quantities. These authors also tested the relative merits of various classification methods. We use the SDSS DR7 spectroscopic catalogue to build an identified sample made of objects of known nature to which we add a few entries extracted from specific additional catalogues (e.g. Cataclysmic Variables). We find that the most discriminating diagrams involve the flux ratio and various other parameters such as the colour index, EPIC hardness ratios, and the extent of the optical source. Active galactic nuclei more luminous than 10erg s have ratios well above those of most galactic stars in their range of . Galaxies of lower X-ray luminosity overlap the region of the diagram covered by active coronae, but can be easily separated from them on the basis of the optical extent.

Not unexpectedly, active galactic nuclei constitute by far the most numerous class of X-ray emitters present in the cross-correlation. Optically unresolved UV excess quasars are the most frequent X-ray emitters encountered and cluster in a well defined area in the / diagram. The second largest group of X-ray detected AGN consists of optically extended objects, with similar or only slightly lower ratios, but exhibiting considerably redder colours. The very scarce SDSS spectroscopic coverage of this region does not allow constraining their true nature. However, relatively bright galaxies harbouring active nuclei with  somewhat lower than those of UV excess quasars are likely to account for most of these objects. The vast majority of these ”red” AGN exhibit X-ray hardness ratios consistent with those expected from a typical type I nucleus. However, a small fraction of these sources display evidence of enhanced photo-electric absorption that could be the signature of Compton Thin QSO2s, although other species of AGN can overlap the same parameter region. As  (and ) decrease, extragalactic X-ray sources become spatially resolved in the optical and their colour indices tend to cluster around 1.3. In parallel, the hardness ratios of lower  sources appear globally softer. We probably see here the effect of the increasing contribution of non-nuclear X-ray sources. Although some stars share ratios and colours similar to some galaxies, taking into account the extent of the optical object allows an almost perfect separation of the galactic and extragalactic source populations. As expected, the latest M stars display larger ratios than earlier active coronae.

Using a sample of reliable X-ray and optical Compton Thin and Compton Thick QSO2s we investigated the possibility of preferentially finding these objects in some specific range of X-ray and optical parameters. We find that Compton Thick QSO2 lack outstanding observational optical and X-ray photometric signatures that may be used to select sub-samples with greatly enhanced densities. However, the photo-electric absorption detectable in X-rays for relatively low-redshift AGN could be used to set apart groups of Compton Thin candidates with a reasonable success rate.

We also find that X-ray bright stars appear slightly bluer in than less active coronae. The origin of this effect is not clear, but could be related to the UV and blue micro flaring occurring in many of the active late type K and M stars. Assuming that most active stars are indeed main sequence, we find a strong dependency of EPIC hardness ratios 2 and 3 with the inferred X-ray luminosity, the most luminous coronae being those emitting with the highest temperature.

The cross-correlation of two large catalogues such as the 2XMMi and the DR7 of the SDSS paves the way to a wide range of investigations. In addition, the SDSS database contains line fluxes for over a million of spectroscopic targets, while the 2XMMi catalogue provides X-ray spectra and time series for the 44,000 brightest serendipitous XMM-Newton sources. These rich data sets offer unprecedented opportunities for studying how the micro-physics of the optically emitting regions relates to X-ray properties. Eventually, such studies, in particular those aiming at finding rare outliers, rely on a solid control of the cross-correlation and cross-identification statistics.

We presented the details of an algorithm based on the classical likelihood ratio, well adapted to cross-correlation of large catalogues and providing useful probabilities of identification. This algorithm is now available as a plug-in for the Aladin VO portal12.

We thank an anonymous referee for providing constructive comments and help in improving the contents of this paper. F.-X. Pineau acknowledges support from CNRS, CNES and from the Region Alsace. Funding for the SDSS and SDSS-II has been provided by the Alfred P. Sloan Foundation, the Participating Institutions, the National Science Foundation, the U.S. Department of Energy, the National Aeronautics and Space Administration, the Japanese Monbukagakusho, the Max Planck Society, and the Higher Education Funding Council for England. The SDSS Web Site is http://www.sdss.org/. The SDSS is managed by the Astrophysical Research Consortium for the Participating Institutions. The Participating Institutions are the American Museum of Natural History, Astrophysical Institute Potsdam, University of Basel, University of Cambridge, Case Western Reserve University, University of Chicago, Drexel University, Fermilab, the Institute for Advanced Study, the Japan Participation Group, Johns Hopkins University, the Joint Institute for Nuclear Astrophysics, the Kavli Institute for Particle Astrophysics and Cosmology, the Korean Scientist Group, the Chinese Academy of Sciences (LAMOST), Los Alamos National Laboratory, the Max-Planck-Institute for Astronomy (MPIA), the Max-Planck-Institute for Astrophysics (MPA), New Mexico State University, Ohio State University, University of Pittsburgh, University of Portsmouth, Princeton University, the United States Naval Observatory, and the University of Washington. This research has made use of the SIMBAD database and of the VizieR catalogue access tool, operated at CDS, Strasbourg, France.

Appendix A Transformation of error ellipses in the new frame

a.1 Spherical trigonometry

On the unit sphere of centre , we consider the spherical triangle where is the north pole, and are the positions of the X-ray and the optical sources respectively. We write

  • , , the angular distances between and , and , and respectively.

  • , and the dihedral angles between planes and , and , and respectively. Those angles in the range and are define as positive in the East direction.

We immediately have , , and the angular distance between the two sources, which is given by the Haversine formula


The spherical trigonometry gives the formula


As , and are positive, we notice that , and have the same sign, which depends on . As the function returns values , we deduce two possible values for both angle and . The cosine of half-angles and enable to know if and if respectively. From the spherical trigonometry formulae


We derive the values of angles and :