The masssheet degeneracy and timedelay cosmography: Analysis of the strong lens RXJ11311231
Abstract
We present extended modelling of the strong lens system RXJ11311231 with archival data in two HST bands in combination with existing lineofsight contribution and velocity dispersion estimates. Our focus is on source size and its influence on timedelay cosmography. We therefore examine the impact of masssheet degeneracy and especially the degeneracy pointed out by Schneider & Sluse (2013) [1] using the source reconstruction scale. We also extend on previous work by further exploring the effects of priors on the kinematics of the lens and the external convergence in the environment of the lensing system. Our results coming from RXJ11311231 are given in a simple analytic form so that they can be easily combined with constraints coming from other cosmological probes. We find that the choice of priors on lens model parameters and source size are subdominant for the statistical errors for measurements of this systems. The choice of prior for the source is subdominant at present (2% uncertainty on ) but may be relevant for future studies. More importantly, we find that the priors on the kinematic anisotropy of the lens galaxy have a significant impact on our cosmological inference. When incorporating all the above modeling uncertainties, we find km sMpc, when using kinematic priors similar to other studies. When we use a different kinematic prior motivated by Barnabè et al. (2012) [2] but covering the same anisotropic range, we find km sMpc. This means that the choice of kinematic modeling and priors have a significant impact on cosmographic inferences. The way forward is either to get better velocity dispersion measures which would down weight the impact of the priors or to construct physically motivated priors for the velocity dispersion model.
a]Simon Birrer a]Adam Amara a]Alexandre Refregier Prepared for submission to JCAP
The masssheet degeneracy and timedelay cosmography: Analysis of the strong lens RXJ11311231

Institute for Astronomy, Department of Physics, ETH Zurich
WolfgangPauliStrasse 27, 8093, Zurich, Switzerland
Keywords: Gravitational lensing, strong lensing, cosmology, parameter estimation, Hubble constant
ArXiv ePrint: 1511.03662
Contents
 1 Introduction
 2 Theory
 3 RXJ11311231 system
 4 Lens modeling
 5 The mass sheet degeneracy
 6 Combined likelihood analysis
 7 Cosmological inference
 8 Joint uncertainties and comparison with other work
 9 Conclusions
 A Numerical computation of the luminosityweighted LOS velocity dispersion
 B Residual maps
 C Analysis on WFC1 F555W
 D Bayesian description and renormalization of the imaging likelihood
 E Skewed normal distribution
 F Lens model parameter constraints
 G Source size prior
1 Introduction
Strong lensing systems and the time delays between different images of the same background source can provide information about angular diameter distance relations (see [3] and review of [4] for the early work). Cosmographic analyses rely on measurements of time delay [see e.g., 5, 6, 7, 8, 9, 10, and the COSMOGRAIL collaboration]^{1}^{1}1www.cosmograil.org and estimates of the lineofsight structure and lensing potential. This cosmography technique has been applied to determine the Hubble parameter using different strong lens systems [see e.g. 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23] and also by applying statistics to multiple systems [see e.g. 24, 25, 26]. In the past, some of the measurements have produced a wide range of results for [e.g. see section 8.2 of 21]. One concern has been to evaluate the impact of potential systematic errors. In particular, the masssheet degeneracy (MSD) [27] and related degeneracies that cause biases due to model assumptions [e.g. 11, 28, 29, 30, 31] need special consideration. For instance, this has been illustrated by [1] where they show that assuming a powerlaw lens model can cause significant biasing of results.
In this paper, we introduce a new treatment of the MSD and source reconstruction for cosmographic analyses. This approach integrates information coming from imaging, velocity dispersion, external convergence and time delay measurements. For the choice of data and the parameterization of the lens we follow the work of [22], and we infer the values of the parameters using our recent framework presented in [32]. In our framework we reconstruct the source using shapelet basis sets. This allows us to explicitly set an overall scale for the reconstruction. We will show that this enables us to better disentangle the effects coming from source structure and MSD. This then makes it simpler to robustly combine the information coming from the different data sets.
The paper is organized as follow: In Section 2 we briefly review the principles of time delay cosmography. Section 3 presents the data used in this work. Section 4 describes the details of the lens modeling, including kinematics, likelihood analysis and the source reconstruction technique of [32]. In Section 5 we show that the use of this reconstruction technique turns out to be well designed for mapping out the MSD. Section 6 describes the combined likelihood analysis and posterior sampling. Section 7 discuss the cosmological constraints in terms of angular diameter relations and cosmological parameters. In Section 8, we compare our results to others. We summarize our conclusions in Section 9.
2 Theory
Gravitational lensing is caused by deflection of light by matter. In this section, we review the principles of gravitational lensing and time delay cosmography and introduce our conventions.
2.1 Lensing formalism
The lensing potential at an angular position is given by
(2.1) 
where is the convergence and is given by
(2.2) 
with
(2.3) 
is the critical density and is the physical projected surface mass density. , and are the angular diameter distances from the observer to the lens, to the source and from the lens to the source ^{2}^{2}2 is not the subtraction . In a flat universe: , where is the transverse comoving distance., respectively. The deflection angle is and the lens equation, which describes the mapping from the source plane to the image plane is given by
(2.4) 
The convergence can also be written as
(2.5) 
2.2 Time delays
The Fermat potential is defined as
(2.6) 
The excess time delay of an image at with corresponding source position is
(2.7) 
where
(2.8) 
is referred as the time delay distance. The relative time delay difference between two images positioned at and , the actual observable, is then given by
(2.9) 
Lineofsight (LOS) structures external to the lens also affect the observed time delay distance through additional focusing or defocusing of the light rays. We parameterize the LOS structure with a single constant mass sheet parameter , the external convergence. The actual time delay distance relates to the one inferred by ignoring the external LOS structure by
(2.10) 
3 RXJ11311231 system
The quadrupole lens system RXJ11311231 (Figure 1) was discovered by [33] and the redshift of the lens and of the background quasar source was determined spectroscopically by [33]. The lens was modeled extensively by [34, 35, 22, 32, 36] with single band images. We use the archival HST ACS WFC1 images in filter F814W and F555W (GO 9744; PI: Kochanek). The filter F814W was also used for lens modeling in [22], [23] and [32]. We make use of the MultiDrizzle product from the HST archive. We use a 160 pixel image centered at the lens position with pixel scale 0.05”. This corresponds to a FOV of 8”.
For the analysis in this work, we take the time delay measurements and uncertainties from [37], namely days, days, and days, where represent the quasar images in Figure 1. This data was used in [22], where they also measure the LOS velocity dispersion of km s, that we use in our analysis.
For the external convergence , we take the estimate of [22] based on relative galaxy counts in the field [38] and their modeled external shear component compared with ray tracing of the Millennium Simulation (see their Figure 6). As their probability density function for is not given in a parameterized form, we use an approximation of their PDF in the form of a skewed normal distribution with mean , standard deviation and skewness . This function is illustrated in Figure 2 and described in Appendix E.
4 Lens modeling
In this section, we present the parameterization of the lens model, the lens light description, the source reconstruction technique, PSF modeling, the modeling of the lens kinematics and the likelihood analysis.
4.1 Lens model parameterization
For the lens model, we use:

An elliptical powerlaw mass distribution parameterized as
(4.1) where is the Einstein radius, is the ellipticity and is the radial powerlaw slope.

A second spherical isothermal profile (Equation 4.1 with fixed and ) centered at the position of the visible companion of the lens galaxy about 0.6 arc seconds away from the center.

A constant external shear yielding a potential parameterized in polar coordinates given by
(4.2) with is the shear strength and is the shear angle.
4.2 Lens light parameterization
The light distribution of the lens is modeled in a parameterized form. We use the same profiles as [22], namely two elliptical Sérsic profiles [39] with common centroid for the central elliptical galaxy and an additional spherical Sérsic profile for the companion galaxy. The intensity profile is parameterized as
(4.3) 
where is the amplitude, is a constant such that is the effective halflight radius, is the axis ratio and is the Sérsic index. We use the value of halflight radius as the effective radius in the kinematics modeling of Section 4.5.
4.3 Source surface brightness reconstruction
We use the source reconstruction method presented in [32] based on shapelet basis functions introduced by [40]. To apply this method, three choices have to be made. (1) The shapelet center position, which we fixed to quasar source position. The determination of the quasar source position is explained in detail in Section 4.2 of [32]. (2) The width of the shapelet basis function (see Section 5 for its impact). (3) The maximal order of the shapelet polynomials. We set for modeling and parameter inference. With this, most of the features in the extended source can be modeled. Given these three choices, one can reconstruct the angular scales between and around the center of the shapelet in the source plane.
4.4 PSF modeling
We use four bright stars in the same ACS image to model the PSF. After normalizing for flux, we apply a subpixel shift to recenter the stars and then stack. When comparing the individual star images and the stack, we see significant variations that we need to consider in our analysis. To do this by measuring the scatter for each pixel and assume that the scatter in high signaltonoise pixels is due to a model error that we quantify as a fraction of the flux. This leads to an additional error term, beyond the Poisson and background contribution, that is important close to the center of the bright point sources (see Section 4.6). For the quasar point sources, we use a cutout of the PSF of 111 pixels to cover most of the diffraction spikes. For the extended surface brightness we apply a PSFconvolution kernel of 21 pixels.
4.5 Stellar kinematics
We follow the analysis of [21] for the modeling of the stellar velocity dispersion. The mass profile is assumed to be a spherical symmetric powerlaw in the form of
(4.4) 
where is the density at radius and is a powerlaw slope of the mass profile (the same as for the lens model in Equation 4.1). The normalization of the mass profile can be expressed in terms of the lensing quantities as
(4.5) 
where is the external convergence, is the critical projected density, is the Einstein radius, is the angular diameter distance from the observer to the lens and is the Gamma function. The estimation of the projected velocity dispersion along the line of sight requires a description of the anisotropic velocity component split in radial and tangential component
(4.6) 
Massive elliptical galaxies are assumed to have isotropic stellar motions in the center of the galaxy () and radial motions in the outskirts (). A simplified description of the transition can be made with an anisotropy radius parameterization defining as a function of radius as
(4.7) 
Assuming a Hernquist profile [41] and an anisotropy radius for the stellar orbits in the lens galaxy, the threedimensional radial velocity dispersion at radius from Jeans modeling is given by
(4.8) 
where is related to the effective radius of the lens light profile by and is a hyper geometric function. The modeled luminosityweighted projected velocity dispersion is given by
(4.9) 
where is the projected radius, is the stellar density and is the projected Hernquist distribution. The luminosity weighted LOS velocity dispersion within an aperture is then (see also equation 20 in [21])
(4.10) 
where indicate the convolution with the seeing. In Appendix A we describe in detail how we compute a modeled in a numerically stable way. This calculation assumes no rotational behaviour of the lensing galaxy. Priors on the anisotropic behaviour are discussed in section 6.3.
Equation (4.10) can be expressed as a function of angular scales of and paired with a cosmological dependent angular diameter distance relation and an external convergence factor as
(4.11) 
where is capturing all the computation of equation (4.10) without cosmological and external convergence specifications. With this calculation, we see that any estimate of the (central) velocity dispersion is dependent on the ratio of angular diameter distance from us to the source and from the deflector to the source. This fact is important when kinematic modeling is used to infer cosmographic information. We separate in the modeling the angular and the cosmological information. The separability allows us to consistently infer cosmographic information without the need of cosmological priors in the kinematic modeling.
4.6 Likelihood analysis
We estimate the pixel uncertainty in the image with a Gaussian background contribution estimated from an empty region in the image and a Poisson contribution from the model signal scaled by the exposure map . In addition, the modeling uncertainty of the PSF of the bright point sources with amplitude , PSF kernel and model uncertainty coming from the starbystar scatter is given as
(4.12) 
at a pixel i, where is the number of quasar images. All together, the uncertainty for each pixel sums up in quadrature as
(4.13) 
For the linear source surface brightness reconstruction is replaced by the image intensity .
The likelihood of an image given a model is
(4.14) 
with being the number of pixels in the modeled image and is the normalization
(4.15) 
At this stage, it is useful to separate the model into nonlinear parameters and linear parameters . The likelihood of the nonlinear parameters is given by
(4.16) 
The integral is computed in [32] (their Equation 13) assuming flat priors in , which we adopt.
The likelihood for the time delays is the product of the likelihoods of all relative delays of the quasar pairs
(4.17) 
The likelihood of the LOS central velocity dispersion is given by
(4.18) 
5 The mass sheet degeneracy
There exists many different degeneracies in strong lens modeling [e.g., 28, 42]. In this section we focus on the MSD [27] and in particular its impact on time delay cosmography as it was pointed out by [1]. As shown by [27], a remapping of a reference mass distribution by
(5.1) 
combined with an isotropic scaling of the source plane coordinates
(5.2) 
will result in the same dimensionless observables (image positions, image shapes and magnification ratios) regardless of the value of . This type of mapping is called masssheettransform (MST), and shows that imaging data, no matter how good, can not break the MSD.
The additional mass term in MST (Equation 5.1) can be internal to the lens galaxy (affecting the lens kinematics) or due to lineofsight structure (not affecting the lens kinematics) [see e.g., 28, 29]. The external part of the MST can be approximated by an external convergence , which rescales the time delays accordingly. The external contribution also rescales the source plane. Lens modeling often only explicitly models the internal structure of the lens. The inferred source scale has to be rescaled by the external mass sheet to match the physical scale.
5.1 Source scaling and the MSD
An important parameter in the lens model inference is the physical source scale. Neither the lens model nor the source size are direct observables, but they share the MST in each others inference. Given a lens model, certain source sizes are preferred. The opposite is also true: Given a source size, certain lens models are preferred. This is a direct consequence of the MST (Equation 5.1 and 5.2). Therefore, it is important to control the prior on the assumed source scale in the modeling. A particular source surface brightness reconstruction method, depending on the choice of regularization, basis set, pixel grid size or parameters of the source reconstruction, will potentially favor a certain size of the reconstructed source and therefore may indirectly lead to priors on the internal mass model through the MST. As one does not know a priori the physical scales in the source galaxy, this may lead to significant biases in the inference of the lens model.
We use shapelets [40] as the source surface brightness basis functions as implemented in [32]. These basis functions form a complete basis set when the order goes to infinity. When restricting the shapelet basis to a finite order , the reconstruction of an image depends on the chosen scale of the shapelet basis function. As pointed out by [40], for a given , there is a scale that best fits the data. From Equation 5.2, we see that changes in can be remapped into changes in the lensing potential through the linear parameter . Therefore, since our source reconstruction technique has an explicit scale, we have a tool to walk along the MST.
5.2 Varying source scale in the ACS WFC1 images
We have identified the source scale to have an impact on the inference of the lens model within the MST. To investigate the specific dependence of the shapelet scale in the source reconstruction in combination with lens model parameterization (Section 4.1) in our analysis of RXJ11311231, we model the ACS WFC1 F814W and F555W images with different choices of the shapelet scale . For the F814W image, we use the range 0.14”  0.19” and for the F555W image the range 0.13”  0.18”. The shapelet order was held constant at . To find the best fit model, we used a particle swarm optimization as used in [32] to maximize the likelihood (Equation 4.16). In this section, we only use the HST images for our modeling. Timedelay and kinematic data will be added in Section 6.
Figure 3 shows the source reconstruction of the best fit models of filter F814W for six different scales . We see that the source reconstructions are very similar but scaled by the relative factors of the chosen shapelet scale. More explicitly, we overlay in Figure 4 the intensity contours of the different source reconstructions rescaled by . We also show the reconstructions for the F555W image, which shows the same behavior. On the right of Figure 4 we overplot a joint source reconstruction of the two bands in a fake color image. In Appendix B, we present the corresponding normalized residuals for this analysis of the F814W reconstruction.
The difference in the likelihood value for different scales from the imaging data exceeds the 10 level between each modeled scale . This reflect the fact that the chosen lens model parameterization (see Section 4.1) does not allow for the full freedom needed for a perfect transform according to the MST (Equation 5.1). The source scale can not be fixed to an arbitrary value and caution on any scale dependent source reconstruction description is needed. When assigning a prior on and infer this parameter together with all the lens model parameters from the image reconstruction, we are able to very precisely determine the corresponding source scale and the parameters of the given functional form of the lens model.
5.3 Relaxing on the lens model assumption
As pointed out by [1], there can also be an internal component to the MST. Namely when the lens model can not reproduce the underlining internal mass distribution. The assumption of a powerlaw lens model formally sets the internal part of the MST. The parameters will fit preferentially those models, whose shape, modulo an artificial MST, are the most similar to the underlying mass distribution. The only effect visible in the modeling of the imaging data is on the source scale. The inferred source scale will be different from the one of the true lens model. Any assumed mass distribution which can not be rescaled according to Equation (5.1) can thus potentially lead to biased inferences, in particular on the slope of the mass profile. This also can result in significant biases in the inferred lensing potential and lens kinematics. In particular, it was stated by [1] that the assumption of a powerlaw lens model can potentially lead to a significant bias in the inference of the time delay distance.
Three approaches to handle the concerns of [1] in performing cosmographic estimates are:

One choses a more flexible lens model than a single powerlaw mass profile. This approach was followed in [23] in response to [1]. Different profile parameterizations may lead to different preferred source scales. It is not guaranteed that a more sophisticated lens model parameterization infers an unbiased result in the cosmographic inference.

Perform simplifications and approximations that lead to greater robustness against known degeneracies. For instance accommodating MST through careful handling of the source size inference.
In this work we chose the third option mentioned above. This option requires the least assumptions on the lens model and a prior is placed on the source size, rather through the functional form of the lens model. In Appendix D we specifically state the process in a Bayesian inference way to make clear our steps and approximations and show that a renormalization of the imaging likelihood for different imposed source scales is needed to explore the impact of plausible internal MST on the cosmological inference.
5.4 Adding lens kinematics
Additional constraints on the lens model can come from kinematic data at a different scale than the Einstein ring. This becomes of particular importance when weakening the constraining power of the lens model, as described in Section 5.3. Lens models with different source scales predict different lens kinematics. The prediction depends on the stellar velocity anisotropy which can not be known from the existing data and the external convergence which has to be inferred separately.
As long as the relative likelihood of additional kinematic data (Equation 4.18) can not compete with the relative likelihood of the different shapelet scales (on the 10 level between the chosen source scales, see Section 5.2), the combined likelihood will be dominated by the lens model assumption. Only when renormalizing the likelihood of the imaging data for different scales , the kinematic data can have a significant impact in the determination of the lens profile and in particular the lens potential for timedelay cosmography.
6 Combined likelihood analysis
In this section, we discuss how we combine the different data sets and their likelihoods. We showed in the previous section that biases can emerge from choices in the lens and source modeling. These aspects have to be taken into account when the data sets are combined.
6.1 Combining imaging and time delay data
In a first step, we do a joint analysis of the independent measurements of the time delay and imaging data. The combined likelihood is
(6.1) 
with the independent likelihoods of Equation (4.16) and (4.17). We do not yet combine the kinematic data at the likelihood level. We sample all the lens model parameters and the time delay distance . We keep the lens light parameters fixed at the final position of the particle swarm process in the MCMC process to achieve a more efficient sampling of the relevant parameters. We included the full flexibility of the lens light parameters on a subset of the MCMC chains and come to the conclusion that the additional covariance of the lens light model on the cosmographic analysis is very minor, i.e. the impact on the uncertainty on is below 0.1%.
From Bayes theorem, the likelihood of the parameters given the data is (modulo a normalization):
(6.2) 
We apply flat priors on the parameters , , , , and Mpc.
At this stage, we want to emphasize that there are 3 data points in the time delay measurement compared to several thousands of high signaltonoise pixels in the imaging comparison. In principle, the provided time delay measurement can not only determine , which is independent of the imaging data but also can partially constrain the lens model. In practice, any even minor bias introduced in the image modeling can outweigh the constraining power of the two additional time delay measurements.
In the following, we present the results of the analysis of filter F814W. The results of the equivalent analysis of filter F555W can be found in Appendix C. To sample the posterior distribution of the parameter space we use CosmoHammer [43]. We fix the shapelet scale at [0.14”, 0.15”, 0.16”, 0.17”, 0.18”, 0.19”] and do a separate inference of the parameters for each choice of . Figure 5 shows the posterior distribution of some of the parameters for the different choices of . The inferred parameter constraints for different values do not overlap. We see that is very narrowly determined for a given shapelet scale but varies from up to depending on the position in the degeneracy plane. We want to stress that the external convergence estimated by [22] is based on an external shear prior of .
6.2 Constraints from kinematic data
To investigate the potential constraining power of the velocity dispersion data, we are interested in how distinguishable different positions within the MST are in terms of their predicted central velocity dispersions. To do so, we fix the cosmology and the external convergence to fiducial values. This allows us to evaluate the predicted LOS central velocity dispersion (Equation 4.11) for all the posterior samples of Figure 5. We assume a random realization of with a flat prior in the range [0.5,5] for all the posterior positions.
In Figure 6 we illustrate the predicted samples vs the predicted time delay distance . We see that the samples can not be fully distinguished with the current velocity dispersion measurement and the assumed anisotropy prior. The relative distance in the predicted velocity dispersion between the different samples are all within 4 (model given data).
There are three factors which affects the distinction of the source scales by kinematic data. (1) The uncertainty in the spectroscopic measurement, analysis and modeling of km s which is about 6%. This is visually the most obvious contribution in Figure 6, marked by the gray band. The mean values of the predicted samples of the different source scales differ by about one sigma of this estimated uncertainty. (2) The anisotropic uncertainty in the lens galaxy kinematics. This is the main driver of the spread in the predictions of the velocity dispersion within each source scale sample. This scatter has a relative spread of 10% given . (3) The predicted velocity dispersion depend highly on the observational conditions and configuration. The PSF and the slit size of the spectrograph results in a convolution and averaging over a wide range of radial scales. The predicted velocity dispersion for different concentrations of the mass in the lens galaxy (i.e. powerlaw slope ) differ the most in the very center of the lens. At the Einstein radius itself, the different lens models predict basically the same kinematics. With the PSF of 0.7” and a slit size of 0.81” 0.7”, powerlaw mass profiles with slopes in the range differ by about 100 km s in their predicted velocity dispersion . A smaller slit and seeing conditions of FWHM 0.1” can double this relative difference and therefore could improve the constraining power of the kinematic data significantly.
The combined effect of nonperfect data and nonperfect modeling of the kinematic data with prior can be translated in a relative error in the time delay distance of about 7.5% from Figure 6. Only kinematic data of the lens galaxy and its analysis can reduce this error budget.
In Section 5.2 we showed that the individual image likelihoods of the different samples differ by more than 10. Before including the velocity dispersion measurement in our cosmographic analysis, we renormalize the image likelihood such that it is independent of the source scale (see Section 5.3). This renormalization is done by taking the same number of MCMC posterior samples from the different source scales when doing further inferences with the lens model parameters.
6.3 Source scale and kinematic anisotropy priors
The combination and inference coming from the different data sets relies on priors on the source scale of the background galaxy and on the anisotropic behaviour of the stellar kinematics in the lens galaxy. In particular, the inference of the Hubble constant is related to the inference of the angular diameter distance as
(6.3) 
In Figure 6, we see a significant dependence between the size of the source galaxy () and . Furthermore the interpretation of the kinematic data is also dependent on the anisotropic behaviour of the lens galaxy.
Choices of the priors on the source size and aniosotropic kinematic must be chosen with care based on information gained from other work as these priors potentially have a significant impact on the infered parameter posterior (i.e. ). In the following, we discuss two different priors in the kinematic anisotropy and the source scale. ^{3}^{3}3Comments from the authors about confirmation bias: The analysis of the mentioned priors on the cosmological inference has been made after posting a first version of this paper on arXiv.
6.3.1 Source size prior
A simple form of the source size prior which does not impose any specific form of knowledge about is a uniform prior in the range . We refer to this prior as . This prior ignores any knowledge about the population of galaxies. The model parameter is directly related to the brightness of the source as
(6.4) 
The number density of galaxies as a function of luminosity is a well measured quantity (luminosity function, LF) and its faint end slope for the blue galaxy population can be well described with a single powerlaw slope as
(6.5) 
with [44]. In this form, the expected source size can be stated as
(6.6) 
This prior is weakly dependent on such that smaller source sizes are prefered. We chose as our default prior and explore the impact with in section 7.4.
6.3.2 Anisotropic kinematic prior
Studies of early type (lens) galaxies have been made by e.g. [45, 46] which reveal similar properties compared to local early type galaxies. We consider two priors which cover the same range in the mean anisotropic behaviour and their predicted velocity dispersion . (1) The prior used in Figure 6 is flat in (equation 4.7) in the range [0.5,5]. This prior should cover the expected scale where the transition between isotropic and radial velocity dispersion should occur in an uniform way and is exactly the same prior used in [22]. We refer to this prior as .
(2) We model a global contribution of the anisotropic behaviour in the form
(6.7) 
in the range . This reflects the same range in allowed values for a given mass model. We refer to this prior as . indicates a isotropic velocity dispersion and , for which the velocity dispersion ellipsoid is very elongated along the radial direction with , corresponds to with the same mean anisotropy within the aperture. This is the same functional form of the prior as used in [2] to analyze a spiral lens galaxy althought with less range into a pure radial dispersion.
7 Cosmological inference
In this section, we study the cosmological constraints from strong lensing using data from images, time delays, central velocity dispersion of the lensing galaxy and independent external convergence estimates. We first show that the data can be used to constrain the angular diameter relation. Based on the constraints on the angular diameter distances, we then introduce the likelihood that allows us to infer the parameters within the flat CDM cosmological model.
7.1 Angular diameter distance posteriors
We can combine the posterior samples of Figure 5 with the independent velocity dispersion measurement to calculate the angular diameter distance relations and (Equation 4.11 and 2.8) as
(7.1) 
and
(7.2) 
To take into account the errors in , and , we importance sample the posteriors from the independent measurements ( and ) and for we uniformly sample in the range [0.5,5] times [see e.g. 47, 21, 22, for similar use].
The vs plane as shown in Figure 7 inherits the cosmological information of this analysis coming from the combined data and consistently translates the uniform prior in the source scale into the cosmological inference. This plane covers a wide range but the constrained region is more narrow. [48] did a very similar analysis in term of folding in the velocity dispersion measurement. In our case, we get a degeneracy in the twodimensional plane coming from the MST whereas [48] and the forecasting of [49] assume independence in the two quantities. We overplot the posterior samples of WMAP DR9 [50] and Planck15 [PlanckCollaboration:2015p9875] converted to the angular diameter distances of the lens system. We find that at least the posterior samples of one chosen source scale parameter is consistent within 2 with the CMB experiment posteriors in a flat CDM cosmology for the low redshift angular diameter distance relations. Without the renormalization of the imaging likelihood (see Section 5.3), this statement can not be made.
7.2 An analytic likelihood for cosmology
So far, we have discretized the degeneracy plane by uniformly sample in steps of 0.01”. Effectively this means that while all the other parameters are sampled through standard MCMC methods, the direction is sampled on a grid. This separation is needed to allow us to do the renormalization of the likelihood as described in Section 6.2. Sampling the grid finely is computationally expensive. In the following, we show how we can analytically describe the posterior distribution and fill the gaps in without additional sampling.
To do so, we first map the vs plane of Figure 7 (left panel) into a vs plane (right panel). We see a linear relations between the posterior samples in a monotonic and equally spaced increasing fashion as a function of . We fit with linear regression the function
(7.3) 
with being the slope and being the intercept. The legend of Figure 7 (right panel) shows the best fit values, which we discuss in more detail later. The linear fit is a good description of the combined samples of different source scalings. The same is shown for the filter F555W analysis in Appendix C. The spread of the distribution orthogonal to the linear relation is not well fit by a Gaussian distribution, but we find a skewed normal distribution provides a good description.
The onedimensional likelihood of the strong lens system data given a cosmological model is given by the onedimensional probability density of the samples relative to the fitted line:
(7.4) 
where is the standard deviation, the skewness and is the reparameterized skewed normal distribution function described in Appendix E. How the different source scale priors on fold in the likelihood is described in Appendix G and equation G.3. In this section, we apply a flat prior in , , and a flat prior in , , (see section 6.3). The inferences for the other combinations of the choices of priors are presented in Section 7.4.
For the analysis of the HST band F814W we fit the values , , and . For band F555W the fits result in , , and . Fitting the combined samples of the band F814W and F555W leads to , , and . The units of these parameters are given in respect with the angular diameter distances in Mpc.
The simple form of the likelihood enables a fast and consistent combination of different strong lensing systems also in combination with other cosmological probes.
7.3 Cosmological parameter constraints
The constraints on the angular diameter distance relations can be turned into constraints on the cosmological parameters of the background evolution. In the following we assume a flat CDM cosmology. The homogeneous expansion can be described in terms of the matter density and the Hubble constant . We use the likelihood of Equation (7.4) with the values of , , and from the analysis of F814W and F555W separately. First, we sample the parameters and simultaneously with uniform priors of and . Figure 8 shows the posterior distributions for the filter F814W (left panel) and F555W (middle panel) for the priors (, ) separately. The degeneracy in is strong but can be determined fairly well. A good approximation of the degeneracy shown in the plane can be described by
(7.5) 
where is the value for at fixed and is the marginalized error at fixed . This form allows us to more directly compare with other results from the literature.
For a fixed value of , we infer a Hubble constant of km sMpcfor the F814W and km sMpcfor the F555W analysis.
From the analysis of each filter separately, we get an uncertainty coming from the imaging data only to be below 1% in the resulting inference of . Given the fact that our estimates for the two filters F814W and F555W is about 4.0% different while using exactly the same analysis and the same timedelay and kinematic data for all other parameters involved, we conclude that the imaging data inference is partially driven by unknown systematics in the modeling and the data. To marginalize out potential systematics in the analysis, we combine the two analyses on the angular diameter posterior level. The twodimensional posteriors are shown in the right panel of Figure 8. In this way, we get a Hubble constant of km sMpc. The full posteriors for both samples are shown in Figure 9.
7.4 Prior dependence
In this section we investigate the dependence of the cosmological inference from the choice of priors of the source scale and the anisotropic kinematics of the lensing galaxy . In Section 6.3 we stated for each parameter two different priors, each of them being quoted to be uninformative and probing the same range in the physics. In table 1 the likelihood parameters and the resulting inference for fixed in flat CDM are stated. We see a strong prior dependence on the posterior distribution which can result in a mean shift in of more than 10 km sMpc. The source scale prior can result in a weak mean shift of about 12 km sMpcwithout a change in the uncertainty. This means that the information content in the imprinted priors are roughly the same and the systematic uncertainty is subdominant to the quoted total uncertainty. The situation changes for the kinematic prior . The flat prior approach for the two different parameterizations shifts the mean infered value of by more than . The precision is also affected: The prior results in a significantly higher precision inference than . This implies that inherits more information for the specific task of measuring than . If this prior is not representative of the distribution of early type galaxies, the inference with can be significantly biased compared with .
^{4}^{4}4For fixed in flat CDM.  

    
0.0012  263.8  
    
0.0014  264.2 
8 Joint uncertainties and comparison with other work
In this Section we analyze the impact of the different data sets on the cosmological inference and we compare our method and results with the literature.
8.1 Uncertainties from the different data sets
We assign uncertainty estimates on the inference of coming from the independent data sets, namely the time delays, the HST ACS images, the lineofsight analysis of wide field data and the spectra of the lens galaxy for the kinematic estimate ^{5}^{5}5In this analysis we ignore the dependence of the lineofsight analysis on the shear term from the ACS image reconstruction.. We do so by forecasting a perfect modeling result for all data sets except the one in question. We then proceed in exactly the same way as presented in Section 7. This leads to an inference of the cosmological parameters only affected by the uncertainties coming from one single data set. We perform this analysis with the default priors and .
In Table 2 the estimated uncertainties from the different data sets are summarized and the 1 uncertainties on for fixed is stated. The Gaussian approximation of all these errors leads to a total uncertainty of 9.4% on . The estimate of the uncertainty coming from the full sampling results in 7.9%. This analysis does not include further potential systematics and does not question the priors chosen.
Our approach on the error analysis is different than the one chosen by [22]. We do not quote an error on the lens model itself, as this inference is dependent on different data sets. We quote an error on the lens model modulo a MST for the image reconstruction and separately an error on the kinematic estimate, which potentially can fully break the degeneracy.
We clearly see that the dominant contribution in the final uncertainty can be related to the kinematic data and its modeling. As discussed in Section 6.2, high resolution spectroscopy can provide data which can better constrain different positions in the MST and therefore significantly reduce the uncertainty on the angular diameter distance relation. The second most dominant uncertainty come from the lineofsight contribution.
8.2 Comparison with other work
Cosmographic inference has been published by [22] with the same lens model parameterization and by [23] in combination with a composite (dark matter and baryonic matter separated) lens model, in response to the work of [1]. The values and uncertainties on the Hubble constant are km sMpcfor a value of in [22], a 5.5% error, and km sMpc, a 5.75% error, with for a flat CDM universe.
One difference between the work of [22, 23] and the one presented in this work arise from the explicit treatment of the MSD and related degeneracies in our work and its link to the source surface brightness reconstruction method. This allows us to overcome (at least partially) systematics from the source reconstruction method and the mass profile assumption. On the other hand, this weakens the constraining power of the image reconstruction. This explains our larger uncertainties compared to [22, 23]. Furthermore, their stated values on are independent in the flat scenario while our values do depend on (see our Figure 8 vs. Figure 8 in [22]). This comes from the different description of the cosmological likelihood. The likelihood in [22] is described fully in terms of the timedelay distance where else our likelihood has an additional dependence on . In that sense, their stated value is independent of but ours requires a prior on .
A second difference is that we work in a 2Dplane of angular diameter distance relations (Figure 7) without the need of cosmological priors to define our angular diameter distance likelihood. This results in a different shape of the posterior distribution in the  plane (Figure 8) and the inferred projected posteriors have a strong dependence.
The best comparison with the work of [22, 23] should be done when comparing the inference with the same kinematic prior (first or second row in Table 1). We want to stress that we use explicit priors on the source scale. The cosmological inference is dependent on this prior as the constraining power of the kinematic data is weak. Therefore a shift of about 1 in our stated uncertainty on the inference of is not surprising.
Comparing our results with the CMB experiments, we get a 2.5 shift for and a 1 shift for in the CDM parameter inference. We conclude that the angular diameter distance at last scattering and the inferred angular diameter distance relation at lower redshift from this analysis are consistent with a flat CDM cosmology. Our analysis depends on uninformative priors on the kinematics of the lens galaxy and the source reconstruction scale . Further systematics can potentially also occur and are not included in this analysis.
Description  Uncertainty 
Time delays  1.6% 
HST ACS image reconstruction  2.8% 
Lineofsight contribution  4.7% 
Lens kinematics ^{6}^{6}6The quoted uncertainty includes the uncertainty in the unisotropy radius with a prior of .  7.5% 
Total (Gaussian)  9.4% 
Total (full sampling ^{7}^{7}7The uncertainty in the full sampling is given as half of the 68% confidence interval divided by the mean posterior value.)  7.9% 
9 Conclusions
In this work we applied the newly developed source reconstruction technique of [32] to the strong lens system RXJ11311231 to extract cosmographic information. We showed how different source reconstruction scales probe different regimes in the MST even when the lens model is not fully transformable through the MST.
This work is built on the modeling and the data of [22] and the systematics analysis of [1]. We incorporate a renormalization of the imaging likelihood such that we have explicit priors on the source scale before combining with the kinematic data.
We introduced a cosmographic inference analysis which enables us to combine imaging, timedelay and kinematic data without relying on any cosmological priors. We came up with a likelihood function only based on the angular diameter distance relations, which can be described in analytic terms.
We find that the choice of priors on lens model parameters and source size are subdominant for the statistical errors for measurements of this systems. The choice of prior for the source is subdominant at present (2% uncertainty on ) but may be relevant for future studies. More importantly, we find that the priors on the kinematic anisotropy of the lens galaxy have a significant impact on our cosmological inference. When incorporating all the above modeling uncertainties, we find km sMpc(for ), when using kinematic priors similar to other studies. When we use a different kinematic prior motivated by Barnabè et al. (2012) [2] but covering the same anisotropic range, we find km sMpc. This means that the choice of kinematic modeling and priors have a significant impact on cosmographic inferences. Further systematics in the data and modeling can also occur. The way forward is either to get better velocity dispersion measures which would down weight the impact of the priors or to construct physically motivated priors for the velocity dispersion model.
This inference analysis was achieved with a single strong lens system in two imaging bands. Combining the information of multiple systems with comparable data can add vital constraints about the late time expansion history of the universe, also in terms of extensions of the standard cosmological model.
Acknowledgments
We thank Sherry Suyu and the coauthors of [22] and [23] for useful comments and discussions. We thank the Referee for useful comments on the manuscript that helped us improving the text. We acknowledge the import, partial use or inspiration of the following python packages: CosmoHammer [51], FASTELL [52], numpy ^{8}^{8}8www.numpy.org, scipy ^{9}^{9}9www.scipy.org, astropy ^{10}^{10}10www.astropy.org, triangle ^{11}^{11}11https://github.com/dfm/triangle.py. This work has been supported by the Swiss National Science Foundation (grant 200021_149442/1 and 200021_143906/1).
References
 [1] P. Schneider and D. Sluse, Masssheet degeneracy, powerlaw models and external convergence: Impact on the determination of the hubble constant from gravitational lensing, Astronomy & Astrophysics 559 (Nov, 2013) A37.
 [2] M. Barnabè, A. A. Dutton, P. J. Marshall, M. W. Auger, B. J. Brewer, T. Treu et al., The swells survey  iv. precision measurements of the stellar and dark matter distributions in a spiral lens galaxy, Monthly Notices of the Royal Astronomical Society 423 (Jun, 2012) 1073.
 [3] S. Refsdal, On the possibility of determining hubble’s parameter and the masses of galaxies from the gravitational lens effect, Monthly Notices of the Royal Astronomical Society 128 (Jan, 1964) 307.
 [4] R. D. Blandford and R. Narayan, Cosmological applications of gravitational lensing, In: Annual review of astronomy and astrophysics. Vol. 30 (A9325826 0990) 30 (Jan, 1992) 311.
 [5] F. Courbin, A. Eigenbrod, C. Vuissoz, G. Meylan and P. Magain, Cosmograil: the cosmological monitoring of gravitational lenses, Gravitational Lensing Impact on Cosmology 225 (Jun, 2005) 297.
 [6] A. Eigenbrod, F. Courbin, C. Vuissoz, G. Meylan, P. Saha and S. Dye, Cosmograil: The cosmological monitoring of gravitational lenses. i. how to sample the light curves of gravitationally lensed quasars to measure accurate time delays, Astronomy and Astrophysics 436 (Jun, 2005) 25.
 [7] F. Courbin, V. Chantry, Y. Revaz, D. Sluse, C. Faure, M. Tewes et al., Cosmograil: the cosmological monitoring of gravitational lenses. ix. time delays, lens dynamics and baryonic fraction in he 04351223, Astronomy & Astrophysics 536 (Dec, 2011) A53.
 [8] M. Tewes, F. Courbin, G. Meylan, C. S. Kochanek, E. Eulaers, N. Cantale et al., Cosmograil: Measuring time delays of gravitationally lensed quasars to constrain cosmology, The Messenger 150 (Dec, 2012) 49.
 [9] S. R. Kumar, M. Tewes, C. S. Stalin, F. Courbin, I. Asfandiyarov, G. Meylan et al., Cosmograil: the cosmological monitoring of gravitational lenses. xiv. time delay of the doubly lensed quasar sdss j1001+5027, Astronomy & Astrophysics 557 (Sep, 2013) A44.
 [10] M. Tewes, F. Courbin and G. Meylan, Cosmograil: the cosmological monitoring of gravitational lenses. xi. techniques for time delay measurement in presence of microlensing, Astronomy & Astrophysics 553 (May, 2013) 120.
 [11] P. Schneider and C. Seitz, Steps towards nonlinear cluster inversion through gravitational distortions. 1: Basic considerations and circular clusters, Astronomy and Astrophysics (ISSN 00046361) 294 (Feb, 1995) 411.
 [12] C. S. Kochanek, Is there a cosmological constant?, Astrophysical Journal v.466 466 (Aug, 1996) 638.
 [13] P. L. Schechter, C. D. Bailyn, R. Barr, R. Barvainis, C. M. Becker, G. M. Bernstein et al., The quadruple gravitational lens pg 1115+080: Time delays and models, The Astrophysical Journal 475 (Feb, 1997) L85.
 [14] L. V. E. Koopmans, T. Treu, C. D. Fassnacht, R. D. Blandford and G. Surpi, The hubble constant from the gravitational lens b1608+656, The Astrophysical Journal 599 (Dec, 2003) 70.
 [15] O. Wucknitz, A. D. Biggs and I. W. A. Browne, Models for the lens and source of b0218+357: a lensclean approach to determine h0, Monthly Notices of the Royal Astronomical Society 349 (Mar, 2004) 14.
 [16] T. York, N. Jackson, I. W. A. Browne, O. Wucknitz and J. E. Skelton, The hubble constant from the gravitational lens class b0218+357 using the advanced camera for surveys, Monthly Notices of the Royal Astronomical Society 357 (Feb, 2005) 124.
 [17] P. Jakobsson, J. Hjorth, I. Burud, G. Letawe, C. Lidman and F. Courbin, An optical time delay for the double gravitational lens system fbq 0951+2635, Astronomy and Astrophysics 431 (Feb, 2005) 103.
 [18] C. Vuissoz, F. Courbin, D. Sluse, G. Meylan, M. Ibrahimov, I. Asfandiyarov et al., Cosmograil: the cosmological monitoring of gravitational lenses. v. the time delay in sdss j1650+4251, Astronomy and Astrophysics 464 (Mar, 2007) 845.
 [19] D. Paraficz, J. Hjorth and Á. Elíasdóttir, Results of optical monitoring of 5 sdss double qsos with the nordic optical telescope, Astronomy and Astrophysics 499 (May, 2009) 395.
 [20] R. Fadely, C. R. Keeton, R. Nakajima and G. M. Bernstein, Improved constraints on the gravitational lens q0957+561. ii. strong lensing, The Astrophysical Journal 711 (Mar, 2010) 246.
 [21] S. H. Suyu, P. J. Marshall, M. W. Auger, S. Hilbert, R. D. Blandford, L. V. E. Koopmans et al., Dissecting the gravitational lens b1608+656. ii. precision measurements of the hubble constant, spatial curvature, and the dark energy equation of state, The Astrophysical Journal 711 (Mar, 2010) 201.
 [22] S. H. Suyu, M. W. Auger, S. Hilbert, P. J. Marshall, M. Tewes, T. Treu et al., Two accurate timedelay distances from strong lensing: Implications for cosmology, The Astrophysical Journal 766 (Apr, 2013) 70.
 [23] S. H. Suyu, T. Treu, S. Hilbert, A. Sonnenfeld, M. W. Auger, R. D. Blandford et al., Cosmology from gravitational lens time delays and planck data, The Astrophysical Journal Letters 788 (Jun, 2014) L35.
 [24] P. Saha, J. Coles, A. V. Macciò and L. L. R. Williams, The hubble time inferred from 10 time delay lenses, The Astrophysical Journal 650 (Oct, 2006) L17.
 [25] M. Oguri, Gravitational lens time delays: A statistical assessment of lens model dependences and implications for the global hubble constant, The Astrophysical Journal 660 (May, 2007) 1.
 [26] J. Coles, A new estimate of the hubble time with improved modeling of gravitational lenses, The Astrophysical Journal 679 (May, 2008) 17.
 [27] E. E. Falco, M. V. Gorenstein and I. I. Shapiro, On modeldependent bounds on h(0) from gravitational images application of q0957 + 561a,b, Astrophysical Journal 289 (Feb, 1985) L1.
 [28] P. Saha, Lensing degeneracies revisited, The Astronomical Journal 120 (Oct, 2000) 1654.
 [29] O. Wucknitz, Degeneracies and scaling relations in general powerlaw models for gravitational lenses, Monthly Notices of the Royal Astronomical Society 332 (Jun, 2002) 951.
 [30] J. Liesenborgs and S. D. Rijcke, Lensing degeneracies and mass substructure, Monthly Notices of the Royal Astronomical Society 425 (Sep, 2012) 1772.
 [31] P. Schneider and D. Sluse, Sourceposition transformation: an approximate invariance in strong gravitational lensing, Astronomy & Astrophysics 564 (Apr, 2014) A103.
 [32] S. Birrer, A. Amara and A. Refregier, Gravitational lens modeling with basis sets, The Astrophysical Journal 813 (Nov, 2015) 102.
 [33] D. Sluse, J. Surdej, J.F. Claeskens, D. Hutsemékers, C. Jean, F. Courbin et al., A quadruply imaged quasar with an optical einstein ring candidate: 1rxs j113155.4123155, Astronomy and Astrophysics 406 (Jul, 2003) L43.
 [34] J.F. Claeskens, D. Sluse, P. Riaud and J. Surdej, Multi wavelength study of the gravitational lens system rxs j11311231. ii. lens model and source reconstruction, Astronomy and Astrophysics 451 (Jun, 2006) 865.
 [35] B. J. Brewer and G. F. Lewis, Unlensing hst observations of the einstein ring 1rxs j11311231: a bayesian analysis, Monthly Notices of the Royal Astronomical Society 390 (Oct, 2008) 39.
 [36] G. C. F. Chen, S. H. Suyu, K. C. Wong, C. D. Fassnacht, T. Chiueh, A. Halkola et al., Sharp  iii: First use of adaptive optics imaging to constrain cosmology with gravitational lens time delays, arXiv astroph.CO (Jan, 2016) , [1601.01321v1].
 [37] M. Tewes, F. Courbin, G. Meylan, C. S. Kochanek, E. Eulaers, N. Cantale et al., Cosmograil: the cosmological monitoring of gravitational lenses. xiii. time delays and 9yr optical monitoring of the lensed quasar rx j11311231, Astronomy & Astrophysics 556 (Aug, 2013) 22.
 [38] C. D. Fassnacht, L. V. E. Koopmans and K. C. Wong, Galaxy number counts and implications for strong lensing, Monthly Notices of the Royal Astronomical Society 410 (Feb, 2011) 2167.
 [39] J. L. Sersic, Atlas de galaxias australes, Cordoba (Jan, 1968) .
 [40] A. Refregier, Shapelets  i. a method for image analysis, Monthly Notice of the Royal Astronomical Society 338 (Jan, 2003) 35.
 [41] L. Hernquist, An analytical model for spherical galaxies and bulges, Astrophysical Journal 356 (Jun, 1990) 359.
 [42] P. Saha and L. L. R. Williams, Gravitational lensing model degeneracies: Is steepness allimportant?, The Astrophysical Journal 653 (Dec, 2006) 936.
 [43] J. Akeret, S. Seehars, A. Amara, A. Refregier and A. Csillaghy, Cosmohammer: Cosmological parameter estimation with the mcmc hammer, Astronomy and Computing 2 (Aug, 2013) 27.
 [44] S. M. Faber, C. N. A. Willmer, C. Wolf, D. C. Koo, B. J. Weiner, J. A. Newman et al., Galaxy luminosity functions to z 1 from deep2 and combo17: Implications for red galaxy formation, The Astrophysical Journal 665 (Aug, 2007) 265.
 [45] L. V. E. Koopmans, A. Bolton, T. Treu, O. Czoske, M. W. Auger, M. Barnabè et al., The structure and dynamics of massive earlytype galaxies: On homology, isothermality, and isotropy inside one effective radius, The Astrophysical Journal Letters 703 (Sep, 2009) L51.
 [46] M. Barnabè, O. Czoske, L. V. E. Koopmans, T. Treu and A. S. Bolton, Twodimensional kinematics of slacs lenses  iii. mass structure and dynamics of earlytype lens galaxies beyond z â 0.1, Monthly Notices of the Royal Astronomical Society 415 (Aug, 2011) 2215.
 [47] A. Lewis and S. Bridle, Cosmological parameters from cmb and other data: A monte carlo approach, Physical Review D 66 (Nov, 2002) 103511.
 [48] I. Jee, E. Komatsu and S. H. Suyu, Measuring angular diameter distances of strong gravitational lenses, Journal of Cosmology and Astroparticle Physics 11 (Nov, 2015) 033.
 [49] I. Jee, E. Komatsu, S. H. Suyu and D. Huterer, Timedelay cosmography: Increased leverage with angular diameter distances, eprint arXiv 1509 (Sep, 2015) 3310.
 [50] G. Hinshaw, D. Larson, E. Komatsu, D. N. Spergel, C. L. Bennett, J. Dunkley et al., Nineyear wilkinson microwave anisotropy probe (wmap) observations: Cosmological parameter results, The Astrophysical Journal Supplement 208 (Oct, 2013) 19.
 [51] J. Akeret, S. Seehars, A. Amara, A. Refregier and A. Csillaghy, Cosmohammer: Cosmological parameter estimation with the mcmc hammer, Astrophysics Source Code Library (Mar, 2013) 1303.003.
 [52] R. Barkana, Fast calculation of a family of elliptical mass gravitational lens models, Astrophysical Journal v.502 502 (Aug, 1998) 531.
 [53] J. Bergé, L. Gamper, A. Réfrégier and A. Amara, An ultra fast image generator (ufig) for widefield astronomy, Astronomy and Computing 1 (Feb, 2013) 23.
 [54] C. S. Kochanek, What do gravitational lens time delays measure?, The Astrophysical Journal 578 (Oct, 2002) 25.
Appendix A Numerical computation of the luminosityweighted LOS velocity dispersion
The computation of the luminosityweighted LOS velocity dispersion within an aperture under certain seeing conditions (Equation 4.10) involves numerically challenging projection integrals and convolutions. In this section, we describe our approach to achieve a numerically stable and fast computation with a MonteCarlo raytracing approach, similarly used by e.g. [53] to render convolved Galaxy light profiles. This method is based on drawing positions representing the total light distribution of the galaxy.
For the light in the galaxy, we take a Hernquist profile [41]
(A.1) 
where is the total flux and relaxed to the effective radius of the galaxy by . The radial distribution function of flux is then
(A.2) 
The cumulative distribution function is
(A.3) 
A sample of can then be drawn from the distribution
(A.4) 
where is the uniform distribution in .
In the following, we describe the steps starting from a representative sample of the flux in the galaxy to get to the estimate of the aperture averaged velocity dispersion:

Draw a representative sample of radii drawn from the threedimensional light distribution of the Hernquist profile (Equation A.4).

Project the radius on a random twodimensional plane and compute its projected radius and the projected coordinates . This sample represents the projected light profile of the galaxy.

Displace the twodimensional coordinates with a random realization according to the seeing distribution to . We assume the PSF is a twodimensional Gaussian distribution. This sample represents the convolved, projected twodimensional light distribution of the galaxy.

Select samples, whose displaced position is on the aperture . This selects a sample representative for the luminosity and radial weighting within the aperture.

Evaluate , the projected (but unweighted) velocity dispersion for the remaining samples.

Take the sample average of the velocity dispersion . This average (once converged) corresponds to with the assumption of a Gaussian velocity dispersion.
About 100 samples evaluated in the aperture gives already an accuracy in of about 1%. For this paper, the computation is done with 1000 samples.
Appendix B Residual maps
In Figure 10 the normalized residuals corresponding to the source models with different source scales in Section 5.2 are shown. The residual maps differ significantly between the best fit values of the different shapelet scales . This reflects the fact that extended structure in the Einstein ring can give constraints on the local slope of the mass profile and the given mass model can not adopt equally well to different source scales as it is can not be rescaled according to the masssheet transform. The inferred lens models can be understood as the best fit powerlaw profiles at different positions within the MST.
Appendix C Analysis on WFC1 F555W
In the paper, we did focus on the analysis of the WFC1 F814W filter band. Here we present the same analysis for filter F555W. Figure 12 shows the posterior distribution of the lens model parameters and time delay distance for F555W. Figure 11 shows the constraints on the angular diameter distance relation. The values describing the distribution can be found in the main text.
Appendix D Bayesian description and renormalization of the imaging likelihood
One of the steps presented in this paper is the renormalization of the imaging likelihood for different source scales . In Section 5.3 we provided heuristic arguments for this approach in the case of time delay cosmography. In the following Section, we provide a Bayesian interpretation and justification of our choice in performing this calculation.
Let us assume that there is a complete model that is able to fully describe the lens, with parameters . However, when we fit the data, in our modeling process, we use a restricted subset of the model containing only the parameters and that the missing degrees of freedom are captured by the parameters . To complete our notations, the source scale is given as , the cosmological parameters as . We also denote the image data as , the kinematic data as and any other independent data of the time delays and the lens environment as .
Our goal is to estimate the cosmological parameters given the data, which is . We can state, using Bayes rule
(D.1) 
Independence of , and results in
(D.2) 
The internal part of the MST is encapsulated in the term . One way to think about MST is that the source scale cannot be measured from imaging data alone. In other words, given image data and marginalizing over all possible lens models, one should recover the source size prior. The Bayesian expression for the MST is then