Skynet Algorithm for SingleDish Radio Mapping I: ContaminantCleaning, Mapping, and Photometering SmallScale Structures
Abstract
We present a singledish mapping algorithm with a number of advantages over traditional techniques. (1) Our algorithm makes use of weighted modeling, instead of weighted averaging, to interpolate between signal measurements. This smooths the data, but without blurring the data beyond instrumental resolution. Techniques that rely on weighted averaging blur point sources sometimes as much as 40%. (2) Our algorithm makes use of local, instead of global, modeling to separate astronomical signal from instrumental and/or environmental signal drift along the telescope’s scans. Other techniques, such as basket weaving, model this drift with simple functional forms (linear, quadratic, etc.) across the entirety of scans, limiting their ability to remove such contaminants. (3) Our algorithm makes use of a similar, local modeling technique to separate astronomical signal from radiofrequency interference (RFI), even if only continuum data are available. (4) Unlike other techniques, our algorithm does not require data to be collected on a rectangular grid or regridded before processing. (5) Data from any number of observations, overlapping or not, may be appended and processed together. (6) Any pixel density may be selected for the final image. We present our algorithm, and evaluate it using both simulated and real data. We are integrating it into the imageprocessing library of the Skynet Robotic Telescope Network, which includes optical telescopes spanning four continents, and now also Green Bank Observatory’s 20meter diameter radio telescope in West Virginia. Skynet serves hundreds of professional users, and additionally tens of thousands of students, of all ages. Default data products are generated on the fly, but will soon be customizable after the fact.
Subject headings:
techniques: image processing — radio continuum: general1. Introduction
1.1. Skynet
Founded in 2005, Skynet is a global network of fully automated, or robotic, volunteer telescopes, scheduled through a common web interface.^{1}^{1}1https://skynet.unc.edu Currently, our optical telescopes range in size from 14 to 40 inches, and span four continents. Originally envisioned for gammaray burst followup (Reichart et al. 2005, Haislip et al. 2006, Dai et al. 2007, Updike et al. 2008, Nysewander et al. 2009, Cenko et al. 2011, Cano et al. 2011, Bufano et al. 2012, Jin et al. 2013, Morgan et al. 2014, MartinCarrillo et al. 2014, Friis et al. 2015, De Pasquale et al. 2016, Bardho et al. 2016, Melandri et al. 2017), Skynet has also been used to study gravitationalwave sources (Abbott et al. 2017a, 2017b, Valenti et al. 2017, Yang et al. 2017), blazars (Osterman Meyer et al. 2008, Valtonen et al. 2016, Zola et al. 2016, Liu et al. 2017, Goyal et al. 2017), supernovae (Foley et al. 2010, 2012, 2013, Pignata et al. 2011, Valenti et al. 2011, 2014, Pastorello et al. 2013, Milisavljevic et al. 2013, Maund et al. 2013, Fraser et al. 2013, Stritzinger et al. 2014, Inserra et al. 2014, Takats et al. 2014, 2015, 2016, Dall’Ora et al. 2014, Folatelli et al. 2014, Barbarino et al. 2015, de Jaeger et al. 2016, Gutierrez et al. 2016, 2018, Tartaglia et al. 2017, 2018, Prentice et al. 2018), supernova remnants (Trotter et al. 2017), novae (Schaefer et al. 2011), pulsating white dwarfs and hot subdwarfs (Thompson et al. 2010, Barlow et al. 2010, 2011, 2013, 2017, Reed et al. 2012, Bourdreaux et al. 2017, Hutchens et al. 2017), a wide variety of variable stars (Layden et al. 2010, Gvaramadze et al. 2012, Wehrung et al. 2013, Miroshnichenko et al. 2014, Abbas et al. 2015, Khokhlov et al. 2017, 2018), a wide variety of binary stars (Reed et al. 2010, Sarty et al. 2011, Helminiak et al. 2011, 2012, 2015, Strader et al. 2015, Tovmassian et al. 2016, 2017, Fuchs et al. 2016, Pala et al. 2017, Neustroev et al. 2017, Lubin et al. 2017, Swihart et al. 2017, Zola et al. 2017, Kriwattanawong et al. 2018), exoplanetary systems (Fischer et al. 2006, Czesla et al. 2012, Meng et al. 2014, 2015, Kenworthy et al. 2015, Kuhn et al. 2016, Awiphan et al. 2016, Blank et al. 2018), transNeptunian objects and Centaurs (BragaRibas et al. 2013, 2014, DiasOliveira et al. 2015), asteroids (Descamps et al. 2009, Pravec et al. 2010, 2012, 2016, Marchis et al. 2012, Savanevych et al. 2018), and nearEarth objects (NEOs; Brozovic et al. 2011, Pravec et al. 2014). Skynet is also the leading tracker of NEOs in the southern hemisphere (R. Holmes, private communication).
Skynet’s mission is split evenly between supporting professional astronomers and supporting students and the public. Although most of our observations have been for professionals, most of our users are students. We have developed/are continuing to develop Skynetbased curricula for undergraduates, highschool students (in partnership with Morehead Planetarium and Science Center (MPSC)), and middle schoolaged students (in partnership with the University of Chicago/Yerkes Observatory, Green Bank Observatory (GBO), the Astronomical Society of the Pacific, and 4H), as well as for blind and visuallyimpaired students (in partnership with Associated Universities, Inc., the University of Chicago/Yerkes Observatory, the Technical Education Research Centers, and the University of Nevada at Las Vegas). These efforts have reached over 20,000 students, and our publicengagement efforts (also in partnership with MPSC) have also reached over 20,000, mostly elementary and middle schoolaged students. Curriculumbased student users queue observations through the same web interface that the professionals use. Altogether, over 15 million images have been taken to date.
In partnership with GBO, and funded by the American Recovery and Reinvestment Act, Skynet has added its first radio telescope, GBO’s 20meter in West Virginia (see Figure 1). We describe the 20meter, which has been refurbished, in §2. As with Skynet’s optical telescopes, the 20meter serves both professionals and students. Professional use consists primarily of timing observations (e.g., pulsar timing; fast radio burst searches in conjunction with Swift), but also some mapping observations, for photometry (e.g., fading of Cas A and improved fluxdensity calibration of the radio sky (Trotter et al. 2017); intraday variable blazar campaigns in conjunction with other radio, and optical, telescopes). Student use consists of timing, spectroscopic (e.g., Williamson et al. 2018), and mapping observations, but with an emphasis on mapping, at least for beginners.
Regarding student, as well as public, use, the 20meter represents a significant opportunity for radio astronomy. Small optical telescopes can be found on many, if not most, college campuses. But small radio telescopes are significantly more expensive to build, operate, and maintain, and consequently are generally found only in the remote locations that make the most sense for professional use. Consequently, most people – including most students of astronomy – never experience radio telescopes, let alone use them. However, under the control of Skynet, the 20meter is not only more accessible to more professionals, it is already being used by thousands of students per year, of all ages, as well as by the public.
1.2. SingleDish Mapping
In this paper, we present the singledish mapping algorithm that we developed for Skynet, both for professional use and for student use. Here we outline the design requirements that we set for ourselves, and outline approaches that we adopted/developed to meet these requirements.
1.2.1 Mapping Pattern
Many singledish mapping algorithms (e.g., Sofue & Reich 1979, Emerson & Grave 1988) require the signal to be sampled on a rectangular grid. Generally, this requires what is called “stepandintegrate” or “pointandshoot” mapping.
However, this is an inefficient way to observe, with greater telescope acceleration, deceleration, and settling overheads, greater wear and tear on the telescope, and greater sensitivity to timevariable systematics (see §1.2.3). “Onthefly” (OTF) mapping, in which signal is integrated while the telescope moves, minimizes these concerns, as long as integrations span no more than 0.2 beamwidths (FWHM) along the telescope’s direction of motion (along the telescope’s âscanâ), to avoid blurring point sources by more than 1% (Mangum, Emerson & Greison 2007).
A wide variety of OTF mapping patterns might be employed (e.g., see Figure 2):

A “raster” pattern approximates a rectangular grid, though integrations might not line up from scan to scan.

A “nodding” pattern, in which the telescope moves up and down, typically in elevation, as Earth’s rotation carries the telescope’s beam across the sky in the perpendicular direction, is typical of meridiantransit telescopes, which cannot move eastwest (e.g., see §2.2). However, even with fullmotion telescopes, a nodding pattern can be used to maintain a constant parallactic angle, if it is desired that the telescope’s beam pattern not rotate across the image, or from image to image.

“Spiral” and “hypocycloid” patterns are efficient ways of mapping sources and extended structures, respectively, without abrupt changes to the telescope’s motion (Mangum, Emerson & Greison 2007).

A “daisy” pattern can also be used to map sources, with the advantage of crossing the source’s peak, and sampling the background level, an arbitrarily large number of times.
As long as the gaps between scans do not exceed Nyquist sampling, or 0.4 beamwidths, in theory all information between scans can be recovered (modulo noise, interference, etc.) Given this, and not wishing to limit our users to a particular mapping pattern, or set of patterns, we require that our algorithm work independently of mapping pattern. Furthermore, since student/quicklook maps may be undersampled, we also require that our algorithm work independently of sampling density (within reason). Instead, undersampled images will be flagged as not for professional use.
1.2.2 Signal Averaging vs. Signal Modeling
Some singledish mapping algorithms get around the problem of not collecting data on a rectangular grid by resampling the data onto a rectangular grid before processing (e.g., Winkel, Floer & Kraus 2012). This is known as “regridding”, and is typically done by taking a weighted average of the data around each grid point. This is essentially a convolution of the data with the weighting function, sometimes called the convolution kernel.
However, this blurs the image. Many adopt a kernel of width equal to the telescope’s beamwidth, but this results in 40% blurring and 40% errors (near the center of the beam pattern) in the reconstructed image (see Figure 3). Others oversample so a narrower kernel can be used. For example, Winkel, Floer & Kraus (2012) collect 3 times as much data as required by Nyquist and use a 1/2beamwidth kernel, but this still results in 12% blurring and 18% errors (near the center of the beam pattern) in the reconstructed image (Figure 3).
Given that singledish maps already suffer from poor resolution (compared to interferometric maps), we do not want to degrade resolution further, simply due to processing. To this end, instead of weighted averaging, we use weighted modeling: Instead of averaging the data local to each pixel in the final image, using a weighting function, we fit a model to the data local to each pixel in the final image, using a weighting function (see Figure 4). As long as (1) the model is sufficiently flexible over the scale of the weighting function, and (2) sampling is sufficiently dense to constrain (actually, overconstrain) this model, the signal should be recoverable at any location, without blurring. For example, the weighted modeling approach that we present in §3.7 is able to recover the simulated data in Figure 3 with 1% errors (near the center of the beam pattern).
Another requirement that we place on our algorithm is that this – replacing the data with a locally modeled version of itself – be the last step, not the first step. Other algorithms regrid the data before processing (e.g., contaminant removal, see §1.2.3), because their processing operations work only on gridded data. However, this is a poor modeling practice: Even if weighted averaging could be done accurately, and even if weighted modeling can be done accurately, every operation on data propagates uncertainty in ways that are both difficult to understand and even more difficult to properly account for before/in the next operation. It is always preferable to operate on real data, at least for as long as possible, than it is to operate on successive approximations thereof.
1.2.3 Contaminant Removal
Singledish mapping algorithms must address signal contaminants, which we separate into three broad categories: (1) enroute signal drift, resulting in what is sometimes called “the scanning effect”, (2) radiofrequency interference (RFI), and (3) elevationdependent signal. Each of these is demonstrated in Figure 5.
EnRoute Drift: Even with very stable, modern receivers, the detected signal can still drift in time, due to instrumental reasons, such as , or pink, noise, and/or due to environmental reasons, such as changing atmospheric emission, or changing ground emission spilling over the edge of the dish, particular as the dish moves. This results in lowlevel variations in the signal along the telescope’s scans, and is most noticeable from scan to scan (the scanning effect). Components of these variations can be made to vary over shorter or longer angular scales by moving the telescope slower or faster, but this does not eliminate them.
Sofue & Reich (1979) used unsharp masking to separate enroute drift (and smallscale structure in the image) from largerscale structure, and then modeled the enroute drift along each scan with a secondorder polynomial, using sigmaclipping to avoid smallscale structure contamination. Drawbacks of this approach are: (1) Unsharp masking uses a blurred version of the data to correct the data, resulting in a blurred image, at least in the lower signaltonoise parts of the image (something we wish to avoid; §1.2.2); (2) Loworder polynomials may adequately model enroute drift over small angular scales, but are too simple/increasingly inaccurate over larger angular scales, such as the length of a scan; and (3) Sigmaclipping is too crude of an outlier rejection criterion (see §1.3).
Emerson & Grave (1988) Fourier transform the data, collapsing enroute drift to a single band of nearzero spatial frequencies, perpendicular to the scan direction. They then mask it in Fourier space and transform back. Ideally, two orthogonal mappings are combined (in Fourier space), so real spatial frequencies that are masked in one image can be recovered from the other. The primary advantage of this approach is that it does not assume that enroute drift can be wellmodeled by a loworder polynomial over the length of each scan. However, drawbacks are: (1) We will not always have two orthogonal mappings; (2) Even if we did, OTF rasters do not populate rectangular grids, requiring the data to be regridded before processing (however, §1.2.2); and (3) This technique would not work well, or at all, with other mapping patterns (e.g., noddings, daisies, etc.; §1.2.1).
Haslam, Quigley & Salter (1970), Haslam et al. (1974), Seiber, Haslam & Salter (1979), and Haslam et al. (1981) introduced a technique called basketweaving, involving two mappings with intersecting scans (not necessarily orthogonal). Signal differences at the intersections are minimized, but again assuming that enroute drift can be wellmodeled by a loworder polynomial over the length of each scan. The procedure also requires iteration. Winkel, Floer & Krauss (2012) introduced an updated version that does not require iteration, but: (1) It does require regridding (§1.2.2); (2) It works only with two, nearorthogonal mappings, so not with single mappings, and not with noddings, daisies, etc. (§1.2.1); and (3) Although it does permit an arbitrarily highorder parameterization of the enroute drift over the length of each scan, in practice it becomes less effective as the number of parameters times the number of scans approaches the number of regridded pixels (which is typically significantly less than the original number of signal measurements). This limited them to secondorder polynomials in the examples that they presented.
RFI: RFI is typically localized to specific frequencies. If spectral information is available, these frequencies can be identified and masked, or the excess signal at these frequencies can be measured and subtracted off (see §3.1 of Paper II, Dutton et al. 2018). However, sometimes only continuum data are available (e.g., see §2.2), or spectral data are available but the RFI has a continuous spectrum (e.g., lightning). In these cases, RFI can be identified only from its temporal signature. If prolonged, it might be indistinguishable from enroute drift. If localized in time, it is unlikely to occur at the same position on adjoining scans, appearing as a source that is narrower than the telescope’s beamwidth, at least across scans. In most of the above references, RFI was identified by eye and excised by hand. However, given the volume, and diversity, of users that we have with the 20meter, we need to be able to do this automatically.
ElevationDependent Signal: Atmospheric emission increases as elevation decreases. And if the dish is overilluminated, terrestrial spillover increases as elevation increases. If interested only in smallscale structures, these backgrounds can be removed with enroute drift (see §3.3). However, if largescale structures are to be retained (or added back in), these backgrounds need to be removed separately (see §2.3 of Paper II, Dutton et al. 2018).
Telescope  Receiver  Scale 

20meter  L (HI OH)^{a}^{a}Before August 1, 2014  7^{c}^{c}The 20meter’s beam pattern has a lowlevel, broad component, in both L and X bands, and consequently, we recommend larger backgroundsubtraction scales here. This component was significant in L band prior to 8/1/14, as can be seen in the third row of Figure 3, as well as in Figure 5. Post 8/1/14, it was significantly reduced, but not altogether eliminated. This component corresponds to approximately 2% – 3% and 4% – 5% of the integrated beam pattern in L and X band, respectively. If this is not a concern, these minimum recommended 1D backgroundsubtraction scales can be lowered to 3 and 4 theoretical beamwidths, respectively. 
20meter  L (HI)^{b}^{b}After August 1, 2014  6^{c}^{c}The 20meter’s beam pattern has a lowlevel, broad component, in both L and X bands, and consequently, we recommend larger backgroundsubtraction scales here. This component was significant in L band prior to 8/1/14, as can be seen in the third row of Figure 3, as well as in Figure 5. Post 8/1/14, it was significantly reduced, but not altogether eliminated. This component corresponds to approximately 2% – 3% and 4% – 5% of the integrated beam pattern in L and X band, respectively. If this is not a concern, these minimum recommended 1D backgroundsubtraction scales can be lowered to 3 and 4 theoretical beamwidths, respectively. 
20meter  L (OH)^{b}^{b}After August 1, 2014  6^{c}^{c}The 20meter’s beam pattern has a lowlevel, broad component, in both L and X bands, and consequently, we recommend larger backgroundsubtraction scales here. This component was significant in L band prior to 8/1/14, as can be seen in the third row of Figure 3, as well as in Figure 5. Post 8/1/14, it was significantly reduced, but not altogether eliminated. This component corresponds to approximately 2% – 3% and 4% – 5% of the integrated beam pattern in L and X band, respectively. If this is not a concern, these minimum recommended 1D backgroundsubtraction scales can be lowered to 3 and 4 theoretical beamwidths, respectively. 
20meter  X  6^{c}^{c}The 20meter’s beam pattern has a lowlevel, broad component, in both L and X bands, and consequently, we recommend larger backgroundsubtraction scales here. This component was significant in L band prior to 8/1/14, as can be seen in the third row of Figure 3, as well as in Figure 5. Post 8/1/14, it was significantly reduced, but not altogether eliminated. This component corresponds to approximately 2% – 3% and 4% – 5% of the integrated beam pattern in L and X band, respectively. If this is not a concern, these minimum recommended 1D backgroundsubtraction scales can be lowered to 3 and 4 theoretical beamwidths, respectively. 
40foot  L (HI)  3 
We approach all three types of signal contaminants in the same way. First, we model them locally, not globally: We use simple parameterizations, such as first and secondorder polynomials, but we do not expect, nor need, them to hold over large angular scales. When fitting these models, we use robust, and appropriate, outlier rejection (see §1.3) to separate contaminants from astronomical signal. Finally, we combine our locallyfitted models into a global model, checking the procedure against simulated data.
1.3. Robust Chauvenet Outlier Rejection
Our singledish mapping algorithm is built upon a new outlier rejection technique, called “robust Chauvenet rejection”, which we developed for the Skynet Robotic Telescope Network’s imageprocessing library in general, and for this application in particular.
Sigma clipping (§1.2.3) is one of the simplest outlier rejection techniques, but also one of the crudest. It suffers from a number of problems, foremost of which is how to set the threshold. For example, if working with 100 data points, 2sigma variations are expected but 4sigma variations are not. However, if working with 10 data points, 3sigma variations are expected but 5sigma variations are not. Chauvenet rejection is simply sigma clipping plus a reasonable rule for setting the threshold:
(1) 
where is the total number of data points and is the cumulative probability of being more than standard deviations from the mean, assuming a Gaussian distribution (Chauvenet 1863).
However, sigma clipping, and by extension Chauvenet rejection, also suffers from the following problem: If the mean and the standard deviation are not known a priori, which is almost always the case, they must be measured from the data, and both of these quantities are sensitive to the very outliers that they are being used to reject. This limits the applicability of Chauvenet rejection to very small samplecontamination fractions.
Consequently, we developed robust Chauvenet rejection, which makes use of the mean and the standard deviation, but also makes use of increasingly robust (but decreasingly precise) alternatives, namely the median and the (halfsample; e.g., Bickel & Fruhwith 2005) mode, and the 68.3percentile deviation, measured in three increasingly robust ways. These quantities approximate the mean and the standard deviation, respectively, and equal them in the case of a Gaussian distribution. But they are significantly less sensitive to outliers, and are particularly effective, meaning both robust and precise, if applied in proper combination.
The applicability of this technique is very broad, spanning not only astronomy and science, but all quantitative disciplines. We have submitted this technique as a companion paper (Maples et al. 2018), and make extensive use of it in this paper, with implementation details offered in the footnotes. As such, this paper additionally serves the purpose of “field testing” this new technique.
1.4. Overview
In §2, we provide a technical description of the refurbished 20meter, including its L and Xband receivers. We also provide a technical description of Green Bank Observatory’s 40foot telescope, used primarily for education and public engagement, but on which we developed many components of our algorithm before the 20meter was ready. In §3, we introduce our algorithm for contaminantcleaning and mapping smallscale structures (e.g., sources) from continuum observations, and test it on both simulated and real data. In §4, we demonstrate that opticalstyle, aperture photometry can be carried out on these maps, and introduce an algorithm for calculating photometric error bars, given the different (and more complicated, correlated) noise characteristics of these maps. We summarize our results in §5.
In Paper II (Dutton et al. 2018): (1) We expand on our smallscale structure algorithm to additionally contaminantclean and map largescale structures; (2) We do the same for spectral (as opposed to just continuum) observations; and (3) We carry out an Xband survey of the Galactic plane, from , to showcase, and further test, techniques developed in both papers.
2. Telescopes
2.1. Green Bank Observatory 20Meter
The 20meter diameter telescope (Figure 1) was constructed by Radiation Systems, Inc. at the Green Bank site, completed in 1994, and outfitted with a dualfrequency 2.3/8.4GHz feed and receiver (Norrod et al. 1994). Funded by the U.S. Naval Observatory (USNO), it was part of a network of VeryLong Baseline Interferometry (VLBI) stations that monitored changes in Earth’s rotation axis and rate, as well as measured continental drift (Ma et al. 1998).
The antenna has an altitude/azimuth mount, and is able to observe down to 1 degree above the horizon. Its maximum slew speed is 2 degrees per second in each axis, which met the needs of the geodetic VLBI program, to quickly observe many radio sources, despite being widely distributed across the sky. Its pointing accuracy is 35 arcsec rms.
The antenna’s surface is a paraboloid with a focal ratio of 0.43 at prime focus. Its surface accuracy was originally 0.8 mm rms, but this has degraded over the years to 1.4 mm rms. Its aperture efficiency is 50% at 8 GHz.
Due to budget cutbacks, USNO ended its Green Bank operations in 2000. From 2000 to 2010, the telescope was used only occasionally, primarily for testing experimental receivers.
In 2010, the University of North Carolina at Chapel Hill (UNCCH) received a grant from the National Science Foundation to expand its Skynet Robotic Telescope Network, and this included restoring and automating the 20meter.
Two receivers were prepared. The primary receiver is for 1.3 – 1.8 GHz (L band), and had previously been used on Green Bank’s 140foot diameter telescope. We originally operated this receiver with three interchangeable feeds, each for a different part of the band. However, changing these feeds was a manyhour, manual operation. In 2014, we built and installed a new feed, for the entire 1.3 – 1.8 GHz band.
The secondary receiver is for 8 – 10 GHz (X band). It is a strippeddown version of the original, dualfrequency receiver, outfitted with a new 8 – 10 GHz feed. This receiver is used primarily when the Lband receiver is off the telescope for maintenance, which has only been for two extended (manymonth) periods since 2010.
A new backend was also prepared. The spectrometer uses a FieldProgrammable Gate Array (FPGA) design that was adapted from the Green Bank Ultimate Pulsar Processing Instrument (GUPPI). In its low resolution mode, it produces 1024channel spectra over a 500MHz band. In 2013, a highresolution mode was added; in this mode, it produces 1024, 2048, 4096, 8192, or 16286channel spectra in each of two 15.625MHz bands (see §3 of Paper II).
New software was also written for lowlevel control of (1) the antenna’s pointing, and (2) data acquisition from the spectrometer, as well as for highlevel control of the entire system through Skynet’s webbased interface (Footnote 7). For both its optical telescopes and its radio telescope, Skynet features a dynamically updated queue, with options for assigning different users, groups of users, and collaborations of groups different levels of priority access, including targetofopportunity access.
Beyond the initial grant, continued operation of the 20meter has been made possible by a combination of education grants and research grants (primarily to search for fast radio bursts), as well as by operations funds and personnel of the Green Bank Observatory. Since the 20meter went online on Skynet in 2012, over 23,000 observations have been carried out, mostly by students, for both education and research objectives.
2.2. Green Bank Observatory 40Foot
The 40foot diameter telescope (see Figure 6) was obtained in 1961 to determine if several bright radio sources varied on daily timescales.
Purchased from Antenna Systems, Inc., this relatively inexpensive ($40K) meridiantransit telescope was prefabricated in Hingham, Massachusetts, and assembled at the Green Bank site in two days.
The reflecting surface is aluminum mesh, similar to that of Green Bank’s former 300foot diameter telescope, but with smaller openings. The reflector backup structure is also aluminum, with a supporting pedestal of galvanized steel. The telescope’s maximum slew speed is 20 degrees per minute along the local meridian, and it can point far enough south to observe the Galactic center, and as far north as a few degrees beyond the celestial pole.
The original receivers and feeds were built onsite, and operated at 750 and 1400 MHz. Backend electronics are housed in an adjacent, underground bunker, to prevent radiofrequency noise from the electronics from interfering with celestial signals.
The original electronics automatically repointed the telescope to the declination of each source sufficiently before it transited the meridian each day. Data were digitized and written out to punched paper tape. Put into operation in 1962, the 40foot is thought to have been the world’s first fully automated telescope (Bowyer & Meredith 1963; Lockman, Ghigo, & Balser 2007).
This project ran for several years, but was abandoned because none of the sources varied on daily timescales. Afterward, they were instead observed less frequently, but with higher precision, using Green Bank’s 300 and 140foot diameter telescopes.
The 40foot fell into disuse for about twenty years. But in 1987, it was refurbished as an educational resource, for students from 5th grade through graduate school to learn basic observational radio astronomy. The feed was replaced with one that had been used on Green Bank’s 85foot diameter Tatel telescope for Project OZMA, sensitive to 18 – 21 cm. A new receiver was built at the National Radio Astronomy Observatory’s Central Development Laboratory in Charlottesville, Virginia. Spectralline capability was added using a synthesizer (which has since been upgraded, see below) and narrowband filters that were inherited from Green Bank’s 300foot diameter telescope after its collapse in 1988.
Over the past thirty years, several generations of students, numbering in the high thousands, have used the 40foot as part of their science education. For pedagogical reasons, observing with the 40foot is now almost completely manual. Students set analog dials and switches, and annotate and measure their data on a chart recorder. When taking spectra, of the spinflip transition of neutral hydrogen, students step through each frequency by hand.
That said, a few users have developed their own digitization and recording systems for the 40foot, which they connect in parallel with the chart recorder. Most notably, a group that is now based at UNCCH, and now affiliated with the Skynet Robotic Telescope Network, did this in 1991, and has been collecting data, with large groups of students, for one week each summer since then. We make use of these data throughout this paper, alongside higherquality data that we have more recently collected with the 20meter: The 40foot data are useful here specifically because they are lower quality, and consequently more rigorously test our algorithms.
That said, a subset of these data have proven to be of sufficiently high quality to help constrain the fading history of the Cassiopeia A supernova remnant (Reichart & Stephens 2000; Trotter et al. 2017). And over the past 2 – 3 years, coaxial cables from the front end have been replaced with fiber optics, and newer synthesizers, mixers, and amplifiers have been installed, making the system more reliable and robust, and eliminating some RFI.
3. Mapping SmallScale Structures with Continuum Observations
In this section, we present our algorithm for contaminantcleaning and mapping smallscale structures (e.g., sources) from continuum observations. In §3.1, we calibrate each of a telescope’s polarization channels against, typically very small, variations in gain. In §3.2, we measure pointtopoint variations in signal, which can be related to the noise level of the data along each scan, at least on the sampling timescale. In §3.3, we make use of this noise measurement to separate smallscale structures, both astronomical and shortduration RFI, from largescale structures (astronomical, enroute drift, longduration RFI, and elevationdependent signal) along each scan. In §3.4, we crosscorrelate consecutive scans to measure and correct for any time delay between coordinate and signal measurements. In §3.5, we measure scantoscan variations in signal, which can be related to the noise level of the data across each scan. In §3.6, we make use of this noise measurement to separate astronomical smallscale structures from RFI, both along and across scans. In §3.7, we present our surfacemodeling algorithm, which interpolates between the contaminantcleaned data, without blurring these data beyond instrumental resolution (§1.2.2). A flowchart of the entire algorithm can be found in Figure 7.
3.1. Gain Calibration
Signal measurements can be calibrated against variations in gain using a noise diode in the receiver. By switching the diode on and off, preferably with the telescope tracking (not scanning) the sky, so astronomical signal does not also change, contemporaneous measurements can be calibrated by dividing by the difference, , between the on and off levels.
This common practice, however, can be contaminated by outlying measurements, due to catching the diode in transition, due to RFI, etc. Consequently, we apply robust Chauvenet rejection (§1.3) when measuring each level.^{2}^{2}2We model each level with a line, in case the telescope is not tracking, or cannot track (as in the case of the 40foot; see Figure 8). We fit this model to the data, simultaneously rejecting outliers, as described in §8 of Maples et al. 2018, using iterative bulk rejection followed by iterative individual rejection (using the generalized mode brokenline deviation technique, followed by the generalized median 68.3%value deviation technique, followed by the generalized mean standard deviation technique), using the smaller of the low and high onesided deviation measurements. Data are weighted by the number of dumps that compose each measurement. Each fitted line is evaluated at the dumpweighted mean time of the nonrejected diodeon and off measurements; is their difference at this time. We demonstrate this with lowerquality 40foot data in Figure 8.
In practice, both the 20meter’s and the 40foot’s gains vary negligibly over the timescale of an observation. Consequently, we calibrate only at the beginning and end of observations, instead of more frequently, say, between scans. Users may then select the first calibration, , the second calibration, , or a linear interpolation between them:
(2) 
We calibrate each polarization channel separately, but then process three maps, one for each polarization channel and one in which we average the two channels’ calibrated data first (after multiplying each by appropriate fluxdensity conversion factors, if this information is available^{3}^{3}3To convert from gaincalibration units to Janskys per pixel, we will soon be implementing automatic, dense mappings of primary calibration sources Cyg A, Tau A, and Vir A, multiple times per day. These will be automatically processed (§3) and photometered (see §4), and used to determine timestamped conversion factors, from gaincalibration units to Janskys per beam, using the fluxdensity calibrations of Trotter et al. 2017. For any observation, the most recently acquired conversion factors, one for each polarization channel, will be written into the raw data file’s header, and will be converted to Janskys per pixel, and applied, when these data are processed into maps (since a map’s pixel density is userconfigurable; see §3.7). If for whatever reason the necessary conversion factor is, or one or both of the necessary conversion factors are, not available, we will leave the maps in gaincalibration units, where one corresponds to the noise diode. All maps in this paper are in gaincalibration units.).
3.2. 1D Noise Measurement
Key to the next section will be a measurement of the standard deviation of the nowcalibrated data on the smallest scales available, along each scan. We refer to this as the 1D “noise” level of the data, and we measure it directly from the data, using the following technique.
For each nonrejected point, we draw a line between the immediately preceding and proceeding nonrejected points, and measure the central point’s deviation from this line (see Figure 9). For each scan, we robustChauvenet reject the point with the most discrepant deviation, update the scan’s deviations, and repeat until all outliers are rejected.^{4}^{4}4We reject outliers as described in §4 and §6 of Maples et al. 2018, using iterative individual rejection (using the median brokenline deviation technique, followed by the median 68.3%value deviation technique, followed by the mean standard deviation technique), using twosided deviation measurements. Unlike in §3.1, we do not precede this with iterative bulk rejection, because deviations change, and must be updated, after each rejection. Deviations are weighted by , where is the number of dumps that compose the central point and is the corresponding weight of the line at , the angular distance of the central point along the scan. Here, (from Equations 29 and 30 of Maples et al. 2018, and standard propagation of uncertainties) and (the weighted average; see §8.3.5 of Maples et al. 2018). Outliers are typically contaminated by residual signal from bright astronomical sources or RFI. The postrejection mean deviation is approximately zero and the postrejection standard deviation is what we call the “pointtopoint” noise measurement.
We calibrate this technique by applying it to simulated, Gaussian random noise, of known standard deviation. We find that the pointtopoint technique overestimates the noise’s true standard deviation by 22.0%. We correct each scan’s noise measurement accordingly.
Finally, we combine all of the scans’ noise measurements into a single model for the entire observation, additionally allowing for a gradual change in the noise level over the course of the observation: We fit a line to these data, again robustChauvenet rejecting outliers (if even necessary; Figure 10).^{5}^{5}5We fit this model to the data, simultaneously rejecting outliers, as described in §8 of Maples et al. 2018, using iterative bulk rejection followed by iterative individual rejection (using the generalized mode 68.3%value deviation technique, followed by the generalized median 68.3%value deviation technique, followed by the generalized mean standard deviation technique), using the smaller of the low and high onesided deviation measurements. Data are weighted by the number of nonrejected dumps that compose each scan. The final product is a noise model for each signal measurement in the observation.
3.3. 1D Background Subtraction
Now that we have a model for the noise level at each point, we make use of it to separate smallscale structures from largescale structures. Specifically, we separate (1) smallscale astronomical structures (e.g., sources) and (2) shortduration RFI from (1) largescale astronomical structures, (2) longduration RFI, (3) enroute drift, and (4) elevationdependent signal. Since RFI and enroute drift are 1D structures, varying along the scans, we begin by modeling and subtracting off structures, larger than a userdefined scale, along the scans only. We refer to this as 1D background subtraction.
This userdefined scale should be larger than the scale over which the telescope’s beam pattern distributes signal for bright point sources. We have determined minimum recommended values for the telescopes and receivers of §2, empirically, by increasing this scale until the measured brightness (see §4) of bright sources, observed at multiple parallactic angles, plateaued. We list these values in Table 1.
Given this scale, a simple approach to 1D background subtraction, which we do not use here, but improve upon here, is to draw a line from each point to another point within one scale length, say, in the forward direction, such that all other points within this scale length are above the line (see Figure 11). This is then repeated for each point, but in the backward direction. The minimum of all of the resulting linear, local background models is a nonlinear, nonparameterized, global background model that fits snuggly beneath the data.
This algorithm, however, is sensitive to negative noise fluctuations, and really is appropriate only in the limit of noiseless data. Consequently, we now modify this algorithm to take into account the noise level of the data such that the final, global model passes through the middle of the backgroundlevel data, instead of below it.
We continue to anchor the local background model, in the simplest case a straight line (but see below), to a point at the beginning or end of its scalelength domain. We then fit this model to all of the points within this domain and calculate the standard deviation of these points about the model.^{6}^{6}6Data are again weighted by the number of dumps that compose each signal measurement. If this standard deviation is greater than the noise level, as measured by the noise model from §3.2, we reject the greatest positive outlier and refit. We repeat this process until the standard deviation of the nonrejected points is consistent with the noise model (see Figure 12, top panel).^{7}^{7}7Here, we are making an assumption, that the noise is Gaussian. This is of course true on any single timescale, such as the sampling timescale of the data. However, this may not be true across all timescales, if in the regime (i.e., if the noise is pink, instead of white). In this case, our pointtopoint noise model is an underestimate, and longertimescale noise will manifest itself as signal, in the form of enroute drift (§1.2.3). That said, enroute drift is very successfully subtracted out by our backgroundsubtraction algorithm (see §3.2.3), and what is not removed in background subtraction is removed by our RFIsubtraction algorithm in §3.6.
We then reject the anchor point and refit to the nonrejected points. This results in a slightly lower standard deviation, so, selected from one point below the domain of the nonrejected points to one point above, not to exceed the original domain of the points, we iteratively add the least outlying rejected point back in and refit until the standard deviation of the nonrejected points is again consistent with the noise model. This results in a final local background model, spanning the domain of the nonrejected points. We repeat this process for every point in the scan, in both directions, resulting in a collection of final local background models, each of which goes through the middle of the nonrejected data in its domain, instead of below it (Figure 12, middle panel).
Finally, we construct the global background model from the local background models, not by taking the minimum at each point, but by taking the mean, after robust Chauvenet rejection of outliers (Figure 12, bottom panel).^{8}^{8}8We reject outliers as described in §4 – §6 of Maples et al. 2018, using iterative bulk rejection followed by iterative individual rejection (using the mode brokenline deviation technique, followed by the median 68.3%value deviation technique, followed by the mean standard deviation technique), using the smaller of the low and high onesided deviation measurements. Data are weighted as described below.^{9}^{9}9In the occasional event that a point has no local background models associated with it, we complete the global background model by linearly interpolating between the immediately preceding and proceeding points for which the global background model was determined. When rejecting outliers, and when computing the postrejection mean, we weight each point (1) by the number of nonrejected dumps that contributed to its local background model, and (2) by its position in its local background model, since fitted models are better constrained near the fitted data’s center, vs. its end points:
(3) 
where is the number of nonrejected dumps that contributed to the th local background model, is the weight of the th point from the th local background model, is the angular distance of this point along the scan, is the dumpweighted mean angular distance of all of the nonrejected points from the th local background model, is the dumpweighted standard deviation of these values, is analogous to standard deviation, and is related to these values’ kurtosis:
(4) 
and is zero for linear local background models and one for quadratic local background models (analogous terms can be added for higherorder local background models). We justify Equation 3 in Appendix A.
Higherorder local background models results in more flexible global background models, and sometimes additional flexibility is needed. However, too high of an order can result in global background models that are too flexible, performing poorly around sources. Fortunately, one does not have to go to too high of an order to find a good solution: In this paper, we use quadratic local background models (e.g., Figure 12, bottom panel), which result in global background models that are sufficiently flexible to subtract off most largescale signal (see §3.3.3, §3.3.4), but are also sufficiently robust to make at most noiselevel errors around sources (see §3.3.2). (That said, linear local background models perform marginally better when using a backgroundsubtraction scale that is near the minimum recommended value; Table 1.)
3.3.1 Simulation: Gaussian Random Noise
We test this algorithm by applying it to simulated data of increasing complexity. We begin by applying it to just Gaussian random noise (see Figure 13), to evaluate its performance in the absence of small or largescale structures. We backgroundsubtract these data on 6, 12, and 24beamwidth scales (see Figure 14, top row). Residuals are presented in the bottom row of Figure 14. We find that: (1) The backgroundsubtracted data are not biased high nor low; and (2) The noise level of the backgroundsubtracted data is nearly that of the original data, and the RMS of the residuals is much less than the noise level of the original data (Figure 14).
How nearly and how much less is a measure of the quality of the background subtraction, which at any point depends (1) on the number of local background models that contributed to point ’s global background model, and (2) on the weights, , of these local background models at this point. For example, the RMS of the residuals is greater in the smaller backgroundsubtraction scale maps, and near the ends of scans, where fewer, and lower weight, local background models inform each point’s background subtraction.
In Figure 15, we plot these backgroundsubtracted data (top panel) and residuals (bottom panel) vs. the sum of the weights of the local background models that contributed to each point’s global background model:
(5) 
We supplement these data with 1 and 3beamwidth backgroundsubtraction scale data, and with analogous data from a lowerdensity, 1/5beamwidth raster. We find that the noise level of the backgroundsubtracted data is wellmodeled by:
(6) 
and that the RMS of the residuals is wellmodeled by:
(7) 
relatively independently of choice of backgroundsubtraction scale and of choice of sampling density. The noise level of the backgroundsubtracted data is less than that of the original data, because some of the original data’s variability is being incorporated into what is being subtracted off, though it approaches that of the original data for moderate and large values of . Likewise, the RMS of the residuals decreases as more information is incorporated, but to a limit. Finally, for all values of , which means that additional variability is not being introduced by the backgroundsubtraction technique itself.
In summary, backgroundsubtraction errors in the absence of small or largescale structures range from only 47% of the original noise level for small values of to only 13% of the original noise level for large values of . Furthermore, these are random errors, biased neither high nor low, reducing the noise level of the backgroundsubtracted data by only 30% for very small values of to only 2% for moderate and large values of .
3.3.2 Simulation: SmallScale Structures
Next, we add simulated point sources and shortduration RFI to our simulated noise (see Figure 16). We again backgroundsubtract these data on 6, 12, and 24beamwidth scales (see Figure 17, top row). We present residuals in the middle row of Figure 17, but here we have additionally subtracted off the residuals from the bottom row of Figure 14, to help distinguish residuals that are due to the new, smallscale structures from those that are due to the Gaussian random noise. There are three categories of residuals due to the new structures:
1. Backgroundsubtracted data are biased low in the vicinity of smallscale structures (whether point sources or shortduration RFI), but (with two exceptions, which we address in points 2 and 3, respectively) this bias is at or below the noise level, and furthermore is independent of the brightness of the proximal smallscale structure. For example, the 1000 peakS/N source near the center of the image is as biased, and as negligibly biased, as the 10 peakS/N sources across the image.
The biased regions are rectangular, centered on the smallscale structures, and the length of the backgroundsubtraction scale in the scan direction. The bias level is greatest in the center of these regions, with peak values roughly given by:
(8) 
Even though this is a systematic error, vs. a random error, it is at a sufficiently low level that it can be ignored. Furthermore, in the more realistic case of a lesswinged beam function (e.g., see Figure 31), this bias is smaller, by a factor of up to 2 – 3 (Figure 17, bottom row). Backgroundsubtraction bias is further mitigated by our RFIsubtraction algorithm in §3.6.2, at least in the regions around sources, and by our largescale structure algorithm in Paper II, which adds oversubtracted signal back in.
2. This bias can be larger when smallscale structures are blended together, resulting in a structure that is larger than the backgroundsubtraction scale in the scan direction. For example, the two sources toward the right of the image are marginally blended in the scan direction (the low point between them, in Figure 16, is 1.5 times the noise level). The 6beamwidth backgroundsubtraction scale is smaller than the blended structure, resulting in residuals that are, in this case, 3 times the noise level (in the middle row, but much less in the bottom row). However, the 12 and 24beamwidth backgroundsubtraction scales are larger than the blended structure, resulting in typical, subnoise level residuals.
3. Finally, this bias can also be larger when smallscale structures occur within 1 – 2 beamwidths of the ends of scans, since there is less/no data on the other side of the structure to inform background models. For example, the integrated brightness of the source in the lower left of Figure 17 is underestimated by 15% (in the middle row, but, again, by less in the bottom row). This is a known deficiency of such approaches, but one that affects only the edges of maps where the telescope stops and reverses direction. As such, we give the user the option to clip these data before surfacemodeling (§1.2.1, see §3.7), if desired.
3.3.3 Simulation: 1D LargeScale Structures
Next, we add simulated enroute drift and longduration RFI (see Figure 18). We again backgroundsubtract these data on 6, 12, and 24beamwidth scales (see Figure 19, top row). We present residuals in the bottom row of Figure 19, but here we have additionally subtracted off the residuals from the bottom and middle rows of Figures 14 and 17, to help distinguish residuals that are due to the new, 1D largescale structures from those that are due to the Gaussian random noise and the smallscale structures, respectively.
Since we background subtract in the same 1D in which enroute drift and longduration RFI occur, along the scans, we are effective at reducing these structures, especially when the backgroundsubtraction scale is smaller than the scale over which these structures are varying (roughly 12 beamwidths in this simulation). We find that enroute drift and longduration RFI are reduced by factors of 3 and 5, to 96% of and 53 times the noise level, respectively, when backgroundsubtracted on double this scale (24 beamwidths); by factors of 63 and 630, to 5% and 39% of the noise level, respectively, when backgroundsubtracted on half of this scale (6 beamwidths); and by even greater factors when backgroundsubtracted on even smaller scales (see Figure 20). These gains are furthered, and significantly, by our RFIsubtraction algorithm in §3.6.3.
3.3.4 Simulation: 2D LargeScale Structures
Next, we add simulated largescale structures, including elevationdependent signal (see Figure 21). We again backgroundsubtract these data on 6, 12, and 24beamwidth scales (see Figure 22, top row). We present residuals in the bottom row of Figure 22, but here we have additionally subtracted off the residuals from the bottom, middle, and bottom rows of Figures 14, 17, and 19, to help distinguish residuals that are due to the new, 2D largescale structures from those that are due to the Gaussian random noise, the smallscale structures, and the 1D largescale structures, respectively.
We are also effective at reducing these structures, especially when the backgroundsubtraction scale is smaller than the scale over which these structures are varying (in this case, roughly 12 and 24 beamwidths, respectively). We find that largescale astronomical and elevationdependent signal is reduced by factors of 26 and 670, to 3 times and 18% of the noise level, respectively, when backgroundsubtracted on the scale of the map (24 beamwidths); by factors of 830 and 3400, to 11% and 4% of the noise level, respectively, when backgroundsubtracted on the 6beamwidth scale; and by even greater factors when backgroundsubtracted on even smaller scales (Figure 20). These gains are furthered by our RFIsubtraction algorithm in §3.6.4.
3.3.5 20Meter and 40Foot Data
We now apply this algorithm to real data. First, we apply it to the 20meter Lband raster from Figure 5, using small and large backgroundsubtraction scales (see Figure 23). The 1D and 2D largescale structures are significantly reduced in both cases, and will be further reduced by our RFIsubtraction algorithm in §3.6.5 (see Figure 38).
Next, we apply the algorithm to two nodding maps of the same region of the sky, taken with the 40foot, two days apart. This is a good test of the algorithm in that the signal stability of the 40foot is not nearly as good as that of the 20meter, as can be seen in the raw data in the top row of Figure 24. Despite this, the two backgroundsubtracted maps are the same, to the noise level (Figure 24, bottom row).
Finally, we apply the algorithm to a daisy map, of 3C 84, taken with the 20meter in X band, to demonstrate its application to a nonrectangular mapping pattern (see Figure 25). Each slew of the telescope across the diameter of the observing region is treated as a separate scan (see Figure 49).
3.4. TimeDelay Correction
In the case of the 20meter, signal is integrated over a userdefined time, and coordinate information is recorded at the midpoint of this integration time. However, with the 40foot, signal is run through an RC filter, with a userdefined time constant, typically 0.1 seconds, but signal and coordinate values are sampled simultaneously, resulting in an effective delay between the two due to the time constant. This results in alternating coordinate errors in alternating scans (see Figure 26, left panel). Even with the 20meter, the same can happen if the signal and coordinate computers’ clocks become unsynchronized (see Figure 27, left panel).
To correct for this, in the case of the 40foot, and to check for this, and correct if necessary, in the case of the 20meter, we crosscorrelate adjacent scans. The position of the maximum value of the crosscorrelation gives the best angular shift between the scans, to at least 1/30 beamwidths, and the square root of this maximum value gives a weight. If the scans intersect a source, the best angular shift is well defined, and this weight is correspondingly high. If they intersect only noise, the best angular shift is not well defined and this weight is low. For all adjacent pairs of scans in an observation, we measure the best angular shift, reject outliers (in two steps, see Figure 28), and take the mean of the remaining values. Half of this value is how much each scan is misaligned, on the whole, in alternating directions.
However, the data are not misaligned in angle, but in time. Consequently, we divide this angular shift by the mean slew speed of the telescope, measured from the data, again robustChauvenet rejecting outliers (corresponding primarily to periods of acceleration near the beginnings of scans).^{14}^{14}14We reject outliers as described in §4 – §6 of Maples et al. 2018, using iterative bulk rejection followed by iterative individual rejection (using the mode brokenline deviation technique, followed by the median 68.3%value deviation technique, followed by the mean standard deviation technique), using the smaller of the low and high onesided deviation measurements. Data are weighted equally.^{15}^{15}15In the case of daisies, the telescope’s slew speed is variable, and maximum when the telescope passes through the source at, or nearly at, the center of the daisy. Since this source will dominate the angular shift measurement, instead of dividing by the mean slew speed of the observation, we divide by the mean of each scan’s maximum slew speed. We then take this time and shift each signal measurement, interpolating the telescope’s coordinate values accordingly. The results can be seen in the right panels of Figures 26 and 27.
If any timedelay correction is required, it is best to do this after background subtraction, so the crosscorrelation is dominated by sources, instead of by instrumental effects (e.g., Figure 24). It is also best to do this before RFI subtraction (§3.5, §3.6), which would treat misaligned sources as regions of at least partial RFI contamination.
3.5. 2D Noise Measurement
Key to RFI subtraction in the next section will be a remeasurement of the standard deviation of the nowbackground subtracted data on the smallest scales available (§3.2), but this time across each scan instead of along each scan. We refer to this as the 2D noise level of the data, and we measure it directly from the data, using the following technique.
For each point, we draw a line between the point with the most similar position along the preceding scan to that along the proceeding scan, and measure the central point’s deviation from this line (see Figure 29). For each scan, we robustChauvenet reject the point with the most discrepant deviation, update the scan’s deviations, and repeat until all outliers are rejected.^{16}^{16}16We reject outliers as described in §4 and §6 of Maples et al. 2018, using iterative bulk rejection followed by iterative individual rejection (using the mode brokenline deviation technique, followed by the median 68.3%value deviation technique, followed by the mean standard deviation technique), using the smaller of the low and high onesided deviation measurements. As in §3.2, we do not precede this with iterative bulk rejection, because deviations change, and must be updated, after each rejection. Deviations are again weighted by , where is the number of dumps that compose the central point and is the corresponding weight of the line at , the angular distance of the central point, in this case, across the scan (see Footnote 10). Outliers are typically contaminated by RFI, as well as by residual signal from bright astronomical sources. The sum in quadrature of the postrejection mean deviation (the systematic component of the deviation) and the postrejection standard deviation (the random component of the deviation) is what we call the “scantoscan” noise measurement.
As in §3.2, we calibrate this technique by applying it to simulated, Gaussian random noise, of known standard deviation. We find that the scantoscan technique overestimates the noise’s true standard deviation by 22.9%. We correct each scan’s noise measurement accordingly.
Finally, we combine all of the scans’ noise measurements into a single model for the entire observation, again, allowing for a gradual change in the noise level over the course of the observation: As in Figure 10, we fit a line to these data, robustChauvenet rejecting outliers (see Figure 30).^{17}^{17}17We fit this model to the data, simultaneously rejecting outliers, as described in §8 of Maples et al. 2018, using iterative bulk rejection followed by iterative individual rejection (using the generalized mode 68.3%value deviation technique, followed by the generalized median 68.3%value deviation technique, followed by the generalized mean standard deviation technique), using the smaller of the low and high onesided deviation measurements. Data are weighted by the number of nonrejected dumps that compose each scan. The final product is a new noise model for each signal measurement in the observation.
3.6. 2D RFI Subtraction
Our algorithm for subtracting RFI, given continuum observations (or given spectral observations but in the case of broadband RFI), is similar to our algorithm for subtracting unwanted backgrounds. As in §3.3, we fit a collection of local models to the data, from which we construct a global model, but with three key differences:
1. In §3.3, we separated subbackground subtraction scale structures from larger structures, along the scans. In this section, we separate subbeamwidth structures from larger structures. RFI can be smaller or larger than this scale along the scans, depending on its duration and the slew speed of the telescope. But unless it coincidentally repeats at the same position along the scan, scan after scan, it will almost always be smaller than this scale across scans (e.g., Figures 5, 16, and 18), as will any residual enroute drift (e.g., Figures 19 and 23). Consequently, instead of modeling the data in 1D, as we did in §3.3, we now model the data in 2D (making use of our 2D noise measurement; §3.5).
2. We use the following, twoparameter local model (instead of a threeparameter quadratic; §3.3):
(9) 
where is the 2D angular distance from the model’s center, is a userdefined RFIsubtraction scale, and , the first of the two fitted parameters, normalizes the function in the first term, and , the other fitted parameter, adds a small, local, background value. This value should be small because the data are already background subtracted, but is allowed to be nonzero because background subtraction cannot be done perfectly. We use the function in the first term because, when beamwidth, it mimics a point source, but with shorter wings (see Figure 31). Given this, by setting to just under the true FWHM of a telescope’s beam pattern (see below), it can be used to separate astronomical signal from even marginally subbeamwidth structures, in the following way:
Centered on each point in the observation, and on additional locations around the peaks of highS/N sources (see below), we fit this model to all points within and calculate the standard deviation of these points about the model. If this standard deviation is greater than the noise level, as measured by the noise model from §3.5, we reject the greatest positive outlier (or the greatest outlier, positive or negative, if ),^{18}^{18}18Although RFI will always be positive, we do this such that noisedominated regions, which have both positive and negative outliers, albeit only by chance and hence only occasionally, are treated in a relatively unbiased way (see §3.6.1). and refit. We repeat this process until the standard deviation of the nonrejected points is consistent with the noise model (see Figure 32, first panel, for 1D crosssection, and Figure 33, top panel, for 2D zoomin).
This fit can be done analytically (see Appendix B). Furthermore, since is independent of and beyond , only (nonrejected) points within need be included in the fit, streamlining the computation.
This results in a single local model, defined only at its nonrejected points (again, Figure 32, first panel, and, Figure 33, top panel). We repeat this process, centering the model on each point in the observation, and on additional locations around the peaks of highS/N sources,^{19}^{19}19For observation points on the sides of sufficiently high S/N sources, where the slope is steepest, to not be rejected, a denser sampling of localmodel center locations is required. Let be whichever is smaller, the mean spacing of points along, or across, scans, measured as a fraction of the telescope’s beamwidth. It is not difficult to show that for an approximately Gaussian local model to be tangent to an approximately Gaussian source function at any observation point along the source function, out to, say, beamwidth, additional localmodel center locations must be added within beamwidths of the source center, if beamwidths. It is also not difficult to show that a grid of spacing results in approximately one center location for every observation point within beamwidth, which we have found to be sufficient. Note, we have also found that these additional center locations are necessary only for the highestS/N sources, specifically for sources with a peak S/N greater than about a hundred (we add them wherever peakS/N 75, using a centroiding algorithm to determine peak locations). For 20meter and 40foot observations, this corresponds to only a handful of sources, the brightest in the sky. resulting in a collection of local models, each of which goes through the middle of its nonrejected points (Figure 32, second panel).
3. As in §3.3, we construct the global model from the local models by taking the mean of the local models at each point, after robustChauvenet rejecting outliers (Figure 32, third panel).^{20}^{20}20We reject outliers as described in §4 – §6 of Maples et al. 2018, using iterative bulk rejection followed by iterative individual rejection (using the mode brokenline deviation technique, followed by the median 68.3%value deviation technique, followed by the mean standard deviation technique), using the smaller of the low and high onesided deviation measurements. Localmodel values are weighted as described in Appendix C. The weight of the resulting globalmodel value is then given by summing the weights of the nonrejected localmodel values, but not before dividing each of these weights by a number that, at least approximately, corrects for the nonindependence of that particular weight’s localmodel value, over a scale that is related to the RFIsubtraction scale (see Appendix D). This is the weighted number of dumps that contributed to the globalmodel value, which we make use of again in §3.7.2. However, if a point has no local models associated with it, instead of interpolating between surrounding global model values as we do in §3.3, we simply excise the point from the observation and leave it to the surfacemodeling algorithm (§1.2.2, see §3.7) to fill the additional gap (Figure 32, fourth and fifth panels, and Figure 33, middle and bottom panels).
Telescope  Receiver  Scale  

Left or Right  Left Right  
Channel  Channel  
20meter  L (HI OH)^{a}^{a}Before August 1, 2014  0.8  0.9 
20meter  L (HI)^{b}^{b}After August 1, 2014  0.7  0.8 
20meter  L (OH)^{b}^{b}After August 1, 2014  0.9  1.1 
20meter  X  0.8  0.8 
40foot  L (HI)  0.7  0.7 
Also unlike in §3.3, we do not subtract the resulting global model from the data. Rather, the global model is the RFIsubtracted result. Furthermore, given that this is a modeled version of the data, incorporating information from, usually many, nearby points at each point, it is a significantly smoother (but not additionally blurred; §1.2.2) version of the data (see §3.6.1 and §3.6.2).
In theory, the RFIsubtraction scale, , need only be marginally less than the true FWHM of the beam pattern. However, in practice, the beam pattern might be more peaked than Equation 9 with , or might be asymmetric, being more compact along one axis. In such cases, one must use a smaller RFIsubtraction scale, or astronomical signal will be mistaken for RFI and removed. Given this, we have determined maximum recommended values for the telescopes and receivers of §2, empirically, by decreasing this scale until the measured brightness (see §4) of bright sources, observed at multiple parallactic angles, plateaued. We list these values in Table 2.
3.6.1 Simulation: Gaussian Random Noise
As in Â§3.3, we test this algorithm by applying it to simulated data of increasing complexity. We begin by applying it to the backgroundsubtracted Gaussian random noise from the top row of Figure 14, to evaluate its performance in the absence of small or largescale structures. We RFIsubtract these data on a nearfull beamwidth scale (0.95 beamwidths; see Figure 34, top row), as well as on a smaller, halfbeamwidth scale (Figure 34, bottom row), for comparison. We find that: (1) The RFIsubtracted data are not biased high nor low; and (2) The noise level of the RFIsubtracted data is indeed significantly less than the noise level of the original, backgroundsubtracted data, having incorporated all of the information within 0.95 and 0.5beamwidth radius regions, respectively, about each point (Figure 34).
3.6.2 Simulation: SmallScale Structures
Next, we apply our RFIsubtraction algorithm, adopting the 0.95beamwidth scale, to the backgroundsubtracted data from the top row of Figure 17, which includes simulated point sources and shortduration RFI (see Figure 35, top row). We present residuals in the middle row of Figure 35, but here we have additionally subtracted off the Gaussian random noise residuals from the top row of Figure 34, to help distinguish residuals that are due to the smallscale structures from those that are due to the Gaussian random noise.
We find that the shortduration RFI, which was reduced only marginally by background subtraction, is now reduced by a factor of 19,000, to 3% of the noise level, when backgroundsubtracted on the scale of the map (24 beamwidths), and is reduced to immeasurable levels when backgroundsubtracted on the 6beamwidth scale, as well as on smaller scales (Figure 20).
We also find that the source residuals, compared to their postbackground subtraction/preRFI subtraction values from Figure 17, (1) are reduced from subnoise to completely negligible levels beyond where the sources intersect the noise level, (2) are increased slightly at these boundaries, but (3) are otherwise fairly consistent with the three categories of postbackground subtraction residuals that we identified in §3.3.2, within these boundaries. As such, this bias is again smaller by a factor of 2 – 3 in the more realistic case of a lesswinged beam function (Figure 35, bottom row).
3.6.3 Simulation: 1D LargeScale Structures
Next, we apply our RFIsubtraction algorithm, again adopting a 0.95beamwidth scale, to the backgroundsubtracted data from the top row of Figure 19, which includes residual enroute drift and longduration RFI (see Figure 36, top row). We present postRFI subtraction residuals in the bottom row of Figure 36, but here we have additionally subtracted off the residuals from the top and middle rows of Figures 34 and 35, to help distinguish residuals that are due to the 1D largescale structures from those that are due to the Gaussian random noise and the smallscale structures, respectively.
Beyond where the sources intersect the noise level, we find that the longduration RFI is now reduced by a combined (background and RFI subtraction) factor of 19,000, to 1% of the noise level, when backgroundsubtracted on the scale of the map (24 beamwidths), and is reduced to immeasurable levels when backgroundsubtracted on the 6beamwidth scale, as well as on smaller scales (Figure 20).
Also beyond where the sources intersect the noise level, we find that the enroute drift is now reduced by a combined factor of 20, to 16% of the noise level, when backgroundsubtracted on the scale of the map; by a factor of 730, to 0.4% of the noise level, when backgroundsubtracted on the 6beamwidth scale; and by even greater factors when backgroundsubtracted on even smaller scales (Figure 20). For reference, the basketweaving technique of Winkel, Floer & Krauss (2012) (§1.2.3) reduces enroute drift by a factor of 10, and then only under idealized circumstances.
Within these boundaries, the residuals are fairly consistent with the postbackground subtraction residuals of Figure 19. These are biased neither high nor low, and are noiselevel.
3.6.4 Simulation: 2D LargeScale Structures
Next, we apply our RFIsubtraction algorithm, again adopting a 0.95beamwidth scale, to the backgroundsubtracted data from the top row of Figure 22, which includes residual largescale astronomical and elevationdependent signal (see Figure 37, top row). We present postRFI subtraction residuals in the bottom row of Figure 37, but here we have additionally subtracted off the residuals from the top, middle, and bottom rows of Figures 34, 35, and 36, to help distinguish residuals that are due to the 2D largescale structures from those that are due to the Gaussian random noise, the smallscale structures, and the 1D largescale structures, respectively.
Away from the sources, we find that the elevationdependent signal is now reduced by a combined factor of 2800, to 4% of the noise level, when backgroundsubtracted on the scale of the map (24 beamwidths), and is reduced to immeasurable levels when backgroundsubtracted on the 6beamwidth scale, as well as on smaller scales (Figure 20).
Also away from the sources, we find that the largescale astronomical signal is now reduced by a combined factor of 800, to 11% of the noise level, when backgroundsubtracted on the scale of the map; by a factor of 5900, to 2% of the noise level, when backgroundsubtracted on the 6beamwidth scale; and by even greater factors when backgroundsubtracted on even smaller scales (Figure 20).
Within the vicinity of the sources, the residuals are fairly consistent with the postbackground subtraction residuals of Figure 22. On the smaller backgroundsubtraction scales, these are as likely to be biased high as low, and are noise level. However, on the 24beamwidth scale, these remain nonnegligible.
3.6.5 20Meter and 40Foot Data
We now apply this algorithm to real data. First, we apply it to the 20meter Lband raster from Figures 5 (raw) and 23 (backgroundsubtracted on small and large scales; see Figure 38). RFI and enroute drift are eliminated on both scales, as is elevationdependent signal. Largescale astronomical signal is eliminated on the smaller backgroundsubtraction scale and is significantly reduced on the larger backgroundsubtraction scale.
Next, we test the algorithm in the limit of extreme RFI contamination (see Figure 39, topleft panel). The source of the RFI is a broadband transmitter at the Roanoke, VA airport, about 100 miles south of Green Bank. The signal is linearly polarized, affecting only one of the receiver’s two polarization channels. Not only is the RFI almost completely eliminated (Figure 39, topright panel, compared to the other polarization channel, bottomleft panel), our timedelay correction algorithm (§3.5), which must be applied before RFI subtraction, correctly aligns the scans despite the extreme contamination, and despite having only lower signaltonoise sources to inform its measurement. It measures a timedelay of 0.30 seconds, which is nearly the same as the value that it measures from the uncontaminated polarization channel (0.24 seconds).
Note, since this algorithm is independent of the telescope’s mapping pattern (§1.2.2), multiple observations can be combined trivially. After determining each observation’s 2D noise model (§3.5), one simply appends all of the observations’ backgroundsubtracted and timedelay corrected scans, and each scan’s 2D noise model value, as if they were a single observation, and applies the RFIsubtraction algorithm to them jointly.^{21}^{21}21For a sequence of observations centered on a moving object, such as a planet (e.g., see Figure 42), we switch to objectcentered coordinates before appending. Failure to do this can cause the object to be partially, or completely, mistaken for RFI, and eliminated. We demonstrate this in Figure 40, for the two observations of Andromeda from Figure 24 (as well as in the bottomright panel of Figure 39, for the two polarization channels of that observation).
A distinct advantage of this approach is that not just narrow, but broad structures that exist in one, or multiple, mappings, but not others, such as an extended period of RFI, spanning multiple scans, or the sun’s diffraction pattern, if different between different mappings, are eliminated. However, structures that exist in all of the mappings, at the same level, at least within each mapping’s separately measured 2D noise level, survive.
Furthermore, by appending multiple mappings, one can eliminate RFI and similar unwanted structures with a substantially smaller RFIsubtraction scale, since each measurement can be compared to other measurements that were taken at nearly the same sky location, but at different times, instead of only to surrounding measurements, some of which were taken at nearly the same time (and hence may be similarly contaminated). In fact, if enough observations are taken, this scale can be as small as half of the (local, see §3.7) gap in the mapping pattern. This is particularly advantageous when mapping lowS/N sources, since our RFI algorithm underestimates signal near the noise level (Figure 35). In other words, there is a tradeoff between RFI subtraction and lowS/N photometry, but this can be mitigated by taking multiple observations and using a minimal RFIsubtraction scale.^{22}^{22}22Note, this does not apply to two polarization channels acquired simultaneously (e.g., Figure 39), since (unpolarized, or differently polarized) RFI, as well as enroute drift, etc., will likely be correlated between the two channels.^{23}^{23}23Alternatively, or additionally, underestimated photometry can be corrected using an empiricallydetermined expression that we introduce in §4. However, this expression is only an approximation, and consequently performs better the less that is requested of it (e.g., if the RFIsubtraction scale is already small, which appending multiple mappings can facilitate).
We demonstrate both of these advantages with six lowS/N mappings of Jupiter, one of which is significantly contaminated by an extended period of RFI, spanning multiple scans (see Figure 41). In the top row of Figure 42, we append all six of these mappings and jointly RFI subtract them (1) on the maximum recommended RFIsubtraction scale from Table 2 (0.9 beamwidths, left panel), (2) on the minimum recommended RFIsubtraction scale for appended mappings (0.1 beamwidths, since these are 0.2beamwidth rasters, middle panel), and (3) on an RFIsubtraction scale of almost zero (corresponding to almost no RFI subtraction). While RFI from the fourth mapping ruins the result on the 0beamwidth scale, it is completely eliminated on the 0.1beamwidth scale (as well as on larger scales). This can be seen by comparing the top row to the bottom row, where we have repeated the processing, but excluding the contaminated fourth mapping. As expected, the smaller the RFIsubtraction scale, the more nearnoise level signal remains (of course, more noise remains as well).
Yet another advantage of this approach is that it is very efficient computationally (especially when smaller RFIsubtraction scales can be employed). Instead of RFIsubtracting and surfacemodeling (§1.2.2; see §3.7) multiple mappings, and then finding a way to combine the resulting images such that structures that differ from image to image are eliminated, but otherwise, the noise level is reduced, these algorithms need be applied only once. However, each mapping needs to be calibrated to the same level, else data that were calibrated higher will be eliminated in favor of data that were calibrated lower, if separated by more than their respective noise levels. Since all of the observations in Figures 41 and 42 were taken within the same day, and both of the observations in Figures 24 and 40 were taken within two days of each other, gain calibration (§3.1) is sufficient. However, if observations are separated by extended periods of time, or if they are taken from different polarization channels (e.g., Figure 39), the receiver’s noise diodes should be crosscalibrated between these observations, ideally by mapping and photometering (see §4) one or more standard sources, such as Cas A, Cyg A, Tau A, or Vir A (e.g., Baars et al. 1977, Trotter et al. 2017).^{24}^{24}24These calibration sources are appropriate for small, singledish radio telescopes, like the 20meter and the 40foot, but are not necessarily appropriate for larger singledish telescopes, or for interferometers. Also note that Cas A and Tau A are fading, on timescales of years and decades, respectively (e.g., Trotter et al. 2017). However, they can still be used to crosscalibrate observations on shorter timescales.
Otherwise, one need only take care when appending mappings that do not fully coincide. Backgroundsubtracted signal can be underestimated near the ends of scans (§3.3.2), in which case data that were not taken near the end of a scan in one mapping could be eliminated in favor of underestimated data that were taken near the end of a scan in another mapping. We recommend clipping data within 2 beamwidths of the ends of scans before appending such, noncoincident mappings (§3.3.2).
When mapping highS/N point sources, Airy rings are sometimes visible, particularly if the telescope is well focused (see Table 1). However, Airy rings are typically half the width of the pointspread function, and consequently are partially eliminated when the maximum recommended RFIsubtraction scale is used, the first Airy ring in particular (see Figure 43, left panel). If one wishes to retain these structures, one need only halve the RFIsubtraction scale (Figure 43, right panel), though at greater risk of RFI contamination.
Finally, we apply the RFIsubtraction algorithm to the 20meter Xband daisy from Figure 25, to demonstrate its application to a nonrectangular mapping pattern (see Figure 44). Although these data were not contaminated with RFI, residual enroute drift is eliminated.
3.7. 2D SurfaceModeling
In this section, we present our algorithm for “regridding” data into an image. Unlike most algorithms, (1) ours uses weighted modeling instead of weighted averaging, to avoid blurring the image beyond the telescope’s native resolution, and (2) ours can be applied after the data have been contaminantcleaned, instead of having to replace the data with an approximation of itself before contaminantcleaning can begin (§1.2.2). On this note, our algorithm can be applied at any stage of the contaminantcleaning process, if visualization is desired. For example, Figures 3, 5, 13, 16, 18, 21, 24 (top row), and 25 (left panel) are visualizations of raw data; Figures 14, 17, 19, 22, 23, 25 (right panel), 26 (left panel), 27 (left panel), and 39 (topleft panel) are visualizations postbackground subtraction (§3.3); Figures 24 (bottom row), 26 (right panel), 27 (right panel), 33 (top panel), and 41 are visualizations posttimedelay correction (§3.4), and Figures 33 (middle and bottom panels), 34 – 38, 39 (all but topleft panel), 40, 42, 43 and 44 are visualizations postRFI subtraction (§3.6).
Another advantage of this approach is that any pixel density can be selected for these images. Most algorithms have to limit the pixel density to streamline computation, since contaminantcleaning is done on the regridded data, instead of on the original data. But since we contaminantclean prior to regridding, this is not an issue. Our default pixel scale is 1/20th of a beamwidth, but this is userconfigurable.
For each pixel, we fit a flexible surface model to all data that are within 1 beamwidth, weighting data that are closest to the pixel the highest (see below). We evaluate the fitted model only at this pixel, so it need only fit well here (e.g., Figure 4). We have found that thirdorder, 2D polynomials (1) are sufficiently flexible to, pixel by pixel, reproduce most all diffractionlimited structures (e.g., Figure 3),^{25}^{25}25The only exception is at the base of sufficiently high S/N point sources, if the telescope is wellfocused and the pointspread function falls off steeply (i.e., does not have long wings), and especially if the first Airy ring is partially or completely eliminated by the RFIsubtraction algorithm (§3.6.5). In this case, Equation 11, which is the product of two thirdorder polynomials, where we have dropped fourth and higherorder terms, recovers a bit too slowly, causing the pixelbypixel fitted surface to undershoot in this region, albeit only marginally. Normally, this is barely noticeable, unless the final image is viewed with a nonlinear scaling, such as squareroot or hyperbolicarcsine, to emphasize fainter structures (e.g., see Figure 45). Fourthorder polynomials would be better in this case, but there is seldom enough data to constrain (actually, to overconstrain, see Footnotes 28 and 30) this many parameters. Consequently, we instead provide the user with two surfacemodel fitting options. The first is just regular regression for weighted data, again, evaluated at each pixel in the final image. The second imposes a noiselevel prior when the first option yields , suppressing all negative values: (10) (This manifests itself only in the first term of the regression matrix: .) We normally employ this latter option (e.g., Figure 43), unless surfacemodeling (1) preRFI subtraction data, or (2) lowS/N sources, postRFI subtraction (e.g., Figure 42). but (2) use few enough parameters (ten) such that these parameters can be well constrained for most all sampling densities and mapping patterns:^{26}^{26}26In the event of a very sparse mapping, or a very sparse region of a mapping, Equation 11 can, of course, be underdetermined. Consequently, before applying it, we confirm that, within this 1beamwidth radius region, we are fitting to at least ten points, spread over at least five scans, with at least one scan having at least five points. These last two conditions ensure overdetermination along each axis, preventing the possibility of wild oscillations between scans or points. If these conditions are not met, we downgrade to secondorder, 2D polynomials. In this case, we first confirm that we are fitting to at least six points, spread over at least four scans with at least one scan having at least four points (for the same reasons as above). If these conditions are also not met, we again downgrade, to firstorder, 2D polynomials (i.e., a plane). In this case we first confirm that we are fitting to at least three points, spanning at least two scans, with one scan having at least two points (two instead of three, because planes cannot oscillate). Failing this, we excise the pixel from the final image. Note: These downgraded versions of Equation 11 are built in if needed, but are rarely called upon (an exception to this would be in the outskirts of low petalnumber, or oversized, daisies).
(11) 
where is the locally modeled signal, and are angular distances from the central pixel along each coordinate, and are the polynomial coefficients. At this pixel, Equation 11 simplifies to , which streamlines the computation. We repeat this process for all pixels in the image.
It is important that the (radial) weighting function drop to zero at or before this 1beamwidth hard limit, else there will be discontinuities in the modeled surface as the central pixel’s location changes, as individual data points enter and exit the set of data that is being fit to. In the absence of the 1beamwidth hard limit, dropping the weight to zero also has the advantage of limiting the amount of data that each local model is being fit to, again streamlining the computation. For the weighting function, we use a function similar to Equation 9, but raised to a power:
(12) 
where:
(13) 
and where is the userdefined FWHM of the weighting function (see Figure 46). Equation 12 approximates the beam function of the telescope when beamwidth (Figure 31).
Smaller values of favor the data that are closer to the central pixel, resulting in a more accurately modeled value at this pixel. However, smaller values of also reduce the weighted number of data points that contribute to the fit, resulting in a less precisely modeled value at this pixel, and hence a noisier image overall. We have found that Equation 11 is sufficiently flexible to reproduce most all (Footnote 27) diffractionlimited structures, to 99% accuracy, if beamwidths (see Figure 47, top two rows). That said:
1. Unless one is trying to visualize narrow contaminants, such as RFI, enroute drift, etc., there is no reason to use values of beamwidths, because there is no additional diffractionlimited information to be gained on subNyquist scales.
2. Values of beamwidths result in only marginally reduced accuracy, but in significantly greater precision. For example, , , and 1 beamwidths result in only 2%, 4%, and 6% underestimates at the source’s peak, respectively, and these underestimates are almost perfectly compensated by overestimates at the source’s base (Figure 47, top two rows).^{33}^{33}33Compare this to 20%, 30%, and 40% underestimates at the source’s peak, and consequently significantly blurred reconstructions, in the case of weighted averaging (Figure 3). This corresponds to additional blurring of the source, but only by 1/2%, 1%, and 2%, respectively. However, this does result in significantly smoother images (see Figure 48).
3. must be sufficiently large to encompass enough data to constrain, and preferably to overconstrain, Equation 11. Otherwise, the fitted 2D polynomials may vary wildly between the data points (Figure 47, bottom row). To avoid this, we find that should be larger than , where is the largest gap between data points in the vicinity of the central pixel (see below).^{34}^{34}34The factor of ensures that at least five scans are encompassed within an radius circle about the central pixel. This is the minimum number of scans necessary to overconstrain Equation 11 (see Footnote 28); scans beyond this radius carry too little weight to as meaningfully contribute (Figure 46).
Consequently, we recommend that:
(14) 
where beamwidths when trying to visualizing subNyquist scale contaminants, beamwidths when trying to visualize diffractionlimited structures, and we have capped beamwidth, because larger values make the weighting function (Equation 12) increasingly flattopped, and only diminishingly wider. For our postRFI subtraction images of diffractionlimited structures, we have used: (1) beamwidths, when evaluating the performance of various aspects of the algorithm (e.g., Figures 34, 35, 36, 37, 39 – here, everywhere in the image, e.g., see Figure 49, bottomleft panel – 40, 42, 43 and 45), and (2) our default value of beamwidths, when a more polished, final image is desired (e.g., Figures 38 and 44).
Although can be modeled globally, based on the expected mapping pattern, we find it better to measure locally, in case the telescope did not move exactly as expected (e.g., due to acceleration at the beginnings of scans, due to wind, etc.), or if data points are removed by the RFIsubtraction algorithm. To do this, we employ a custom “bubbleblowing” algorithm about each data point (see Figure 49). Generally, this gives values equal to, or slightly greater than, whichever is larger, the gap between the data point and the closest data point in the preceding scan, or the gap between the data point and the closest data point in the proceeding scan. If a data point is on the edge of the mapping pattern, its is infinite, corresponding to a limiting weighting scale of beamwidth (Equation 14). We then determine the weighting scale that should be used at each pixel by interpolating between the weighting scales measured at each data point.^{35}^{35}35We interpolate between these values by employing proximityweighted averaging, where we have selected a weighting function of: (15) because (1) (for ) ensures that the weighted average yields the same value as the data point when , and (2) only data points within one beamwidth need be included in the weighted average, keeping it computationally efficient. The exponent: (16) is chosen such that the area under the weighting function between and approximately matches the area under a Gaussian between and . This is the smallest range that one can use and still get relatively smooth transitions between datapoint values; smaller ranges result in more abrupt transitions, and a tessellated, or patchwork pattern, with one patch per data point.
With rasters, is approximately constant across the interior of the mapping (Figure 49, top right). Consequently, rasters should be designed with gaps, both between and along scans, that are not larger than 3/4 (1/3 – 2/3 beamwidths) 1/4 – 1/2 beamwidths. Our recommendation is 1/5 – 1/3 beamwidths, since OTF mappings usually result in marginally larger measurements, particularly at the ends/beginnings of scans, where the telescope changes direction, and under windy conditions.
With noddings and especially with daisies, increases toward the extremities of the mapping, where the telescope changes direction (Figure 2; Figure 49, bottom row), and consequently this condition cannot always be met. However, as long as there are enough, and sufficient (Footnote 28), data to fit to, this algorithm will still model the surface. Although the accuracy of the reconstruction cannot be guaranteed beyond a largest gap of 0.4 beamwidths, corresponding to Nyquist sampling, undersampled mappings, or undersampled regions of mappings, can still be useful for quicklook, and student, results.
Although we have focused on three mapping patterns in this paper, this surfacemodeling algorithm, as well as our RFIsubtraction algorithm, work regardless of the design and details of the mapping pattern. We demonstrate this in Figure 50, where we have instead processed the topright panel of Figure 49 and the left panel of Figure 38 in Galactic coordinates, even though these data were acquired on a horizontal raster pattern in equatorial coordinates.^{36}^{36}36Our timedelay correction algorithm, as well as our 2D noise measurement algorithm, make use of measurements that were made at, or nearly at, the same position along adjacent scans. This requires a coordinate along the scans, and consequently, this is most easily done in the coordinate system in which the mapping pattern was carried out. Consequently, we keep everything in this coordinate system, be it equatorial, Galactic, etc., through this part of the algorithm. If a different endproduct coordinate system is requested by the user, we convert to it before applying our RFIsubtraction and surfacemodeling algorithms. We currently allow users to configure and execute mapping patterns in either equatorial or Galactic coordinates, and to process their images in either of these coordinate systems, regardless of which was used when acquiring the data. We make use of this in Paper II, where we demonstrate many of these papers’ techniques simultaneously, in the processing of a 100 Xband survey of the Galactic plane.
Finally, we set the edge(s) of the map. Upon entry, we translate each right ascension/Galactic longitude coordinate value such that the map is centered on zero, and then we multiply each translated value by the cosine of its corresponding declination/Galactic latitude coordinate value. This is known as a sinusoidal, or SansonFlamsteed, or Mercator equalarea, projection, which minimizes map distortion not only near the map’s center, but also along the horizontal and vertical lines that pass through the map’s center (Cossin 1570). Furthermore, the 20meter maps in these sinusoidallycorrected coordinates. Consequently, for rasters, their four edges are given by fitting straight lines to the sinusoidallycorrected coordinate values of the first and last scans, and of the highest and lowest (or leftmost and rightmost) points in each scan, rejecting any outliers.^{37}^{37}37We fit this model to the data, simultaneously rejecting outliers, as described in §8 of Maples et al. 2018, using iterative bulk rejection followed by iterative individual rejection (using the generalized mode 68.3%value deviation technique, followed by the generalized median 68.3%value deviation technique, followed by the generalized mean standard deviation technique), using the smaller of the low and high onesided deviation measurements. Data are weighted equally. For daisies, its single, circular edge is given by averaging the sinusoidallycorrected angular distances between the daisy’s center and the most distant point in each petal, again rejecting outliers.^{38}^{38}38We reject outliers as described in §4 – §6 of Maples et al. 2018, using iterative bulk rejection followed by iterative individual rejection (using the mode brokenline deviation technique, followed by the median 68.3%value deviation technique, followed by the mean standard deviation technique), using the smaller of the low and high onesided deviation measurements. Data are weighted equally. The 40foot, on the other hand, is a meridiantransit telescope, and consequently maps in equatorial coordinates without being able to precorrect for map distortion. Hence, for noddings, their four edges are determined as in the raster case, but using the original, precorrected coordinate values. All calculated edges are then carried forward, into whichever final coordinate system the user selects (changing their shape if this is a different coordinate system). We also give the user the option of excising data within, typically two beamwidths, of edges that were fitted to the beginnings and ends of scans (§3.3.2), which can be important when appending mappings that do not fully coincide (§3.6.5).
3.7.1 Application to Asymmetric Structures
We have applied this algorithm to simulated and real data throughout this paper. Most of these applications have been to symmetric structures, such as point sources. However, Equation 11, due to its cross terms, imposes no artificial symmetries, allowing it to equally well model, e.g., asymmetric beam patterns (Figure 3, third row), asymmetric sources (see Figure 51), and asymmetric regions (see Figure 52).
3.7.2 Default Data Products
Upon completion of each 20meter mapping, Skynet automatically produces the following data products:
Raw Maps: We gaincalibrate the data, but otherwise do not process it, except for the application of a surface model so the user can visualize the preprocessed, raw data. For this, we use a minimum weighting scale of beamwidths (Equation 14), to better visualize subbeamwidth contaminants (§3.7). We do this for each polarization channel, and for the average of both channels (after multiplying each by appropriate fluxdensity conversion factors, if this information is available; Footnote 9).
ContaminantCleaned Maps: We fully process the data using our default settings: (1) We gaincalibrate the data (§3.1); (2) We backgroundsubtract the gaincalibrated data, using the minimum recommended scale from Table 1 (§3.3); (3) We timedelay correct the backgroundsubtracted data (§3.4); (4) We apply our RFIsubtraction algorithm to the timedelay corrected data, using the maximum recommended scale from Table 2, or, if the mapping contains a sufficiently high S/N source, using half of this value, to preserve visible Airy rings (§3.6); and (5) We apply our surfacemodeling algorithm to the RFIsubtracted data, using a minimum weighting scale of beamwidths, as well as the noiselevel prior from Footnote 27 (§3.7). Again, we do this for each polarization channel, and for the sum of both channels. (These settings can be changed by the user afterward, and the data reprocessed.)
Path Maps: Path maps plot one point at the coordinates of each signal integration in the OTF mapping, alternating between white and gray for alternating scans (e.g., see Figure 53, topleft panel). We generate two path maps, one with the raw maps, using pretime delay corrected coordinates, and including all of the measurements, and one with the contaminantcleaned maps, using posttime delay corrected coordinates, and including all of the measurements save those that are removed by the RFIsubtraction algorithm.
Path maps are useful for determining if the telescope had difficulty maintaining the desired spacing between scans, due to momentumdriven overshooting when changing directions (in which case a slower, or possibly denser, or simply differently patterned, mapping might be considered), due to wind, or due to motor or encoder error. They are also useful for seeing which measurements, if any, were removed by the RFIsubtraction algorithm. Since our algorithms are fairly insensitive to the design and details of the mapping pattern, or path, these effects should not significantly affect the results, unless they leave too large of a gap in the mapping pattern for the surfacemodeling algorithm to be effective (§3.7).
Scale Maps: Scale maps are calculated from the information in the path maps, and visualize the weighting scale that is used to calculate the surface model, at each pixel in the final image (§3.7). Examples are given in Figure 49 for beamwidths (Equation 14), but often , in which case the scale map, at least in such regions, is simply singlevalued (Figure 53, topright panel).
Scale maps are also important when doing photometry. For most maps, Nyquist sampling (0.4 beamwidths) results in largestgap values of – 0.55 beamwidths, which in turn results in weightingscale values of – 3/4 beamwidths (for beamwidths). Hence, if beamwidths within a photometric aperture, the result should be taken with a grain of salt (§3.7). (This however is not the case within a photometric annulus, as the background need not be sampled on Nyquist scales; see §4).
Weight Maps: Weight maps record the weighted number of data points to which Equation 11 was fitted, for each pixel in the final image. Each data point’s weight is given by (1) the product of (a) the proximitydependent weight that was used in the fit, given by Equations 12 – 14, and (b) the weighted number of dumps that contributed to the data point’s postRFI subtraction value (Footnote 22), (2) divided by a number that, at least approximately, corrects for the nonindependence of this value over a scale that is related to the RFIsubtraction scale (see Appendix D). We demonstrate the effect of just (1) in the bottomleft panel of Figure 53, where we see larger values (a) where there’s a higher density of data points, and (b) where the weighting scale is larger, resulting in larger weights per data point. We demonstrate the effect of both (1) and (2) in the bottomright panel of Figure 53. Since (far) fewer data points contribute to each RFIsubtraction local model in the vicinity of sources (most are rejected; e.g., Figure 33, top panel), less information informs the postRFI subtraction values in the vicinity of sources.
Weight maps are important if stacking (averaging) images, since different maps, and different regions of the same map, can be more constraining than others, and consequently should be weighted appropriately. (That said, it is still preferable to append all of the mappings’ scans before RFI subtraction, and surfacemodel only once; §3.6.) Weight maps are also important when doing photometry, both when determining the background level within an annulus, and when determining what the error bar should be, since (again, far) less information informs the surface model in the vicinity of sources, and hence in the aperture, than in the annulus (see §4).
Correlation Maps: Correlations maps record the scale over which pixel data are correlated, which is important when calculating photometric error bars from a single image. Calculational details are provided in Appendix D.
The calculation of photometric error bars makes use of the image, the image’s weight map, and the image’s correlation map. If calculating photometric error bars from a weighted average of multiple images, we recommend (1) producing a new weight map by summing their individual weight maps, and (2) producing a new correlation map by taking the weighted average of their individual correlation maps. (But again, the better approach would be to append all of the mappings’ scans before RFI subtraction, and surfacemodel only once, producing a single image, a single weight map, and a single correlation map.)
4. Aperture Photometry
Given an accurately modeled, contaminantcleaned surface, spanning a rectangular grid of pixels, one can perform operations on it, much as one would on a reduced optical image. In this section, we present an aperture photometry algorithm, which we use to test the accuracy of our smallscale structure maps.
However, unlike in optical images, where each pixel is measured directly, in these images, each pixel is a reconstruction, based on the data points around it. And since we cannot guarantee the accuracy of this reconstruction in regions where the weighting scale beamwidths (§3.7.2), we do not recommend photometering sources in such regions. This is easy enough to avoid by designing a sufficiently dense mapping, though additional care must be taken with noddings and daisies, for which increases toward the extremities of the mapping (§3.7).^{39}^{39}39It is not difficult to show that this condition is met within beamwidths of a daisy’s center, where is the number of petals in the daisy. Our users can chose between , 8, 12, 16, and 20. We recommend for higherquality maps, and especially for photometry. However, this condition need only be met within the aperture (which for daisies, will lie, with the source, at the daisy’s lower center). It is not necessary to meet this condition throughout the annulus (§3.7.2).
Assuming that this condition is met, we begin by centroiding the aperture and annulus on the source. We do this by fitting a simpler, secondorder version of Equation 11 (see Footnote 28), with a fixed weighting scale of beamwidth, to every pixel within a 1beamwidth radius of a firstguess pixel. The centroid is given by the extremum of this function. We then iterate until convergence.
Source  

Ratio  Brightness  Percent Error  Error Bars (%)  
Measured  Corrected  True  Measured  After  Internal  External  Total  
Correction  Calculated  Simulated  (from  (from calculation  
correction)  and correction)  
2 / 1  0.252  0.252  0.255  1.3  1.0  0.2  0.1  0.2  0.2 
3 / 1  0.110  0.111  0.112  1.7  0.2  0.2  0.4  0.6  0.7 
4 / 1  0.061  0.064  0.062  2.5  2.7  0.4  0.8  2.0  2.0 
5 / 1  0.038  0.040  0.040  5.2  0.1  1.0  1.0  2.0  2.2 
6 / 1  0.028  0.032  0.031  12.4  1.6  1.7  0.7  5.3  5.6 
7 / 1  0.018  0.026  0.022  18.8  16.4  1.9  3.3  10.6  10.8 
8 / 1  0.016  0.019  0.020  23.1  4.6  2.1  1.3  7.3  7.6 
9 / 1  0.009  0.015  0.012  26.8  22.1  2.7  2.3  13.1  13.4 
10 / 1  0.007  0.012  0.010  35.1  14.8  2.3  2.5  13.8  14.0 
Next, we measure the background level, , the standard deviation about the background level, , and the uncertainty in our measurement of the background level, , from the pixels in the annulus (the background level should be approximately zero in our smallscale structure maps, since these have already been backgroundsubtracted).^{40}^{40}40Given the availability of our smallscale structure maps, which have been more carefully backgroundsubtracted than one can achieve with just an annulus, we recommend using these over their accompanying largescale structure maps (see §2 of Paper II) when photometering sources (except perhaps within 1 – 2 beamwidths of the beginnings and ends of scans, where the backgroundsubtraction algorithm can underestimate the brightness of sources in our smallscale structure maps; §3.3.2). We measure these by calculating the weighted mean, the weighted standard deviation, and the weighted uncertainty in the mean, respectively, of the pixels in the annulus, making use of (1) the weight map (§3.7.2), and (2) robust Chauvenet rejection (§1.3), to eliminate pixels that are contaminated by Airy rings, other sources, etc. (see Figure 54).^{41}^{41}41We reject outliers as described in §4 – §6 of Maples et al. 2018, using iterative bulk rejection followed by iterative individual rejection (using the mode + brokenline deviation technique, followed by the median + 68.3%value deviation technique, followed by the mean standard deviation technique), using the low onesided deviation measurement to reject low outliers and the high onesided deviation measurement to reject high outliers. This is because the low deviations can be artificially suppressed, by the (default) noiselevel prior (Footnote 27; however, see Appendix D and Footnote 41). (Pixel) data are weighted by the weight map. We then sum the pixel values in the aperture, subtracting the weightedmean background level from each. (Aperture and annulus sizes are userconfigurable.)
The total photometric error bar is then given by:
(17) 
where the sum is over the number of pixels in the aperture, , is each pixel’s weightmap value, is the average weightmap value of the pixels in the annulus that were used to measure , is a number that, at least approximately, corrects for the nonindependence of the thpixel’s value over both the RFIsubtraction and surfacemodeling (weighting) scales, and is an empirically determined correction factor. Calculational details are provided in Appendix D.
We now test the accuracy of our smallscale structure mapping algorithm (as well as the accuracy of Equation 17), by photometering the simulated sources from §3.3 and §3.6. Results for a middleoftheroad set of processing parameters are presented in Table 3, and the measured values underestimate the true values, only marginally for the highestS/N sources, but increasingly so for lowerS/N sources. Furthermore, these underestimates are significant relative to the expected level of uncertainty, measured both by Equation 17, and from 100 resimulations of the data, where we have randomized the noise and enroute drift in each.
However, this is expected: In §3.6.3, we demonstrated that the RFIsubtraction algorithm can eliminate noiselevel signal from sources, dimming lowS/N sources in particular, especially when is large. To a lesser degree, this is also the case with the surfacemodeling algorithm, when is large. We have measured these effects for a wide range of and values (for cases where is dominated by ; Equation 14), as well as for different aperture sizes, beam functions, and mapping densities, and have assembled the following empirical correction factor:
(18) 
where:
(19) 
and where is the signal level at the peak of the source,^{42}^{42}42This is given by summing the subtracted pixel values in the aperture out to a userselected radius, and dividing this by the sum of the peaknormalized beam function, evaluated at these same locations. For the cosinesquared beam function in Figure 31, the latter sum is approximately given by: (20) where is the userselected radius in beamwidths, and is the number of beamwidths per pixel (our default value is 0.05; §3.7). For the Gaussian beam function in Figure 31, this is instead given by: (21) These two expressions are nearly identical for beamwidths, and differ by only 13% at beamwidth. If the pointspread function is not known, we recommend using either of these functions, but with . is the standard deviation of the pixel values in the annulus from above, and are measured in beamwidths, , and:
(22) 
The uncertainty in is approximately given by:
(23) 
where:
(24) 
We find that these expressions hold relatively independently of whether the beam function is narrow or broadwinged, and of the density of the mapping. We apply this correction in Table 3, where it raises the measured values of the sources, and the lowerS/N sources in particular, to within, on average, one total error bar (equal to the internal error bar, given by Equation 17, and the external error bar, given by Equation 23 times the measured value, added in quadrature) of their true values. (Note, we do not apply this correction to the internal error bars themselves, as they are not significant functions of source brightness; see Appendix D. Also, these expressions are only for values of between 1/3 beamwidths and 2/3 beamwidths; §3.7)
Source Ratio  Measured FluxDensity Ratio  Modeled FluxDensity Ratio  Modeled FluxDensity Ratio 

(Average of 24)  (Baars et al. 1977)  (Trotter et al. 2017)  
Cas A / Cyg A  ^{b}^{b}Baars et al. (1977) overestimated the fading of Cas A, and did not model (and hence underestimated) the fading of Tau A. These have been corrected, and the spectral models of Baars et al. (1977) also improved upon, in Trotter et al. (2017; see also Reichart & Stephens 2000).  
Tau A / Cyg A  ^{b}^{b}Baars et al. (1977) overestimated the fading of Cas A, and did not model (and hence underestimated) the fading of Tau A. These have been corrected, and the spectral models of Baars et al. (1977) also improved upon, in Trotter et al. (2017; see also Reichart & Stephens 2000).  
Vir A / Cyg A 
Finally, we test the accuracy of our smallscale structure mapping algorithm by photometering real sources, specifically the primary calibration sources Cas A, Cyg A, Tau A, and Vir A, and by comparing their fluxdensity ratios to previously modeled expectations. Using the 20meter in L band, we collected 24 rasters of each source, all within a few days. In this case, we backgroundsubtracted each with a 6beamwidth scale (Table 1), timedelay corrected each, RFIsubtracted each with a 0.8beamwidth scale (Table 2), and surfacemodeled each with our default minimum weighting scale of 2/3 beamwidths (and our, also default, noiselevel prior). We photometered each with a 2.5beamwidth diameter aperture, corresponding to the apparent minimum between the source and the first Airy ring (e.g., Figure 54), and took ratios of these values, and averages of these ratios, as we have done previously in Trotter et al. (2017). We list these average ratios in Table 4, and they match their expected values (Baars et al. 1974; Trotter et al. 2017) within the uncertainties.
5. Conclusion
In this paper, we have presented a singledish mapping algorithm with the following features:
1. We use robust Chauvenet rejection (RCR; §1.3) to improve gain calibration, making this procedure insensitive to RFI contamination (as long as it is not total, or nearly so), to catching the noise diode in transition, and to the background level ramping up or down (linearly), for whatever reason, during the calibration (§3.1).
2. We again use RCR to measure the noise level of the data, in this case from point to point along the scans, also allowing this level to ramp up or down (again, linearly) over the course of the observation (§3.2). We then use this noise model to backgroundsubtract the data along each scan, without significantly biasing these data high or low (§3.3). We do this by modeling the background locally, within a userdefined scale, instead of globally and hence less flexibly (as, e.g., basketweaving approaches do). This significantly reduces, if not outright eliminates, most signal contaminants: enroute drift, longduration (but not shortduration) RFI, astronomical signal on larger scales, and elevationdependent signal (e.g., Figure 20). Furthermore, this procedure requires only a single mapping (also unlike basketweaving approaches).
3. We use RCR to correct for any time delay between our signal measurements and our coordinate measurements (§3.4). This method is robust against contamination by shortduration RFI and residual longduration RFI. (In general, this procedure requires that the telescope’s slew speed remain nearly constant throughout the mapping, or at least during its scans if not between them, though we do offer a modification such that it can also be applied to variablespeed, daisy mapping patterns, centered on a source.)
4. We again measure the noise level of the data, but this time from point to point across the scans, again allowing this level to ramp up or down (again, linearly) over the course of the observation (§3.5). We then use this noise model to RFIsubtract the data, again without significantly biasing these data high or low (§3.6). We do this by modeling the RFIsubtracted signal locally, over a userdefined scale; structures that are smaller than this scale, either along or across scans, are eliminated, including shortduration RFI, residual longduration RFI, residual enroute drift, etc. (e.g., Figure 20). This scale can be set to preserve only diffractionlimited point sources and larger structures, or it can be halved to additionally preserve Airy rings, which are visible around the brightest sources. Furthermore, this procedure can be applied to multiple observations simultaneously, in which case even smaller scales can be used (better preserving noiselevel signal, and hence faint, lowS/N sources).
5. To interpolate between signal measurements, we introduce an algorithm for modeling the data, over a userdefined weighting scale (though the algorithm can increase this scale, from place to place in the image, if more data are required for a stable, local solution; §3.7). Advantages of this approach are: (1) It does not blur the image beyond its native, diffractionlimited resolution; (2) It may be applied at any stage in our contaminantcleaning algorithm, for visualization of each step, if desired; and (3) Any pixel density may be selected. This stands in contrast to existing algorithms, which use weighted averaging to regrid the data: (1) This does blur the image beyond its native resolution, often significantly; (2) It is usually done before contaminant cleaning takes place, because existing contaminantcleaning algorithms – unlike ours – require gridded data; and (3) The pixel density is then necessarily limited to what these contaminantcleaning algorithms can handle, computationally (§1.2.3).
Furthermore, since our surfacemodeling algorithm does not require gridded data, images can be produced in any coordinate system, regardless of how the mapping pattern was designed. And since our surfacemodeling algorithm does not assume any coordinate systembased symmetries, it works equally well with asymmetric structures. In addition to the final image, we produce a path map, a scale map, a weight map, and a correlation map, the latter three of which are important when performing photometry on the final image.
6. Lastly, we introduce an aperturephotometry algorithm for use with these images (§4). In particular, we introduce a semiempirical method for estimating photometric error bars from a single image, which is nontrivial given the nonindependence of pixel values in these reconstructed images (unlike in, e.g., CCD images, where each pixel value is independent, and consequently the statistics are simpler). We also provide an empirical correction for lowS/N photometry, which can be underestimated in these reconstructed images.
Additionally, we have provided a technical description of Green Bank Observatory’s 20meter diameter telescope, which we refurbished as part of this effort, and is now being used by thousands of students and researchers each year, remotely, through our Skynet interface (§2.1).
And while developed because of this telescope, the singledish mapping and aperturephotometry algorithms presented in this paper are not specifically for this telescope. We demonstrated this by applying these algorithms to a smaller, and less capable, telescope, but there is no reason why they may not also be applied to larger, and more capable, telescopes (e.g., we recently applied them, successfully, to Green Bank Telescope mapping observations). These algorithms could also be applied to similarly collected data, with similar noise and beam characteristics, at other wavelengths, as well as from nonastronomical experiments.^{43}^{43}43Furthermore, components of these algorithms could be used to improve interferometric reductions: E.g., RCR could be used to eliminate outlying interferometric measurements, and our surfacemodeling algorithm could be modified to fill gaps in the uv plane.
In the second paper in this series (Dutton et al. 2018), we expand on the smallscale structure mapping algorithm that we have presented in this paper, in two ways: (1) We introduce an algorithm to additionally contaminantclean and map largescale structures, and (2) We introduce an algorithm to contaminantclean and map spectral data, as opposed to just continuum data, as in this paper. Finally, we present an Xband survey of the Galactic plane, from , to further demonstrate, and further test, the techniques that we present in both of these papers.
Appendix A A. Local Model Weights for Construction of Global Background Model
The local model for background subtraction is given by:
(A1) 
where is angular distance along a scan, is the dumpweighted mean angular distance of all of the nonrejected points to which the local model is fitted, , , and are the fitted parameters, and is zero for linear models and one for quadratic models (§3.3).
Let , , and be the uncertainties in the fitted values of , , and , respectively. Given the construction of the local model, these uncertainties will be largely uncorrelated (§8 of Maples et al. 2018). Consequently, we approximate the variance of the fitted model, as a function of position in the model, by:
(A2) 
Furthermore, each term will contribute to the variance approximately equally, yielding:
(A3) 
for linear models and
(A4) 
for quadratic models. Consequently:
(A5) 
where and (Equation 4).
Finally, the variance scales inversely with the number of nonrejected dumps to which the local model is fitted, and weight scales inversely with variance, yielding Equation 3.
Appendix B B. Analytic Fitting of Local RFI Model
The local model for RFI subtraction is given by Equation 9, and is parameterized by and . This model, , can be fitted to data analytically. Consider the following posterior probability distribution:
(B1) 
where is the th backgroundsubtracted signal measurement in the th scan, is its dump weight, is its angular distance from the model’s fixed center, and is the 2D noise model of the th scan (§3.5). The first exponential is a prior probability distribution on , constraining it to be noise level. The second exponential is the likelihood function. The best fit is given by maximizing :
(B2) 
Solving for and yields:
(B3) 
and
(B4) 
Appendix C C. Local Model Weights for Construction of Global RFI Model
The local model for RFI subtraction is given by Equation 9. Let and be the uncertainties in the fitted values of and , respectively. As in Appendix A, we approximate these uncertainties to be uncorrelated, in which case the variance of the fitted model, as a function of position in the model, is given by:
(C1) 
However, unlike in Appendix A, each term does not contribute to the variance equally, because is additionally constrained by a prior probability distribution to be noise level (Appendix B). Approximating the variance of (from zero) from the prior probability distribution to be comparable to, and almost fully correlated with, the variance of (from its, typically nearzero, bestfit value) from the likelihood function, instead yields:
(C2) 
Consequently:
(C3) 
Finally, as in Appendix A, the variance scales inversely with the number of nonrejected dumps to which the local model is fitted, and weight scales inversely with variance, yielding:
(C4) 
where is the number of nonrejected dumps that contributed to the th local model, is the weight of the th point from the th local model, is the 2D angular distance of this point from the model’s center, and is the userdefined RFIsubtraction scale (§3.6). Admittedly, the factor of two in this equation is crudely determined, but its inclusion is fairer than using, e.g., , or . Regardless, the resulting global model is not very sensitive to how the local models are weighted.
Appendix D D: Averaging and Summing Errors Correlated Over a Scale
Consider measurements, each with uncertainty . If these measurements are independent of each other, the uncertainty of their average is given by:
(D1) 
and the uncertainty of their sum is given by:
(D2) 
If all of the are equal, and , which are the expected results.
However, these equations must be modified if some of the measurements are correlated. First, consider the extreme case where all of the measurements are correlated. Then, and , or:
(D3) 
and:
(D4) 
Now, consider the intermediate, but still simple, case where the measurements consist of independent subgroups, each with correlated measurements. Then:
(D5) 
and:
(D6) 
where is shorthand for of the th measurement.
Finally, consider the case where each measurement is correlated with either a definable subset, or all, of its surrounding measurements, over a scale . In this case, the above equations apply, with instead being the number of measurements (1) within a radius of the th measurement (2) that are also part of this subset, and (3) that are also part of the sum.
We first make use of this when averaging RFIsubtraction localmodel values, to produce a globalmodel value (§3.6). Each localmodel value is correlated with a definable subset of its surrounding localmodel values, over the diameter scale of the local model (Equation 9). If the local model were a oneparameter model, each localmodel value would be correlated with all of its surrounding localmodel values over this scale. However, it is instead a twoparameter model, with a correlated dominated inner region, and a separately correlated (and perhaps even partially anticorrelated) dominated outer region. If a localmodel value is above its corresponding 2D noisemodel, we take it to be part of the dominated region; else we take it to be part of the dominated region.
Consequently, when calculating the weight of the globalmodel value (Footnote 22), we use Equation D5: We sum the weights ( ) of the nonrejected localmodel values (Appendix C), but not before dividing each of these weights by the number of nonrejected localmodel values (1) within a radius of that weight’s localmodel location (2) that are above their 2D noisemodel values if the globalmodel value is above its 2D noisemodel value, or that are below their 2D noisemodel values if the globalmodel value is below its 2D noisemodel value, and (3) that are also part of the sum (i.e., that are also within a radius of the globalmodel location).
We again make use of this when surfacemodeling the postRFI subtraction data (i.e., when surfacemodeling the above globalmodel values; §3.7). These data are correlated over a scale, , which we measure directly from the nonrejected localmodel data that contributed to each globalmodel value.^{44}^{44}44For a given globalmodel, or postRFI subtraction, value, consider the nonrejected localmodel data that contributed to this value, with each data point weighted (1) by the number of times that it contributed, and (2) by its number of dumps. We then take to be proportional to twice the standard deviation of these points’ locations about this value’s location, where we set the factor of proportionality, , empirically below. Consequently, each postRFI subtraction data point has its own value, just as each has its own value (§3.7). Away from sources, where few localmodel data points are rejected, . But near sources, this scale can be much smaller. Consequently, when calculating the weight map (§3.7.2), which is again proportional to Equation D5, we divide each term in the sum by the number of data points (1) within a radius of that term’s data point (2) that are also within the 1beamwidth radius region over which the surface model is being fitted.
The weight map is then used in photometry (§4). We use it to measure or model (1) the background level, (2) the standard deviation of the pixel values about the background level, (3) the uncertainty in any pixel value in the image, (4) the total uncertainty from summing all of the pixel values in the aperture, (5) the uncertainty in the background level, and (6) the total photometric error:
1. The background level is simply given by:
(D7) 
where is the value of the th pixel, is its corresponding weightmap value, and the sums are over all nonrejected pixels in the annulus (Footnote 37; e.g., Figure 54).
2. The standard deviation of the pixel values about the background level is then given by: