Modeling Total Solar Irradiance Variations Using Automated Classification Software On Mount Wilson Data Solar Physics
Abstract
We present the results using the AutoClass analysis application available at \inlinecite2002Autoclass which is a Bayesian, finite mixture model classification system developed by \inlinecite1995…kdd95. We apply this system to Mount Wilson Solar Observatory (MWO) intensity and magnetogram images and classify individual pixels on the solar surface to calculate daily indices that are then correlated with total solar irradiance (TSI) to yield a set of regression coefficients. This approach allows us to model the TSI with a correlation of better than 0.96 for the period 1996 to 2007. These regression coefficients applied to classified pixels on the observed solar surface allow the construction of images of the Sun as it would be seen by TSI measuring instruments like the Solar Bolometric Imager recently flown by \inlinecite2004ApJ…611L..57F. As a consequence of the very high correlation we achieve in reproducing the TSI record, our approach holds out the possibility of creating an ongoing, accurate, independent estimate of TSI variations from groundbased observations which could be used to compare, and identify the sources of disagreement among, TSI observations from the various satellite instruments and to fill in gaps in the satellite record. Further, our spatiallyresolved images should assist in characterizing the particular solar surface regions associated with TSI variations. Also, since the particular set of MWO data on which this analysis is based is available on a daily basis back to at least 1985, and on an intermittent basis before then, it will be possible to estimate the TSI emission due to identified solar surface features at several solar minima to constrain the role surface magnetic effects have on longterm trends in solar energy output.
1 Introduction
Observations of solar surface magnetic and intensity data reveal a myriad of features which evolve on a wide range of spatial and temporal scales. These features are in turn related to variations in solar radiative output both in specific wavelengths and in total solar irradiance (TSI) [Harvey (1994)]. One challenge to ascertaining the relationship between particular solar surface features and variations in solar radiative output is the very wealth of data available both from individual solar observations and from archives of individual solar observations which in some cases, such as those available at Mount Wilson Solar Observatory (MWO), cover many decades. Traditionally, such features were identified manually and categorized by experts into familiar categories such as sunspots, faculae, quiet sun, etc., a process which inevitably produces considerable variation in how, and into which, feature categories regions from solar images are classified. Recently, researchers have applied automated Bayesian statistical pattern recognition systems to identify and characterize solar features, using expert knowledge of the magnetic and radiative characteristics of the recognized types of regions to guide the software (\opencite2002ApJ…568..396T, 1998, \opencite1997ESASP.415..477P). Our approach has a similar goal but uses classification based on individual pixel properties instead of pattern recognition based on spatial configurations of groups of pixels.
An automated classification or pattern recognition approach is particularly promising for analyzing the large database of solar observations taken with the 150foot solar tower at MWO which has been in daily operation since 1912. Since the mid1980s, the observations at MWO have included daily magnetograms and intensity grams taken in the 5250.2Å Fe i and 5237.3Å Cr ii lines. The observing system has been discussed in a number of papers (\opencite1983SoPh…87..195H, \opencite1991SoPh..135..211U, 2002, 2009) and utilizes a Babcock magnetograph system that scans an entrance aperture over the solar surface following a boustrophedonic pattern to build up a full disk image. Because the magnetic and intensity images are captured simultaneously, each pixel in a magnetic or intensity image based on a particular spectral line corresponds exactly to the solar surface area covered by the corresponding image pixels in other lines. This process permits simultaneous observation of the same part of the solar surface in different wavelengths.
One product of the observations is the magnetic field strength for which we use the absolute value of the measurement at : . We used only absolute magnetic field values (measured in gauss) for the magnetograms since, a priori, the sign of the field did not seem likely to affect the intensity of radiation from that pixel. For simplicity of notation we will use for this parameter since this is the only field we use in this study. It is important to recognize that magnetic field strength values are dependent on a variety of factors including the spectral line and spectral sampling as well as the spatial resolution. The values given for the classifications discussed in this paper are those from the 150foot solar tower synoptic program without correction to a scale as discussed by \inlinecite2009SoPh..255…53U. A second product from the and lines is an intensitylike image which we have dubbed an intensity ratiogram. These consist of solar image arrays for which each pixel has a value that is the ratio of the intensity in that pixel to the intensity in the same pixel:
(1) 
where the intensity ratiogram values are based on an arbitrary scale in which 10000 is the value of the peak of each image’s probability distribution function for . Prior to use in the classification analysis, the r images have been adjusted to remove the dependence on centertolimb angle by fitting to a polynomial in powers of up to but with the term in omitted. The fitting is done twice with intermediate steps 1) to identify and remove plages and spots using a histogram analysis and 2) to remove a small E/W gradient probably due to a telluric feature in the line. This flattening is carried out on each image prior to summing into the daily average image. Also, to avoid some observational deficiencies near the solar limb, we have restricted our analysis to pixels having . Because the magnetic sensitivities of the lines differ, the intensity ratio produces an image in which the usual surface featuresspots, faculae, etc — are visible. The advantage to the intensity ratiogram is that, by taking the ratio of the two intensities, short term variations in seeing and atmospheric conditions can be greatly reduced or removed, yielding an image which is superior in terms of solar surface feature resolution to an image in either wavelength alone.
2 Use of AutoClass to Classify Solar Surface Features in MWO Magnetogram and Intensity Ratiograms
2.1 The AutoClass Software
AutoClass is a classification code developed by \inlinecite1995…kdd95 based on a classical finite mixture model utilizing Bayesian statistics for determining the optimal class to which a particular observation belongs. The observed parameters are referred to as attributes and can be quite general. Each observation is called an instance and is described by the attributes. For us the attributes consist of two quantities: the absolute value of the magnetic field and the intensity ratio. Additional observed quantities could be added if available but for our current effort we have retained just these two. The instances are the individual pixels. These instances (data pairs for a single pixel) are treated in isolation from all other attributes such as the image to which they belong. We did explore the addition of the centertolimb position as an additional parameter but found it did not improve the performance of the classification. Although the fact that pixels belong to specific images in a time sequence does not enter into the classification analysis, we do utilize this information to understand the interrelations between the classes.
The classification can be carried out in two different modes. The first mode is a search mode in which it conducts a search for the optimal classification scheme based on posterior Bayesian statistics applied to a specific set of observed data. The search mode can be (1) unconstrained or open ended, in which AutoClass determines both the optimal number of classes and the characteristics of those classes or (2) constrained or user directed, in which certain limits, such as the number of classes to be found, can be imposed on the search. The output classification consists of a set of classes. Each class is defined by a distribution function giving the joint probability of occurance of the attributes for a class member instance. The parameters defining the probability distribution functions (pdf’s) for each attribute are the primary output from the search mode classification. In our case the data used to find the optimal classification scheme consists of all the data pairs for pixels on a selected subset of solar images spanning the full solar cycle. The second mode is a predict mode in which a userdirected classification is carried out by the application of the known classes to a new data set to determine the probability each data instance belongs to each of the classes. For the MWO data the attributes are specifically the absolute value of magnetic field and the 5250/5237 intensity ratio r.
When run in the open ended search mode, AutoClass decides both the optimal number of classes into which the data should be divided and the characteristics (in terms of observable attributes such as intensity ratio and magnetic field) of those classes. The result of running AutoClass in the search mode on a data set is an instance by instance (here, pixel by pixel) classification in which each instance/pixel in the searched dataset is given a weighted probability assignment to a subset of the total classes found in the search mode. For example, if the search mode found classes for the entire data set, then a particular instance/pixel with attributes/measurements and , would be given a set of probabilities of belonging to each of the classes . In some cases, the most likely class will have a probability near unity while in other cases two or more classes may have a high probability. For many pixels the probabilities for the least likely classes may essentially vanish. There is thus no absolute assignment of an instance/pixel to a particular class. This mode requires the determination of the number of classes as well as the parameters defining the probability distribution function (pdf) for each class so that a search mode run is computationally intensive.
When run in the predict mode, AutoClass analyzes a new data set using the results of the search mode (the value of as well as the parameters defining the pdf for each class) to assign each instance/pixel in the new data set into one or more of the classes found in the search mode and to assign a probability to each class assigned to that pixel. In this analysis, no new classes are introduced and AutoClass assumes that any pixel in the new dataset can be categorized into one or more of the classes found in the search mode. The product of AutoClass as run in the predict mode consists of the assignment of class membership probabilities to every pixel.
2.2 Application of AutoClass to MWO Data
To run AutoClass in the search mode using MWO images, we needed to use a reducedsize data set in order to achieve an acceptable running time. For this purpose we created a dataset consisting of 12 daily image pairs (MWO magnetogram; intensity ratiogram), using days spread over the period 1996 to 2008 — i.e., for each year in the time period, we selected one day and for that day used a magnetogram and intensity ratiogram for that day. In order to reduce noise, we (1) used images consisting of the average of the magnetograms and the average of the intensity ratiograms for each selected day, (2) eliminated days for which there was only 1 image and (3) selected days for which there were at least 8 pairs of magnetic and intensity images. (The number of available images per day and type varies from 0 to 16 or more depending on weather and time of year.) When assembled, this dataset contained measurements for 484,000 pixels from 12 pairs of magnetogram and intensity ratiogram images over 12 years, each pixel having a measured value for the absolute magnetic field and the intensity ratio.
As a groundbased observatory at a single site, the MWO system must cope with a variety of difficulties – weatherinduced gaps, sky transparency variations and image blurring known as seeing. Apart from the observations where the sky transparency variations impacted the data, the variations of atmospheric seeing and other instrumental effects did not degrade the image quality since the entrance aperture has a size of 12 squared and the seeing image fluctuations are generally only 2 to 4 on days of poor seeing. The effect of sky transparency variations is felt primarily through their introduction of tracking errors into the system controlling the position of the solar image. At a low level some unsatisfactory images have been left in the data set. Individual daily classification data points have been examined when their results appear discordant and the tracking error mechanism appeared to have an impact on the data quality. Since these points tend to occur on days when one or two observations were made, the easiest way to reduce their impact on the analysis was to restrict the data points to those days on which three or more observations were made. The most adverse months are January to April. For some years, the end of December also tends to have poor data but this happened less than was the case for the later 4 months. We return to this discussion below in the context of a detected discrepancy having a period of a year. In a further note about systematic effects related to the observing system we point out that prior to winter 1993 the spectrograph experienced strong spectrograph seeing during periods of cold weather with the result that the magnetic fields were noiser at those times. None of the presently used data is subject to this problem.
A core problem is to determine the number of classes required to represent solar surface features. AutoClass can make this determination based on its Bayesian statistical model of the data pairs. However, our observational parameters come from a system where the spatial resolution does not isolate solar surface features that are uniform in their characteristics. Consequently, it is probable that some classes are a result of a variable filling factor. For example if and are the observational parameters respectively in a flux tube and the quiet sun and a particular solar surface pixel has , AutoClass may create separate classes for each of a limited range of between 0 and 1 whereas in fact there are only two classes present. In the event that AutoClass finds excess classes from this mechanism, the extra classes will form a subspace where the pairs are linearly dependent and the regression analysis will yield large, nearly cancelling coefficients.
AutoClass was first run in the search mode with the number of classes to be used either fixed or left as a free parameter to be determined by the code. The initial AutoClass search results yielded 37 classes. The regression analysis discussed in the following section indicated that the filling factor mechanism might in fact be playing a role in the MWO data set. Consequently we restricted the AutoClass search to have a maximum of 18 classes. Although the ability of the 37class classification to reproduce the TSI was slightly better than that for 18 classes, we believe that this result is not physically significant. Furthermore, the 18class set is easier to relate to known solar features.
Using the search mode results for the 18class set, we then ran AutoClass in the predict mode on the daily average magnetograms and intensity ratiograms for the period from May, 1996 to July, 2008. For purposes of running AutoClass in predict mode, we modified the operation sequence so that the images for each day were treated as a single dataset containing a measurement for each pixel. We indicate that pixel is found in image with the notation . Designating the total number of satisfactory days we analyze as , the current data set has . The predict mode run takes for each and returns a set of the 10 most probable classes for these pixels. The restriction on the number of returned class probabilities per instance is a feature of the AutoClass package necessitated by the potentially large number of classes that AutoClass can identify. We were not able to modify this feature. Because this set of probabilities is truncated, the sum of the probabilities of the returned classes does not reach unity and a renormalization of the class probabilities was necessary in order to ensure that the area index defined below properly represents the total solar surface.
The next step was to use the new datasets created by AutoClass in the predict mode to create for day an index for each class :
where is the number of pixels in image . Thus, the derived index is the expected fractional area of the solar surface covered by class on day . Also, for each we retain the position as the distance from the image center in the EW and NS directions relative to the image radius. We use these positions to reconstruct a 2 dimensional image showing the locations of the different class pixels. These images help in the identification of the classes with solar surface features of recognized type such as sunspots, plages, faculae and chromospheric network.
3 Use of AutoClass Indices to Model TSI
It is important to observe that, so far in the process, there has been no introduction of any variables other than: (1) the measured intensity ratio and magnetic field values from the MWO images and (2) the AutoClass class probabilities and daily class indices, the latter of which were based solely on the observed number distribution functions of the data pairs with each pixel being treated as an independent entity. The creation of the daily class index from the sum over pixels associated with each image is a separate step outside the AutoClass system. These indices are the fractions of the area of the apparent solar disk covered by each class and having the property:
(2) 
We are also interested in averages over all the daily indices and use a bracket notation
(3) 
to indicate such an average with a similar notation for other quantities.
In order to determine whether the daily indices could be related to satellite TSI observations and solar surface features, we next performed a linear multiple regression to fit the 18 daily MWO/AutoClass indices for 3111 days between May, 1996 and July, 2008 to 3 daily TSI measurements for the same period from: (1) , obtained from the VIRGO experiment on the cooperative ESA/NASA mission SoHO, VIRGO Team (2008) (2) , the composite VIRGO dataset (version d40) obtained from PMOD/WRC, Davos, Switzerland, unpublished data from the VIRGO Experiment on the cooperative ESA/NASA Mission SoHO (the creation of which is described in \opencite2000SSRv…94…15F); and (3) the space absolute radiometric reference (SARR) Crommelynck et al. (1995); Dewitte et al. (2004). In this paper we illustrate the method using the case while a subsequent paper will discuss the application of our methods to all three TSI time series as well as UV indicators. The above symbols will be used whenever we refer to the time series data.
The regression fit we use contains only an overall constant and does not associate a constant with each class since a class with zero area must contribute nothing to the TSI. Thus we seek coefficients giving a model for the virgo TSI, , of the form:
(4) 
using the method of least squares. Each of the coefficients corresponds to the TSI value the sun would produce in case it were covered entirely by a surface of class . We can evaluate by averaging over all the daily cases:
(5) 
Since the have the value of that would prevail if and is the average contribution of class to , we see that must sum to and the value of must be zero.
We can use the fact that the area indicies must sum to unity to introduce an arbitrary offset since any arbitrary constant subtracted from the would result in an equal constant subtracted from all the coefficients :
(6) 
where the requirement of all areas summing to unity is used in the first form of the right hand side of this equation.
Thus the inclusion of a constant just provides a zero point offset for fitting parameters and is not constrained in the fit. If we drop one class from the model equation, then a nonzero constant will have value equal to the for the dropped class. We do not use this approach because we find that all the derived values of depend on which class is dropped. To avoid introducing a dependence on which class is dropped from the fit, we have adopted a bootstrap method applied to the solutions without a constant as described below.
For most classes, the values of are close to the average TSI so we adopt the arbitrary zero point offset to be the global average of the and consider deviations:
(7)  
and 

(8) 
For our current investigation concentrating on the VIRGO data the difference between the observation and simulation for day is then:
(9) 
We have chosen to define with the above a sign to simplify our discussion of the time dependence of near solar minimum in Section 6.
We find that the regression analysis is sensitive to the details of the data set to which the regression is applied. This may be due to the constraint on the sum of the indicies Golan, Judge, and Miller (1996) or due to the near linear dependence of the observed data pairs brought about by the filling factor effect. Although the quality of the fit is consistent when small changes are made to the fitted data, the coefficients themselves vary. We determine the most representative values of the coefficients from a bootstrap approach:

The base pool of data consists of the actual set of 3111 daily observations each of which is the 18 class indices combined with the for that day.

A modified data set is formed by a method called randomly selecting with replacement from the full base pool 3111 times. The modified data set contains some daily observations more than once and omits others. The temporal order of the daily observations is not preserved in the modified data set. However, including the as part of the daily observation prevents the fitting coefficients from being impacted by this temporal scrambling.

The fitting coefficients are computed from the modified pool and recorded.

Steps 2 and 3 are repeated up to 100,000 times to create distribution functions for each of the classes.

The median from each distribution function is adopted as the appropriate value for that class.
This method allows us to estimate the uncertainty in the fitting coefficients. As an additional measure of the uncertainty in the derived fitting coefficients we applied the method of dropping one class for a 17class fit. Since the constant in that case represents the for that class, we subtracted the global average from that constant so it could be treated the same way as the remaining classes. Also, to avoid special focus on any particular class, we carried out this analysis for all of the classes to obtain 18 values for the fitting coefficient for each of the 18 classes. We use the average of coefficients from these 18 cases as an estimate we term the 17class fit coefficients. We then compared the average of the coefficients for the 18 cases to the coefficients found from the bootstrap method. For all classes the coefficients derived in these two ways agreed with a difference much smaller than the error indicated from the bootstrap method. The worst cases were for classes 13, 17, 15 and 16 where the bootstrap coefficients are and Wm while the 17class fit coefficients are and Wm respectively.
The correlation between the MWO daily class indices and the VIRGO TSI measurement for the 3111 day period was 0.9565 for a fit to the 18 classes while a fit with the full 37 classes yielded 0.9625. Figure 1 showes the scatter plots for the MWO model, , and the VIRGO TSI, measurements. The tightness of the relationship between the model and observations is evident. In fact it is somewhat better than is indicated by the full set of cases since there is a longterm trend of the model relative to the observations. The color of the plotted points has been chosen so as to highlight this effect with the early part of the decade being green and the ending part of the decade being orange. The cause for this offset is not known since there have been no changes to the MWO instrumentaton that could give such a systematic drift. Figure 2 showes comparative time series for the MWO fitted simulations and the observed VIRGO TSI data. If we average over the first 6year period and compare it to the last 6year period we find that the value of has decreased by only Wm whereas the observed VIRGO TSI has decreased by Wm for the same periods. The spectrographic system at MWO has been held constant over these periods.
The deviation plots show patterns with a period of a year. This is illustrated in Figure 3 which shows the result of a superposed epoch analysis with a one year period starting at the first of the year. We do not have an explanation for this pattern. We also see the systematic trend of the observed points to fall relative to the simulation for the full time period. The simulation is essentially the same for the two minima. There is a possible effect related to the tilt angle of the magnetic field vector in the polar regions. However, the symmetry between north and south regions leads to the expectation that such an effect would have a dominant component with a period of 6 months. Occasional bad observations that are not eliminated by our quality control procedures tend to have larger than appropriate magnetic fields and their associated model TSI values tend to be high. Since such images are more common during the winter and spring months, this effect would produce a result out of phase compared to what is shown in Figure 3.
The datasets created by running AutoClass in the predict mode can also be used to create resolved images of the sun as it would be seen by a telescope which detected TSI directly. Thus, because each dataset created in the predict mode contains for each pixel the probabilities for the 10 most probable classes for that pixel, and since each predicted class has a fit coefficient derived from the multiple regression of the daily MWO indices against the various TSI measurements, it is possible to create a solar TSI image by assigning to each pixel a value equal to the sum of the per class coefficients times the class probabilities plus the constant derived from the multiple regression. Images of this sort are discussed below for groups of classes rather than for individual classes.
4 Class Groups
4.1 CrossCorrelation Analysis
The solar surface is covered by a number of recognized types of feature that typically include quiet sun, chromospheric network, plages/faculae and sunspots. The 18 (or 37) types of solar surface identified by AutoClass clearly makes a finer distinction of solar surface features than is currently recognized. In addition, we are concerned that the filling factor effect continues to generate artificial classes. AutoClass itself provides no guidance beyond the grouping of pixels into the classes. We utilize the additional information provided by the time series of daily indices to search for groupings among the classes. Thus if we find several classes that have highly correlated daily indices we can determine how this group as a whole might relate to the recognized solar surface features. Furthermore, the values of and determined by AutoClass are a significant indicator of the nature of each class.
Our grouping of the classes is accomplished through the study of the crosscorrelation matrix formed from the 18 time series of the daily class indices. AutoClass creates a class number according to its internal processing and generally gives low class numbers to those classes having the largest number of members. However, the class numbers themselves have no apparent useful information. The procedure we use to find class groups is to identify a subset of the classes that can serve as root classes for their group. Based on a tentative identification of a class as a potential root class, we can arrange all the remaining classes in order of decreasing crosscorrelation coefficient. All classes having a crosscorrelation coefficient above some threshold are then identified as being members of that group. The root class for the next group can be taken as the class having the highest crosscorrelation coefficient below the threshhold. This process can be repeated until all classes are members of some group. In some cases, the group has only one member if its root class has no other classes with a crosscorrelation coefficient above the threshold. This stepwise grouping process is simple and yielded good results. More complex problems might benefit from the global approach detailed in \inlinecite2008JMLR…9..485B, which decomposes the covariance into groups using a maximumlikelihood criterion based partly on the inverse covariance matrix where the group structure is directly encoded.
The lowest number class found by AutoClass has the largest number of pixels and examination of the values of and r strongly suggest that this class corresponds to the quiet sun. As long as we follow the above procedure of beginning each group by taking the root class to be the class with the highest cross correlation below the threshhold from the previous group, the sequence of groups will represent a systematic increase in the level of activity. In fact the values of the and determined by AutoClass for the classes within these groups confirms this expectation. The value of the threshhold crosscorrelation coefficient to define the dividing point between successive groups could not be held constant for all the groups. This process allows us to find six groups of classes that we name Quiet 0 (Q0), Quiet 1 (Q1), Network (N), Plage 0 (P0), Plage 1 (P1) and Spot (S).
The result of the above procedure is shown in Table 1. For most of the groups the assigned classes are clearly distinctly associated with that group and there is little flexibility in making the assignment. However, for the two plage groups P0 and P1, the association is less clear. For example, if the root class of group P0 were chosen as class C5 instead of class C13 then class C12 would have been assigned to P0 instead of P1. However, based on the magnetic field strength, it is more appropriate in group P1. The number of classes within each of the above groups is 3, 1, 1, 6, 5 and 2 respectively. The fact that the Plage groups have the largest number of classes as members is probably a consequence of the filling factor mechanism.
4.2 Class Group Properties
The relationship of the classes and groups to solar properties is of great interest and is shown in Figure 4 and Table 2. Although the classes in the groups are similar, their regression coefficients are often of different sign within the groups suggesting that there may be more parameters than needed for the basic fit. However, the uncertainties derived from the bootstrap analysis indicate that sign of the coefficient, especially for the P0 and P1 groups, is not significant. The concentration of the classes with the least certain fitting coefficients in these two groups is supportive of the hypothesis that uncertainty in the fit comes from the fluxtube/filling factor mechanism discussed in Section 2.2. The systematic increase in magnetic field strength in going through the sequence (Q0, Q1, N, P0, P1, S) is evident. The intensity ratio is an indicator of the brightness of the upper photosphere compared to the low photosphere with larger values indicating a brighter upper photosphere/chromosphere zone. Its dependence on the magnetic field strength is mostly along a well defined curve where the upper atmosphere is brighter where the field is stronger. Exceptions occur within the P0 and P1 groups where the dependence is erratic. In the case of the Q1 group (a single class) the chromosphere is brighter than other portions of the solar surface with similar field strength. The average areas given in the last column for each group are averaged over the full solar cycle and show that over 80% of the solar surface is covered by the Q0, Q1 and N groups. Nonetheless, these groups have low crosscorrelation coefficients and represent distinct components of the solar surface.
A useful diagnostic for the classes and groups is their distribution over the solar surface. We select only those pixels whose probability of membership in a single class is over 90% and utilize their centertolimb angles as measured by . For each image we calculate the average of to get a measure of the average pixel distance where the notation denotes an average over the pixels in image . Since we are interested in the distribution of the class pixels relative to the analyzed image, we rescale the values of by dividing by 0.95. The daily averages are then averaged over the full data set to yield . We have also considered the rms spread in denoted by . For reference we note that a quantity distributed uniformly over the solar image would have , . This analysis clarifies the distinction between class 16 and the other P1 and P0 classes with similar magnetic field strength – class 16 is systematically nearer the limb than the others and has a higher value of r as a consequence.
Much of the variability dominating the crosscorrelation coefficients comes from the rotational modulation of the solar surface coverage. Changes due to the 11year cycle are also of interest. The growth or decline of the fractional coverage of a particular class or group of classes may provide a clue as to the underlying process of the solar dynamo. The relative sizes of the coefficients allows the groups P1 and S to have a major impact on the TSI even though the areas themselves may be small. We reduce the rotational modulation in the areas and TSI model by smoothing all parameters with a Gaussian function having a width of 30 days. We show in Figures 5 and 6 respectively the smoothed trends for the contributions to the TSI and the changing breakdown of the class areas within three groups of classes. Several of the groups influence the TSI but show only a restricted dependence on the solar cycle: Q0, Q1 and N. The case of the P1 group is interesting because the D panel of Figure 5 suggests that class C8 accounts for most of the TSI variation. However, all classes within the P1 group contribute a nearly constant fractional area to the P1 group and their contributions to TSI differ primarily because of the differing values of their coefficients. The substantial cancellation effects amoung the coefficients within the P1 group are typical of an underdetermined system having more parameters than constraints. The least square fitting exploits the small differences between the class time dependencies through these cancellations.
Because the Q0 group occupies a major fraction of the available surface area, when the P0 and P1 groups increase their area of coverage, the Q0 and Q1 groups decrease their area because they provide the areas supplying the P0 and P1 groups. During the rising portion of the solar cycle, the conversion of the quiet group areas into plage group areas is associated with a rising TSI. Consequently some of the increase in the TSI is attributed to a negative coefficient for the Q0 and Q1 groups.
5 Images
The relationship between the observable parameters and the derived change in the model TSI has both expected and unexpected parts. In this section we mostly show a number of representations of the solar image for a representative day of 16 May 2002. We also use data obtained on 9 Sep. 2003 with one image from MWO and another image from the balloonborne Solar Bolometric Imager (SBI) instrument Foukal et al. (2004). 16 May 2002 has a moderate level of activity and not a high level of activity wherein the regions might cover more of the solar surface and tend to merge or interact. It is not unusual in any obvious way and is of good quality data not impacted by terrestrial sky transparency variations. Figure 7 shows images of the absolute value of the magnetic field and the intensity ratio along the top row. For comparison on the lower right is a familiar representation of the chromospheric structure as measured by a Ca K spectroheliogram adapted from the synoptic program carried out at the Paris Observatory at Meudon. This K image is displayed here as a negative in order to be more consistent with the magnetic and intensity ratio images that have been created entirely digitally. The close correspondence between magnetic field strength and the K intensity is well known. The intensity ratio has not been used before as a diagnostic of solar surface features. However, the image shown in Figure 7 is very similar to that for with two differences: 1) the strong magnetic field regions have a reduced value of the intensity ratio, and 2) in regions of very weak magnetic field, the intensity ratio continues to become lower even though the field itself is lower by only a very small amount. When these features are classified and converted to a contribution to the TSI, a map of the solar surface can be created. The result for the sample day is shown in the lower left. Note that the positive contributions to the TSI are not as widely distributed as are the magnetic and intensity ratio features.
The balloonborne Solar Bolometric Imager (SBI) has provided us with a real image of the sun of the sort that our algorithm attempts to produce. The comparison between the real SBI image and our model is shown in Figure 8. The image based on the class coverage fractions is not the local photometric contrast but instead is the local contribution to the change in the TSI. The SBI image shows the local contrast wherein the deviations near the limb are enhanced relative to ours. The model TSI image differs from the observed in the important characteristic that it has placed an excessive fraction of the irradiance deviation in a bright rim around the plage rather than leaving it distributed somewhat uniformly in the plage or facular regions. This pattern is most likely a consequence of the ring effect discussed above in Section 4.2 and is related to the illdetermined coefficients in the plage groups.
The grouping of the classifications in Section 4.2 is based exclusively on the cross correlations between the time series created using the fact that the pixels are from solar images. The names of the groups imply an association with known solar structures although some of the groups are subdivisions of these previously known structures. The sequence (Q0, Q1, N, P0, P1, S) is ordered in terms of increasing magnetic field. The values of the field in each case is consistent with the fields for these structures except perhaps for the N group (the single class C1) and the P0 group where the fields are weaker than expected. An important diagnostic of the nature of these groups comes from the distribution over solar images and the spatial proximity of pixels of each class or group to other classes and groups.
We have created diagnostic images from the class groups using a CMY (Cyan, Magenta, Yellow) encoding wherein a one to three color tiff image has each of the color strengths based on one or more class group probabilities – each pixel is fully the assigned color if its probability of membership in the assigned class is 100%. Pixels having a significant probability of being in two or three classes displayed in the figure have their colors composed of those class value and can take on an appropriate intermediate hue. Class group images created this way are shown in Figure 9 for Q0, Q1, N and (P0, P1). The three classes making up the Q0 group are shown using the full CMY encoding, the plage groups are both shown in one figure using two colors while the remaining two groups have only one class and are show separately using the cyan color alone. The S group is not shown in a panel because the areas involved are very limited and can be seen adequately as the blue and green areas on the panel of Figure 7. In order to aid in the coordination of these class group figures with the input magnetic field pattern, an overlay of contours of constant strength values is plotted on top of the class group plots.
The P1 magenta areas are clearly very concentrated about the high magnetic field positions while the Q0 areas clearly avoid the magnetic contours. The gradation of the classes within the Q0 group is clearly a sequence with the quietest C0 class forming the center of the areas and being surrounded by the C4 and C6 classes in succession. The group identified as N is widely distributed as expected for the solar chromospheric network justifying our naming of this group. The Q1 group appears to occur preferentially near the solar limb and this is probably the reason it is not well correlated with the other classes that have been grouped as Q0.
Referring back to Figure 7, it is evident that the plage areas have an alternating ring pattern where the central portions of the plage are not bright but are surrounded by a rather bright ring. This pattern is not seen in either the or the figures. According to Table 2, the P1 group has both positive and negative coefficients with the area weighted average being positive. In Figure 7 the panel is computed from the full set of classes rather than the class groups. In the parts of the figure with strong active regions characterized by sunspots or concentrations of P1 areas, this alternating ring pattern is seen quite clearly. In more isolated areas which only have P0 or P1 areas, the alternating ring pattern is not present. This pattern comes primarily from the way the coefficients have been constrained in the fit to the TSI. The most strongly positive coefficient is that for class 8 while the stronger magnetic field classes 10 to 12 have small or negative coefficients. An examination of the spatial distribution of the class 8 pixels shows that they are found both associated with the plage regions and in areas well away from the stronger plage zones. Evidently the widelydistributed pixels are essential for the fit to the TSI and lead to a high . Apparently this large coefficient yields a higherthanneeded contribution from the stronger plage regions so that an additional contribution from the class 10 to 12 pixels is not needed to match the TSI. The close spatial proximity of the class 8 pixels and the class 10 to 12 pixels makes this cancellation effective in reducing the contribution to the TSI variation. The extended pattern of smaller brighter and dimmer regions outside the active areas is a combination of contributions from the Q1, N and P0 groups.
6 The Solar Minimum Slope Difference
We discussed above the inability of AutoClassassigned areas to reproduce the longterm trend of decreasing TSI as measured by Virgo – the Virgo TSI has a greater rate of decrease than is produced by solar surface magnetic features. Our above conclusion could be a consequence of the tight constraints on coefficient values imposed by the large amplitude variations in the TSI due to the modulation of stationary plage and sunspot regions carried by rotation across the apparent solar surface. By temporally smoothing the time series to eliminate the rotational modulation of the TSI record, the least square fitting may be freed to fit the longterm trend mismatch. We have tested this supposition by carrying out a set of modified fits using both smoothed indices and the smoothed Virgo TSI data. The degree of smoothing is defined by the Gaussian width parameter which we have taken to be 3, 31 and 101 successive points. Our points are separated by one or more days since observing conditions do not permit data acquisition every day so that the actual smoothing period varies but the number of observations smoothed does not. After smoothing, the data sets were subsampled in order to retain only independent time sequence points and the regression fit using the bootstrap approach was repeated. To determine whether these smoothed fits permit a better representation of the longtime trends, we define two temporal averaging periods: and which are 12month averages of at the beginning of our series May, 1996 to April, 1997 and August, 2007 to July, 2008 respectively. Recalling that is the observed minus model, we define a solar minimum slope difference as:
(10) 
For the Virgo observations the solar minimum slope is Wmyr. The smoothing and resampling reduces the number of constraints on the model so we considered three sets of class groups:

The class groups discussed above.

The class groups discussed above except for the Q0 case which was split so that class C0 was one group and classes C4 and C6 were considered a second group.

All the original classes were considered as independent.
(Wmyr)  with trend  detrended  

(days)  (days)  (days)  
Vars.  none  3  31  none  3  31  none  3  31 
6  0.025  0.024  0.028  0.888  0.926  0.909  0.898  0.941  0.927 
7  0.023  0.023  0.026  0.890  0.928  0.909  0.897  0.940  0.926 
18  0.023  0.022  0.024  0.953  0.958  0.925  0.963  0.971  0.931 
In Table 3 we report the values of for six cases defined by the above three numbers of variables and by the two smoothing range. An 18month averaging interval for the periods at the beginning and end of the series was tried and yield essentially the same results as given in Table 3. We find that both smoothing and reducing the number of fitting variables make more negative. As an additional indication of the effect of the trend on our fits, we evaluate the Pearson correlation coefficient for these cases both with and without the trend between the two solar minima. For the detrended cases, we adjust the observed values according to:
(11) 
where the notation indicates that the revised is the result of the addition left of the arrow. The improvement in the value of for the 3point smoothed case is a result of the reduction of noise in the MWO model due to observational factors. Our highest value of indicates a remarkable degree of success in the method’s ability to reproduce the observed variations in the total solar irradiance. The scatter diagram relationship between the model and the observations is shown in Figure 10.
These results strengthen our earlier conclusion that the longtime trend in the difference between the model and the observations is not a consequence of a change in the magnetic coverage of the solar surface. If this discrepancy were to be explained by changes in the solar radius and/or surface temperature it would require a decrease of about 80 milli arcsec and/or 0.3K over the spanned period of 11.25 years. The observed radius change is opposite to what might be required to explain the slope excess or too small to influence our result Kuhn et al. (2004); Djafer, Thuillier, and Sofia (2008a); Djafer et al. (2008b). The temperature change is difficult to measure directly and is closely related to the TSI data such as we are discussing here Fazel et al. (2005). Since our analysis already includes the effect of the changes in surface coverage by different magnetic classes, this additional change must come from the temperature of one of the quiet classes – what has been referred to by \inlinecite2003SoPh..212..227L as the immutable basal temperature. The rate of change in this immutable basal temperature required by the is small compared to the amplitudes these authors discussed.
7 Discussion
The problem of identification and measurement of different parts of the solar surface along with the determination of their relationship, if any, with variability of the solar energy output has long been recognized and studied, see for example the reviews by LeanLean (1988, 1997). The strength and geometry of magnetic fields is the underlying factor that distinguishes one portion of the solar surface from others. Work by \inlinecite1975ApJ…200..747S showed that there is a direct relationship between the magnetic field strength and the strength of emission in the radiation near the core of the CaII K spectral line. Spectroheliograms utilizing this spectral band have been used to identify facular and sunspot regions and estimate their impact on the TSI White and Livingston (1978); Skumanich et al. (1984); Fligge et al. (1998); Harvey and White (1999). The He I line at Å has also been used to estimate the impact of faculae and network on the TSI Foukal and Lean (1988). Although the strength of the magnetic field is assumed to be the underlying cause, these papers have all used the proxy of the CaII K emission or the HeI Å emission as an index to estimate the field strength. The study by \inlinecite2006A&A…460..583W has used magnetic fields from the NSO512 and the NSOSPM at the KPVT to model the TSI variations in solar cycles 21 to 23. The spatial resolution of these magnetograms is and the magnetic field measurements have a noise level of 7.5 G to 9 G. This study also identified spots using continuum images. For a period from 1992 to 2003 these authors obtained an optimized reconstruction of the TSI having . The identification of features from their geometry and contiguity on the solar surface is a common goal of all these investigations. While direct inspection is sometimes used, it is desireable to have a system that carries out this identification automatically so that potential investigator bias can be removed. Two studies have implemented such automatic pattern recognition and classification systems Turmon, Pap, and Mukhtar (2002); Jones et al. (2008). The generally recognized features are sunspots, faculae and the network. Of these the identification and measurement of the network has been most problematic.
Our approach differs from previous investigations in several ways:

The classification completely abandons all imagebased information. All pixels are treated equally.

We use a new type of data based on a line intensity ratio. This quantity is sensitive to atmospheric properties not previously included – the brightness in the upper photosphere/lower chromosphere.

Our magnetic field measurement has a noise level of about 0.2 G Ulrich et al. (2009) for the images used that are averages of 4 to 15 individual magnetograms.
The resulting classification has allowed us to reproduce the observed TSI with reaching 0.97 for the case where the Virgo observations have the longterm trend removed and some of the MWO noise is reduced by smoothing with a 3point wide Gaussian. This high coefficient indicates that we need to take the classification seriously even though there are a number of deficiencies that are evident from the comparison to the SBI image. The cancellation effect illustrated by the alternating sign of the coefficients for individual classes among the groups we have called P0 and P1 are indicative of an inadequacy of the classification – probably a consequence of our low spatial resolution. However, one strong aspect of our study is the low noise for the magnetic field strengths. This is partly a consequence of our large size pixels. If a similar level of noise could be achieved with pixels with subarcsecond size, the classification might be better. Another way the classification could be improved is through the addition of more measured quantities such as a true continuum intensity that would give better identification of sunspot areas. The MWO system does not provide such a quantity so its addition will require merging of an external data set wherein the temporal and spatial scales will need to be adjusted to match those of the MWO data set. Other observational systems that are internally consistent do not provide a quantity having the properties of the r ratio we have used. Part of the success of our analysis has come from the fact that this ratio is sensitive to the upper photosphere instead of the mid chromosphere. However, it is possible that other combinations of quantities could be used in the AutoClass system with success similar to ours.
8 Conclusions
This analysis holds out the possibility of creating an ongoing, accurate, independent estimate of TSI variations from the MWO ground based observations which could be used to fill in gaps in the satellite record. Comparison of the solar surface features seen in the TSI images created from the various TSI measurements may shed light on the sources of disagreement between various TSI measurements. Further, the spatial resolution of these images should assist in identifying with greater accuracy the particular solar surface regions associated with TSI variations and could help in addressing questions such as those raised by \inlinecite2008SoPh..248….1F.
Acknowledgements.
This research has been supported over the years by a number of grants from NASA, NSF and ONR. The most recent observations and analysis has been supported by NASA grant to Stanford 1616588026967G and NSF grant to UCLA NSF ATM0517729.References
 Banerjee, O., El Ghaoui, L., dAspremont, A.: 2008, J. Mach. Learn. Res. 9, 485.
 Cheeseman, P., Stutz, J.: 1996, In: Fayyad, U.M., PiatetskyShapiro, G., Smyth, P., Uthurusamny, R. (eds.) Advances in Knowledge Discovery and Data Mining, AAAI Edited Collection, 153.
 Crommelynck, D., Fichot, A., Lee, R.B. III, Romero, J.: 1995, Advances in Space Research 16, 17.
 Dewitte, S., Crommelynck, D., Mekaoui, S., Joukoff, A.: 2004, Solar Phys. 224, 209.
 Djafer, D., Thuillier, G., Sofia, S.: 2008a, Astrophys. J. 676, 651.
 Djafer, D., Thuillier, G., Sofia, S., Egidi, A.: 2008b, Solar Phys. 247, 225.
 Fazel, Z., Rozelot, J.P., Pireaux, S., Ajabshirizadeh, A., Lefebvre, S.: 2005, Mem. Soc. Astron. Atal. 76, 961.
 Fligge, M., Solanki, S.K., Unruh, Y.C., Froehlich, C., Wehrli, C.: 1998, Astron. Astrophys. 335, 709.
 Foukal, P., Bernasconi, P.N.: 2008, Solar Phys. 248, 1.
 Foukal, P., Lean, J.: 1988, Astrophys. J. 328, 347.
 Foukal, P., Bernasconi, P., Eaton, H., Rust, D.: 2004, Astrophys. J. Lett. 611, L57.
 Fröhlich, C.: 2000, Space Science Reviews 94, 15.
 Golan, A., Judge, G.G., Miller, D.: 1996, Maximum Entropy Econometrics: Robust Estimation with Limited Data, Wiley, Somerset, NJ.
 Harvey, K.L.: 1994, In: Pap, J.M., Frohlich, C., Hudson, H.S., Solanki, S.K. (eds.) The Sun as a Variable Star: Solar and Stellar Irradiance Variations, IAU Colloquium 143, Cambridge University Press, Cambridge, 217.
 Harvey, K.L., White, O.R.: 1999, Astrophys. J. 515, 812.
 Howard, R., Boyden, J.E., Bruning, D.H., Clark, M.K., Crist, H.W., Labonte, B.J.: 1983, Solar Phys. 87, 195.
 Jones, H.P., Chapman, G.A., Harvey, K.L., Pap, J.M., Preminger, D.G., Turmon, M.J., Walton, S.R.: 2008, Solar Phys. 248, 323.
 Kuhn, J.R., Bush, R.I., Emilio, M., Scherrer, P.H.: 2004, Astrophys. J. 613, 1241.
 Lean, J.: 1988, Advances in Space Research 8, 263.
 Lean, J.: 1997, Ann. Rev. Astron. & Astrophys. 35, 33.
 Livingston, W., Wallace, L.: 2003, Solar Phys. 212, 227.
 NASA/Ames Intelligent Systems Div.: 2002, http://ti.arc.nasa.gov/project/autoclass/.
 Pap, J.M., Turmon, M., Mukhtar, S., Bogart, R., Ulrich, R., Fröhlich, C., Wehrli, C.: 1997, In: Wilson, A. (ed.) Correlated Phenomena at the Sun, in the Heliosphere and in Geospace, ESA Special Publication 415, 477.
 Skumanich, A., Smythe, C., Frazier, E.N.: 1975, Astrophys. J. 200, 747.
 Skumanich, A., Lean, J.L., Livingston, W.C., White, O.R.: 1984, Astrophys. J. 282, 776.
 Turmon, M., Pap, J.M., Mukhtar, S.: 1998, In: Korzennik, S. (ed.) Structure and Dynamics of the Interior of the Sun and Sunlike Stars, ESA Special Publication 418, 979.
 Turmon, M., Pap, J.M., Mukhtar, S.: 2002, Astrophys. J. 568, 396.
 Ulrich, R.K., Webster, L., Boyden, J.E., Magnone, N., Bogart, R.S.: 1991, Solar Phys. 135, 211.
 Ulrich, R.K., Evans, S., Boyden, J.E., Webster, L.: 2002, Astrophys. J. Suppl. 139, 259.
 Ulrich, R.K., Bertello, L., Boyden, J.E., Webster, L.: 2009, Solar Phys. 255, 53.
 VIRGO Team: 2008, ftp://ftp.pmodwrc.ch/pub/virgo/data/IrradianceData/.
 Wenzler, T., Solanki, S.K., Krivova, N.A., Fröhlich, C.: 2006, Astron. Astrophys. 460, 583.
 White, O.R., Livingston, W.: 1978, Astrophys. J. 226, 679.