Adaptive spline fitting with particle swarm optimization
In fitting data with a spline, finding the optimal placement of knots can significantly improve the quality of the fit. However, the challenging high-dimensional and non-convex optimization problem associated with completely free knot placement has been a major roadblock in using this approach. We present a method that uses particle swarm optimization (PSO) combined with model selection to address this challenge. The problem of overfitting due to knot clustering that accompanies free knot placement is mitigated in this method by explicit regularization, resulting in a significantly improved performance on highly noisy data. The principal design choices available in the method are delineated and a statistically rigorous study of their effect on performance is carried out using simulated data and a wide variety of benchmark functions. Our results demonstrate that PSO-based free knot placement leads to a viable and flexible adaptive spline fitting approach that allows the fitting of both smooth and non-smooth functions.
A spline of order is a piecewise polynomial function that obeys continuity conditions on its value and its first derivatives at the points, called knots, where the pieces join deBoor (). Splines play an important role in nonparametric regression wegman1983splines (); splinesmoothing (), simply called curve fitting when the data is one dimensional, where the outcome is not assumed to have a predetermined form of functional dependence on the predictor.
It has long been recognized wold1974spline (); jupp1978approximation (); luo1997hybrid () that the quality of a spline fit depends significantly on the locations of the knots defining the spline. Determining the placement of knots that is best adapted to given data has proven to be a challenging non-linear and non-convex, not to mention high-dimensional, optimization problem that has resisted a satisfactory solution.
A diverse set of methods have been proposed that either attempt this optimization problem head-on or solve an approximation to it in order to get a reasonable solution. In the latter category, methods based on knot insertion and deletion smith1982curve (); friedman1989flexible (); friedman1991 () have been studied extensively. In these methods, one starts with a fixed set of sites for knots and performs a step-wise addition or removal of knots at these sites. The best number of knots is determined by a model selection criterion such as Generalized Cross Validation (GCV) golub1979generalized (); luo1997hybrid (). Step-wise change in knot placement is not an efficient exploration of the continuous space of possible knot positions and the end result, while computationally inexpensive to obtain and tractable to mathematical analysis, is not necessarily the best possible.
In attempts at solving the optimization challenge directly, general purpose stochastic optimization algorithms (metaheuristics) such as Genetic Algorithm (GA) geneticalgo () or those based on Markov Chain Monte Carlo (MCMC) green1995reversible (), have been studied pittman2002adaptive (); dimatteo2001bayesian (); YOSHIMOTO2003751 (); miyata2003adaptive (). These methods have proven quite successful in solving many challenging high-dimensional optimization problems in other fields and it is only natural to employ them for the problem of free knot placement. However, GA is more suited to discrete optimization problems rather than the inherently continuous one in knot optimization, and MCMC is computationally expensive. Thus, there is plenty of scope for using other metaheuristics to find better solutions.
It was shown in galvez2011efficient (), and independently in mohanty2012particle (), that Particle Swarm Optimization (PSO) PSO (), a relatively recent entrant to the field of nature inspired metaheuristics such as GA, is a promising method for the free knot placement problem. PSO is governed by a much smaller set of parameters than GA or MCMC and most of these do not appear to require much tuning from one problem to another. In fact, as discussed later in the paper, essentially two parameters are all that need to be explored to find a robust operating point for PSO.
An advantage of free knot placement is that a subset of knots can move close enough to be considered as a single knot with a higher multiplicity. A knot with multiplicity can be used to construct splines that can fit curves with discontinuities. Thus, allowing knots to move and merge opens up the possibility of modeling even non-smooth curves. That PSO can handle regression models requiring knot merging was demonstrated in galvez2011efficient () albeit for examples with very low noise level.
A question that has not been addressed adequately so far, either with addition-deletion schemes or with metaheuristics, is that of performance of adaptive spline fitting on data with a high noise level. There is a growing need for extending non-parametric regression to highly noisy data in many scientific fields. This is especially acute in the case of gravitational wave (GW) searches that have recently achieved success ligo2018gwtc () (and the 2017 Nobel prize in Physics). The most important target for GW searches, where the data is a noisy time series, are transient chirps: signals whose instantaneous frequency, , and amplitude, , evolve adiabatically on the timescale of the instantaneous period, . For GW sources having a sufficiently accurate theoretical model for predicting and , parametric regression methods such as matched filtering Helstrom () can be used and the associated data analysis problem is well under control (except for computational issues). However, if either or , or both, are not known a priori, non-parametric regression methods must be used and the high level of noise in GW data makes this an especially challenging task. In this context, the use of a spline-based non-parametric regression method (SEECR) was shown in mohanty2017spline () to be much more effective than searches based on time-frequency analysis. In SEECR, both and are modeled by splines, and knot optimization using PSO is used for fitting .
It was found in mohanty2017spline (), and later in a simplified model problem mohanty2018swarm (), that the advantage engendered by free knot placement turns into a liability as the level of noise increases: knots can form spurious clusters to fit outliers arising from noise, producing spikes in the resulting estimate and making it worse than useless. This problem was found to be mitigated mohanty2018swarm () by introducing a penalized spline regularization ruppert2003semiparametric () of the regression problem. Penalized spline regression has also been used in combination with knot addition luo1997hybrid () but its role there – suppression of numerical instability arising from a large numbers of knots – is very different.
The progress described above in adaptive spline fitting using free knot placement has happened over decades and in somewhat isolated steps that were often limited by the available computing power. However, the tremendous growth in computing power and the development of more powerful metaheuristics has finally brought us to the doorstep of a satisfactory resolution of this problem, at least for one-dimensional regression.
In this paper, we combine PSO based knot placement with penalized spline into a single algorithm for adaptive spline fitting. The algorithm, called Swarm Heuristics based Adaptive and Penalized Estimation of Splines (SHAPES), has the flexibility to fit non-smooth functions as well as smooth ones without any change in algorithm settings. It uses model selection to determine the best number of knots, and reduces estimation bias arising from the penalized spline regulator using a least squares derived rescaling. The elements of SHAPES outlined above match most closely the method that was used in mohanty2018swarm () for the analysis of a single example. However, the crucial step of bias reduction was missing there. (The bias reduction step does not seem to have been used elsewhere to the best of our knowledge.)
Various design choices involved in SHAPES are identified clearly and their effects are examined using large-scale simulations and a diverse set of benchmark functions. Most importantly, SHAPES is applied to data with much higher noise level than has traditionally been considered in the field of adaptive spline fitting and found to have promising performance. This sets the stage for further development of the adaptive spline methodology for new application domains.
The rest of the paper is organized as follows. Sec. II provides a brief review of pertinent topics in spline fitting. The PSO metaheuristic and the particular variant used in this paper are reviewed in Sec. III. Details of SHAPES are described in Sec. IV along with the principal design choices. The setup used for our simulations is described in Sec. V. Computational aspects of SHAPES are addressed in Sec. VI. This is followed by the presentation of results in Sec. VII. Our conclusions are summarized in Sec. VIII.
Ii Fitting splines to noisy data
In this paper, we consider the one-dimensional regression problem
, , , , with unknown and drawn independently from .
The ordinary least-squares solution for , , is not unique unless the problem is regularized by restricting the space of possible solutions. One reasonable approach is to enforce “smoothness” on the estimate of . This can be done by choosing to be the minimizer of the least-squares function but with a penalty on its average curvature: higher average curvature implying sharper turning points in and more “roughness”. Thus,
where , the regulator gain, determines the strength of the smoothness requirement. It can be shown that belongs to the family of cubic splines defined by as the set of knots. Consequently, is known as the smoothing spline estimate reinsch1967smoothing ().
It is empirically observed that the large number of knots – all the values in – in spline smoothing results in a high degree of freedom and overfitting. Restricting the number of knots to be is one way to control overfitting and this leads to the regression spline wold1974spline () estimate. Analysis of the asymptotic properties of smoothing and regression spline estimates also lends support to this modification claeskens2009asymptotic ().
ii.1 B-spline functions
For a given set of knots , , , and given order of the spline polynomials, the set of splines that interpolates , , forms a linear vector space of dimensionality . A convenient choice for a basis of this vector space is the set of B-spline functions curry1947spline ().
In this paper, we need B-spline functions for the more general case of a knot sequence , , in which a knot can appear more than once. The number of repetitions of any knot cannot be greater than . Also, for , and for . The span of B-spline functions defined over a knot sequence with repetitions can contain functions that have jump discontinuities in their values or in their derivatives. (The dimensionality of the span is .)
The Cox-de Boor recursion relations DEBOOR197250 () given below provide an efficient way to compute the set of B-spline functions, , for any given order. The recursions start with B-splines of order 1, which are piecewise constant functions
In the recursion above, . Fig. 1 provides an illustration of B-spline functions.
The regression spline method is elegantly formulated in terms of B-spline functions. The estimate is assumed to belong to the parametrized family of linearly combined B-spline functions,
where . The least-squares estimate is given by , where and minimize
ii.2 Regression and penalized spline with free knot placement
The regression problem is implicitly regularized by the restriction . Explicit regularization can be restored by minimizing
thereby enforcing more smoothness on the solution. When is minimized keeping fixed, the method is called penalized spline ruppert2003semiparametric ().
While the reduction of the number of knots in regression spline coupled with the explicit regularization of penalized spline reduces overfitting, the fit is now sensitized to where the knots are placed. Thus, the complete method involves the minimization of (c.f., Eq. 9) over both and . (For lack of a better term, the method of regression spline with knot optimization and explicit regularization will also be referred to as regression spline in the following.)
Minimization of over and can be nested as follows.
The solution, , of the inner minimization is expressed in terms of the -by- matrix , with
where is the -by- identity matrix. The outer minimization over of
needs to be performed numerically.
Due to the fact that freely moveable knots can coincide, and that this produces discontinuities in B-spline functions as outlined earlier, curve fitting by regression spline can accommodate a broader class of functions – smooth with localized discontinuities – than smoothing or penalized spline.
The main bottleneck in implementing the full regression spline method is the global minimization of since it is a high-dimensional non-convex function having multiple local minima. Trapping by local minima renders greedy methods ineffective and high dimensionality makes a brute force search for the global minimum computationally infeasible. This is where PSO enters the picture and, as shown later, offers a way forward.
ii.3 Model selection
In addition to the parameters and , regression spline with explicit regularization has two hyper-parameters, namely the regulator gain and the number of interior knots , that affect the outcome of fitting. Model selection methods can be employed to adaptively fix these hyper-parameters based on the data.
In this paper, we restrict ourselves to the adaptive selection of only the number of knots. This is done by minimizing the Akaike Information Criterion (AIC) akaike1998information (): For a regression model with parameters ,
where is the likelihood function. The specific expression for AIC used in SHAPES is provided in Sec. IV.
Iii Particle swarm optimization
Under the PSO metaheuristic, the function to be optimized (called the fitness function) is sampled at a fixed number of locations (called particles). The set of particles is called a swarm. The particles move in the search space following stochastic iterative rules called dynamical equations. The dynamical equations implement two essential features called cognitive and social forces. They serve to retain “memories” of the best locations found by the particle and the swarm (or a subset thereof) respectively.
Since its introduction by Kennedy and Eberhart PSO (), the PSO metaheuristic has expanded to include a large diversity of algorithms engelbrecht2005fundamentals (). In this paper, we consider the variant called local-best (or lbest) PSO lbestTopology (). We begin with the notation normandin2018particle () for describing lbest PSO.
: the scalar fitness function to be minimized, with . In our case, is , is (c.f., Eq. 14), and .
: the search space defined by the hypercube , in which the global minimum of the fitness function must be found.
: the number of particles in the swarm.
: the position of the particle at the iteration.
: a vector called the velocity of the particle that is used for updating the position of a particle.
: the best location found by the particle over all iterations up to and including the . is called the personal best position of the particle.
: a set of particles, called the neighborhood of particle , . There are many possibilities, called topologies, for the choice of . In the simplest, called the global best topology, every particle is the neighbor of every other particle: . The topology used for lbest PSO in this paper is described later.
: the best location among the particles in over all iterations up to and including the . is called the local best for the particle.
: The best location among all the particles in the swarm, is called the global best.
The dynamical equations for lbest PSO are as follows.
Here, is a deterministic function known as the inertia weight, and are constants, and is a diagonal matrix with iid random variables having a uniform distribution over . Limiting the velocity as shown in Eq. 21 is called velocity clamping.
The iterations are initialized at by independently drawing (i) from a uniform distribution over , and (ii) from a uniform distribution over . For termination of the iterations, we use the simplest condition: terminate when a prescribed number of iterations are completed. The solutions found by PSO for the minimizer and the minimum value of the fitness are and respectively. Other, more sophisticated, termination conditions are available engelbrecht2005fundamentals (), but the simplest one has served well across a variety of regression problems in our experience.
The second and third terms on the RHS of Eq. 19 are the cognitive and social forces respectively. On the average they attract a particle towards its personal and local bests, promoting the exploitation of an already good solution to find better ones nearby. The term containing the inertia weight, on the other hand, promotes motion along the same direction and allows a particle to resist the cognitive and social forces. Taken together, the terms control the exploratory and exploitative behaviour of the algorithm. We allow the inertia weight to decrease linearly with from an initial value to a final value in order to transition PSO from an initial exploratory to a final exploitative phase.
For the topology, we use the ring topology with neighbors in which
The local best, , in the iteration is updated after evaluating the fitnesses of all the particles. The velocity and position updates given by Eq. 19 and Eq. 20 respectively form the last set of operations in the iteration.
To handle particles that exit the search space, we use the “let them fly” boundary condition under which a particle outside the search space is assigned a fitness value of . Since both and are always within the search space, such a particle is eventually pulled back into the search space by the cognitive and social forces.
iii.1 PSO tuning
Stochastic global optimizers, including PSO, that terminate in a finite number of iterations do not satisfy the conditions laid out in solis1981minimization () for convergence to the global optimum. Only the probability of convergence can be improved by tuning the parameters of the algorithm for a given optimization problem.
In this sense, most of the parameters involved in PSO are found to have fairly robust values when tested across an extensive suite of benchmark fitness functions bratton2007defining (). Based on widely prevalent values in the literature, these are: , , , , and .
Typically, this leaves the maximum number of iterations, , as the principal parameter that needs to be tuned. However, for a given , the probability of convergence can be increased by the simple strategy of running multiple, independently initialized runs of PSO on the same fitness function and choosing the best fitness value found across the runs. The probability of missing the global optimum decreases exponentially as , where is the probability of successful convergence in any one run and is the number of independent runs.
Besides , therefore, is the remaining parameter that should be tuned. If the independent runs can be parallelized, is essentially fixed by the available number of parallel workers although this should not be stretched to the extreme. If too high a value of is needed in an application (say ), it is usually an indicator that should be increased by tuning the other PSO parameters or by exploring a different PSO variant. In this paper, we follow the simpler way of tuning by setting it to , the typical number of processing cores available in a high-end desktop.
Iv SHAPES algorithm
The SHAPES algorithm is summarized in the pseudo-code given in Fig. 2. The user specified parameters of the algorithm are (i) the number, , of PSO to use per data realization; (ii) the number of iterations, , to termination of PSO; (iii) the set of models, , over which AIC based model selection (see below) is used; (iv) the regulator gain .
A model in SHAPES is specified by the number of non-repeating knots. For each model , denotes the fitness value, where is the best PSO run. The AIC value for the model is given by
which follows from the number of optimized parameters being (accounting for both knots and B-spline coefficients) and the log-likelihood being proportional to the least squares function for the noise model used here. (Additive constants that do not affect the minimization of AIC have been dropped.)
The algorithm acts on given data to produce (i) the best fit model ; (ii) the fitness value associated with the best fit model; (iii) the estimated function from the best fit model. The generation of includes a bias correction step described next.
iv.1 Bias correction
The use of a non-zero regulator gain leads to shrinkage in the estimated B-spline coefficients. As a result, the corresponding estimate, , has a systematic point-wise bias towards zero. A bias correction transformation is applied to as follows.
First, the unit norm estimated function is obtained,
where is the norm.
Next, a scaling factor is estimated as
The final estimate is given by .
iv.2 Knot merging and dispersion
In both of the mappings described in Sec. IV.3, it is possible to get knot sequences in which a subset of of interior knots falls within an interval between two consecutive predictor values. There are two possible options to handle such a situation.
Heal: Overcrowded knots are dispersed such that there is only one knot between any two consecutive predictor values. This can be done iteratively by moving a knot to the right or left depending on the difference in distance to the corresponding neighbors.
Merge: All the knots in an overcrowded set are made equal to the rightmost knot until its multiplicity saturates at . The remaining knots, to , are equalized to the remaining rightmost knot until its multiplicity staturates to , and so on. (Replacing rightmost by leftmost when merging is an equally valid alternative.) Finally, if more than one set of merged knots remain within an interval , they are dispersed by healing.
If only healing is used, SHAPES cannot fit curves that have jump discontinuities in value or derivatives. Therefore, if it is known that the unknown curve in the data is free of jump discontinuities, healing acts as an implicit regularization to enforce this condition. Conversely, merging should be used when jump discontinuities cannot be discounted.
It is important to note that in both healing and merging, the number of knots stays fixed at where .
iv.3 Mapping particle location to knots
For a given model , the search space for PSO is dimensional. Every particle location, , in this space has to be mapped to an element knot sequence before evaluating its fitness .
We consider two alternatives for the map from to .
Plain: is sorted in ascending order. After sorting, copies of and are prepended and appended respectively to . These are the repeated end knots as described in Sec. II.1.
Centered-monotonic: In this scheme calvin-siuro (), the search space is the unit hypercube: , . First, an initial set of knots is obtained from
(26) (27) (28)
This is followed by prepending and appending copies of and respectively to the initial knot sequence.
In the plain map, any permutation of maps into the same knot sequence due to sorting. This creates degeneracy in , which may be expected to make the task of global minimization harder for PSO. The centered-monotonic map is designed to overcome this problem: by construction, it assigns a unique to a given . Moreover, is always a monotonic sequence, removing the need for a sorting operation. This map also has the nice normalization that the center of the search space at , , corresponds to uniform spacing of the interior knots.
It should be noted here that the above two maps are not the only possible ones. The importance of the “lethargy theorem” (degeneracy of the fitness function) and using a good parametrization for the knots in regression spline was pointed out by Jupp jupp1978approximation () back in . A logarithmic map for knots was proposed in jupp1978approximation () that, while not implemented in this paper, should be examined in future work.
iv.4 Optimization of end knots
When fitting curves to noisy one-dimensional data in a signal processing context, a common situation is that the signal is transient and localized well away from the end points and of the predictor. However, the location of the signal in the data – its time of arrival in other words – may be unknown. In such a case, it makes sense to keep the end knots free and subject to optimization.
On the other hand, if it is known that the curve occupies the entire predictor range, it is best to fix the end knots by keeping and fixed. (This reduces the dimensionality of the search space for PSO by .)
iv.5 Retention of end B-splines
The same signal processing scenario considered above suggests that, for signals that decay smoothly to zero at their start and end, it is best to drop the end B-spline functions because they have a jump discontinuity in value (c.f., Fig 1). In the contrary case, the end B-splines may be retained so that the estimated signal can start or end at non-zero values.
V Simulation study setup
We examine the performance of SHAPES on simulated data with a wide range of benchmark functions. In this section, we present these functions, the simulation protocol used, the metrics for quantifying performance, and a scheme for labeling test cases that is used in Sec. VII.
v.1 Benchmark functions
Function has a sharp change but is differentiable everywhere. Functions and have jump discontinuities, and has a jump discontinuity in its slope. Functions and are smooth but sharply peaked. Functions to all decay to zero at both ends and serve to model smooth but transient signals; to are designed to require progressively higher number of knots for fitting; is an oscillatory signal that is typical for signal processing applications and expected to require the highest number of knots. In addition, and test the ability of SHAPES to localize time of arrival.
v.2 Data simulation
Following the regression model in Eq. 1, a simulated data realization consists of pseudorandom iid noise drawn from added to a given benchmark function that is sampled uniformly at 256 points in .
We consider the performance of SHAPES across a range of signal to noise ratio () defined as,
where is a benchmark function and is the standard deviation – set to unity in this paper – of the noise. For each combination of benchmark function and , SHAPES is applied to independent data realizations.
The principal performance metric used in this paper is the sample root mean squared error (RMSE):
where is the true function in the data and its estimate from the data realization. We use bootstrap with independently drawn samples with replacement from the set to obtain the sampling error in RMSE.
A secondary metric that is useful is the sample mean of the number of knots in the best fit model. To recall, this is the average of over the data realizations, where and were defined in Sec. IV. The error in is estimated by its sample standard deviation.
v.4 Labeling scheme
Several design choices in SHAPES were described in Sec. IV. A useful bookkeeping device for keeping track of the many possible combinations of these choices is the labeling scheme presented in Table 2.
|PSO algorithm (Sec. III)||: lbest PSO|
|Knot Map (Sec. IV.3)||:||:|
|(Number of PSO iterations)||(Numerical)|
|End knots (Sec. IV.4)||: Fixed||: Variable|
|End B-splines (Sec. IV.5)||: Keep||: Drop|
|Knot merging (Sec. IV.2)||: Merge||: Heal|
Following this labeling scheme, a string such as refers to the combination: lbest PSO; plain map from PSO search space to knots; for the true function in the data; regulator gain ; maximum number of PSO iterations set to ; end knots fixed; end B-splines retained; merging of knots allowed.
Vi Computational considerations
The results in this paper were obtained with a code implemented entirely in Matlab matlab (). Some salient points about the code are described below.
The evaluation of B-splines uses the efficient algorithm given in deBoor (). Since our current B-spline code is not vectorized, it suffers a performance penalty in Matlab. (We estimate that it is slower as a result.) Nonetheless, the code is reasonably fast: A single PSO run on a single data realization, for the more expensive case of , takes about sec on an Intel Xeon (3.0 GHz) class processor. It is important to note that the run-time above is specific to the set, , of models used. In addition, due to the fact that the number of particles breaching the search space boundary in a given PSO iteration is a random variable and that the fitness of such a particle is not computed, the actual run times vary slightly for different PSO runs and data realizations.
The only parallelization used in the current code is over the independent PSO runs. Profiling shows that of the run-time in a single PSO run is consumed by the evaluation of particle fitnesses, out of which is spent in evaluating B-splines. Further substantial saving in run-time is, therefore, possible if particle fitness evaluations are also parallelized. This dual parallelization is currently not possible in the Matlab code but, given that we use particles, parallelizing all fitness evaluations can be expected to reduce the run-time by about an order of magnitude. However, realizing such a large number of parallel processes needs hardware acceleration using, for example, Graphics Processing Units.
The operations count in the most time-consuming parts of the code (e.g., evaluating B-splines) scales linearly with the length of the data. Hence, the projected ratios above in run-time speed-up are not expected to change much with data length although the overall run-time will grow linearly.
The pseudorandom number streams used for the simulated noise realizations and in the PSO dynamical equations utilized built-in and well-tested default generators. The PSO runs were assigned independent pesudorandom streams that were initialized, at the start of processing any data realization, with the respective run number as the seed. This (a) allows complete reproducibility of results for a given data realization, and (b) does not breach the cycle lengths of the pseudorandom number generators when processing a large number of data realizations.
The presentation of results is organized as follows. Sec. VII.1 shows single data realizations and estimates for a subset of the benchmark functions. Sec. VII.2 analyzes the impact of the regulator gain on estimation. Sec. VII.3 and Sec. VII.4 contain results for and respectively. Relative to the SNRs used commonly in the literature on adaptive spline fitting, these values of SNR can be ranked respectively as being moderate to low.
In all cases, the set of models used was . The spacing between the models is set wider for higher knot numbers in order to reduce the computational burden involved in processing a large number of data realizations. In an application involving just a few realizations, a denser spacing may be used.
Fig. 4 shows the performance of lbest PSO across the set of benchmark functions as a function of the parameter . Given that the fitness values do not change in a statistically significant way when going from to in the SNR=100 case, we set it to the former as it saves computational cost. A similar plot of fitness values (not shown) for is used to set for the case.
vii.1 Sample estimates
In Fig. 5, we show function estimates obtained with SHAPES for arbitrary single data realizations. While not statistically rigorous, this allows an initial assessment of performance when the is sufficiently high.
For ease of comparison, we have picked only the benchmark functions ( to ) used in galvez2011efficient (). The SNR of each function matches the value one would obtain using the noise standard deviation tabulated in galvez2011efficient (). Finally, the algorithm settings were brought as close as possible by (a) setting the regulator gain , (b) using the plain map (c.f., Sec. IV.3), (c) keeping the end knots fixed, and (d) allowing knots to merge. Differences remain in the PSO variant (and associated parameters) used and, possibly, the criterion used for merging knots.
We find that SHAPES has excellent performance at high values: without any change in settings, it can fit benchmark functions ranging from spatially inhomogenous but smooth to ones that have discontinuities.
vii.2 Regulator gain
While the aim of restricting the number of knots in regression spline is to promote a smoother estimate, it is an implicit regularization that does not guarantee smoothness. In the absence of an explicit regularization, a fitting method based on free knot placement will exploit this loophole to form spurious clusters of knots that fit outliers arising from noise and overfit the data. This issue becomes increasingly important as the level of noise in the data increases.
Fig. 6 illustrates how adding the penalized spline regulator helps mitigate this problem of knot clustering. Shown in the figure is one data realization and the corresponding estimates obtained with high and low values of the regulator gain . For the latter, sharp spikes appear in the estimate where the function value is not high but the noise values are. The method tries to fit out these values by putting more knots in the model and clustering them to form the spikes. Since knot clustering also needs large B-spline coefficients in order to build a spike, a larger penalty on the coefficients suppresses spurious spikes.
Fig. 7 and Fig. 8 present a statistically more rigorous study of the effect of by examining the RMSE attained across the whole set of benchmark functions at different values. In both figures, the RMSE is shown for identical algorithm settings except for , and in both we observe that increasing the regulator gain improves the RMSE. (The lone case where this is not true is addressed in more detail in Sec. VII.4.) The improvement becomes more pronounced as is lowered. (There is no statistically significant effect of on the number of knots in the best fit model at either .)
The higher values of the regulator gains in Fig. 7 and Fig. 8 – and for and respectively – were chosen according to the . These pairings were chosen empirically keeping in mind that there is an optimum regulator gain for a given noise level. Too high a gain becomes counterproductive as it simply shrinks the estimate towards zero. Too low a value, as we have seen, brings forth the issue of knot clustering and spike formation. Since the latter is a more serious issue for a higher noise level, the optimum regulator gain shifts towards a correspondingly higher value.
vii.3 Results for
We have already selected some of the algorithm settings in the preceding sections, namely, the number of iterations to use and the regulator gain for a given . Before proceeding further, we need to decide on the remaining ones.
For the case, it is clear that the end knots and end B-splines must be retained because benchmark functions to do not all decay to zero and the noise level is not high enough to mask this behavior. Similarly, knot merging is an obvious choice because discontinuities in some of the benchmark functions are obvious at this and they cannot be modeled under the alternative option of healing. The remaining choice is between the two knot maps: plain or centered-monotonic.
As shown in Fig. 9, the RMSE is distinctly worsened by the centered-monotonic map across all the benchmark functions. This map also leads to a higher number of knots in the best fit model although the difference is not as significant statistically. Thus, the clear winner here is the map in which knots are merged.
With all the design choices fixed, the performance of SHAPES can be examined. This is done in Fig. 10 and Fig. 11 where the point-wise sample mean and deviation, being the sample standard deviation, are shown for all the benchmark functions. Note that the level of noise now is much higher than the examples studied in Sec. VII.1.
It is evident from these figures that SHAPES is able to resolve different types of discontinuities as well as the locations of features such as peaks and sharp changes. In interpreting the error envelope, it should be noted that the errors at different points are strongly correlated, a fact not reflected in the point-wise standard deviation. Thus, a typical single estimate is not an irregular curve bounded by the error envelopes, as would be the case for statistically independent point-wise errors, but a smooth function. Nonetheless, the error envelopes serve to indicate the extent to which an estimate can deviate from the true function.
vii.4 Results for
Here, we examine the case of high noise level at . Fig. 12 and Fig. 13 show the point-wise sample mean and deviation, being the sample standard deviation, for all the benchmark functions. The algorithm settings used are the same as in Sec. VII.3 for the case except for the regulator gain and the number of PSO iterations: and respectively.
Unlike the case, the high noise level masks many of the features of the functions. For example, the discontinuities and the non-zero end values for to are washed out. Thus, the algorithm settings to use are not at all as clear cut as before. In fact, the results presented next show that alternative settings can show substantial improvements in some cases.
First, as shown in Fig. 14, the estimation of actually improves significantly when the regulator gain is turned down to . While this is the lone outlier in the general trend between regulator gain and RMSE (c.f., Fig. 8), it points to the importance of choosing the regulator gain adaptively rather than empirically as done in this paper.
Next, Fig. 15 examines the effect of the knot map and its interplay with fixing the end knots or allowing them to vary. Allowing the end knots to vary under either knot map leads to a worse RMSE but the number of knots required in the best fit model is reduced, significantly so for to . A plausible explanation for this is that the high noise level masks the behavior of the functions at their end points, and freeing up the end knots allows SHAPES to ignore those regions and focus more on the ones where the function value is higher relative to noise.
Under a given end knot condition, the centered-monotonic map always performs worse in Fig. 15 albeit the difference is statistically significant for only a small subset of the benchmark functions. Remarkably, this behavior is reversed for some of the benchmark functions when additional changes are made to the design choices. Fig. 16 shows the RMSE when the centered-monotonic map and variable end knots are coupled with the dropping of end B-splines and healing of knots. Now, the performance is better for functions to relative to the best algorithm settings found from Fig. 15: not only is there a statistically significant improvement in the RMSE for these functions but this is achieved with a substantially smaller number of knots. This improvement comes at the cost, however, of significantly worsening the RMSE for the remaining benchmark functions.
Our results show that the challenge of free knot placement in adaptive spline fitting is solvable. The most important element of the solution is the use of an effective metaheuristic for knot optimization. We have shown that lbest PSO is effective in this task. Considering the benchmark function for example, the best model found by SHAPES reaches the vicinity of the highest number () of non-repeating knots considered in this paper. The good quality of the fit obtained for shows that PSO was able to handle this high-dimensional optimization well.
We applied SHAPES to both moderate and low SNR cases. In the former, discontinuities in function values or derivatives were well localized in all the cases. At the same time, the smooth parts of the benchmark functions were also well estimated. The moderate SNR () used in this paper is much lower than the typical examples encountered in the literature. This in itself, therefore, is a major advance in pushing non-parametric regression to higher noise levels.
The estimates from SHAPES for low SNR () had, naturally, more error. In particular, the noise level in all the cases examined was high enough to completely mask the location of discontinuities and they were not well localized. Nonetheless, even with a conservative error envelope of around the mean estimated signal, the overall shape of the true function is visually clear in all the examples. This shows that the estimated functions are responding at a statistically significant level to the presence of the true function in the data. Note that, being a non-parametric method, SHAPES can handle functions with qualitatively disparate behaviors – from a simple change between two levels to oscillatory – without requiring any special tuning.
Further characterization of the performance of SHAPES for the low case requires incorporating a hypothesis test, a topic that we are currently investigating. For the case of transient signals represented by to , the fitness value returned by SHAPES may itself serve as a powerful detection statistic since the estimated functions depart quite significantly from the expected behavior under the null hypothesis. (Hypotheses testing for was explored in this manner in mohanty2018swarm () with a method that did not include bias reduction.) However, this expectation may need to be modified for the other benchmark functions, in particular for . Another issue that comes up in a hypothesis test is that of bias correction: it must be used only if the detection threshold of the test is crossed, otherwise estimates under both the null and alternative hypotheses will get amplified. This creates an interesting interplay between hypothesis testing and estimation that needs a careful scrutiny.
The dependence of design choices on , as elucidated in this paper, does not seem to have been fully appreciated in the literature on adaptive spline fitting, probably because the typical scenario considered is that of high . While performance of SHAPES for is found to be fairly robust to the design choices made, they have a non-negligible affect at . The nature of the true function also influences the appropriate algorithm settings for the latter case. Fortunately, the settings were found to depend on only some coarse features of a function, such as its behavior at data boundaries ( to ), whether it is transient ( to ), or whether it is oscillatory (. Such features are often well-known in a real-world application domain: it is unusual to deal with signals that have discontinuities as well as signals that are smooth and transient in the same application. Hence, in most such cases, it should be straightforward to pick the best settings for SHAPES.
The inclusion of a penalized spline regulator was critical in SHAPES for mitigating the problem of knot clustering. For all except one () benchmark functions considered here, the regulator gain was determined empirically by simply examining a few realizations at each with different values of the regulator gain . Ideally, however, should be determined adaptively from given data using a method such as GCV. The case of at provides a particularly good test bed in this regard: while worked well for the other benchmark functions at , the RMSE for improved significantly when the gain was lowered to . Thus, any method for determining adaptively must be able to handle this extreme variation. We leave the additional refinement of using an adaptive regulator gain in SHAPES to future work.
The extension of SHAPES to multi-dimensional splines and longer data lengths is the next logical step in its development. It is likely that extending SHAPES to these higher complexity problems will require different PSO variants than the one used here.
The contribution of S.D.M. to this paper was supported by National Science Foundation (NSF) grant PHY-1505861. The contribution of E.F. to this paper was supported by NSF grant PHY-1757830. We acknowledge the Texas Advanced Computing Center (TACC) at The University of Texas at Austin (www.tacc.utexas.edu) for providing HPC resources that have contributed to the research results reported within this paper.
- (1) C. de Boor, A Practical Guide to Splines (Applied Mathematical Sciences), Springer, 2001.
- (2) E. J. Wegman, I. W. Wright, Splines in statistics, Journal of the American Statistical Association 78 (382) (1983) 351–365.
- (3) W. Hardle, Applied nonparametric regression, Econometric Society Monographs 19, cambridge University Press.
- (4) S. Wold, Spline functions in data analysis, Technometrics 16 (1) (1974) 1–11.
- (5) D. L. Jupp, Approximation to data by splines with free knots, SIAM Journal on Numerical Analysis 15 (2) (1978) 328–343.
- (6) Z. Luo, G. Wahba, Hybrid adaptive splines, Journal of the American Statistical Association 92 (437) (1997) 107–116.
- (7) P. L. Smith, Curve fitting and modeling with splines using statistical variable selection techniques, Tech. rep., NASA, Langley Research Center, Hampton, VA, report NASA,166034 (1982).
- (8) J. H. Friedman, B. W. Silverman, Flexible parsimonious smoothing and additive modeling, Technometrics 31 (1) (1989) 3–21.
- (9) J. H. Friedman, Multivariate adaptive regression splines, The Annals of Statistics 19 (1) (1991) 1–67.
- (10) G. H. Golub, M. Heath, G. Wahba, Generalized cross-validation as a method for choosing a good ridge parameter, Technometrics 21 (2) (1979) 215–223.
- (11) M. Mitchell, An Introduction to Genetic Algorithms by Melanie Mitchell, Bradford, 1998.
- (12) P. J. Green, Reversible jump markov chain monte carlo computation and bayesian model determination, Biometrika 82 (4) (1995) 711–732.
- (13) J. Pittman, Adaptive splines and genetic algorithms, Journal of Computational and Graphical Statistics 11 (3) (2002) 615–638.
- (14) I. DiMatteo, C. R. Genovese, R. E. Kass, Bayesian curve-fitting with free-knot splines, Biometrika 88 (4) (2001) 1055–1071.
- (15) F. Yoshimoto, T. Harada, Y. Yoshimoto, Data fitting with a spline using a real-coded genetic algorithm, Computer-Aided Design 35 (8) (2003) 751 – 760.
- (16) S. Miyata, X. Shen, Adaptive free-knot splines, Journal of Computational and Graphical Statistics 12 (1) (2003) 197–213.
- (17) A. Gálvez, A. Iglesias, Efficient particle swarm optimization approach for data fitting with free knot b-splines, Computer-Aided Design 43 (12) (2011) 1683–1692.
- (18) S. D. Mohanty, Particle swarm optimization and regression analysis I, Astronomical Review 7 (2) (2012) 29–35.
- (19) J. Kennedy, R. C. Eberhart, Particle swarm optimization, in: Proceedings of the IEEE International Conference on Neural Networks: Perth, WA, Australia, Vol. 4, IEEE, 1995, p. 1942.
- (20) L. S. Collaboration, V. Collaboration, et al., Gwtc-1: a gravitational-wave transient catalog of compact binary mergers observed by ligo and virgo during the first and second observing runs, arXiv preprint arXiv:1811.12907.
- (21) C. W. Helstrom, Statistical Theory of Signal Detection, Pergamon, London, 1968.
- (22) S. D. Mohanty, Spline based search method for unmodeled transient gravitational wave chirps, Physical Review D 96 (10) (2017) 102008.
- (23) S. D. Mohanty, Swarm Intelligence Methods for Statistical Regression, Chapman and Hall/CRC, 2018.
- (24) D. Ruppert, M. P. Wand, R. J. Carroll, Semiparametric regression, Vol. 12, Cambridge University Press, 2003.
- (25) C. H. Reinsch, Smoothing by spline functions, Numerische mathematik 10 (3) (1967) 177–183.
- (26) G. Claeskens, T. Krivobokova, J. D. Opsomer, Asymptotic properties of penalized spline estimators, Biometrika 96 (3) (2009) 529–544.
- (27) H. B. Curry, I. J. Schoenberg, On spline distributions and their limits-the polya distribution functions, Bulletin of the American Mathematical Society 53 (11) (1947) 1114–1114.
- (28) C. de Boor, On calculating with b-splines, Journal of Approximation Theory 6 (1) (1972) 50 – 62.
- (29) H. Akaike, Information theory and an extension of the maximum likelihood principle, in: Selected Papers of Hirotugu Akaike, Springer, 1998, pp. 199–213.
- (30) A. P. Engelbrecht, Fundamentals of computational swarm intelligence, Vol. 1, Wiley Chichester, 2005.
- (31) J. Kennedy, Small worlds and mega-minds: effects of neighborhood topology on particle swarm performance, in: Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406), Vol. 3, IEEE, 1999, pp. 1931–1938.
- (32) M. E. Normandin, S. D. Mohanty, T. S. Weerathunga, Particle swarm optimization based search for gravitational waves from compact binary coalescences: Performance improvements, Physical Review D 98 (4) (2018) 044029.
- (33) F. J. Solis, R. J.-B. Wets, Minimization by random search techniques, Mathematics of Operations Research 6 (1) (1981) 19–30.
- (34) D. Bratton, J. Kennedy, Defining a standard for particle swarm optimization, in: Swarm Intelligence Symposium, 2007. SIS 2007. IEEE, IEEE, 2007, pp. 120–127.
- (35) C. Leung, Estimation of unmodeled gravitational wave transients with spline regression and particle swarm optimization, SIAM Undergraduate Research Online (SIURO) 8.
- (36) D. Denison, B. Mallick, A. Smith, Automatic bayesian curve fitting, Journal of the Royal Statistical Society: Series B (Statistical Methodology) 60 (2) (1998) 333–350.
- (37) M. Li, Y. Yan, Bayesian adaptive penalized splines, Journal of Academy of Business and Economics 2 (2006) 129–141.
- (38) T. Lee, On algorithms for ordinary least squares regression spline fitting: a comparative study, Journal of statistical computation and simulation 72 (8) (2002) 647–663.
- (39) Matlab Release2018b, The MathWorks, Inc., Natick, Massachusetts, United States.; http://www.mathworks.com/.