Reduced variable optimization methods via implicit functional dependence with applications
Optimization methods have been broadly applied to two classes of objects viz. (i) modeling and description of data and (ii) the determination of the stationary points of functions. The overwhelming majority of these methods treat explicitly the variables relevant to the function optimization, where these parameters are independently varied with a weightage factor. Here, a theoretical basis is developed which suggests algorithms that optimizes an arbitrary number of variables for classes (i) and (ii) by the minimization of a function of a single variable deemed the most significant on physical or experimental grounds form which all other variables can be derived. Often, one or more key variables play a physically more significant role than others even if all the variables are equally weighted so that these key optimized variables serve as criteria whether or not reasonable solutions have been obtained, independently of the other variables. Algorithms that focus on a reduced variable set also avoid problems associated with multiple minima and maxima that arise because of the large numbers of parameters, and could increase the accuracy of the determination by cutting down on machine errors. The methods described could have potentially significant applications in the physical sciences where the optimization of one physically significant variable has priority over the other variables as in the energy-position optimization schemes of computational quantum mechanics applied to structure determination, and in chemical kinetics trajectory calculations where the trajectory variable is linked to many other variables of secondary importance. For (i), we develop both an approximate but computationally more tractable method and an exact method where the single controlling variable of all the other variables passes through the local stationary point of the least squares (LS) metric. For (ii), an exact theory is developed whereby the optimized function of an independent variation of all parameters coincides with that due to single parameter optimization. The implicit function theorem has to be further qualified to arrive at this result. The topology of the surfaces of constant value of the target or cost function are considered for all the methods. A real world application of the above implicit methodology to rate constant and final concentration parameter determination for first and second order chemical reactions from published data is attempted to illustrate its utility. This work is different from and more general than all the reduction schemes for conditional linear parameters used for example in extracting data from mixed signal spectra of physical quantities such as found in laser spectroscopy since it is valid for conditional and nonconditional nonlinear parameters. Nor is it a subset of the Adomian decomposition method (ADM) used for estimating solutions of differential equations, which still require boundary conditions that do not feature in topics (i) and (ii).
keywords:Parameter fitting, Chemical rate determination, Single variable optimization, Generalization of conditional linear parameter optimization
Msc:41-04, 41A28, 41A58, 41A63, 41A65
MSC code 41-04 41A28 41A58 41A63 41A65
PACS code 05.10.-a 07.05.Bx 02.60.Pn 02.60.Cb 82.20.Nk
The following theory and elaboration revolve about properties of constrained and unconstrained functions that are continuous and differentiable to various specified degreescrav1 (); depree1 (), and the existence of implicit functions apos1 () and the form of the function to be optimized. The implicit function theorem is applied in a way that requires further qualified because the optimization problem is of an unconstrained kind without any redundant variables. Methods (i)a,b (Secs.(2 ) and (3) respectively) refers to modeling of data (nrc, , Chap.15,p.773-806) where the form of the function with independently varying variables have the general form
where and are datapoints and a known function, and optimizations of may be termed a least squares (LS) fit over parameters which are independently optimized. Method (ii) focuses on optimizing a general function, not necessarily LS in form. There are many standard and hybrid methods to deal with such optimization (nrc, , Ch.10), such as golden section searches in 1-D, simplex methods over multidimensions (nrc, , p.499-525), steepest descent and conjugate methods jan1 () and variable metric methods in multidimensions (nrc, , p.521-525) . Hybrid methods include multidimensional (DFP) secant methods dvd1 (), BFGS (secant optimization) broy1 () and RFO (rational function optimization) ban1 () which is a Newton-Raphson technique utilizing a rational function rather than a quadratic model for the function close to the solution point. Global deterministic optimization schemes combine several of the above approaches (wales1, , sec 6.7.6) Other ad hoc, physical methods perhaps less easy to justify analytically include probabilistic "basin-hopping" algorithms (wales1, , sec 6.7.4), simulated annealing methods kirk1 () and genetic algorithms (wales1, , p.346). An analytical justification on the other hand is attempted here, but in real-world problems some of the assumptions (e.g. continuity, compactness of spaces) may not always obtain. For what follows, the distance metric used are all Euclidean, represented by or where represents the determinant of the matrix . Reduction of the number of variables to be optimized is possible in the standard matrix regression model only if conditional linear parameters exists batwat1 (), where these variables do not appear in the final expression of the least squares function (2) to be optimized, whereas the nonconditional linear parameters do and are a subset of the variables; for the existence of each conditional linear parameter, there is a unit reduction in the number of independent parameters to be optimized. These reductions in variable number occurs for any "expectation function" which is the model or law for which a fitting is required, where there are different datapoints that must be used to determine the parameter variables (batwat1, , p.32,Ch.2). A conditionally linear parameter exists if and only if the derivative of the expectation function with respect to does not involve - in other words is independent of - . Clearly such a condition may severely limit the number of parameters that can be neglected for the expectation function variables when the prescribed matrix regressional techniques are employed (batwat1, , Sec.3.5.5, p.85) where the residual sum of squares is minimized:
The -vectors in dimensional space defines the expectation surface. If the variables are partitioned into the conditional linear parameters and the other nonlinear parameters , then the response can be written . Golub and Pereyra Gol1 () used standard Gauss-Newton algorithm to minimise that depended only on the nonlinear parameters where with being a defined pseudoinverse of (batwat1, , Sec.3.5.5,p.85) where and are matrices. The variables must be separable as discussed above and the number of variable reduction is only equal to the number of conditional linear parameters that exists for the problem. In applications, the preferred algorithm that exploits this valuable variable reduction is called Variable Projection. There are many applications in time resolved spectroscopy that is heavily dependent on this technique and many references to the method are given in the review by van Stokkum et al. stokkum1 (). Recently this method of variable projection has been extended in a restricted sense shear1 () in the field of inverse problems, which is not related to our method of either modeling or optimization, nor is the methodology related to the implicit function methods. In short, much of the reported methods developed are , meaning that they are constructed to face the specific problems at hand with no pretense to any all-encompassing generality and this work too is in the sense of suggesting variable reduction with specific classes of non-inverse problems as indicated where the work develops a method of reducing the variable number to unity for all variables in the expectation function space irrespective of whether they are conditional or not by approximating their values by a method of averages (for method(i)a) without any form of linear regression being used in determining their approximations during the minimization cycles, and without necessarily using the standard matrix theory that is valid for a very limited class of functions. Methods (i)b and (ii) are exact treatments. No "‘eliminating"’ of conditional linear parameters are involved in this nonlinear regression method because they are explicitly calculated. Nor is any projection in the mathematical sense involved. These more general methods could have useful applications in deterministic systems comprising many parameters that are all linked to one variable, the primary one (denoted here) that is considered on physical grounds to be the most important one. A generalization of this method would be to select a smaller set of variables than the full parameter list. Examples of multiparameter complex systems include those for multiple-step elementary reactions each with its own rate constant that gives rise to photochemical spectra signals that must be resolved unambiguously getoff1 (). All these complex and coupled processes in physical theories are related by postulated laws that feature parameters . Other examples include quantum chemical calculations with many topological and orientation variables that need to be optimized with respect to the energy, but in relation to one or a few variables, such as the molecular trajectory parameter during a chemical reaction where this variable is of primary significance in deciding on the ’reasonableness’ of the analysis (wales1, , Sec. 6.2.3,p.294). Method (i)a and (i)b below refer to LS data-fitting algorithms. Method (i)a is an approximate method where it is proved under certain conditions that it could be a more accurate determination of parameters compared to a standard LS fit using (1). Method (i)b develops the methodology whereby its optimum value for with domain values coincides with that of the standard LS method where the variables are varied independently. Also discussed are the relative accuracy of both methods (i)a in subsection (2.2)and (i)b (endnote at end of section 3). Method (ii) develops a single parameter optimization where the conditions of an arbitrary function are met simultaneously, viz.
We note that methods (i)a, (i)b and (ii) are not related to the Adomian decomposition method and its variants that expands polynomial coefficients waz1 () for solutions to differential equations not connected to estimation theory; indeed here there are no boundary values that determine the solution of the differential equations.
2 Method (i)a theory
Deterministic laws of nature are sometimes written - for the simplest examples- in the form
linking the variable to . The components of , and are parameters. Verification of a law of form (3) relies on an experimental dataset . The variable could be a vector of variable components of experimentally measured values or a single parameter as in the example below where denotes values of time in the domain space. The vector form will be denoted . Variables are defined as members of the ’domain space’ of the measurable system and similarly is the defined range or ’response’ space of the physical measurement. Confirmation or verification of the law is based on (a) deriving experimentally meaningful values for the parameters and (b) showing a good enough degree of fit between the experimental set and . In real world applications, to chemical kinetics for instance, several methods (hou1, ; moore1, ; went1, ; went2, , etc.) have been devised to determine the optimal parameters, but most if not all these methods consider the aforementioned parameters as autonomous and independent (e.g. moore1 ()). A similar scenario broadly holds for current state of the art applications of structural elucidation via energy functions wales1 (). To preserve the viewpoint of the inter-relationship between these parameters and the experimental data, we devise a scheme that relates to for all via the set , and optimize the fit over -space only. i.e. there is induced a dependency on via the the experimental set . The conditions that allow for this will also be stated.
2.1 Details of method (i)a
Let be the number of dataset pairs , the number of components of the parameter, and the number of singularities where the use of a particular dataset leads to a singularity in the determination of as defined below and which must be excluded from being used in the determination of . Then for the unique determination of . Let be the total number of different datasets that can be chosen which does not lead to singularities. If the singularities are not choice dependent, i.e. a particular data-set pair leads to singularities for all possible choices, then we have the following definition for where is the total number of combinations of the data-sets taken at a time that does not lead to singularities in . In general, is determined by the nature of the data sets and the way in which the proposed equations are to be solved. Write in the form
and for a particular dataset , write . Define the vector function with components . Assume defined on an open set that contains .
For any such if , the unique function (with components ) defined on where , and where for every .
The above follows from the Implicit function theorem (IFT) (apos1, , Th.13.7,p.374) where is the independent variable for the existence of the function. ∎
We seek the solutions for subject to the above conditions for our defined functions. Map as follows
where the term and its components are defined below and where is a varying parameter. For any of the combinations denoted by a combination variable where is a particular dataset pair, it is in principle possible to solve for the components of in terms of through the following simultaneous equations:
from Lemma 1. And each choice yields a unique solution where . Hence any functions of involving addition and multiplication are also in . For each , there will be different solutions, . We can define an arithmetic mean (there are several possible mean definitions that can be utilized) for the components of as
In choosing an appropriate functional form for (eq.7) we assumed equal weightage for each of the dataset combinations; however the choice is open, based on appropriate physical criteria. We verify below that the choice of satisfy the constrained variation of the LS method so as to emphasize the connection between the level-surfaces of the unconstrained LS with the line function .
Each is a function of whose derivative is known either analytically or by numerical differentiation. To derive an optimized set, then for the LS method, define
Then for an optimized , we have Defining
the optimized solution of corresponds to which has been reduced to a one dimensional problem. The standard LS variation on the other hand states that the variables in (4) are independently varied so that
with solutions for in terms of whenever . Of interest is the relationship between the single variable variation in (8 ) and the total variation in (10). Since is a function of , then (10) is a constrained variation where
subjected to (i.e. for some function of k) and where are the components of . According to the Lagrange multiplier theory (apos1, , Th.13.12,p.381) the function has an optimal value at subject to the constraints over the subset where vanishes, i.e. where when either of the following equivalent equations (12,13) are satisfied
where and the ’s are invariant real numbers. We refer to as any variable that is a function of constructed on physical or mathematical grounds, and not just to the special defined in (7). Write
where since and therefore . We abbreviate the functions and . Define
where are the experimental subspace variables as in (6) with defined above. We next verify the relation between and .
Of interest is the theoretical relationship of the variables of the function described by (11,8) denoted and those of the free variational function of (15) denoted with the variable set which can be written
which is given by the following theorem, where we abbreviate and where we note that the functional form is unique and of the same form for both these variables.
The unconstrained LS solution to for the independent variables is also a solution for the constrained variation single variable where . Further, the two solutions coincide if and only if
The unconstrained solution is derived from the equations
with being constants. If there is a dependency, then we have
If the variable set satisfies (22) and (23) in unconstrained variation, then the values when substituted into (24 ) satisfies the equation since and are the same functional form. This proves the first part of the theorem. The second part follows from the converse argument, where from (24 ), if , then setting one factor to zero in (2.1 ) leads to the implication of (26)
The theorem, verification and lemma above do not indicate topologically under what conditions a coincidence of solutions for the constrained and unconstrained models exists. Fig.(1) depicts the discussion below. From theorem (3), if set represents the solution for the unconstrained LS method and set for the constrained method , then . Define within the range . Then is in a compact space, and since , is uniformly continuous, (depree1, , TH.8,p.79). Then admissible solutions to the above constraint problem with the inequality implies where is the unconstrained minimum. The unconstrained LS function to be minimized in (10) implies
Defining the constrained function then where . Because , solutions occur when (i) corresponding to the coincidence of the local minimum of the unconstrained for the best choice for the line with coordinates as it passes through the local unconstrained minimum and (ii) where this solution is a special case of (iii) when the vector is to , i.e. is at a tangent to the surface for some where this situation is shown in Fig.(1) where the vector is tangent at some point of the surface . Whilst the above characterizes the topology of a solution, the existence of a solution for the line which passes through the point of the unconstrained minimum of is proven below under certain conditions where a set of equations are constructed to allow for this exceptionally important application. Also discussed is the case when it may be possible for unconstrained solution set to satisfy the inequality , where is a function designed to accommodate all solutions of (6).
2.2 Discussion of LS fit for a function with a possibility of a smaller LS deviation than for parameters derived from a free variation of eq.(10)
The LS function metric such as (10) implied at a stationary (minimum) point for variables . On the other hand, the sets of solutions (6), in number provides for each set exact solutions averaged to . If the solutions are in a -neighbourhood, then it may be possible that the composite function metric to be optimized over all the sets of equations , in number defined here
could be such that
implying that, under these conditions, the of (28) is a better measure of fit. For what follows, the for equation set obtains for all values of the open set , from the IFT, including which minimises (8). Another possibility that will be discussed briefly later is where in (28), all are free to vary. Here we consider the case of the values averaged to for some . We recall the Intermediate-value theorem (IVT) (apos1, , Th.4.38,p.87)for real continuous functions defined over a connected domain which is a subset of some . We assume that the functions immediately below obey the IVT. For each solution of the set for a specific we assume that the function is a strictly increasing function in the sense of definition(4)below, where
with , in the following sense:
A real function f is (strictly) increasing on a connected domain about the origin at if relative to this origin, if (of the boundaries of ball and implies both and .
Nb. A similar definition obtains for a (strictly) decreasing function with the inequalities. Since the boundaries are compact, and continuous, the maximum and minimum values are attained for all ball boundaries.
For any region bounded by and with coordinate (radius centered about coordinate ) ,
Suppose in fact then which is a contradiction and a similar proof obtains for the upper bound. ∎
Nb. Similar conditions apply for the non-strict inequalities . The function that is optimized is
Define as the solution vector for the equation set . We illustrate the conditions where the solution for a free variation for the metric given in(10) can fulfill the inequality where is as defined in (32)
with given as in (7). A preliminary result is required. Define and .
for for some
Any point would be located within an spherical annulus centred at , with radii so chosen so that by lemma (5), the following results:
where in (30). Choose so that . Define as the space bounded by the boundary of the balls centered on of radius and ( ). Then by lemma (6). Since in (28) is not equivalent to in (10)where we write here the free variation vector solution as , then the above results leads to the following:
Hence we have demonstrated that it may be more realistic or accurate to fit parameters based on a function that represents different coupling sets such as above rather than the standard LS method using (28) if lies sufficiently far away from . We note that if is the solution of the free variation of the above in (28), then from the arguments presented after the proof of theorem(3), it follows that
which implies that the independent variation of all parameters in LS optimization of the variation is the most accurate functional form to use assuming equal weightage of experimental measurements than the standard free variation of parameters using the function of (10).
3 Theory of method (i)b
Whilst it is advantageous in science data-analysis to optimize a particular multiparameter function by focusing on a few key variables (our variable of restricted dimensionality, which we have applied to a 1-dimensional optimization in the next section) , it has been shown that this method yields a solution that is always of higher value for the same function than a full and independent parameter optimization, meaning that it is less accurate. The key issue, therefore, is whether for any function, including those of the variety, it is possible to construct a parameter optimization such that the line of parameter variables passes through the minimum surface of the function. We develop a theory to construct such a function below. However, Method(i) may still be advantageous because of the greater simplicity of the equations to be solved, and the fact that functions were required, whereas here there the functions must be at least continuous.
For the function defined in (10), where each of the functions are on an open set and where is convex, the solution at any point of whenever at determines uniquely the line equation that passes the minimum of the function when .
As before so that
Define , and for an independent variation of the variables at the stationary point, we have
The above results for the functions to have a unique implicit function of denoted by the IFT (apos1, , Th.13.7,p.374) requires that on an open set . More formally, the expansion of the preceding determinant in (41) verifies that a symmetric matrix obtains for due to the commutation of second order partial derivatives of
Defining as a function of only by expanding yields the total derivative as where
Substituting (44) derived from (39) and (40) into (43) together with the condition implies that , which satisfies (40) for the free variation in . Thus for independent variation of . So fulfills the criteria of a stationary point at say , since ((depree1, , Prop. 16, p.112)). Suppose that is convex, where is a minimum point, , a convex subdomain of . Then at , , and is also the unique global minimum over according to (crav1, , Theorem 3.2,pg.46). Thus is unique, whether derived from a free variation of or via dependent parameters with the function. ∎
As before, and may be replaced with the summation of indexes as for in (28) to derive a physically more accurate fit.
4 Method (ii) theory
Methods (i)a and (i)b, which are mutual variants of each other are applications of the implicit method to modeling problems. Here, another variant of the implicit methodology for optimization of a target or cost function is presented. One can for instance consider to be an energy function with coordinates , where as before the components of is , is another coordinate so that . For bounded systems, (such as the molecular coordinates) , one can write
Thus is in a compact space . Define
Then the equilibrium conditions become
Take (46) as the defining equations for