Beyond Newton: a new root-finding fixed-point iteration for nonlinear equations
Finding roots of equations is at the heart of most computational science. A well-known and widely used iterative algorithm is the Newton’s method. However, its convergence depends heavily on the initial guess, with poor choices often leading to slow convergence or even divergence. In this paper, we present a new class of methods that improve upon the classical Newton’s method. The key idea behind the new approach is to develop a relatively simple multiplicative transformation of the original equations, which leads to a significant reduction in nonlinearities, thereby alleviating the limitations of the Newton’s method. Based on this idea, we propose two novel classes of methods and present their application to several mathematical functions (real, complex, and vector). Across all examples, our numerical experiments suggest that the new methods converge for a significantly wider range of initial guesses with minimal increase in computational cost. Given the ubiquity of Newton’s method, an improvement in its applicability and convergence is a significant step forward, and will reduce computation times several-folds across many disciplines. Additionally, this multiplicative transformation may improve other techniques where a linear approximation is used.
Newton-Raphson, or often referred to as Newton’s, fixed-point iteration method has been the gold-standard for numerically solving equations for several centuries. In order to set the symbols and nomenclature, we start by defining a generic problem.
for a given function or . In general, fixed-point iteration methods start with a guess to the solution and iteratively update it using
where depends on and the chosen numerical scheme. It is required that as for the numerical scheme to converge, and, in practice, the fastest possible convergence is desired. If we expand in finite Taylor’s series up to second order about point and evaluate it at to solve , we get
for some . The last term is called the remainder and can be written in other forms Apostol (1967). Here, is unknown. Therefore, in the standard Newton’s method, we neglect the second order term and approximate with an updated guess to the solution: . Thus, we obtain the Newton’s fixed-point iteration, also called Newton-Raphson method (using notation ):
The above step shows (at least) quadratic convergence of the Newton’s method when has only simple roots. This quadratic convergence has led to a wide adoption of Newton’s method in solving problems in all scientific disciplines. Furthermore, this attractive property has also led to a large amount of work towards developing root-finding algorithms that either imitate or approximate the Newton’s method.
Despite the aforementioned attractive convergence of the Newton’s method, it can be seen from Eq. (5) that to achieve quadratic convergence, the current guess must be “sufficiently close” to the solution. When the initial guess is not close to the solution, the convergence can be slower than quadratic, or, worse, the iterations may not converge, oscillate, or diverge. As an example, if we consider and solve it using Eq. (4) with , iterations diverge until we get numerical overflow (Fig. 1, solid red line). Instead, if we start with , the error decreases sub-quadratically before achieving quadratic convergence closer to the solution (Fig. 1, dashed red line).
Ii Methods and Results
The basin of attraction for a root is defined as the set of initial guesses from which a given fixed-point iteration converges to (this can be seen as the domain of convergence for initial guesses). Naturally, it is desirable to have a large basin of attraction for the root so that convergence is achieved for a wide range of initial guesses. Even though, for a general , determination of the size of the basin of attraction is challenging, it is clear from Eq. (5) that size of the basin depends on a measure of nonlinearity111This nonlinearity measure is similar to the number in Smale’s -theorySmale (2000) .
We pose a question: can we pre-multiply the original equation by a function so that we obtain a larger (or, if possible, infinite) basin of attraction? That is, instead of solving Eq. (1), can we solve
still using Newton’s method? The idea is to choose a that decreases the nonlinearity and, hence, gives a larger basin of attraction, while retaining at least quadratic convergence close to the root. For Eq. (6), the measure of nonlinearity is
We note that if is not a function of , we do not change the nonlinearity of the problem, and this case would fall under the purview of the highly developed field of linear preconditioning. In an attempt to minimize , Eq. (7), we equate , which gives
for arbitrary integration constants and . However, this has two problems. Multiplying the above form of by :
eliminates the original root and
introduces a new root .
In order to to solve these problems, we add another constant in the denominator of in Eq. (8)
This avoids the elimination of the root. In addition, we eliminate the undesirable new root by choosing such that . That is, . Furthermore, makes indeterminate and any value only scales the equation with a constant without affecting its nonlinearity. Therefore, without loss of generality, we set and . That is,
Thus instead of applying Newton’s method to , if we apply Newton’s method to our new equation
we get a new fixed-point iteration
The superscript denotes “Extended Newton” method.
The new fixed-point iteration Eq. (12) contains an arbitrary constant , which remains to be determined. We note that if such that , we have . That is, we will find the solution in a single iteration irrespective of the starting point and function . This is not surprising; if one could chose , the solution is already known, making its basin of attraction infinite.
However, it is not clear how the distance of from the solution will affect convergence. Even though one could study the effect of on the nonlinearity by assessing from Eq. (7), it does not give an immediate insight. For the example, the new equation to be solved is . Surprisingly, we find that our Extended Newton method, Eq. (12), converges starting from (Fig. 1, gray lines), irrespective of the choice of in a wide range . Even if this behaviour of is specific to the chosen, given the simplicity of the formulation, this is a remarkable result showing the potential of the EN method in comparison to the classical Newton’s method.
Limiting case of
Next, we note that we must choose . Choosing makes a stationary point of the fixed-point iteration Eq. (12), and the iteration gets “stuck” at this undesired point. This is because, even though , is indeterminate. Practically, this does not pose a problem as long as .
Nevertheless, we look at the limiting case when . Rewriting Eq. (12) and taking the limit we get
This gives us a new fixed-point iteration
where the superscript denotes “Corrected Newton” method (the choice of this name will become clear shortly). The advantage of this fixed-point iteration is that there is no arbitrary constant involved, however we have to pay with the price of calculating the second derivative of 222The same limit on our modified equation Eq. (11) gives (15) If we apply Newton’s method to the above equation, we get (16) There is a factor of 2 between and Eq. (16). Numerically, we verified that Eq. (14) performs better than Eq. (16)..
If we substitute within the parentheses as the one from standard Newton Eq. (4) and rearrange, we get Eq. (14). That is, Eq. (14) can be thought of as a correction to the Newton’s equation using second derivative . Thus, we can re-write Eq. (14) as
where is obtained from standard Newton’s method. While the above form is obtained with the aim of expanding the basin of attraction, we note that the Eq. (18) is identical to the Halley’s method with cubic order convergence close to the root Traub (1982). This observation will be useful in developing the extension of the method to vector functions.
We solve the problem for varying values of , , and and find a significantly improved convergence using the proposed fixed-point iterations (Fig. 2). In order to verify the generality of the new schemes, we solve several different nonlinear equations and provide the results in the Supporting information (SI).
The extension to functions on the complex plane is straightforward, and the same equations can be applied. Applying Newton’s method to gives us a Newton fractal, and the Extended Newton and Corrected Newton methods modify that fractal (Fig. 3). Surprisingly, the new schemes even reduce the dimension of the fractal (Fig. S8) and the basins of attraction appear more connected (generally speaking) than before. Choosing in Extended Newton breaks the three-fold symmetry of this system, and the solution close to is heavily favored.
Extension to multiple unknowns
Instead of single unknown, we have unknowns and a set of equations for . Our problem in -dimensions is
where all unknowns are collectively written as . Applying the standard Newton’s method gives us a system of linear equations to be solved to obtain the step
where and we have dropped the subscript to denote the current iteration value of as that is implied.
There could be several different way of extending Eq. (11) to this case of multiple unknowns. If it is possible to recognize a subset of the equations that are most nonlinear, one may use Extended Newton method for that subset of equations and standard Newton’s method for the rest. Here, we consider only the case where all equations are transformed. Thus, one straightforward extension is the following modified set of equations:
for chosen values of (), and is the vector with -th element replaced by . Accordingly, applying the Newton’s method we get our new system of linear equations
where when and 0 otherwise. For real or complex functions with single unknown, we have to calculate only once and there is no need for computing . However, in the above extension to multiple unknowns, needs to be evaluated at every iteration along with as well. Thus, the Extended Newton method for the case with multiple unknowns leads to a cumbersome set of calculations.
On the other hand, extending the Corrected Newton method, Eq. (18), to multiple unknowns is simpler:
The above method requires solution to two systems of linear equations. To alleviate this and obtain only one system of equations, we multiply both sides of Eq. (23) by
Using an approximation that and utilizing Eq. (20), we get a simplified expression
This equation gives us the step by solving only one system of equations, and we call it a quasi-Corrected Newton method (QCN). In numerical methods, computing the Jacobian is common, but calculating another gradient for Eq. (23) may not be desirable. However, in Eq. (25), we note that if we have an expression for , can be computed using finite difference in only operations, making this approach highly attractive.
As an extension of the exponential example from single variable system, we solve the following two equations:
This represents two springs in series with one end fixed and the other end under force , where each spring’s force is nonlinearly related to its change in length as . Similar to the single equation, the standard Newton’s method fails to converge for initial guess , whereas the Extended Newton (for ) and Corrected Newton methods converge in less than 10 iterations (Fig. 4).
As another example, we consider the minimization of Easom’s function with two variables:
For this function with a single minimum at , the basin of attraction is relatively small using the standard Newton’s method. Our proposed methods increase that basin significantly (Fig. 5). Surprisingly, the improvement using quasi-Corrected Newton method is even greater than the Corrected Newton method. Furthermore, the Extended Newton method can provide an even larger basin of attraction if is chosen close enough to the solution (Fig. 5 bottom row). A summary of proposed methods is provided in Table 1.
Newton’s method is one of the most widely used methods for finding roots in science and engineering. This is particularly true when the computation of the Jacobian is inexpensive. In the opposite case, significant amount of work has focused on developing methods that approximate the Newton’s method, largely through approximations of leading to the so-called quasi-Newton methods, so that convergence properties similar to the Newton’s method can be achieved with reduced computational expense Morton and Mayers (1995). On the other hand, different methods have been proposed that achieve higher than quadratic convergence rates using higher order derivatives. For example, Halley’s method, Cauchy’s method, Chebyshev method Traub (1982), class of Householder method Householder (1970), and other variants Osada (1994); Amat et al. (2003); Kou (2008); Cordero et al. (2013). However, the convergence of all of these methods is examined when the starting guess is close to the solution.
The objective of this article is to go beyond the classical approaches. The central idea is to decrease the system nonlinearity in view of improving convergence properties far from the solution. Starting with one variable and one equation, we premultiplied the original equation and formulated a new equation with the same roots (Eq. 11). Applying Newton’s method to this equation, we proposed a new fixed-point iteration, which we term as the Extended Newton (EN) method Eq. (11). This method has an arbitrary constant , and, in fact, the choice of makes Extended Newton method a family of fixed-point iterations. Using several example functions, we found that the Extended Newton method, unlike standard Newton’s method, can find the solution even when the starting point is considerably far from the solution. We found that irrespective of the choice of , the performance of EN was always better than the standard Newton’s method. It is also important to recognize that even if we choose close to the starting guess, the convergence is greatly improved. Therefore, when no prior information on the choice of is available, we recommend choosing for small .
|Method||Scalar function update||Vector function update|
We note that although we used a constant value of during iteration, there are other options of choosing its value at every iteration based on the current guess , previous guess , and/or the function form . In fact, if we choose , the EN can be thought of as a combination of secant method and Newton’s method. As a limit case , we rediscovered Halley’s method Eq. (18), which we call the Corrected Newton (CN) method. While Halley’s method was motivated by achieving cubic convergence, our approach has been motivated by increase in the size of the basin of attraction. It is, therefore, remarkable that both approaches lead to identical algorithm. Thus, one contribution of this work is to elucidate that Halley’s method not only provides cubic convergence but also results in a larger basin of attraction. We also note that the Extended Newton method provides at least quadratic convergence close to the root, because it is a Newton’s method applied to . In this sense, the EN method is no worse that Newton’s method while providing a larger basin of attraction. The advantage of the Corrected Newton over Extended Newton method is that there is no arbitrary constant to choose; however, it requires the calculation of the second derivative . With symbolic computation Buchberger (1996) and automatic differentiation Rall (1981) becoming commonplace, this limitation is becoming less important.
The application of these new methods to a sample of problems provided incredible improvement. The equation could be solved for extremely high in less than 10 iterations. Similar improvements were found for other examples (presented in the SI). We note that for a specific (known) equation form, instead of pre-multiplying, a transformation may be more suitable to decrease its nonlinearity. For example, for equation , one may write the equation as and then take a logarithm on both sides resulting in a linear equation that will always be solved in one iteration. However, this approach is problem-specific, requires identifying the two sides of the equation, and has been proposed by us for nonlinear elasticity Mei et al. (2018). On the other hand, the present approach is completely general and works for all kinds of equations and nonlinearities.
The extension of the approach to functions on the complex plane was straightforward and produced some astonishing results when applied to Newton fractal. The new method not only finds the root in fewer iterations, it also decreases the sensitivity of which root is obtained with respect to perturbations in the initial guess. This was especially true for Extended Newton method, where a choice of made almost all initial guesses converge to the same solution closest to . Corrected Newton method did not have such a bias and reduced the sensitivity equally for all the three roots.
Further extension of the Extended Newton method to arbitrary number of unknowns and equations can be accomplished in multiple ways. We presented one option (Eq. 22) and found the results improved significantly. This approach requires choosing arbitrary unknowns and calculation of the residual and Jacobian at additional points. This can lead to additional computation for large systems. Instead, one could choose a smaller subset of equations responsible for the nonlinearity and only transform those. This would decrease the extra computation required while still improving the convergence. Furthermore, for the exponential example, it was observed that the method worked only for . The underlying reason for this remains unclear, especially because the application of this method to Easom’s function did not show any such behavior.
On the other hand, the extension of Corrected Newton method to system of equations was straightforward and unique (Eq. 23). Even though this approach worked perfectly, it has two computational disadvantages: calculation and storage of with terms and solving two linear systems – first for the standard Newton (Eq. 20) and then for the correction (Eq. 23). As a simplification, we proposed its approximation (Eq. 25), which we call quasi-Corrected Newton method. This approach only requires calculation of one diagonal of , i.e. and solving only one linear system of equations. The name quasi-Corrected Newton method is inspired from multitude of quasi-Newton algorithms that are used to imitate Newton’s method while decreasing the computational requirements. We note that even if explicit expressions for may be unavailable or cumbersome to calculate, can be calculated using finite difference in only operations, and can be easily parallelized (unlike the iterations of Newton’s method).
To our knowledge, the quasi-Corrected Newton method has no hitherto reported counterpart. We believe that one of the reasons why these higher-order methods (Halley’s method, Householder’s, or other higher-order methods) are not widely used in science and engineering is that the faster convergence rates of these methods are usually offset by their higher computational cost at each iteration. In this respect, the quasi-Corrected Newton method provides a good balance between improved convergence properties and computational expense. In addition, the larger basin of attraction also outweighs the small extra computational cost of these proposed methods.
Among the different methods proposed, EN is associated with the least computational expense, and in many cases outperforms the Corrected Newton method, provided a good values of is available (see SI). We believe that in the future, different scientific communities will report optimal ranges of , obtained through numerical analysis and/or experimentation, for specific problems of wide interest. We foresee a large impact of the proposed methods in numerical solution of partial differential equations, where the nonlinearities only allow a step-by-step increase in boundary conditions using the classical Newton’s method post discretization (a common example being the nonlinear finite element analysis in structural mechanics where the applied loads can only be increased incrementally in the classical framework). The new methods will potentially allow for significantly larger step sizes, similar to those reported in Mei et al. (2018), thereby substantially reducing the solution times.
We believe there are many opportunities of extending this approach to other problems and end our discussion by mentioning a few potential avenues. For extending the EN method to vector functions, there are other options as well, which need to be explored in the future. Similarly, simplifications of CN method to give quasi-Corrected Newton methods (or quasi-Halley’s methods) would create a new area of research. In general, the extension of such root-finding methods to multiple-variables is a broad area, which also takes into account additional factors, such as sparsity and symmetry of the system, into account. The idea of multiplying with a function of instead of a constant, can be used in the development of preconditioners. Unlike linear preconditioners, which are well-understood, this will lead to nonlinear preconditioning, which is a relatively under-explored area Cai and Keyes (2002).
Application of a similar approach to nonlinear least squares fitting (or regression) will likely improve the fitting procedure, which is widely used in parameter estimation. Similarly, in optimization problems with multiple unknowns, a line search is commonly used. This approach could be used to improve the convergence of line search, which is akin to solving a single unknown equation. There is also the possibility of extending similar ideas to solving nonlinear dynamics, which is usually done by numerical time integration schemes. It is well-known that an accurate and stable solution of stiff ordinary differential equations, owing to their nonlinearity and coupling of different time scales, requires extremely small time steps. A similar approach of reducing their nonlinearity may be used to improve the time integration methods so that larger time steps can be used.
It took several decades to develop methods based on the classical Newton’s method. We anticipate that the new approach presented in this work, which resulted in some known (through classical higher-order approaches) and some new results, will spark the interest of the scientific community to fully explore the properties of these new methods, their applications, and the opportunities they present in science and engineering, over the next years.
A.A. and S.P. conceived the idea and designed the study. A.A. developed the theory and analysis. A.A. and S.P. analyzed the results and wrote the paper.
Conflict of interest
The authors declare no conflict of interest.
Acknowledgements.This work was supported by Welsh Government and Higher Education Funding Council for Wales through the Sêr Cymru National Research Network in Advanced Engineering and Materials (Grant No. F28 to A.A.), and the Engineering and Physical Sciences Research Council of the UK (Grant No. EP/P018912/1 to A.A.).
- Apostol (1967) T. M. Apostol, Jon Wiley & Sons (1967).
- Smale (2000) S. Smale, in The Collected Papers of Stephen Smale: Volume 3 (World Scientific, 2000) pp. 1263–1286.
- Traub (1982) J. F. Traub, Iterative methods for the solution of equations (American Mathematical Soc., 1982).
- Morton and Mayers (1995) K. Morton and D. Mayers, SIAM, Philadelphia (1995).
- Householder (1970) A. Householder, The Numerical Treatment of a Single Nonlinear Equation, International series in pure and applied mathematics (McGraw-Hill, 1970).
- Osada (1994) N. Osada, Journal of Computational and Applied Mathematics 51, 131 (1994).
- Amat et al. (2003) S. Amat, S. Busquier, and J. Gutiérrez, Journal of Computational and Applied Mathematics 157, 197 (2003).
- Kou (2008) J. Kou, Journal of Computational and Applied Mathematics 213, 71 (2008).
- Cordero et al. (2013) A. Cordero, J. R. Torregrosa, and P. Vindel, Applied Mathematics and Computation 219, 8568 (2013).
- Buchberger (1996) B. Buchberger, in Frontiers of Combining Systems (Springer, 1996) pp. 193–219.
- Rall (1981) L. Rall, Automatic Differentiation: Techniques and Applications, Lecture Notes in Computer Science (Springer Berlin Heidelberg, 1981).
- Mei et al. (2018) Y. Mei, D. E. Hurtado, S. Pant, and A. Aggarwal, Computer Methods in Applied Mechanics and Engineering 337, 110 (2018).
- Cai and Keyes (2002) X.-C. Cai and D. E. Keyes, SIAM Journal on Scientific Computing 24, 183 (2002).