Automatic Differentiation to Simultaneously Identify Nonlinear Dynamics and Extract Noise Probability Distributions from Data
Abstract
The sparse identification of nonlinear dynamics (SINDy) is a regression framework for the discovery of parsimonious dynamic models and governing equations from timeseries data. As with all system identification methods, noisy measurements compromise the accuracy and robustness of the model discovery procedure. In this work we develop a variant of the SINDy algorithm that integrates automatic differentiation and recent timestepping constrained motivated by Rudy et al. [66] for simultaneously (i) denoising the data, (ii) learning and parametrizing the noise probability distribution, and (iii) identifying the underlying parsimonious dynamical system responsible for generating the timeseries data. Thus within an integrated optimization framework, noise can be separated from signal, resulting in an architecture that is approximately twice as robust to noise as stateoftheart methods, handling as much as 40% noise on a given timeseries signal and explicitly parametrizing the noise probability distribution. We demonstrate this approach on several numerical examples, from LotkaVolterra models to the spatiotemporal Lorenz 96 model. Further, we show the method can identify a diversity of probability distributions including Gaussian, uniform, Gamma, and Rayleigh.
1 Introduction
The datadriven discovery of governing equations is an emerging field within the machine learning and artificial intelligence communities. From neural networks (NN) to traditional model regression techniques, a diversity of methods are emerging that transform timeseries data (or spatiotemporal data) into representations of governing equations of motion [10, 52, 39, 71, 36, 13, 50, 82, 34, 2, 5, 85, 81, 47, 80, 55, 43, 58, 14, 61, 59, 60, 26, 84, 6, 72, 20, 69, 11, 65, 66, 86]. The interpretability and generalizability of these discovered equations of motion are critical for understanding, designing, and controlling complex systems. As such, the sparse identification of nonlinear dynamics (SINDy) [11] framework provides a compelling regression framework since discovered models are interpretable and parsimonious by design. As with all system identification algorithms, noisy measurements compromise the accuracy and robustness of the model discovery procedure. Moreover, many optimization frameworks rely explicitly on the assumption of Gaussian noise, which is rarely true in the real world. Recently, Rudy et al. [66] developed a novel optimization framework for separating signal and noise from noisy timeseries data by identifying a deep NN model for the signal from numerical timestepping constraints such as a RungeKutta. In this work, we build on this framework and leverage automatic differentiation [3] in the optimization procedure to simultaneously denoise data and identify sparse nonlinear models via SINDy. This new architecture yields significant improvements in model discovery, including superior separation of the signal from noise while simultaneously characterizing the noise distribution.
SINDy has emerged as a flexible and promising architecture for model discovery due to its inherent parsimonious representation of dynamics. The SINDy framework relies on sparse regression on a library of candidate model terms to select the fewest terms required to describe the observed dynamics [11]. Specifically, SINDy is formulated as an overdetermined linear system of equations , where sparsity of the solution is promoted by the norm . Thus sparsity is a proxy for parsimony, interpretability, and generalizability. Measurement noise, however, is always present, and it corrupts the ability of the SINDy regression framework, and indeed any other model discovery paradigm, to accurately extract governing models.
There are many variants of sparse regression, all of which typically attempt to approximate the solution to an NPhard, norm penalized regression. Sparsitypromoting methods like the LASSO [74, 76] use the norm as a proxy for sparsity since tractable computations can be performed. The iterative leastsquares thresholding technique of the SINDy algorithm promotes sparsity through a sequential procedure. Recently, Zhang and Schaeffer [87] have provided several rigorous theoretical guarantees on the convergence of the SINDy algorithm. Specifically, they proved that the algorithm approximates local minimizers of an unconstrained penalized leastsquares problem, which allows them to provide sufficient conditions for general convergence, the rate of convergence, and conditions for onestep recovery. Using a relaxed formulation, Champion et al. [16] show how the SINDy regression framework can accommodate additional structure, robustness to outliers, and nonlinear parameter estimation using the sparse relaxed regularized regression (SR3) formulation [90]. SINDy results in interpretable models, and it has been widely applied in many scientific disciplines [73, 41, 19, 40, 28, 42, 24, 51, 21, 30, 75, 37, 23, 70, 53, 4]. Moreover, it has been extended to incorporate control [12, 32], rational or implicit dynamics [46, 31, 88], partial differential equations [65, 48], parametric model dependencies [63], discrepancy models [30, 21], multiscale physics [15], stochastic dynamics [7], constrained physics [40], among many other innovations [41, 69, 44, 77, 67, 68, 83, 7, 45, 25, 16, 27, 62, 49]. The PySINDy Python package executes many of these variants [22].
Despite its flexibility, modularity, and extensibility, SINDy and its variants typically rely on approximating timederivative of the measured timeseries data. Computing derivatives of noisy measurement data is known to be a challenging problem, with many algorithmic innovations and mathematical architectures developed to produce accurate derivative approximations [78]. These methods include finitedifferences, spectral methods [35], spline smoothing, filtering procedures, polynomial fitting, lowrank projection, and total variations, to highlight some of the diverse techniques employed for this critical task of scientific computing. This task is made even more difficult, depending upon the noise statistics. Gaussian noise is often easier to learn and characterize than noise distributions that have nonzero means and are not symmetric. Ultimately, there is a need for methods that are robust to noisy measurements and diverse probability distributions.
Recent innovations in automatic differentiation have enabled the solution of an optimization problem directly related to the computation of the required derivatives [66]. Since its inception, automatic differentiation has been widely used in the machine learning community to enable complicated optimization problems without manually computing Jacobians [1, 3, 57, 79, 17, 64, 66, 9, 56, 38]. More recently, this approach has been used with NNs to separate a signal from noise and model the signal when a model is unknown [66], and to improve Kalman smoothing when the governing equations are known [64]. The success of these algorithms suggest that they could be leveraged for noise signal separation in the SINDy framework. In this work, we extend this simultaneous denoising and discovery approach to SINDy. Specifically, automatic differentiation enables differentiation with respect to the functions in the SINDy library, thus circumventing a direct differentiation of the noisy timeseries data. The modified SINDy algorithm is more robust to noise and further allows for an explicit characterization (discovery) of the underlying probability distribution of the noise, something that current stateoftheart methods cannot do and is a unique feature of our method.
In Sec. 2, we illustrate the modified SINDy algorithm. In Sec. 3, we show the comparison between modified SINDy and noise signal separation approach based on the NN proposed by Rudy et al. [66]. In Sec. 4, we show the use of modified SINDy on various numerical examples. We also show how modified SINDy can be used to extract the noise distribution information and how it can be used in the discrepancy modeling framework. In Sec. 5, we show our conclusions and possible future improvements.
2 Methods
In what follows, we introduce the basic mathematical architecture behind the SINDy algorithm, demonstrating explicitly its sensitivity to noisy measurements. This guides our introduction of the modified SINDy for simultaneously learning the system model and denoising the signal.
2.1 Sparse Identification of Nonlinear Dynamics
The SINDy algorithm [11] provides a principled, datadriven discovery method for nonlinear dynamics of the form
(1) 
where is system states represented as a row vector. SINDy posits a set of candidate functions that would characterize the right hand side of the governing equations. Candidate model terms form the library of potential right hand side terms, where is formed by row vectors. This then allows for the formulation of a regression problem to select only the few candidate terms necessary to describe the dynamics:
(2) 
where the matrix is comprised of the sparse vectors that select candidate model terms. The amount of sparsity promotion is controlled by the parameter , which determines the penalization by the norm. The can be any candidate function that may describe the system dynamics such as trigonometric functions or polynomial functions , for example. By solving Eq. (2), we can identify a model of system dynamics
(3) 
Many different optimization techniques can be used to obtain the sparse coefficients , such as sequentially thresholded least squares (STLSQ) [11, 87], LASSO [76], sparse relaxed regularized regression (SR3) [90, 16], stepwise sparse regression (SSR) [8], and Bayesian approaches [88, 54].
In practice, noisefree measurements of are not available, and only the full state noisy measurement
(4) 
is provided to SINDy from sensors, where is noisy measurement and is the noise added to true state. Thus, Eq. (2) then becomes
(5) 
where is noisy measurement matrix formed by row vectors measurement of size and is noise matrix also formed by row vector of size . From Eq. (5), note that the solution is no longer the same shown in Eq. (2) due to the presence of noise. Moreover, the noise will be magnified when approximating the derivatives by a factor of [65], and it will nonlinearly corrupt the library matrix . Extensive research has been done to improve the robustness of the SINDy framework. The integral formulation [67] and weak formulation [62, 49, 48] improved the regression robustness by avoiding taking derivative of noisy data. Other approaches, such as subsampling [89], increased the noise robustness of the SINDy framework by doing regression on the subsampled measurement that has less noise. Corrupt data can also be handled with methods from robust statistics [77, 16]. In the next section, we introduce an alternative approach that simultaneously learns the noise while using the denoised data to perform model identification.
2.2 Simultaneously Denoising and Learning System Model
To improve the noise robustness of the SINDy regression, we determine the estimated noise as a hyperparameter and formulate in order to optimize the difference between the estimated derivative and system’s vector field such that
(6) 
where is formed by estimated true states , and is the derivative approximation error. Note that . When , the effect of noise can be eliminated, and the accuracy of SINDy will be significantly improved. However, there exist many trivial solutions for minimizing the Eq. (6) with two uncorrelated optimization parameters and . Thus, an additional constraint is needed to regularize Eq. (6).
The additional constraint proposed here uses the estimated vector field of the system model similar to the one proposed by Rudy et al. [66]. Equation (3) gives the estimate of the true vector field . Integrating over a segment of time to gives the integrated vector field, or flow map,
(7) 
This can be generalized to integrate the system either forward or backward in time steps. This gives
(8) 
To obtain the in Eq. (8), a numerical simulation scheme such as Runge–Kutta can be used. In what follows, we employ a 4thorder RungeKutta method to simulate the dynamics forward/backward in time steps. Similar to Eq. (8), when the noisy measurement data is given, the estimated state satisfies
(9) 
when and the exact value of is known. Thus, by minimizing
(10) 
the optimization parameters and are coupled, resulting in additional structural constraint of the model. The parameter is used to account for the numerical error and is set to , where is a constant (throughout this paper, we use ). The use of suggests that the simulation error too far ahead in the future, or too far backward in the past, should be penalized less due to the error of the numerical simulation scheme. The error incurred by simulating the vector filed forward/backward on the entire trajectory can be written as
(11) 
Using subscripts to represent the time step, the final cost function is then
(12) 
which is the summation of the derivative approximation error and simulation error . The optimization problem to simultaneously denoise and learn the system model can then be written as
(13) 
The global optimal solution for Eq. (13) needs to satisfy and . To solve for Eq. (13), it is necessary to calculate the Jacobian and , which is a difficult task to do analytically or computationally. However, recent automatic differentiation packages such as Tensorflow [1] and Julia Flux [29] make it possible to directly extract the gradients of with respect to and . This allows us to solve the optimization problem in Eq. (13) easily using gradient descent method such as Adam [33]. Throughout this paper, we use the Tensorflow 2.0 and Adam optimizer to solve the Eq. (13). Moreover, to enforce the sparsity of the identified model, a thresholding approach [11] is used and the Eq. (13) is solved for times. Each iteration uses the previous iteration’s optimization result as the initial guess of the new iteration. The values of is also used to calculate the estimated state , which is used to calculate the new estimated values of the selection parameter . Furthermore, if the elements in are smaller than a threshold at the end of an optimization loop, those elements will be constrained to zero for the remainder of the optimization process. Figure 1 illustrates this process, and Appendix. A shows the detailed algorithm for simultaneous denoising and sparse model identification. Some guidance on the selection of the hyperparameters , , and is given in Appendices. B, C, and D.
3 Performance Comparison with Neural Network Denoising Approach
The advocated optimization framework of modified SINDy is compared with a NN denosing approach by Rudy et al. [66]. Additionally, the robustness to noise and the amount of data is considered.
3.1 Performance Criteria
For ease of comparison, we use the same performance criteria developed by Rudy et al. [66]. Specifically, these are the vector field error , the noise identification error , and the prediction error . The vector filed error is
(14) 
which calculates the relative squared error between the true vector filed and identified vector field . The noise identification error is
(15) 
which is the mean difference between the true noise and identified noise . The prediction error is
(16) 
and it calculates the difference between forward simulation trajectory and true trajectory. For comparison of modified SINDy and recently published WeakSINDy [49], as shown in Appendix. F, two more performance criteria are used. The first one is the normalized parameter error
(17) 
which reflects how much the identified parameters is off from the true parameters . The other one is the success rate, which describes the percentage of identifying the model’s correct structure in multiple trials.
3.2 Robustness to Noise
The Lorenz attractor is used as an example to test the noise robustness of the the approach. The model of the chaotic Lorenz is
(18)  
where , , and . The Lorenz attractor is simulated with initial condition , , and . The prediction step is chosen as for both approaches compared and for our proposed method. Unless otherwise noted, Adam optimizer is used to optimize the problem with maximum iteration set to for modified SINDy and for NN approach [66]. Different magnitudes of Gaussian noise are added to generate the noisy training data. The noise level is defined as
(19) 
For each noise level, different sets of noisy data are generated and used as data for both approaches. The NN approach [66] uses the same set up as [66], with hidden layers, and each layer having neurons. Moreover, the regularization parameter is chosen as , and the penalty for is chosen as . Unless otherwise noted, we use the same set up for all the NNs in this paper. For modified SINDy, the library is constructed with terms up to second order (not including the constant term). Moreover, the value of the sparsity parameter varies based on the noise added. For most of the case, . A Tikhonov regularization approach is used to presmooth the noisy data as in [66], although we have found that presmoothing does not affect the results appreciably when using zeromean noise.
Fig. 2 show the noise identification error of the NN approach [66] and the modified SINDy approach. The vector field error and short term prediction error can be seen in Fig. 3. For all the noise levels, modified SINDy correctly identified the Lorenz model. To calculate the prediction error, the identified model is simulated seconds forward in time, with , for both modified SINDy and NN denoising approach [66]. Fig. 3 suggests that modified SINDy identified model has better performance when simulated forward in time. Appendix. E shows noise robustness comparison between the modified SINDy and original SINDy [11]. In general, the modified SINDy is about times more robust than orignal SINDy [11]. A comparison between modified SINDy and the recently developed WeakSINDy approach [49] is presented in Appendix. F.
3.3 Robustness to Data Length
We also compare the performance of the NN denoising approach by Rudy et al. [66] and modified SINDy under different data usage with a fixed noise level. The minimum amount of data needed by modified SINDy to correctly identify the system model is shown by using Lorenz attractor as an example. To perform the numerical experiment, the same initial point, , is used to generate noisefree data of different temporal lengths. The time step is fixed at with of Gaussian noise added to generate noisy training data. The success rate of modified SINDy is calculated to indicate the minimum amount of data needed to identify the correct system model. The prediction error is not shown since the simulation of the identified model in the low data limit is not stable. With a learning rate of , Adam is used to optimize the problem with the prediction step set to for both approaches. A fixed thresholding parameter with is used for modified SINDy and the library is constructed with up to second order terms (without constant term added). Fig. 4 suggests that when the correct parameters and library is used for modified SINDy, it will outperform the NN denoising approach by Rudy et al. given the same amount of data.
4 Results
In this section, we demonstrate the ability of modified SINDy to separate signal and noise while learning the system model. The Van der Pol oscillator will be used as the example test case to show that modified SINDy can identify the correct distribution of the Gaussian noise added to the system. Additionally, we highlight several other examples tested with modified SINDy and summarize the performance. Furthermore, as a more advanced example, we show that modified SINDy can be used to separate nonGaussian, nonzero mean, and nonsymmetric noise distributions from the dynamics. Finally, we show how modified SINDy can be integrated to the discrepancy modeling approach [30].
4.1 Van der Pol Oscillator
The Van der Pol oscillator is used as our test case to demonstrate the ability of modified SINDy to denoise and learn the system dynamics simultaneously. The Van der Pol oscillator is given by
(20)  
where the nonlinear damping/gain parameter is used for demonstration purposes. The system is simulated with initial condition , , and . The Adam optimizer with learning rate of is used for all noise levels. The parameters of modified SINDy are chosen as and , and the library of candidate functions is constructed with polynomial terms up to third order (without constant term). Three different levels of noise are applied and the distribution of identified noise is shown in Fig. 5. Figure 5 shows that modified SINDy correctly identified the distribution of true noise.
4.2 Rössler Attractor
The second example we use is the Rössler attractor that is governed by
(21)  
where , , and . The system is simulated with initial condition , , and . The Adam optimizer with learning rate of is used for all noise levels. The parameters of modified SINDy are chosen as and , and the library of candidate functions is constructed with polynomial terms up to second order (with constant term). Three different levels of noise are applied and the denoised signal is shown in Fig. 6. Figure 6 also shows the simulated trajectories of the identified models. The initial condition , , and are used to simulate the identified models.
4.3 Lorenz 96 Model
As our last example, we use the modified SINDy to identify Lorenz 96 model whose equation is given by
(22) 
for . We assume , , , and set forcing term as to generate chaotic behavior. The number is set as such that the model has states. The system is simulated with initial condition , , and . The Adam optimizer with learning rate of is used for all noise levels. The parameters of modified SINDy are chosen as and (for noise, ). The library of candidate functions is constructed with polynomial terms up to third order (with constant term included, candidates in total). Three different levels of noise are applied and the denoised signal is shown in Fig. 7 (for ease of visualization, only the first three states are shown). Figure 7 also shows the simulated trajectories of identified models. The initial condition , , and are used to simulate the identified models.
In Fig. 8, the effectiveness of modified SINDy is demonstrated on a number of canonical dynamical systems models. For all examples, Gaussian noise with zeromean is added to generate the noisy training data, and Adam optimizer is used to perform the optimization. The models and other parameters used for each example are summarized in Appendix. F. The modified SINDy correctly identified all the system model and noise distribution regardless of the noise magnitude used.
4.4 Identification of Noise Distributions
The modified SINDy algorithm has the ability to to handle different kinds of noise distributions. Three different kinds of noise distributions are used to demonstrate this: Gaussian, Uniform, and Gamma. To generate the Gamma noise, its shape and scale are set to . The generated noise is multiplied by . The noisefree data of Van der Pol oscillator is generated the same way in Sec. 4.1. The prediction step is set to and the sparsity parameter is set to . Figure 9 shows the distribution identified by modified SINDy. Figure 9 shows that learning the nonzero mean noise distribution is more difficult than learning a zeromean one. For better learning results of a nonzero mean noise distribution, one can try the iterative learning approach shown in Appendix. H. Once the noise is separated from the signal, an additional step can be taken to identify the distribution of noise from the candidate distributions. This can be achieved by the fitter package in Python [18]. Appendix. I shows more details of this process.
4.5 Discrepancy Modeling
Modified SINDy can be easily integrated with the discrepancy modeling framework of SINDy [30]. This is of great practical value since it is often the case that parts of the dynamics is known. Suppose the known (theoretical) righthand side dynamics in Eq. (1) is . Discrepancy modeling assumes that the known model is not capable of modeling the data due to missing physics terms on the righthand side. Thus there is a mismatch between the derivative and the known dynamics . The discrepancy modeling approach tries to identify the missing dynamics such that
(23) 
To illustrate this process, consider a system , whose model is given as
(24)  
Eq. (24) is simulated with the , , and to generate noisefree data. Training data is produced by adding Gaussian noise in order to create the noisy measurement. Assume that the noisy measurement of Eq. (24) is given. Further assume that the dynamics is modified based on , which is given by
(25)  
The difference between the Eq. (24) and Eq. (25) will be the discrepancy model that modified SINDy identifies. Note that this prior information of the dynamics, , can be constrained to exist in the modified SINDy library during the optimization process, and its parameters can be used as an initial guess of the true parameters. Thus, the only thing we have to learn is the missing dynamics. Figure 10 illustrates this process. In this example, the , , and the learning rate of Adam optimizer is . Fig.10 suggests that modified SINDy can be used to learn the discrepancy model when parts of the dynamics are already known.
5 Conclusion and Future Work
In this work, we introduce a new learning algorithm that leverages automatic differentiation and sparse regression for simultaneously (i) denoising timeseries data, (ii) learning and parametrizing the noise probability distribution, and (iii) identifying the underlying parsimonious dynamical system responsible for generating the timeseries data. The method provides a critically enabling modification to the SINDy algorithm for improving robustness to noise with less training data in comparison with the previously developed NN denoising approach by Rudy et al. [66]. Multiple numerical examples are shown to demonstrate the effectiveness of the modified SINDy method for signal and noise separation as well as model identification. It is also shown that the modified SINDy can be integrated with a discrepancy modeling framework whereby prior information of the dynamical model can be used to help identify the missing dynamics. Importantly, we have shown that modified SINDy can be used to learn various types of noise distributions, including Gaussian, uniform, and nonzero mean noise distributions, such as a Gamma distribution. Overall, the modified SINDy is a robust method with practical potential for handling highly noisy data sets and/or when partial model information is known.
The modified SINDy is modular, allowing for many easily integrated improvements. An important direction for development includes the incorporation of control inputs, since many systems of practical interest are actuated, such as the pendulum on a cart system [12, 31]. Extending modified SINDy to consider the impact of control will significantly expand its application domain. Improvements in computational speed are also desirable. In comparison with the sequential leastsquare thresholding of the standard SINDy algorithm, the Adam optimizer is slow. There is the potential to use the standard SINDy sparse regression algorithms to warm start the Adam optimization routine. The modified SINDy can also be integrated with SINDyPI to identify rational or implicit dynamics, which is quite difficult since the simulation error shown in Eq. (11) can not be calculated easily when the dynamics take a rational form. This is the case where the use of the NN denoising approach [66] by Rudy et al. is ideal.
Finally, it is important to improve the robustness of the modified SINDy algorithm when a large number of library terms are used. Currently, the modified SINDy can not handle large libraries robustly due to the nonconvexity of the optimization problem. When the library is too large, the problem becomes unstable without decreasing the optimizer’s learning rate. One potential solution is to simulate the dynamics with a variable time step numerical simulation scheme instead of a fixed step scheme, as we used in this paper. Although there are still many improvements to be made, we believe the introduction of modified SINDy will help guide the use of automatic differentiation tools to improve the SINDy framework.
Acknowledgments
SLB acknowledges support from the Army Research Office (ARO W911NF1910045) and the Air Force Office of Scientific Research (AFOSR FA95501810200). JNK acknowledges support from the Air Force Office of Scientific Research (AFOSR FA95501710329). We also acknowledge valuable discussions with Samuel H. Rudy, Jared Callaham, Henning Lange, Daniel A. Messenger, and Benjamin Herrmann.
Appendix A Algorithm for Simultaneously Denoising and Learning System Model
Appendix B Effect of Thresholding Parameter
Thresholding parameter is the most important parameter to tune in modified SINDy. The parameter will determine the sparsity of the model structure. It’s effect can be seen in Fig. 11. In Fig. 11, Lorenz equation is simulated with , , and . of Gaussian noise is added and Adam optimizer with learning rate of is used to denoise the signal. is set to and different values of is used. For each , the numerical experiments is performed times to calculate the median and distribution of the error as shown in Fig. 11. Fig. 11 suggests that the value of must be properly tuned. If the value of is too small, the sparsity constraint will not be strong enough to enforce the correct model to be found. Moreover, and will easily get stuck in the local minimum. If the value of is too large, the correct terms can be wrongly eliminated and the resulting model structure will be wrong. If the model structure is wrong, there will be huge difference between the identified noise and true noise . To avoid swiping different values of , our proposed method can be easily modified to use the stepwise sparse regression (SSR) approach [7]. However, the use of SSR approach and its performance is not in the scope of this paper.
Appendix C Effect of Prediction Step
Fig. 12 shows the effect of the prediction step on the performance of NN denoising approach by Rudy et al. [66] and modified SINDy approach. The chaotic Lorenz system is used for comparison. The Lorenz attractor is simulated by setting , , and . The noise level is set to to generate noisy data. Each prediction step is run for times to calculate the median of the error. Adam optimizer, with a learning rate of is used to perform the optimization. is set to . Fig. 12 suggests that the performance of modified SINDy is not hugely affected by the prediction step . However, for the NN denoising approach shown in [66], there exist some value of to achieve optimal performance. Fig. 12 also suggests the computational time of both approaches increase linearly as the value of increase. Thus, can be chosen as a small value to save the computational time when using modified SINDy without sacrificing too much of the performance.
Appendix D Effect of Optimization Iteration
The parameter determines how many times the thresholding optimization is performed. Fig. 13 shows the effect of on the noise identification error and vector field error using Lorenz attractor as an example. The system is simulated by setting , , , and . Adam optimizer, with a learning rate of , is used to optimize the problem. Fig. 13 suggests the performance of modified SINDy will gradually converge in the end.
Appendix E Noise Robustness Comparison with SINDy
This section shows the noise robustness comparison of SINDy [11] and modified SINDy using Van der Pol oscillator, Lorenz attractor, and Rössler attractor. Fig. 14 shows the maximum noise percentage each algorithm can handle to generate the correct model structure. For each noise level, different noisy data sets are generated and provided to both approaches. If the tested algorithm fails to identify the correct model structure for any noisy data sets at a given noise level, we will assume it is not robust to noise at this level. For SINDy, the derivative is computed using finite difference, and we show the effect of presmoothing the noisy data on its performance. Note that no smoothing is applied for modified SINDy. The clean data for Lorenz attractor, Van der Pol oscillator, and Rössler attractor is generated the same way shown in Sec. 3.2, Sec. 4.1, and Sec. 4.2. For SINDy, the sparsity parameter is chosen as a hundred uniformly distributed values from to the minimum of true parameters’ absolute value. For modified SINDy, and are used for all examples shown in Fig. 14. Table. 1 shows other parameters we used for modified SINDy. Note that it is possible to make modified SINDy work at a higher noise level by tuning its parameters. However, swiping various parameters is quite computationally heavy for modified SINDy. Thus, the maximum noise level modified SINDy can tolerate in Fig. 14 is a lower approximate.
Model 





Lorenz  




Rössler  




Van der Pol  




Lorenz  



Appendix F Noise Robustness Comparison with WeakSINDy
This section shows the noise robustness comparison of WeakSINDy [49] and modified SINDy using Lorenz attractor as an example. The Lorenz attractor is simulated by setting , , and . For both approaches, the library is constructed using up to second order terms (without constant term). Different percentage of noise is added to the clean data to generate noisy training data. The parameter error and success rate is computed for both approaches. For modified SINDy, Adam optimizer with learning rate of is used to perform the optimization. The sparsity parameter is chosen as for most of the time. If the modified SINDy can not produce the correct result, is used instead. When we presmooth the data using approach mentioned in Sec. 3.2 and no presmoothing is done when . For WeakSINDy, when , test functions with polynomial order of are used. The widthathalfmax parameter , and the support size . When , test functions with polynomial order of are used. The , and . different sparsity parameters evenly ranges from to are used, each generates a different candidate model for WeakSINDy. The final model we used to calculate the parameter error for WeakSINDy is the model that has correct structure (with only correct terms are selected from the library). If the WeakSINDy fails to produce the model with correct active terms, the model that predicts the test data best is used to calculate the prediction error, and the test data is generated using initial condition and simulated with and . The final comparison result of the best model generated by WeakSINDy and modified SINDy can be seen in Fig. 15.
Models  Initial Condition  Library Order  Learning Rate  

Van der Pol  
Duffing  
Cubic  
LotkaVolterra  
Lorenz  , 
Appendix G Parameters Used in Fig. 8
In this section, the models used to simulate the system in Fig. 8 are listed. The model used for simulating the Duffing oscillator is
(26)  
with , , and . The model used for simulating the Cubic oscillator is
(27)  
with , , , and . The model used for simulating the LotkaVolterra system is
(28)  
with and . Other parameters used for training the modified SINDy is summarized in Table. 2. For all examples, and .
Appendix H Tips on Learning NonZero Mean Noise
As Sec. 4.4 suggests, learning nonzero mean noise distribution is much harder than learning the zeromean noise distribution. To achieve better performance on the nonzero mean noise distribution, we propose an iterative learning approach. This approach can be summarized as follow: 1. Apply modified SINDy to the noisy data, briefly learn the distribution of noise. 2. Subtract the mean of learned noise from the noisy measurement, and use the new data to perform the learning. 3. Repeat the step 2 until the result converges and the correct model is found. The Van der Pol oscillator is used to illustrate this approach, and the clean data is generated the same way in Sec. 4.1. of Gamma noise is added to create the noisy data. The parameters are set as , and . Fig. 16 demonstrate this approach. However, there’s no guarantee that this approach will work when the bias of noise is too large, learning the nonzero mean noise is quite hard and careful tuning is needed. We find out using the soft start approach will also help the denoising of nonzero mean noise.Appendix I Identifying Noise Distribution Type
True Distribution State True Parameter Identified Distribution Identified Parameters Gaussian Gaussian Gaussian Uniform Uniform Uniform Gamma Gamma Gamma Dweibull Dweibull , Dweibull Rayleigh Rayleigh Rayleigh When the noise is identified, it might be interesting to learn what type of distribution the noise follows. To illustrate this, the Van der Pol oscillator shown in Eq. (20) is simulated with initial condition , , and (for Gamma and Rayleigh noise distribution, ). Next, of noise is added to the simulation data to generate the noisy data. The noisy data is provided to modified SINDy to learn the dynamics and identify the noise added to the signal. We set , ( for Gamma noise). Adam optimizer with learning rate equals to is used, and the library order is set to . For all cases, the modified SINDy correctly identified the model. As Table. 3 shows, five different noise distributions is used to generate the noisy data. After the noise is identified, the distribution of noise is fitted into seven candidate noise distributions, which are normal distribution, uniform distribution, Gamma distribution, Dweibull distribution, Rayleigh distribution, Cauchy distribution, and Beta distribution. Next, the sum of the square errors between the and the fitted distribution is calculated, and the distribution that produces the lowest error is selected as the identified noise distribution. Notice that when there’s not enough data provided, it is totally possible that other kinds of distribution is misidentified as the true underlying distribution of noise. The study of how many data points is needed to identify the correct distribution is beyond the scope of this paper. The final result can be summarized in Table. 3.Appendix J Caveats of the Approach
This section provides some tips on using modified SINDy.
Properly design the library: Building the correct library for the regression is the most important part of this algorithm. If the library does not contain the terms included in the actual dynamics, the algorithm will fail to produce the correct noise and system model. Thus, whenever possible, one should include any prior information of the dynamics to build the library. In general, the library needs to be large enough to include all the terms that show up in the dynamics, and at the same time small enough to ensure the robustness. Do not expect the modified SINDy will work on a library with hundreds or thousands of terms, it will break if the library is too large. For example, when using the Lorenz example with above noise, the maximum order of the library modified SINDy can handle is (about terms). This happens since the higher order terms in the library will tend to mess up the forward and backward simulation and producing the nan cost, making the optimizer fails. To leverage this, one can try to decrease the learning rate of the optimizer, presmooth the data, get better initial estimate of , reduce the library size, or set optimization parameters type as float64. Moreover, whether the constant term should be included is casespecific. If the actual dynamics do not have a constant term and the measurement noise is nonzero mean or has significant outliers, including the constant basis in the library will make modified SINDy get stuck at the local minimum more easily. It is advised that the user tries both the library with and without constant basis.
Initial guess of and : Having a good initial guess of the estimated noise and estimated selection parameter can improve the condition of the optimization problem and allowing us tackle harder problem with more library terms. If possible, the initial values of can be obtained by presmoothing the noisy signal, which will provide a good start for the optimization problem, and it is also good for estimating . If no other information is given, the initial guess of the can be set as zeros.
Footnotes
 Corresponding author (kadierk@uw.edu); Code available at github.com/dynamicslab/modifiedSINDy.
References
 (2016) Tensorflow: a system for largescale machine learning. In 12th USENIX symposium on operating systems design and implementation (OSDI 16), pp. 265–283. Cited by: §1, §2.2.
 (1969) Fitting autoregressive models for prediction. Ann Inst Stat Math 21 (1), pp. 243–247. Cited by: §1.
 (2017) Automatic differentiation in machine learning: a survey. The Journal of Machine Learning Research 18 (1), pp. 5595–5637. Cited by: §1, §1.
 (2020) Formulating turbulence closures using sparse regression with embedded form invariance. arXiv preprint arXiv:2003.12884. Cited by: §1.
 (2013) Nonlinear system identification: narmax methods in the time, frequency, and spatiotemporal domains. John Wiley & Sons. Cited by: §1.
 (2007) Automated reverse engineering of nonlinear dynamical systems. Proc. Natl. Acad. Sciences 104 (24), pp. 9943–9948. Cited by: §1.
 (2018) Sparse learning of stochastic dynamical equations. The Journal of chemical physics 148 (24), pp. 241723. Cited by: Appendix B, §1.
 (2018) Sparse learning of stochastic dynamical equations. The Journal of Chemical Physics 148 (24), pp. 241723. External Links: Document Cited by: §2.1.
 (2019) DeepMoD: deep learning for model discovery in noisy data. arXiv preprint arXiv:1904.09406. Cited by: §1.
 (2019) Datadriven science and engineering: machine learning, dynamical systems, and control. Cambridge University Press. Cited by: §1.
 (2016) Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proceedings of the national academy of sciences 113 (15), pp. 3932–3937. Cited by: Figure 14, Appendix E, §1, §1, §2.1, §2.2, §3.2.
 (2016) Sparse identification of nonlinear dynamics with control (SINDYc). IFACPapersOnLine 49 (18), pp. 710–715. Cited by: §1, §5.
 (2012) Applied Koopmanism. Chaos: An Interdisciplinary Journal of Nonlinear Science 22 (4), pp. 047510. Cited by: §1.
 (2019) Datadriven discovery of coordinates and governing equations. Proceedings of the National Academy of Sciences 116 (45), pp. 22445–22451. Cited by: §1.
 (2019) Discovery of nonlinear multiscale systems: sampling strategies and embeddings. SIAM Journal on Applied Dynamical Systems 18 (1), pp. 312–333. Cited by: §1.
 (2019) A unified sparse optimization framework to learn parsimonious physicsinformed models from data. arxiv 0, pp. 1906.10612v1. Cited by: §1, §2.1, §2.1.
 (2018) Neural ordinary differential equations. In Advances in neural information processing systems, pp. 6571–6583. Cited by: §1.
 Cokelaer/fitter: v1.2.3 synchronised on pypi External Links: Document, Link Cited by: §4.4.
 (2017) Sparse identification of a predatorprey system from simulation data of a convection model. Physics of Plasmas 24 (2), pp. 022310. Cited by: §1.
 (2015) Automated adaptive inference of phenomenological dynamical models. Nature communications 6. Cited by: §1.
 (2019) Discovery of physics from data: universal laws and discrepancy models. arXiv preprint arXiv:1906.07906. Cited by: §1.
 (2020) PySINDy: A Python package for the sparse identification of nonlinear dynamics from data. arXiv preprint arXiv:2004.08424. Cited by: §1.
 (2020) Loworder model for successive bifurcations of the fluidic pinball. Journal of Fluid Mechanics 884. Cited by: §1.
 (2018) Sparse modeling of the lift gains of a highlift configuration with periodic coanda blowing. In 2018 AIAA Aerospace Sciences Meeting, pp. 1054. Cited by: §1.
 (2019) Multidimensional approximation of nonlinear dynamical systems. Journal of Computational and Nonlinear Dynamics 14 (6). Cited by: §1.
 (2012) Nonlinear Laplacian spectral analysis for time series with intermittency and lowfrequency variability. Proceedings of the National Academy of Sciences 109 (7), pp. 2222–2227. Cited by: §1.
 (2020) Tensor network approaches for learning nonlinear dynamical laws. arXiv preprint arXiv:2002.12388. Cited by: §1.
 (2019) Reactive SINDy: discovering governing reactions from concentration data. Journal of Chemical Physics 150 (025101). Cited by: §1.
 (2018) Flux: elegant machine learning with Julia. Journal of Open Source Software. External Links: Document Cited by: §2.2.
 (2019) Learning discrepancy models from experimental data. Conference on Decision and Control. Cited by: §1, §4.5, §4.
 (2020) SINDyPI: a robust algorithm for parallel implicit sparse identification of nonlinear dynamics. arXiv preprint arXiv:2004.02322. Cited by: §1, §5.
 (2018) Sparse identification of nonlinear dynamics for model predictive control in the lowdata limit. Proceedings of the Royal Society of London A 474 (2219). Cited by: §1.
 (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §2.2.
 (2018) Datadriven model reduction and transfer operator approximation. Journal of Nonlinear Science. Cited by: §1.
 (2013) Datadriven modeling & scientific computation: Methods for complex systems & big data. Oxford University Press. Cited by: §1.
 (2016) Dynamic mode decomposition: datadriven modeling of complex systems. SIAM. Cited by: §1.
 (2019) Sparse structural system identification method for nonlinear dynamic systems with hysteresis/inelastic behavior. Mechanical Systems and Signal Processing 117, pp. 813–842. Cited by: §1.
 (2020) From Fourier to Koopman: spectral methods for longterm time series prediction. arXiv preprint arXiv:2004.00574. Cited by: §1.
 (2010) Perspectives on system identification. Annual Reviews in Control 34 (1), pp. 1–12. Cited by: §1.
 (2018) Sparse reducedorder modelling: Sensorbased dynamics to fullstate estimation. Journal of Fluid Mechanics 844, pp. 459–490. Cited by: §1.
 (2018) Constrained sparse Galerkin regression. Journal of Fluid Mechanics 838, pp. 42–67. Cited by: §1.
 (2020) Datadriven modeling of the chaotic thermal convection in an annular thermosyphon. Theoretical and Computational Fluid Dynamics, pp. 1–27. Cited by: §1.
 (2019) DeepXDE: a deep learning library for solving differential equations. arXiv preprint arXiv:1907.04502. Cited by: §1.
 (2017) Model selection for dynamical systems via sparse regression and information criteria. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences 473 (2204), pp. 20170009. External Links: Document Cited by: §1.
 (2019) Model selection for hybrid dynamical systems via sparse regression. Proceedings of the Royal Society A 475 (2223), pp. 20180534. Cited by: §1.
 (2016) Inferring biological networks by sparse identification of nonlinear dynamics. IEEE Transactions on Molecular, Biological and MultiScale Communications 2 (1), pp. 52–63. Cited by: §1.
 (2018) VAMPnets: deep learning of molecular kinetics. Nature Communications 9 (5). Cited by: §1.
 (2020) Weak SINDy for partial differential equations. arXiv preprint arXiv:2007.02848. Cited by: §1, §2.1.
 (2020) Weak SINDy: galerkinbased datadriven model selection.. arXiv preprint arXiv:2005.04339. Cited by: Appendix F, §1, §2.1, §3.1, §3.2.
 (2013) Analysis of fluid flows via spectral properties of the Koopman operator. Annual Review of Fluid Mechanics 45, pp. 357–378. Cited by: §1.
 (2018) Datadriven identification of interpretable reducedorder models using sparse regression. Computers & Chemical Engineering 119, pp. 101–111. Cited by: §1.
 (2013) Nonlinear system identification: from classical approaches to neural networks and fuzzy models. Springer. Cited by: §1.
 (2020) Sparsitypromoting algorithms for the discovery of informative Koopman invariant subspaces. arXiv preprint arXiv:2002.10637. Cited by: §1.
 (201601) A sparse Bayesian approach to the identification of nonlinear statespace systems. IEEE Transactions on Automatic Control 61 (1), pp. 182–187. External Links: Document Cited by: §2.1.
 (2018) Modelfree prediction of large spatiotemporally chaotic systems from data: A reservoir computing approach. Physical review letters 120 (2), pp. 024102. Cited by: §1.
 (2020) Universal differential equations for scientific machine learning. arXiv preprint arXiv:2001.04385. Cited by: §1.
 (2017) DifferentialEquations.jl–a performant and featurerich ecosystem for solving differential equations in Julia. Journal of Open Research Software 5 (1). Cited by: §1.
 (2019) Physicsinformed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics 378, pp. 686–707. Cited by: §1.
 (2017) Machine learning of linear differential equations using gaussian processes. arXiv preprint arXiv:1701.02440. Cited by: §1.
 (2018) Hidden physics models: machine learning of nonlinear partial differential equations. Journal of Computational Physics 357, pp. 125–141. Cited by: §1.
 (2020) Hidden fluid mechanics: learning velocity and pressure fields from flow visualizations. Science 367 (6481), pp. 1026–1030. Cited by: §1.
 (2020) Using noisy or incomplete data to discover models of spatiotemporal dynamics. Physical Review E 101 (1), pp. 010203. Cited by: §1, §2.1.
 (2019) Datadriven identification of parametric partial differential equations. SIAM Journal on Applied Dynamical Systems 18 (2), pp. 643–660. Cited by: §1.
 (2019) Smoothing and parameter estimation by softadherence to governing equations. Journal of Computational Physics 398, pp. 108860. Cited by: §1.
 (2017) Datadriven discovery of partial differential equations. Science Advances 3 (4), pp. e1602614. Cited by: §1, §1, §2.1.
 (2019) Deep learning of dynamics and signalnoise decomposition with timestepping constraints. Journal of Computational Physics 396, pp. 483–506. Cited by: Figure 12, Appendix C, Automatic Differentiation to Simultaneously Identify Nonlinear Dynamics and Extract Noise Probability Distributions from Data, §1, §1, §1, §2.2, Figure 2, Figure 3, Figure 4, §3.1, §3.2, §3.2, §3.2, §3.3, §3, §5, §5.
 (2017) Sparse model selection via integral terms. Physical Review E 96 (2), pp. 023302. Cited by: §1, §2.1.
 (2018) Extracting sparse highdimensional dynamics from limited data. SIAM Journal on Applied Mathematics 78 (6), pp. 3279–3295. Cited by: §1.
 (2017) Learning partial differential equations via data discovery and sparse optimization. In Proc. R. Soc. A, Vol. 473, pp. 20160446. Cited by: §1, §1.
 (2020) Discovery of algebraic Reynoldsstress models using sparse symbolic regression. Flow, Turbulence and Combustion 104 (2), pp. 579–603. Cited by: §1.
 (201008) Dynamic mode decomposition of numerical and experimental data. Journal of Fluid Mechanics 656, pp. 5–28 (English). External Links: ISSN 00221120 Cited by: §1.
 (2009) Distilling freeform natural laws from experimental data. Science 324 (5923), pp. 81–85. Cited by: §1.
 (2016) Sparse identification for nonlinear optical communication systems: SINO method. Optics express 24 (26), pp. 30433–30443. Cited by: §1.
 (2017) False discoveries occur early on the Lasso path. The Annals of statistics 45 (5), pp. 2133–2150. Cited by: §1.
 (2019) Sparse identification of truncation errors. Journal of Computational Physics 397, pp. 108851. Cited by: §1.
 (1996) Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society: Series B (Methodological) 58 (1), pp. 267–288. Cited by: §1, §2.1.
 (2017) Exact recovery of chaotic systems from highly corrupted data. Multiscale Modeling & Simulation 15 (3), pp. 1108–1129. Cited by: §1, §2.1.
 (2020) Numerical differentiation of noisy data: a unifying multiobjective optimization framework. arXiv preprint arXiv:2009.01911. Cited by: §1.
 (2018) Automatic differentiation in ML: where we are and where we should be going. In Advances in neural information processing systems, pp. 8757–8767. Cited by: §1.
 (2018) Datadriven forecasting of highdimensional chaotic systems with long shortterm memory networks. Proc. R. Soc. A 474 (2213), pp. 20170844. Cited by: §1.
 (2018) Timelagged autoencoders: deep learning of slow collective variables for molecular kinetics. The Journal of Chemical Physics 148 (241703), pp. 1–9. Cited by: §1.
 (2015) A datadriven approximation of the Koopman operator: extending dynamic mode decomposition. Journal of Nonlinear Science 6, pp. 1307–1346. Cited by: §1.
 (2019) Numerical aspects for approximating governing equations using data. Journal of Computational Physics 384, pp. 200–221. Cited by: §1.
 (2017) Reconstruction of normal forms by learning informed observation geometries from data. Proceedings of the National Academy of Sciences, pp. 201620045. Cited by: §1.
 (2018) Physicsinformed generative adversarial networks for stochastic differential equations. arXiv preprint arXiv:1811.02033. Cited by: §1.
 (2007) Modeling and nonlinear parameter estimation with Kronecker product representation for coupled oscillators and spatiotemporal systems. Physica D: Nonlinear Phenomena 227 (1), pp. 78–99. Cited by: §1.
 (2019) On the convergence of the SINDy algorithm. Multiscale Modeling & Simulation 17 (3), pp. 948–972. Cited by: §1, §2.1.
 (2018) Robust datadriven discovery of governing physical laws with error bars. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences 474 (2217), pp. 20180305. External Links: Document Cited by: §1, §2.1.
 (2019) Robust datadriven discovery of governing physical laws using a new subsamplingbased sparse Bayesian method to tackle four challenges (large noise, outliers, data integration, and extrapolation). arXiv preprint arXiv:1907.07788. Cited by: §2.1.
 (2018) A unified framework for sparse relaxed regularized regression: SR3. IEEE Access 7, pp. 1404–1423. Cited by: §1, §2.1.